Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing

Divide, Conquer and Combine:
Hierarchical Feature Fusion Network
with Local and Global Perspectives for Multimodal
Affective Computing

Multimodal
– Deep Learning
– Deep Learning
– DNN
E2E
Cross modal

Multimodal Model for emotion recognition
–
– DNN
–
– cool
– = Multimodal Fusion

– Multimodal Fusion
–
– ACL2019 Multimodal Fusion
– Hierarchical Feature Fusion Network (HFFN)
–
–

Multimodal Fusion: Concat
nConvolutional MKL Based Multimodal Emotion Recognition and
Sentiment Analysis
– cancat

Multimodal Fusion: Concat
nContext-Dependent Sentiment Analysis in User-Generated Videos
– Text/Visual/Audio Video
concat(BC-LSTM)

Multimodal Fusion: Weighted average
nDeep Multimodal Fusion for Persuasiveness Prediction
– FFN

Multimodal Fusion: Tensor Fusion
nTensor Fusion Network for Multimodal Sentiment Analysis
–

Multimodal Fusion: Multistage Fusion
nMultimodal Language Analysis with Recurrent Multistage Fusion
– (divide) Multistage local fusion(conquer)
(Recurrent Multistage Fusion Network: RMFN)

– window (divide), local
fusion (conquer) local fusion ABS-LSTM (combine)
: Hierarchical Feature Fusion
Network(HFNN)

& (divide)
– Modal
– Language feature: Text-CNN
– Acoustic feature: openSMILE
– Visual feature: 3D-CNN
– Window portion

Local Fusion (conquer)
– Tensor Fusion
portion
Local Fusion 3

Local Fusion (combine)
– 2 Bi-LSTM: ABS-LSTM Global (global
fusion)
– 1. Regional Interdependence Attention(RIA)
– t Local Fusion Attention
– 2.Global Interaction Attention(GIA)
– LSTM
– LSTM concat FFN Fusion
RIA
GIA

:
– 3
– CMU-MOSI
– 93 video 62 utterance
– CMU-MOSEI
– Utterance sentiment emotion
– IEMOCAP
– anger happiness

– divide window
local fusion
– RIA t GIA
local fusion

– Multimodal Modal Fusion
– Tensor Fusion RMFN
– Global Fusion Attention

– Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, Liang Wang: A Comprehensive Survey on Cross-modal Retrieval
– Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. Show and tell: A neural image caption generator
– Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang . Talking Face Generation by Adversarially Disentangled Audio-Visual Representation. In
AAAI2019
– Sijie Mai, Haifeng Hu, Songlong Xing. Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal
Affective Computing. In ACL
– Soujanya Poria, Iti Chaturvedi, Erik Cambria, and Amir Hussain. 2016a. Convolutional mkl based multimodal emotion recognition and sentiment analysis. In
Proceedings of IEEE International Conference on Data Mining (ICDM), Deep Multimodal Fusion for Persuasiveness Prediction
– Soujanya Poria, Erik Cambria, Devamanyu Hazarika, Navonil Majumder, Amir Zadeh, and Louis Philippe Morency. 2017b. Context-dependent sentiment analysis in
user-generated videos. In ACL
– Haohan Wang, Aaksha Meghawat, Louis Philippe Morency, and Eric P Xing. 2016. Select-additive learning: Improving generalization in multimodal sentiment
analysis
– Amir Zadeh, Paul Pu Liang, Jonathan Vanbriesen, Soujanya Poria, Edmund Tong, Erik Cambria, Minghai Chen, and Louis Philippe Morency. 2018c. Multimodal
language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. ACL
– Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. EMNLP
– Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis Philippe Morency. 2016b. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages.
IEEE Intelligent Systems,
– Amir Zadeh, Paul Pu Liang, Jonathan Vanbriesen, Soujanya Poria, Edmund Tong, Erik Cambria, Minghai Chen, and Louis Philippe Morency. 2018c. Multimodal
language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. ACL
– Carlos Busso, Murtaza Bulut, Chi Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth S. Narayanan. 2008.
Iemocap: interactive emotional dyadic motion capture database. Language Resources and Evaluation

Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing

More Related Content

Similar to Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing (19)

Recently uploaded (20)

Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing