SlideShare a Scribd company logo
Divide, Conquer and Combine:
Hierarchical Feature Fusion Network
with Local and Global Perspectives for Multimodal
Affective Computing
Multimodal
– ( / / / )
Multimodal
– Deep Learning
– Deep Learning
– DNN
E2E
Cross modal
Multimodal Model for emotion recognition
–
– DNN
–
– cool
– = Multimodal Fusion
– Multimodal Fusion
–
– ACL2019 Multimodal Fusion
– Hierarchical Feature Fusion Network (HFFN)
–
–
Multimodal Fusion: Concat
nConvolutional MKL Based Multimodal Emotion Recognition and
Sentiment Analysis
– cancat
Multimodal Fusion: Concat
nContext-Dependent Sentiment Analysis in User-Generated Videos
– Text/Visual/Audio Video
concat(BC-LSTM)
Multimodal Fusion: Weighted average
nDeep Multimodal Fusion for Persuasiveness Prediction
– FFN
Multimodal Fusion: Tensor Fusion
nTensor Fusion Network for Multimodal Sentiment Analysis
–
Multimodal Fusion: Multistage Fusion
nMultimodal Language Analysis with Recurrent Multistage Fusion
– (divide) Multistage local fusion(conquer)
(Recurrent Multistage Fusion Network: RMFN)
– window (divide), local
fusion (conquer) local fusion ABS-LSTM (combine)
: Hierarchical Feature Fusion
Network(HFNN)
– window (divide), local
fusion (conquer) local fusion ABS-LSTM (combine)
: Hierarchical Feature Fusion
Network(HFNN)
& (divide)
– Modal
– Language feature: Text-CNN
– Acoustic feature: openSMILE
– Visual feature: 3D-CNN
– Window portion
– window (divide), local
fusion (conquer) local fusion ABS-LSTM (combine)
: Hierarchical Feature Fusion
Network(HFNN)
Local Fusion (conquer)
– Tensor Fusion
portion
Local Fusion 3
– window (divide), local
fusion (conquer) local fusion ABS-LSTM (combine)
: Hierarchical Feature Fusion
Network(HFNN)
Local Fusion (combine)
– 2 Bi-LSTM: ABS-LSTM Global (global
fusion)
– 1. Regional Interdependence Attention(RIA)
– t Local Fusion Attention
– 2.Global Interaction Attention(GIA)
– LSTM
– LSTM concat FFN Fusion
RIA
GIA
:
– 3
– CMU-MOSI
– 93 video 62 utterance
– CMU-MOSEI
– 2928 video 98 utterance
– Utterance sentiment emotion
– IEMOCAP
– 151 video 110 utterance
– anger happiness
–
– acoustic vision
– divide window
local fusion
– RIA t GIA
local fusion
– Multimodal Modal Fusion
– Tensor Fusion RMFN
– Global Fusion Attention
– Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, Liang Wang: A Comprehensive Survey on Cross-modal Retrieval
– Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. Show and tell: A neural image caption generator
– Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang . Talking Face Generation by Adversarially Disentangled Audio-Visual Representation. In
AAAI2019
– Sijie Mai, Haifeng Hu, Songlong Xing. Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal
Affective Computing. In ACL
– Soujanya Poria, Iti Chaturvedi, Erik Cambria, and Amir Hussain. 2016a. Convolutional mkl based multimodal emotion recognition and sentiment analysis. In
Proceedings of IEEE International Conference on Data Mining (ICDM), Deep Multimodal Fusion for Persuasiveness Prediction
– Soujanya Poria, Erik Cambria, Devamanyu Hazarika, Navonil Majumder, Amir Zadeh, and Louis Philippe Morency. 2017b. Context-dependent sentiment analysis in
user-generated videos. In ACL
– Haohan Wang, Aaksha Meghawat, Louis Philippe Morency, and Eric P Xing. 2016. Select-additive learning: Improving generalization in multimodal sentiment
analysis
– Amir Zadeh, Paul Pu Liang, Jonathan Vanbriesen, Soujanya Poria, Edmund Tong, Erik Cambria, Minghai Chen, and Louis Philippe Morency. 2018c. Multimodal
language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. ACL
– Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. EMNLP
– Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis Philippe Morency. 2016b. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages.
IEEE Intelligent Systems,
– Amir Zadeh, Paul Pu Liang, Jonathan Vanbriesen, Soujanya Poria, Edmund Tong, Erik Cambria, Minghai Chen, and Louis Philippe Morency. 2018c. Multimodal
language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. ACL
– Carlos Busso, Murtaza Bulut, Chi Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth S. Narayanan. 2008.
Iemocap: interactive emotional dyadic motion capture database. Language Resources and Evaluation

More Related Content

PDF
Sentiment analysis by deep learning approaches
PDF
Survey of Various Approaches of Emotion Detection Via Multimodal Approach
PDF
Audio-
PDF
Optimized multi-layer self-attention network for feature-level data fusion in...
PDF
Optimized multi-layer self-attention network for feature-level data fusion in...
DOCX
ai da.docx
PDF
Wilder Rodrigues - A Multimodal Approach to Recommend SAT and ACT Exam Questi...
PDF
Preprocessing Challenges for Real World Affect Recognition
Sentiment analysis by deep learning approaches
Survey of Various Approaches of Emotion Detection Via Multimodal Approach
Audio-
Optimized multi-layer self-attention network for feature-level data fusion in...
Optimized multi-layer self-attention network for feature-level data fusion in...
ai da.docx
Wilder Rodrigues - A Multimodal Approach to Recommend SAT and ACT Exam Questi...
Preprocessing Challenges for Real World Affect Recognition

Similar to Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing (19)

PDF
PREPROCESSING CHALLENGES FOR REAL WORLD AFFECT RECOGNITION
PPTX
Emotion recognition using facial expressions and speech
PPTX
M3er multiplicative_multimodal_emotion_recognition
PPTX
Multimodal emotion recognition at utterance level with spatio-temporal featur...
PDF
A Survey on Cross-Modal Embedding
PDF
Towards_multimodal_emotion_recognition_i.pdf
PDF
C5 giruba beulah
PDF
Connotative Feature Extraction For Movie Recommendation
PDF
RCOMM 2011 - Sentiment Classification with RapidMiner
PDF
RCOMM 2011 - Sentiment Classification
PDF
F0363942
PDF
Speech emotion recognition using 2D-convolutional neural network
PDF
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
PDF
Osimo crossover-opinionminingv3
PPTX
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...
PDF
Multimedia information retrieval using artificial neural network
PDF
A hybrid strategy for emotion classification
PDF
Emotion-oriented computing: Possible uses and applications
PDF
Workshopvin1 2018 Technology In Labour For Rebirth Of Hal 9000 Series
PREPROCESSING CHALLENGES FOR REAL WORLD AFFECT RECOGNITION
Emotion recognition using facial expressions and speech
M3er multiplicative_multimodal_emotion_recognition
Multimodal emotion recognition at utterance level with spatio-temporal featur...
A Survey on Cross-Modal Embedding
Towards_multimodal_emotion_recognition_i.pdf
C5 giruba beulah
Connotative Feature Extraction For Movie Recommendation
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification
F0363942
Speech emotion recognition using 2D-convolutional neural network
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Osimo crossover-opinionminingv3
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...
Multimedia information retrieval using artificial neural network
A hybrid strategy for emotion classification
Emotion-oriented computing: Possible uses and applications
Workshopvin1 2018 Technology In Labour For Rebirth Of Hal 9000 Series
Ad

Recently uploaded (20)

PDF
An interstellar mission to test astrophysical black holes
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
BIOMOLECULES PPT........................
PPTX
Pharmacology of Autonomic nervous system
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
2. Earth - The Living Planet earth and life
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
neck nodes and dissection types and lymph nodes levels
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
An interstellar mission to test astrophysical black holes
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Placing the Near-Earth Object Impact Probability in Context
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
BIOMOLECULES PPT........................
Pharmacology of Autonomic nervous system
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Classification Systems_TAXONOMY_SCIENCE8.pptx
The KM-GBF monitoring framework – status & key messages.pptx
2. Earth - The Living Planet earth and life
. Radiology Case Scenariosssssssssssssss
Taita Taveta Laboratory Technician Workshop Presentation.pptx
ECG_Course_Presentation د.محمد صقران ppt
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
HPLC-PPT.docx high performance liquid chromatography
Biophysics 2.pdffffffffffffffffffffffffff
INTRODUCTION TO EVS | Concept of sustainability
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
neck nodes and dissection types and lymph nodes levels
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Ad

Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing

  • 1. Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing
  • 3. Multimodal – Deep Learning – Deep Learning – DNN E2E Cross modal
  • 4. Multimodal Model for emotion recognition – – DNN – – cool – = Multimodal Fusion
  • 5. – Multimodal Fusion – – ACL2019 Multimodal Fusion – Hierarchical Feature Fusion Network (HFFN) – –
  • 6. Multimodal Fusion: Concat nConvolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis – cancat
  • 7. Multimodal Fusion: Concat nContext-Dependent Sentiment Analysis in User-Generated Videos – Text/Visual/Audio Video concat(BC-LSTM)
  • 8. Multimodal Fusion: Weighted average nDeep Multimodal Fusion for Persuasiveness Prediction – FFN
  • 9. Multimodal Fusion: Tensor Fusion nTensor Fusion Network for Multimodal Sentiment Analysis –
  • 10. Multimodal Fusion: Multistage Fusion nMultimodal Language Analysis with Recurrent Multistage Fusion – (divide) Multistage local fusion(conquer) (Recurrent Multistage Fusion Network: RMFN)
  • 11. – window (divide), local fusion (conquer) local fusion ABS-LSTM (combine) : Hierarchical Feature Fusion Network(HFNN)
  • 12. – window (divide), local fusion (conquer) local fusion ABS-LSTM (combine) : Hierarchical Feature Fusion Network(HFNN)
  • 13. & (divide) – Modal – Language feature: Text-CNN – Acoustic feature: openSMILE – Visual feature: 3D-CNN – Window portion
  • 14. – window (divide), local fusion (conquer) local fusion ABS-LSTM (combine) : Hierarchical Feature Fusion Network(HFNN)
  • 15. Local Fusion (conquer) – Tensor Fusion portion Local Fusion 3
  • 16. – window (divide), local fusion (conquer) local fusion ABS-LSTM (combine) : Hierarchical Feature Fusion Network(HFNN)
  • 17. Local Fusion (combine) – 2 Bi-LSTM: ABS-LSTM Global (global fusion) – 1. Regional Interdependence Attention(RIA) – t Local Fusion Attention – 2.Global Interaction Attention(GIA) – LSTM – LSTM concat FFN Fusion RIA GIA
  • 18. : – 3 – CMU-MOSI – 93 video 62 utterance – CMU-MOSEI – 2928 video 98 utterance – Utterance sentiment emotion – IEMOCAP – 151 video 110 utterance – anger happiness
  • 20. – divide window local fusion – RIA t GIA local fusion
  • 21. – Multimodal Modal Fusion – Tensor Fusion RMFN – Global Fusion Attention
  • 22. – Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, Liang Wang: A Comprehensive Survey on Cross-modal Retrieval – Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. Show and tell: A neural image caption generator – Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang . Talking Face Generation by Adversarially Disentangled Audio-Visual Representation. In AAAI2019 – Sijie Mai, Haifeng Hu, Songlong Xing. Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing. In ACL – Soujanya Poria, Iti Chaturvedi, Erik Cambria, and Amir Hussain. 2016a. Convolutional mkl based multimodal emotion recognition and sentiment analysis. In Proceedings of IEEE International Conference on Data Mining (ICDM), Deep Multimodal Fusion for Persuasiveness Prediction – Soujanya Poria, Erik Cambria, Devamanyu Hazarika, Navonil Majumder, Amir Zadeh, and Louis Philippe Morency. 2017b. Context-dependent sentiment analysis in user-generated videos. In ACL – Haohan Wang, Aaksha Meghawat, Louis Philippe Morency, and Eric P Xing. 2016. Select-additive learning: Improving generalization in multimodal sentiment analysis – Amir Zadeh, Paul Pu Liang, Jonathan Vanbriesen, Soujanya Poria, Edmund Tong, Erik Cambria, Minghai Chen, and Louis Philippe Morency. 2018c. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. ACL – Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. EMNLP – Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis Philippe Morency. 2016b. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intelligent Systems, – Amir Zadeh, Paul Pu Liang, Jonathan Vanbriesen, Soujanya Poria, Edmund Tong, Erik Cambria, Minghai Chen, and Louis Philippe Morency. 2018c. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. ACL – Carlos Busso, Murtaza Bulut, Chi Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth S. Narayanan. 2008. Iemocap: interactive emotional dyadic motion capture database. Language Resources and Evaluation