SlideShare a Scribd company logo
MCSE: Multimodal Contrastive Learning of
Sentence Embeddings
Miaoran Zhang, Marius Mosbach, David Ifeoluwa Adelani, Michael A. Hedderich, and Dietrich Klakow,
2022
Experiments
Conclusions
Introduction
Related Work
01
02
03
04
Introduction
• MCSE : Multimodal Contrastive Learning of Sentence Embeddings
 Background: Unsupervised SimCSE (Gao et al., 2021)
 Extend a multimodal contrastive objective
 Experiments on standard Semantic Textual Similarity (STS)
• Architecture of MCSE
Introduction
f v (·
) is a pre-trained image encoder such as ResNet
• Contrastive learning : background Unsupervised SimCSE
 data augmentation strategy : dropout noise
 pulling positive sentences closer and pushing apart negatives
Related Work
Cosine similarity
• Multimodal Contrastive Learning
 sentence-image pairs , sentence xi and image yi
• f v (·
) : pre-trained image encoder such as ResNet
• fθ(·
) : pre-trained language encoder such as BERT
 pull semantically close image-sentence pairs together and push away non-related pairs
Related Work
• Dataset
 Multimodal datasets : Flickr30k (29,783 images) and MS-COCO (82,783 images)
 text corpus : Wiki1M (English Wikipedia : 106 sentences)
• Encoder
 Language encoders : BERT and RoBERTa
 Image encoder : ResNet-50
Single layer MLPs
• Evaluation
 7 Semantic Textual Similarity (STS) : STS 2012-2016, STS Benchmark, SICK-Relatedness
 Spearman’s correlation
Experiments
Results
• MCSE : Wiki1M, Flickr30k
BERT (76.3 → 77.3)
RoBERTa (76.6 → 78.3)
• STS16 MCSE-BERT
-> the domain discrepancy
Performance comparison on STS tasks
Results
• the performances decrease
(without the large text-only corpus)
• MCSE models (0.9 – 3.8 points improvement)
• Spearman’s correlation(0.8 – 5.0 points reduction)
-> validating the efficacy of visual semantics
Average Spearman’s correlation on 7 STS tasks
Results
• Alignment-Uniformity
 Alignment : paired instance 사이의 거리
(짧을수록 좋음)
Similar samples have similar features
 Uniformity : embedding이 얼만큼 균일하게
분포하는지 (균일 할수록 좋음)
Preserve maximal information
* 참고 논문 : Understanding Contrastive Representation Learning through
Alignment and Uniformity on the Hypersphere (ICML 2020)
• Embedding space가 넓고, 고르게
분포하여 각 단어가 고유한 의미를
보존하는 것이 중요함.
• Contrastive learning을 통해
Negative Pair를 Positive Pair와 멀게
강제하는 과정에서 embedding space를
균일하게 분포하게 함.
Results
• Alignment-Uniformity
 PPOS : positive pairs distribution
 Pdata : data distribution
MCSE models : visually grounding
enhance by improving the
alignment property
The alignment-uniformity plot of models (BERT)
Results
• Improvements on Different Subsets
 different degrees from the visually grounding
because of domain discrepancy
Results
• SimCSE는 구문이 유사한 문장을 검색하는 반면
MCSE는 구문이 다양하고 의미 체계를 공유하는 문장을 검색
Results
• Cross-Modal Retrieval : metric Recall@K
 Recall@K : k개 추천 결과에 대한 recall
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
Conclusion
• MCSE 제안 : sentence embedding learning
• MCSE consistently improves the performance on STS tasks
• the superiority of method : by analyzing the alignment and uniformity properties
of the embedding space.
• SimCSE는 limited SAMPLE에서 MCSE 보다 나은 성능을 보임
MCSE는 큰 데이터에서는 SimCSE 성능을 능가함.
-> multimodal weight training 관련
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정

More Related Content

PPTX
3D Gaussian Splatting
PDF
VoxelNet
PDF
OpineSum Entailment-based self-training for abstractive opinion summarization...
PDF
JetsonTX2 Python
PPTX
Hyperbolic Image Embedding.pptx
PDF
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
PDF
YOLO V6
PDF
Dataset Distillation by Matching Training Trajectories
3D Gaussian Splatting
VoxelNet
OpineSum Entailment-based self-training for abstractive opinion summarization...
JetsonTX2 Python
Hyperbolic Image Embedding.pptx
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
YOLO V6
Dataset Distillation by Matching Training Trajectories

More from taeseon ryu (20)

PDF
RL_UpsideDown
PDF
Packed Levitated Marker for Entity and Relation Extraction
PPTX
MOReL: Model-Based Offline Reinforcement Learning
PDF
Scaling Instruction-Finetuned Language Models
PDF
Visual prompt tuning
PDF
PDF
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
PDF
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
PDF
The Forward-Forward Algorithm
PPTX
Towards Robust and Reproducible Active Learning using Neural Networks
PDF
BRIO: Bringing Order to Abstractive Summarization
PPTX
ProximalPolicyOptimization
PPTX
Dream2Control paper review
PDF
Online Continual Learning on Class Incremental Blurry Task Configuration with...
PDF
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
PDF
Unsupervised Neural Machine Translation for Low-Resource Domains
PDF
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PDF
Distributional RL via Moment Matching
PDF
Deep Reinforcement Learning from Human Preferences
PPTX
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
RL_UpsideDown
Packed Levitated Marker for Entity and Relation Extraction
MOReL: Model-Based Offline Reinforcement Learning
Scaling Instruction-Finetuned Language Models
Visual prompt tuning
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
The Forward-Forward Algorithm
Towards Robust and Reproducible Active Learning using Neural Networks
BRIO: Bringing Order to Abstractive Summarization
ProximalPolicyOptimization
Dream2Control paper review
Online Continual Learning on Class Incremental Blurry Task Configuration with...
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
Unsupervised Neural Machine Translation for Low-Resource Domains
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
Distributional RL via Moment Matching
Deep Reinforcement Learning from Human Preferences
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
Ad

Recently uploaded (20)

PPT
Quality review (1)_presentation of this 21
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Lecture1 pattern recognition............
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
annual-report-2024-2025 original latest.
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Computer network topology notes for revision
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
1_Introduction to advance data techniques.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Quality review (1)_presentation of this 21
Clinical guidelines as a resource for EBP(1).pdf
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Lecture1 pattern recognition............
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
annual-report-2024-2025 original latest.
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Ppt On Nestle.pptx huunnnhhgfvu
Computer network topology notes for revision
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Business Analytics and business intelligence.pdf
1_Introduction to advance data techniques.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
STUDY DESIGN details- Lt Col Maksud (21).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Ad

MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정

  • 1. MCSE: Multimodal Contrastive Learning of Sentence Embeddings Miaoran Zhang, Marius Mosbach, David Ifeoluwa Adelani, Michael A. Hedderich, and Dietrich Klakow, 2022
  • 3. Introduction • MCSE : Multimodal Contrastive Learning of Sentence Embeddings  Background: Unsupervised SimCSE (Gao et al., 2021)  Extend a multimodal contrastive objective  Experiments on standard Semantic Textual Similarity (STS)
  • 4. • Architecture of MCSE Introduction f v (· ) is a pre-trained image encoder such as ResNet
  • 5. • Contrastive learning : background Unsupervised SimCSE  data augmentation strategy : dropout noise  pulling positive sentences closer and pushing apart negatives Related Work Cosine similarity
  • 6. • Multimodal Contrastive Learning  sentence-image pairs , sentence xi and image yi • f v (· ) : pre-trained image encoder such as ResNet • fθ(· ) : pre-trained language encoder such as BERT  pull semantically close image-sentence pairs together and push away non-related pairs Related Work
  • 7. • Dataset  Multimodal datasets : Flickr30k (29,783 images) and MS-COCO (82,783 images)  text corpus : Wiki1M (English Wikipedia : 106 sentences) • Encoder  Language encoders : BERT and RoBERTa  Image encoder : ResNet-50 Single layer MLPs • Evaluation  7 Semantic Textual Similarity (STS) : STS 2012-2016, STS Benchmark, SICK-Relatedness  Spearman’s correlation Experiments
  • 8. Results • MCSE : Wiki1M, Flickr30k BERT (76.3 → 77.3) RoBERTa (76.6 → 78.3) • STS16 MCSE-BERT -> the domain discrepancy Performance comparison on STS tasks
  • 9. Results • the performances decrease (without the large text-only corpus) • MCSE models (0.9 – 3.8 points improvement) • Spearman’s correlation(0.8 – 5.0 points reduction) -> validating the efficacy of visual semantics Average Spearman’s correlation on 7 STS tasks
  • 10. Results • Alignment-Uniformity  Alignment : paired instance 사이의 거리 (짧을수록 좋음) Similar samples have similar features  Uniformity : embedding이 얼만큼 균일하게 분포하는지 (균일 할수록 좋음) Preserve maximal information * 참고 논문 : Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere (ICML 2020) • Embedding space가 넓고, 고르게 분포하여 각 단어가 고유한 의미를 보존하는 것이 중요함. • Contrastive learning을 통해 Negative Pair를 Positive Pair와 멀게 강제하는 과정에서 embedding space를 균일하게 분포하게 함.
  • 11. Results • Alignment-Uniformity  PPOS : positive pairs distribution  Pdata : data distribution MCSE models : visually grounding enhance by improving the alignment property The alignment-uniformity plot of models (BERT)
  • 12. Results • Improvements on Different Subsets  different degrees from the visually grounding because of domain discrepancy
  • 13. Results • SimCSE는 구문이 유사한 문장을 검색하는 반면 MCSE는 구문이 다양하고 의미 체계를 공유하는 문장을 검색
  • 14. Results • Cross-Modal Retrieval : metric Recall@K  Recall@K : k개 추천 결과에 대한 recall
  • 16. Conclusion • MCSE 제안 : sentence embedding learning • MCSE consistently improves the performance on STS tasks • the superiority of method : by analyzing the alignment and uniformity properties of the embedding space. • SimCSE는 limited SAMPLE에서 MCSE 보다 나은 성능을 보임 MCSE는 큰 데이터에서는 SimCSE 성능을 능가함. -> multimodal weight training 관련