SlideShare a Scribd company logo
2
Most read
6
Most read
7
Most read
Title of presentation
Subtitle
Name of presenter
Date
Exploring Multi-Modal Fusion for Image Manipulation Detection and
Localization
Konstantinos Triaridis, Vasileios Mezaris
CERTH-ITI, Thermi - Thessaloniki, Greece
30th Int. Conf. on MultiMedia Modeling,
Amsterdam, The Netherlands, Feb. 2024
• Easy image manipulation with modern tools → Detecting manipulated media is
becoming increasingly important.
• Different manipulation types: splicing, copy-move, inpainting.
Introduction
2
• Easy image manipulation with modern tools → Detecting manipulated media is
becoming increasingly important.
• Different manipulation types: splicing, copy-move, inpainting.
Introduction
3
• SOTA detection models: High-pass filters (HPFs) to suppress image content
and expose forensic artifacts.
• Different high pass filters: SRM, Bayar convolution, NoisePrint++
• Hypothesis: different HPFs -> better at detecting different manipulation types
• We explore methods for combining different HPFs in a way that best exploits the
complementary forensic artifacts that they produce and propose two models that
reach sota performance for image manipulation localization and detection.
• Detection: given an image, classify it as “manipulated” or “pristine”
• Localization: identify the manipulated regions within an image
Image manipulation detection and localization
4
• Some methods: Localization and detection
simultaneously
• Other methods including ours : First localize
anomalous regions, then classify as
manipulated or not
Localization
Detection
Manipulated
p = 87.23%
5
• Dual-decoder architecture, same as TruFor[1], based on CMX[2]: a model that uses
two multi-scale encoders and intermediate feature fusion to perform RGB+Depth
Semantic Segmentation
• Combination of three forensic filters, traditionally used separately
Methods: Base architecture
[1]Guillaro, F, et al. "TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization." Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
[2]: J. Zhang et al. "CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation With Transformers" in IEEE Transactions on
Intelligent Transportation Systems. 2023
6
• One dual encoder for each
modality
• Weights for the RGB branch shared
across them
Methods: Late Fusion w/ weight sharing
7
• Fusion of all modalities into single set of feature maps and one dual encoder.
Methods: Early Fusion
8
• Two-phase training proposed in TruFor[1].
• Backbone: SegFormer pre-trained on ImageNet
• Training/Validation datasets: Casiav2, IMD2020, FantasticReality, tampCOCO
• Testing datasets:
-COVER: copy-move forgeries
-Columbia: splicing forgeries
-Casiav1+: splicing and copy-move forgeries
-DSO-1: splicing forgeries w/ post-processing
-CocoGlide: diffusion-based inpainting
• Supervised training for 100 epochs
Implementation
[1]Guillaro, F, et al. "TruFor: Leveraging all-round clues for trustworthy image forgery detection and
localization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
9
• Localization performance (metric is average pixel F1)
Experimental results
10
• Detection performance (metrics are AUC and balanced accuracy)
Experimental results
11
• Ablation study
Experimental results
12
Qualitative results
Image
Ground
Truth
RGB+
NP++
RGB+
Bayar
Early
Fusion
RGB+
SRM
13
• We show that different forensic filters used in IMLD tasks display complementary
performance
• We explore approaches for expanding existing IMLD models to support multiple
forensic filters as inputs
• We propose two different modal-fusion paradigms and demonstrate that both
approaches reach state-of-the-art across several datasets. Importantly, the Early
Fusion method achieves those results without significant computational cost so it
is our preferred methods
• Experiments show that both approaches are effective at leveraging and
combining diverse forensic artifacts from different filters.
Conclusions
14
Thank you for your attention!
Questions?
Konstantinos Triaridis, triaridis@iti.gr
Code publicly available at:
https://github.com/IDT-ITI/MMFusion-IML
This work was supported by the EUs Horizon 2020 research and innovation programme under grant
agreement 101021866 CRiTERIA

More Related Content

PDF
Salient keypoint-based copy move image forgery detection.pdf
PDF
03. Swarm Key Optimization for Chaos Based Image Encryption.pdf
PPT
Multimedia Mining
PPTX
Biometric Hashing technique for Authentication
PPTX
Explainable Deepfake Image/Video Detection
PPTX
Seminar -I PPT Vivek RT-Object Detection.pptx
PPTX
[DSC Europe 22] Face Spoofing Detection: Theory and Practice - Pavle Milosevic
PDF
Analysis and Detection of Image Forgery Methodologies
Salient keypoint-based copy move image forgery detection.pdf
03. Swarm Key Optimization for Chaos Based Image Encryption.pdf
Multimedia Mining
Biometric Hashing technique for Authentication
Explainable Deepfake Image/Video Detection
Seminar -I PPT Vivek RT-Object Detection.pptx
[DSC Europe 22] Face Spoofing Detection: Theory and Practice - Pavle Milosevic
Analysis and Detection of Image Forgery Methodologies

Similar to Multi-Modal Fusion for Image Manipulation Detection and Localization (20)

PDF
General Purpose Image Tampering Detection using Convolutional Neural Network ...
PDF
General Purpose Image Tampering Detection using Convolutional Neural Network ...
PDF
General Purpose Image Tampering Detection using Convolutional Neural Network ...
PDF
Fz2510901096
PDF
Medical image encryption using multi chaotic maps
PPTX
ppt - of a project will help you on your college projects
PPTX
slide-171212080528.pptx
PDF
Broadcasting Forensics Using Machine Learning Approaches
PDF
Journal_IEEE_2023.pdf
PPTX
Real Time Object Dectection using machine learning
PDF
Buerger - W3C Media Annotation Working Group @EUscreen Mykonos
PDF
A Novel Approach for Enhancing Image Copy Detection with Robust Machine Learn...
PDF
A Novel Approach for Enhancing Image Copy Detection with Robust Machine Learn...
PPTX
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
PDF
IMAGE SEGMENTATION AND ITS TECHNIQUES
PDF
Encryption-Decryption RGB Color Image Using Matrix Multiplication
PDF
Objects Clustering of Movie Using Graph Mining Technique
PDF
YOLOv4: A Face Mask Detection System
PDF
Social Event Detection using Multimodal Clustering and Integrating Supervisor...
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
General Purpose Image Tampering Detection using Convolutional Neural Network ...
General Purpose Image Tampering Detection using Convolutional Neural Network ...
General Purpose Image Tampering Detection using Convolutional Neural Network ...
Fz2510901096
Medical image encryption using multi chaotic maps
ppt - of a project will help you on your college projects
slide-171212080528.pptx
Broadcasting Forensics Using Machine Learning Approaches
Journal_IEEE_2023.pdf
Real Time Object Dectection using machine learning
Buerger - W3C Media Annotation Working Group @EUscreen Mykonos
A Novel Approach for Enhancing Image Copy Detection with Robust Machine Learn...
A Novel Approach for Enhancing Image Copy Detection with Robust Machine Learn...
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
IMAGE SEGMENTATION AND ITS TECHNIQUES
Encryption-Decryption RGB Color Image Using Matrix Multiplication
Objects Clustering of Movie Using Graph Mining Technique
YOLOv4: A Face Mask Detection System
Social Event Detection using Multimodal Clustering and Integrating Supervisor...
Welcome to International Journal of Engineering Research and Development (IJERD)
Ad

More from VasileiosMezaris (20)

PDF
Combatting video-borne disinformation and increasing trust in AI methods
PDF
An LLM Framework for Long-form Video Retrieval and Audio-Visual Question Answ...
PDF
Improving the Perturbation-Based Explanation of Deepfake Detectors Through th...
PDF
B-FPGM: Lightweight Face Detection via Bayesian-Optimized Soft FPGM Pruning
PPTX
LMM-Regularized CLIP Embeddings for Image Classification
PPTX
Disturbing Image Detection Using LMM-Elicited Emotion Embeddings
PPTX
Exploiting LMM based knowledge for image classification tasks
PPTX
Detecting visual-media-borne disinformation: a summary of latest advances at ...
PPTX
Dataset and methods for 360-degree video summarization
PDF
CERTH-ITI at MediaEval 2023 NewsImages Task
PPTX
Spatio-Temporal Summarization of 360-degrees Videos
PPTX
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
PPTX
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
PPTX
TAME: Trainable Attention Mechanism for Explanations
PPTX
Gated-ViGAT
PPTX
Explaining video summarization based on the focus of attention
PPTX
Combining textual and visual features for Ad-hoc Video Search
PPTX
Explaining the decisions of image/video classifiers
PPTX
Learning visual explanations for DCNN-based image classifiers using an attent...
PPTX
Are all combinations equal? Combining textual and visual features with multi...
Combatting video-borne disinformation and increasing trust in AI methods
An LLM Framework for Long-form Video Retrieval and Audio-Visual Question Answ...
Improving the Perturbation-Based Explanation of Deepfake Detectors Through th...
B-FPGM: Lightweight Face Detection via Bayesian-Optimized Soft FPGM Pruning
LMM-Regularized CLIP Embeddings for Image Classification
Disturbing Image Detection Using LMM-Elicited Emotion Embeddings
Exploiting LMM based knowledge for image classification tasks
Detecting visual-media-borne disinformation: a summary of latest advances at ...
Dataset and methods for 360-degree video summarization
CERTH-ITI at MediaEval 2023 NewsImages Task
Spatio-Temporal Summarization of 360-degrees Videos
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
TAME: Trainable Attention Mechanism for Explanations
Gated-ViGAT
Explaining video summarization based on the focus of attention
Combining textual and visual features for Ad-hoc Video Search
Explaining the decisions of image/video classifiers
Learning visual explanations for DCNN-based image classifiers using an attent...
Are all combinations equal? Combining textual and visual features with multi...
Ad

Recently uploaded (20)

PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
An interstellar mission to test astrophysical black holes
PPTX
Microbiology with diagram medical studies .pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
famous lake in india and its disturibution and importance
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPT
protein biochemistry.ppt for university classes
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Sciences of Europe No 170 (2025)
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Introduction to Fisheries Biotechnology_Lesson 1.pptx
INTRODUCTION TO EVS | Concept of sustainability
An interstellar mission to test astrophysical black holes
Microbiology with diagram medical studies .pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
2. Earth - The Living Planet Module 2ELS
famous lake in india and its disturibution and importance
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
protein biochemistry.ppt for university classes
ECG_Course_Presentation د.محمد صقران ppt
Phytochemical Investigation of Miliusa longipes.pdf
Sciences of Europe No 170 (2025)
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Taita Taveta Laboratory Technician Workshop Presentation.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
microscope-Lecturecjchchchchcuvuvhc.pptx

Multi-Modal Fusion for Image Manipulation Detection and Localization

  • 1. Title of presentation Subtitle Name of presenter Date Exploring Multi-Modal Fusion for Image Manipulation Detection and Localization Konstantinos Triaridis, Vasileios Mezaris CERTH-ITI, Thermi - Thessaloniki, Greece 30th Int. Conf. on MultiMedia Modeling, Amsterdam, The Netherlands, Feb. 2024
  • 2. • Easy image manipulation with modern tools → Detecting manipulated media is becoming increasingly important. • Different manipulation types: splicing, copy-move, inpainting. Introduction 2
  • 3. • Easy image manipulation with modern tools → Detecting manipulated media is becoming increasingly important. • Different manipulation types: splicing, copy-move, inpainting. Introduction 3 • SOTA detection models: High-pass filters (HPFs) to suppress image content and expose forensic artifacts. • Different high pass filters: SRM, Bayar convolution, NoisePrint++ • Hypothesis: different HPFs -> better at detecting different manipulation types • We explore methods for combining different HPFs in a way that best exploits the complementary forensic artifacts that they produce and propose two models that reach sota performance for image manipulation localization and detection.
  • 4. • Detection: given an image, classify it as “manipulated” or “pristine” • Localization: identify the manipulated regions within an image Image manipulation detection and localization 4 • Some methods: Localization and detection simultaneously • Other methods including ours : First localize anomalous regions, then classify as manipulated or not Localization Detection Manipulated p = 87.23%
  • 5. 5 • Dual-decoder architecture, same as TruFor[1], based on CMX[2]: a model that uses two multi-scale encoders and intermediate feature fusion to perform RGB+Depth Semantic Segmentation • Combination of three forensic filters, traditionally used separately Methods: Base architecture [1]Guillaro, F, et al. "TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. [2]: J. Zhang et al. "CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation With Transformers" in IEEE Transactions on Intelligent Transportation Systems. 2023
  • 6. 6 • One dual encoder for each modality • Weights for the RGB branch shared across them Methods: Late Fusion w/ weight sharing
  • 7. 7 • Fusion of all modalities into single set of feature maps and one dual encoder. Methods: Early Fusion
  • 8. 8 • Two-phase training proposed in TruFor[1]. • Backbone: SegFormer pre-trained on ImageNet • Training/Validation datasets: Casiav2, IMD2020, FantasticReality, tampCOCO • Testing datasets: -COVER: copy-move forgeries -Columbia: splicing forgeries -Casiav1+: splicing and copy-move forgeries -DSO-1: splicing forgeries w/ post-processing -CocoGlide: diffusion-based inpainting • Supervised training for 100 epochs Implementation [1]Guillaro, F, et al. "TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
  • 9. 9 • Localization performance (metric is average pixel F1) Experimental results
  • 10. 10 • Detection performance (metrics are AUC and balanced accuracy) Experimental results
  • 13. 13 • We show that different forensic filters used in IMLD tasks display complementary performance • We explore approaches for expanding existing IMLD models to support multiple forensic filters as inputs • We propose two different modal-fusion paradigms and demonstrate that both approaches reach state-of-the-art across several datasets. Importantly, the Early Fusion method achieves those results without significant computational cost so it is our preferred methods • Experiments show that both approaches are effective at leveraging and combining diverse forensic artifacts from different filters. Conclusions
  • 14. 14 Thank you for your attention! Questions? Konstantinos Triaridis, triaridis@iti.gr Code publicly available at: https://github.com/IDT-ITI/MMFusion-IML This work was supported by the EUs Horizon 2020 research and innovation programme under grant agreement 101021866 CRiTERIA