Multi-Modal Fusion for Image Manipulation Detection and Localization

Title of presentation
Subtitle
Name of presenter
Date
Exploring Multi-Modal Fusion for Image Manipulation Detection and
Localization
Konstantinos Triaridis, Vasileios Mezaris
CERTH-ITI, Thermi - Thessaloniki, Greece
30th Int. Conf. on MultiMedia Modeling,
Amsterdam, The Netherlands, Feb. 2024

• Easy image manipulation with modern tools → Detecting manipulated media is
becoming increasingly important.
• Different manipulation types: splicing, copy-move, inpainting.
Introduction
2

• Easy image manipulation with modern tools → Detecting manipulated media is
becoming increasingly important.
• Different manipulation types: splicing, copy-move, inpainting.
Introduction
3
• SOTA detection models: High-pass filters (HPFs) to suppress image content
and expose forensic artifacts.
• Different high pass filters: SRM, Bayar convolution, NoisePrint++
• Hypothesis: different HPFs -> better at detecting different manipulation types
• We explore methods for combining different HPFs in a way that best exploits the
complementary forensic artifacts that they produce and propose two models that
reach sota performance for image manipulation localization and detection.

• Detection: given an image, classify it as “manipulated” or “pristine”
• Localization: identify the manipulated regions within an image
Image manipulation detection and localization
4
• Some methods: Localization and detection
simultaneously
• Other methods including ours : First localize
anomalous regions, then classify as
manipulated or not
Localization
Detection
Manipulated
p = 87.23%

5
• Dual-decoder architecture, same as TruFor[1], based on CMX[2]: a model that uses
two multi-scale encoders and intermediate feature fusion to perform RGB+Depth
Semantic Segmentation
• Combination of three forensic filters, traditionally used separately
Methods: Base architecture
[1]Guillaro, F, et al. "TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization." Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
[2]: J. Zhang et al. "CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation With Transformers" in IEEE Transactions on
Intelligent Transportation Systems. 2023

6
• One dual encoder for each
modality
• Weights for the RGB branch shared
across them
Methods: Late Fusion w/ weight sharing

7
• Fusion of all modalities into single set of feature maps and one dual encoder.
Methods: Early Fusion

8
• Two-phase training proposed in TruFor[1].
• Backbone: SegFormer pre-trained on ImageNet
• Training/Validation datasets: Casiav2, IMD2020, FantasticReality, tampCOCO
• Testing datasets:
-COVER: copy-move forgeries
-Columbia: splicing forgeries
-Casiav1+: splicing and copy-move forgeries
-DSO-1: splicing forgeries w/ post-processing
-CocoGlide: diffusion-based inpainting
• Supervised training for 100 epochs
Implementation
[1]Guillaro, F, et al. "TruFor: Leveraging all-round clues for trustworthy image forgery detection and
localization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

9
• Localization performance (metric is average pixel F1)
Experimental results

10
• Detection performance (metrics are AUC and balanced accuracy)

11
• Ablation study

12
Qualitative results
Image
Ground
Truth
RGB+
NP++
RGB+
Bayar
Early
Fusion
RGB+
SRM

13
• We show that different forensic filters used in IMLD tasks display complementary
performance
• We explore approaches for expanding existing IMLD models to support multiple
forensic filters as inputs
• We propose two different modal-fusion paradigms and demonstrate that both
approaches reach state-of-the-art across several datasets. Importantly, the Early
Fusion method achieves those results without significant computational cost so it
is our preferred methods
• Experiments show that both approaches are effective at leveraging and
combining diverse forensic artifacts from different filters.
Conclusions

14
Thank you for your attention!
Questions?
Konstantinos Triaridis, triaridis@iti.gr
Code publicly available at:
https://github.com/IDT-ITI/MMFusion-IML
This work was supported by the EUs Horizon 2020 research and innovation programme under grant
agreement 101021866 CRiTERIA

Multi-Modal Fusion for Image Manipulation Detection and Localization

More Related Content

Similar to Multi-Modal Fusion for Image Manipulation Detection and Localization (20)

More from VasileiosMezaris (20)

Recently uploaded (20)

Multi-Modal Fusion for Image Manipulation Detection and Localization