The document presents a method for detecting discrepancies in multimedia content by adapting visual and audio scene classification techniques. Key contributions include the development of a novel experimental protocol and benchmark dataset, alongside a baseline method designed for audio-visual discrepancy detection. Experimental results demonstrate high accuracy in identifying manipulated samples, emphasizing the method's potential in combating digital disinformation.