The document presents a scalable graph-based multimodal clustering approach for social event detection (SED) leveraging an item-to-item model. It evaluates the approach using a dataset from the 2012 MediaEval SED task, demonstrating high accuracy and clustering quality, while also discussing the importance of time and other features. Future work includes improving methods for handling non-event images and exploring multimedia data quality.