Prof. Dr. Cees Snoek discusses the limitations of multimodal foundation models in perceiving scarcity, space, time, and human values. The talk outlines ongoing research efforts to improve these models' capabilities through techniques such as synthetic data generation and specialized adapters. Ultimately, while these models are powerful, they still struggle with perceptual challenges that require innovative solutions.
Related topics: