This document discusses user intentions in visual information retrieval and multimedia information systems. It begins by introducing query by example search and different low-level visual features that work better for some domains than others. It then discusses how determining the right features and defining visual similarity is challenging. The document defines context and intention, and discusses how a user's intention relates to their information need. It reviews taxonomies of user intentions in web search and proposes intentions in multimedia may include search, production, sharing, archiving. The document proposes several open PhD theses around developing a general model of user intentions in multimedia, using games and human computation to infer intentions, bringing context to queries, and creating adaptable applications based on user intentions.