From the course: Introduction to Multimodal Prompting for Generative AI

Unlock the full course today

Join today to access over 24,800 courses taught by industry experts.

Visual modality

Visual modality

- [Instructor] When it comes to visual modalities, when we have imagery, we can have an input that has an image as well as a text prompt and contains some sort of question about the image. So we can have an image of ingredients or an image of a web interface, and we can ask different questions about that image and get the model to reason about that. This task is called visual question answering. There's also image generation that involves giving the model an image along with a text prompt and getting a new image from the model. Now, visual question answering isn't exclusive to images. Some models can take an input that consists of a video as well as text, and use the text to reason about the video. When it comes to video generation, very recent models, such as Sora, can take a video along with text and generate a new video.

Contents