LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: Introduction to Multimodal Prompting for Generative AI

Unlock the full course today

Join today to access over 24,800 courses taught by industry experts.

Visual modality

Visual modality

From the course: Introduction to Multimodal Prompting for Generative AI

Start my 1-month free trial Buy for my team

Visual modality

“

- [Instructor] When it comes to visual modalities, when we have imagery, we can have an input that has an image as well as a text prompt and contains some sort of question about the image. So we can have an image of ingredients or an image of a web interface, and we can ask different questions about that image and get the model to reason about that. This task is called visual question answering. There's also image generation that involves giving the model an image along with a text prompt and getting a new image from the model. Now, visual question answering isn't exclusive to images. Some models can take an input that consists of a video as well as text, and use the text to reason about the video. When it comes to video generation, very recent models, such as Sora, can take a video along with text and generate a new video.

Contents