SlideShare a Scribd company logo
Building the Next-Gen Apps with
Multimodal Retrieval using Twelve
Labs & Milvus
Hrishikesh Yadav
ABOUT
Developer Advocate @TwelveLabs
Ex AI Engineer @Shaga
Kaggle 2x Expert
Applied Gen AI Researcher
Member @SuperTeamDao
Hrishikesh Yadav
About Mysellf
Deep Learning and Applied Generative AI Researcher
I like to participate and judge the hackathon
Worked on the product - Shaga and CrimeDekho
Published Research Work around the Predictive Policing and Time
Forecasting
1.Discussion of Multimodal Embeddings
2.Embed API Exploration
3.Usecases to Explore and Build
4.Demo of Multimodal RAG with Twelve Labs and Milvus
5.Visual Similarity (Image to Video Semgents)
6.Ideas to Explore and Build
7.QnA
Agenda
• Users can find relevant content across any format
regardless of query type -
⚬ search with text to find videos
⚬ images to find videos
⚬ voice to find visuals, breaking down traditional
search limitations.
Embedding Powers Application
“And alot many with the any to any application”
Multimodal Embeddings
Encoders
Video
Image
Audio
Text
[-0.037437707,-0.015245657…]
[-0.037437707,-0.015245657…]
[-0.037437707,-0.015245657…]
[-0.037437707,-0.015245657…]
Crowd of men with a horse
Any-to-Any Search!
Any
Modality
Retrieved
Any
Modality
Data
or
QUERY
IN
Crowd of men with a horse
Why Empowering Product with
Multimodal
More modality, More power to the user,
More personalization
‹#›
Multimodal Embeddings
Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus
Video-level Embedding
Embed
POST
input_type
file
: video, audio, image, text
: video.mp4
Embed
: 61e1127861c43d6d9b736194
GET
task_id
Embeddin
gs
[0.6,-0.2,0.3,0.4,...]
GET
‹#›
Video Video embeddings (semantic
representation)
[0.6,-0.2,0.3,0.4,...]
Clip-level Embeddings
GET
[0.6,-0.2,0.3,0.4,...],
[0.6,-0.2,0.3,0.4,...], …
Embed API
Marengo 2.6 Benchmarks
Get the entire report: https://www.twelvelabs.io/blog/introducing-marengo-2-6
Usecases to Explore and Build
Surveillance Analysis
Assistant
• Usecase in the surveillance were the
knowledge base contaning the CCTV
video Footages and details can be
loaded into the embedding format.
• Video understanding with the
embedding would save a lot of time,
searching for the particular video in the
surveillance.
Organization Documentation Archive
Assistant
• Knowledge Base containig the
organization documentation of all
modalities.
• Employees can ask natural questions
query and instantly receive relevant
results across all formats.
• Automatically connects related content
across formats - when viewing a
technical specification, instantly see
related implementation videos.
Museum Guide
Assistant
• Visitors can take a photo of any artwork
or simply describe what interests them
to receive instant insights about the
piece, related artworks across the
museum.
• Delivers audio visual tours by
understanding the visual elements of
artwork and finding the relevant info.
More Personalization, More Engagement
Demo of Multimodal RAG with Twelve
Labs and Milvus
Fashion Assistant with LLM
https://github.com/Hrishikesh332/Twelve-Labs-Fashion-chat-assistant
• The details about the application -
⚬ Vector Database - Milvus
⚬ Embedding - Marengo-2.6-retreival
⚬ LLM Model - gpt-3.5-turbo (OpenAI)
⚬ Deployment - Streamlit Cloud
Fashion Assistant with LLM and Multimodal
Retreival
Fashion Assistant with LLM and Multimodal
Retreival
Demo of Image to Video Segment
Image Query to Video Segment Retrieval
Image Query to Video Segment
Retrieval
https://github.com/Hrishikesh332/Twelve-Labs-Fashion-chat-assistant
For Detailed Working Tutorial Blog -
https://github.com/Hrishikesh332/Twelve-Labs-Fashion-chat-assistant
Scan
Here
For Detailed Working Tutorial Blog -
https://github.com/Hrishikesh332/Twelve-Labs-Fashion-chat-assistant
Scan
Here
ABOUT
Twitter - @hrishikesh_ai
Developer Advocate @TwelveLabs
Prev. AI Engineer @Shaga
Kaggle 2x Expert
Applied Gen AI Researcher
Member @SuperTeamDao
Thank You

More Related Content

PPTX
technical seminar.pptx on multi model of AI
PDF
Multimodal RAG with Milvus and GPT-4o Webinar
DOCX
Multimodal Al_ The Future of Intelligent Systems.docx
PDF
Multimodal Machine Learning_ Merging Text, Images, and Sound.pdf
PPTX
Research Trends: Smart Phone Applications Development,
PDF
Using RAG to create your own Podcast conversations.pdf
PDF
Multimodal Retrieval-Augmented Generation (RAG) with Vector Database
PDF
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...
technical seminar.pptx on multi model of AI
Multimodal RAG with Milvus and GPT-4o Webinar
Multimodal Al_ The Future of Intelligent Systems.docx
Multimodal Machine Learning_ Merging Text, Images, and Sound.pdf
Research Trends: Smart Phone Applications Development,
Using RAG to create your own Podcast conversations.pdf
Multimodal Retrieval-Augmented Generation (RAG) with Vector Database
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...

Similar to Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus (18)

PDF
Multimodal Embeddings (continued) - South Bay Meetup Slides
PPT
Elearning - Rich Media Search
PPT
Elearning - Rich Media Search
PPT
Elearning rich media_search
PPTX
multi modal transformers representation generation .pptx
PDF
Rosinski ibm ai overview with several examples of projects in the media and l...
PDF
Adria Recasens, DeepMind – Multi-modal self-supervised learning from videos
PDF
Advanced Video Search - Leveraging Twelve Labs and Milvus for Semantic Retrieval
PDF
Interactive Video Search: Where is the User in the Age of Deep Learning?
PDF
Automatic multi-modal metadata annotation based on trained cognitive solution...
PPTX
AI for UI: How AI technology may support human-technology interaction by Roop...
PDF
Deep Learning: Application Landscape - March 2018
PDF
Dl applicationlandscape-mar2018-180405144127
PDF
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
PDF
Metaverse AI Innovation Platform Insights from Patents
PDF
Exploring Multimodal Embeddings with Milvus
PDF
Unraveling Multimodality with Large Language Models.pdf
DOCX
How Does Multimodal AI Work_ Exploring the Future of AI Models.docx
Multimodal Embeddings (continued) - South Bay Meetup Slides
Elearning - Rich Media Search
Elearning - Rich Media Search
Elearning rich media_search
multi modal transformers representation generation .pptx
Rosinski ibm ai overview with several examples of projects in the media and l...
Adria Recasens, DeepMind – Multi-modal self-supervised learning from videos
Advanced Video Search - Leveraging Twelve Labs and Milvus for Semantic Retrieval
Interactive Video Search: Where is the User in the Age of Deep Learning?
Automatic multi-modal metadata annotation based on trained cognitive solution...
AI for UI: How AI technology may support human-technology interaction by Roop...
Deep Learning: Application Landscape - March 2018
Dl applicationlandscape-mar2018-180405144127
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
Metaverse AI Innovation Platform Insights from Patents
Exploring Multimodal Embeddings with Milvus
Unraveling Multimodality with Large Language Models.pdf
How Does Multimodal AI Work_ Exploring the Future of AI Models.docx
Ad

More from Zilliz (20)

PDF
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
PDF
Zilliz Cloud Demo for performance and scale
PDF
Open Source Milvus Vector Database v 2.6
PDF
Zilliz Cloud Monthly Technical Review: May 2025
PDF
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
PDF
Hands-on Tutorial: Building an Agent to Reason about Private Data with OpenAI...
PDF
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
PDF
Webinar - Zilliz Cloud Monthly Demo - March 2025
PDF
What Makes "Deep Research"? A Dive into AI Agents
PDF
Combining Lexical and Semantic Search with Milvus 2.5
PDF
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
PDF
February Product Demo: Discover the Power of Zilliz Cloud
PDF
Full Text Search with Milvus 2.5 - UD Meetup Berlin Jan 23
PDF
Voice-to-Value- LLM-Powered Customer Interaction Analysis.pdf
PDF
Accelerate AI Agents with Multimodal RAG powered by Friendli Endpoints and Mi...
PDF
1 Table = 1000 Words? Foundation Models for Tabular Data
PDF
How Milvus allows you to run Full Text Search
PDF
How to Optimize Your Embedding Model Selection and Development through TDA Cl...
PDF
Milvus: Scaling Vector Data Solutions for Gen AI
PDF
Keeping Data Fresh: Mastering Updates in Vector Databases
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz Cloud Demo for performance and scale
Open Source Milvus Vector Database v 2.6
Zilliz Cloud Monthly Technical Review: May 2025
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Hands-on Tutorial: Building an Agent to Reason about Private Data with OpenAI...
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
Webinar - Zilliz Cloud Monthly Demo - March 2025
What Makes "Deep Research"? A Dive into AI Agents
Combining Lexical and Semantic Search with Milvus 2.5
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
February Product Demo: Discover the Power of Zilliz Cloud
Full Text Search with Milvus 2.5 - UD Meetup Berlin Jan 23
Voice-to-Value- LLM-Powered Customer Interaction Analysis.pdf
Accelerate AI Agents with Multimodal RAG powered by Friendli Endpoints and Mi...
1 Table = 1000 Words? Foundation Models for Tabular Data
How Milvus allows you to run Full Text Search
How to Optimize Your Embedding Model Selection and Development through TDA Cl...
Milvus: Scaling Vector Data Solutions for Gen AI
Keeping Data Fresh: Mastering Updates in Vector Databases
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Spectroscopy.pptx food analysis technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
Teaching material agriculture food technology
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectroscopy.pptx food analysis technology
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
Big Data Technologies - Introduction.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Chapter 3 Spatial Domain Image Processing.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
sap open course for s4hana steps from ECC to s4
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Understanding_Digital_Forensics_Presentation.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
Teaching material agriculture food technology

Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus