SlideShare a Scribd company logo
Yujian Tang | Zilliz
Beyond RAG: Vector
Databases
Yujian Tang
Senior Developer Advocate, Zilliz
yujian@zilliz.com
https://www.linkedin.com/in/yujiantang
https://www.twitter.com/yujian_tang
Speaker
01 Why Vector Databases?
CONTENTS
03
04 Vector Database Architecture
02 How Do Vector Databases Work?
Use Cases
01 Why Vector Databases?
Compare data that you couldn’t compare before
Unstructured Data is Everywhere
Unstructured data is any data that does not conform to a predefined data model.
By 2025, IDC estimates there will be 175 zettabytes of data globally
(that's 175 with 21 zeros), with 80% of that data being unstructured.
Currently, 90% of unstructured data is never analyzed.
Text Images Video and more!
Vector
Databases
Unstructured Data + ML = Vector Magic
Find Semantically Similar Data
Apple made profits of $97 Billion in 2023
I like to eat apple pie for profit in 2023
Apple’s bottom line increased by record numbers in 2023
But wait! There’s more!
Use math to quantify relationships between
entities
02 How Do Vector Databases Work?
Vector similarity is a mathematical measure of
how close two vectors are
Semantic Similarity
Image from Sutor et al
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Man = [0.5, 0.2]
Queen - Woman + Man = King
Queen = [0.3, 0.9]
- Woman = [0.3, 0.4]
[0.0, 0.5]
+ Man = [0.5, 0.2]
King = [0.5, 0.7]
Man = [0.5, 0.2]
Similarity metrics are ways to measure distance in
vector space
Vector Similarity Metric: L2 (Euclidean)
Queen = [0.3, 0.9]
King = [0.5, 0.7]
d(Queen, King) = √(0.3-0.5)2
+ (0.9-0.7)2
= √(0.2)2
+ (0.2)2
= √0.04 + 0.04
= √0.08 ≅ 0.28
Vector Similarity Metric: Inner Product (IP)
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Queen · King = (0.3*0.5) + (0.9*0.7)
= 0.15 + 0.63 = 0.78
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Vector Similarity Metric: Cosine
𝚹
cos(Queen, King) = (0.3*0.5)+(0.9*0.7)
√0.32
+0.92
* √0.52
+0.72
= 0.15+0.63 _
√0.9 * √0.74
= 0.78 _
√0.666
≅ 0.03
Vector Similarity Metrics
Euclidean - Spatial distance
Cosine - Orientational distance
Inner Product - Both
With normalized vectors, IP = Cosine
Indexes organize the way we access our data
Inverted File Index
Source:
https://towardsdatascience.com/similarity-search-with-ivfpq-9c6348fd4db3
Hierarchical Navigable Small Worlds (HNSW)
Source:
https://arxiv.org/ftp/arxiv/papers/1603/1603.09320.pdf
Scalar Quantization (SQ)
Product Quantization
Source:
https://towardsdatascience.com/product-quantization-for-similarity-search-2f1f67c5fddd
Indexes Overview
- IVF = Intuitive, medium memory, performant
- HNSW = Graph based, high memory, highly performant
- Flat = brute force
- SQ = bucketize across one dimension, accuracy x
memory tradeoff
- PQ = bucketize across two dimensions, more accuracy x
memory tradeoff
Vector databases efficiently store, index, and
relate entities by a quantitative value
03 Use Cases
What Does Vector Data Look Like?
RAG
RAG
Inject your data via a vector
database like Milvus/Zilliz
Query LLM
Milvus
Your Data
Primary Use Case
● Factual Recall
● Forced Data Injection
● Cost Optimization
Common AI Use Cases
LLM Augmented Retrieval
Expand LLMs' knowledge by
incorporating external data sources
into LLMs and your AI applications.
Match user behavior or content
features with other similar
behaviors or features to make
effective recommendations.
Recommender System
Search for semantically similar
texts across vast amounts of
natural language documents.
Text/ Semantic Search
Image Similarity Search
Identify and search for visually
similar images or objects from a
vast collection of image libraries.
Video Similarity Search
Search for similar videos, scenes,
or objects from extensive
collections of video libraries.
Audio Similarity Search
Find similar audios from massive
amounts of audio data to perform
tasks such as genre classification,
or recognize speech.
Molecular Similarity Search
Search for similar substructures,
superstructures, and other
structures for a specific molecule.
Question Answering System
Interactive QA chatbot that
automatically answers user
questions
Multimodal Similarity Search
Search over multiple types of data
simultaneously, e.g. text and
images
Example Use Case
Example Use Case
Example Use Case
04 Vector Database Architecture
Why Not Use a SQL/NoSQL Database?
● Inefficiency in High-dimensional spaces
● Suboptimal Indexing
● Inadequate query support
● Lack of scalability
● Limited analytics capabilities
● Data conversion issues
TL;DR: Vector operations are too computationally intensive for traditional
database infrastructures
Why Not Use a Vector Search Library?
● Have to manually implement filtering
● Not optimized to take advantage of the latest hardware
● Unable to handle large scale data
● Lack of lifecycle management
● Inefficient indexing capabilities
● No built in safety mechanisms
TL;DR: Vector search libraries lack the infrastructure to help you scale,
deploy, and manage your apps in production.
What is Milvus/Zilliz ideal for?
○ Advanced filtering
○ Hybrid search
○ Durability and backups
○ Replications/High Availability
○ Sharding
○ Aggregations
○ Lifecycle management
○ Multi-tenancy
○ High query load
○ High insertion/deletion
○ Full precision/recall
○ Accelerator support (GPU,
FPGA)
○ Billion-scale storage
Purpose-built to store, index and query vector embeddings from unstructured data at scale.
Meta Storage
Root Query Data Index
Coordinator Service
Proxy
Proxy
etcd
Log Broker
SDK
Load Balancer
DDL/DCL
DML
NOTIFICATION
CONTROL SIGNAL
Object Storage
Minio / S3 / AzureBlob
Log Snapshot Delta File Index File
Worker Node QUERY DATA DATA
Message Storage
Access Layer
Query Node Data Node Index Node
High-level overview of Milvus’ Architecture
Start building
with Zilliz Cloud today!
zilliz.com/cloud
| © Copyright 9/25/23 Zilliz
39
Appendix
Important Notes
- Cosine, IP, and L2 are all the SAME rank order.
- They differ in use case
- L2 for when you need magnitude
- Cosine for orientation
- IP for magnitude and orientation
- OR
- Cosine = IP for normalized vectors
Embeddings Models
Basic Idea
You want to use your data with a large language
model
RAG vs Fine Tuning
LLM
Fine Tuning
Augment an LLM by training it on
your data
Your Data
“New” LLM
Query
Primary Use Case
● Style transfer
Takeaway
Use RAG to force the LLM to work with your data by injecting it via a
vector database like Milvus or Zilliz
Chunking Considerations
Chunk Size
Chunk Overlap
Character Splitters
How Does Your Data Look?
Conversation
Data
Documentation Data Lecture or Q/A
Data
Examples
Examples
Examples
Examples
Examples
Examples
Your chunking strategy depends on what your data looks
like and what you need from it.
Takeaway:
Examining Embeddings
Picking a model
What to embed
Metadata
Embeddings Strategies
Level 1: Embedding Chunks Directly
Level 2: Embedding Sub and Super Chunks
Level 3: Incorporating Chunking and Non-Chunking Metadata
Metadata Examples
Chunking
● Paragraph position
● Section header
● Larger paragraph
● Sentence Number
● …
Non-Chunking
● Author
● Publisher
● Organization
● Role Based Access Control
● …
Your embeddings strategy depends on your accuracy,
cost, and use case needs
Takeaway:
Basic Idea
Vector Databases provide the ability to inject your data via
semantic similarity
Considerations include: scale, performance, and flexibility
Milvus Architecture: Differentiation
1. Cloud Native, Distributed System Architecture
2. True Separation of Concerns
3. Scalable Index Creation Strategy with 512 MB Segments
Vector Databases are purpose-built to handle
indexing, storing, and querying vector data.
Milvus & Zilliz are specifically designed for high
performance and billion+ scale use cases.
Takeaway:
Vector Database Resources
Give Milvus a Star! Chat with me on Discord!
Get Started Free
Got questions? Stop by our booth!
Milvus
Open Source
Self-Managed
github.com/milvus-io/milvus
Zilliz Cloud
SaaS
Fully-Managed
zilliz.com/cloud

More Related Content

PDF
Vector Databases 101 - An introduction to the world of Vector Databases
PDF
Building an Agentic RAG locally with Ollama and Milvus
PDF
Introduction to Multilingual Retrieval Augmented Generation (RAG)
PDF
Chunking, Embeddings, and Vector Databases
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
Introduction to Open Source RAG and RAG Evaluation
PDF
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
PDF
Jeff Maruschek: How does RAG REALLY work?
Vector Databases 101 - An introduction to the world of Vector Databases
Building an Agentic RAG locally with Ollama and Milvus
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Chunking, Embeddings, and Vector Databases
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Introduction to Open Source RAG and RAG Evaluation
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Jeff Maruschek: How does RAG REALLY work?

What's hot (20)

PPTX
Introduction to RAG (Retrieval Augmented Generation) and its application
PDF
Vector Databases - A Technical Primer.pdf
PDF
Large Language Models Bootcamp
PPTX
How to fine-tune and develop your own large language model.pptx
PDF
Build an LLM-powered application using LangChain.pdf
PDF
AI presentation and introduction - Retrieval Augmented Generation RAG 101
PDF
Intro to LLMs
PDF
Large Language Models - Chat AI.pdf
PDF
And then there were ... Large Language Models
PDF
LLMs_talk_March23.pdf
PDF
Vertex AI Gemini Prompt Engineering Tips
PDF
Understanding GenAI/LLM and What is Google Offering - Felix Goh
PDF
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
PDF
generative-ai-fundamentals and Large language models
PDF
Big Data Architecture
PDF
Large Language Models, Data & APIs - Integrating Generative AI Power into you...
PDF
A comprehensive guide to prompt engineering.pdf
PDF
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
PDF
Vector databases and neural search
PPTX
MLOps.pptx
Introduction to RAG (Retrieval Augmented Generation) and its application
Vector Databases - A Technical Primer.pdf
Large Language Models Bootcamp
How to fine-tune and develop your own large language model.pptx
Build an LLM-powered application using LangChain.pdf
AI presentation and introduction - Retrieval Augmented Generation RAG 101
Intro to LLMs
Large Language Models - Chat AI.pdf
And then there were ... Large Language Models
LLMs_talk_March23.pdf
Vertex AI Gemini Prompt Engineering Tips
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
generative-ai-fundamentals and Large language models
Big Data Architecture
Large Language Models, Data & APIs - Integrating Generative AI Power into you...
A comprehensive guide to prompt engineering.pdf
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Vector databases and neural search
MLOps.pptx
Ad

Similar to Beyond Retrieval Augmented Generation (RAG): Vector Databases (20)

PDF
2025-02-24 - AWS meetup - Zilliz presentation.pdf
PDF
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
PDF
Supercharge Spark: Unleashing Big Data Potential with Milvus for RAG systems
PDF
09-03-2024_UnstructuredDataAndAIDiscussion.pdf
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
Milvus: Scaling Vector Data Solutions for Gen AI
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
Scaling Vector Search: How Milvus Handles Billions+
PDF
09-26-2024 Conf 42 Kube Native: Unleashing the Potential of Cloud Native Open...
PPTX
Vector Databases and Why Are They Used in Modern AI - Marko Lohert - ATD 2024
PDF
06-18-2024-Princeton Meetup-Introduction to Milvus
PDF
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
PDF
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
PDF
09-12-2024 - Milvus, Vector database used for Sensor Data RAG
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
PDF
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
PDF
20241108 - Milvus : a cloud native vector database for next generation AI app...
PDF
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
2025-02-24 - AWS meetup - Zilliz presentation.pdf
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Supercharge Spark: Unleashing Big Data Potential with Milvus for RAG systems
09-03-2024_UnstructuredDataAndAIDiscussion.pdf
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Milvus: Scaling Vector Data Solutions for Gen AI
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Scaling Vector Search: How Milvus Handles Billions+
09-26-2024 Conf 42 Kube Native: Unleashing the Potential of Cloud Native Open...
Vector Databases and Why Are They Used in Modern AI - Marko Lohert - ATD 2024
06-18-2024-Princeton Meetup-Introduction to Milvus
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
09-12-2024 - Milvus, Vector database used for Sensor Data RAG
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
20241108 - Milvus : a cloud native vector database for next generation AI app...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
Ad

More from Zilliz (20)

PDF
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
PDF
Zilliz Cloud Demo for performance and scale
PDF
Open Source Milvus Vector Database v 2.6
PDF
Zilliz Cloud Monthly Technical Review: May 2025
PDF
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
PDF
Hands-on Tutorial: Building an Agent to Reason about Private Data with OpenAI...
PDF
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
PDF
Webinar - Zilliz Cloud Monthly Demo - March 2025
PDF
What Makes "Deep Research"? A Dive into AI Agents
PDF
Combining Lexical and Semantic Search with Milvus 2.5
PDF
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
PDF
February Product Demo: Discover the Power of Zilliz Cloud
PDF
Full Text Search with Milvus 2.5 - UD Meetup Berlin Jan 23
PDF
Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus
PDF
Voice-to-Value- LLM-Powered Customer Interaction Analysis.pdf
PDF
Accelerate AI Agents with Multimodal RAG powered by Friendli Endpoints and Mi...
PDF
1 Table = 1000 Words? Foundation Models for Tabular Data
PDF
How Milvus allows you to run Full Text Search
PDF
How to Optimize Your Embedding Model Selection and Development through TDA Cl...
PDF
Keeping Data Fresh: Mastering Updates in Vector Databases
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz Cloud Demo for performance and scale
Open Source Milvus Vector Database v 2.6
Zilliz Cloud Monthly Technical Review: May 2025
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Hands-on Tutorial: Building an Agent to Reason about Private Data with OpenAI...
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
Webinar - Zilliz Cloud Monthly Demo - March 2025
What Makes "Deep Research"? A Dive into AI Agents
Combining Lexical and Semantic Search with Milvus 2.5
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
February Product Demo: Discover the Power of Zilliz Cloud
Full Text Search with Milvus 2.5 - UD Meetup Berlin Jan 23
Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus
Voice-to-Value- LLM-Powered Customer Interaction Analysis.pdf
Accelerate AI Agents with Multimodal RAG powered by Friendli Endpoints and Mi...
1 Table = 1000 Words? Foundation Models for Tabular Data
How Milvus allows you to run Full Text Search
How to Optimize Your Embedding Model Selection and Development through TDA Cl...
Keeping Data Fresh: Mastering Updates in Vector Databases

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Big Data Technologies - Introduction.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
“AI and Expert System Decision Support & Business Intelligence Systems”
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
NewMind AI Weekly Chronicles - August'25 Week I
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Chapter 3 Spatial Domain Image Processing.pdf
A Presentation on Artificial Intelligence
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
Big Data Technologies - Introduction.pptx
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Understanding_Digital_Forensics_Presentation.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks

Beyond Retrieval Augmented Generation (RAG): Vector Databases

  • 1. Yujian Tang | Zilliz Beyond RAG: Vector Databases
  • 2. Yujian Tang Senior Developer Advocate, Zilliz yujian@zilliz.com https://www.linkedin.com/in/yujiantang https://www.twitter.com/yujian_tang Speaker
  • 3. 01 Why Vector Databases? CONTENTS 03 04 Vector Database Architecture 02 How Do Vector Databases Work? Use Cases
  • 4. 01 Why Vector Databases?
  • 5. Compare data that you couldn’t compare before
  • 6. Unstructured Data is Everywhere Unstructured data is any data that does not conform to a predefined data model. By 2025, IDC estimates there will be 175 zettabytes of data globally (that's 175 with 21 zeros), with 80% of that data being unstructured. Currently, 90% of unstructured data is never analyzed. Text Images Video and more!
  • 8. Find Semantically Similar Data Apple made profits of $97 Billion in 2023 I like to eat apple pie for profit in 2023 Apple’s bottom line increased by record numbers in 2023
  • 10. Use math to quantify relationships between entities
  • 11. 02 How Do Vector Databases Work?
  • 12. Vector similarity is a mathematical measure of how close two vectors are
  • 13. Semantic Similarity Image from Sutor et al Woman = [0.3, 0.4] Queen = [0.3, 0.9] King = [0.5, 0.7] Woman = [0.3, 0.4] Queen = [0.3, 0.9] King = [0.5, 0.7] Man = [0.5, 0.2] Queen - Woman + Man = King Queen = [0.3, 0.9] - Woman = [0.3, 0.4] [0.0, 0.5] + Man = [0.5, 0.2] King = [0.5, 0.7] Man = [0.5, 0.2]
  • 14. Similarity metrics are ways to measure distance in vector space
  • 15. Vector Similarity Metric: L2 (Euclidean) Queen = [0.3, 0.9] King = [0.5, 0.7] d(Queen, King) = √(0.3-0.5)2 + (0.9-0.7)2 = √(0.2)2 + (0.2)2 = √0.04 + 0.04 = √0.08 ≅ 0.28
  • 16. Vector Similarity Metric: Inner Product (IP) Queen = [0.3, 0.9] King = [0.5, 0.7] Queen · King = (0.3*0.5) + (0.9*0.7) = 0.15 + 0.63 = 0.78
  • 17. Queen = [0.3, 0.9] King = [0.5, 0.7] Vector Similarity Metric: Cosine 𝚹 cos(Queen, King) = (0.3*0.5)+(0.9*0.7) √0.32 +0.92 * √0.52 +0.72 = 0.15+0.63 _ √0.9 * √0.74 = 0.78 _ √0.666 ≅ 0.03
  • 18. Vector Similarity Metrics Euclidean - Spatial distance Cosine - Orientational distance Inner Product - Both With normalized vectors, IP = Cosine
  • 19. Indexes organize the way we access our data
  • 21. Hierarchical Navigable Small Worlds (HNSW) Source: https://arxiv.org/ftp/arxiv/papers/1603/1603.09320.pdf
  • 24. Indexes Overview - IVF = Intuitive, medium memory, performant - HNSW = Graph based, high memory, highly performant - Flat = brute force - SQ = bucketize across one dimension, accuracy x memory tradeoff - PQ = bucketize across two dimensions, more accuracy x memory tradeoff
  • 25. Vector databases efficiently store, index, and relate entities by a quantitative value
  • 27. What Does Vector Data Look Like?
  • 28. RAG RAG Inject your data via a vector database like Milvus/Zilliz Query LLM Milvus Your Data Primary Use Case ● Factual Recall ● Forced Data Injection ● Cost Optimization
  • 29. Common AI Use Cases LLM Augmented Retrieval Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications. Match user behavior or content features with other similar behaviors or features to make effective recommendations. Recommender System Search for semantically similar texts across vast amounts of natural language documents. Text/ Semantic Search Image Similarity Search Identify and search for visually similar images or objects from a vast collection of image libraries. Video Similarity Search Search for similar videos, scenes, or objects from extensive collections of video libraries. Audio Similarity Search Find similar audios from massive amounts of audio data to perform tasks such as genre classification, or recognize speech. Molecular Similarity Search Search for similar substructures, superstructures, and other structures for a specific molecule. Question Answering System Interactive QA chatbot that automatically answers user questions Multimodal Similarity Search Search over multiple types of data simultaneously, e.g. text and images
  • 33. 04 Vector Database Architecture
  • 34. Why Not Use a SQL/NoSQL Database? ● Inefficiency in High-dimensional spaces ● Suboptimal Indexing ● Inadequate query support ● Lack of scalability ● Limited analytics capabilities ● Data conversion issues TL;DR: Vector operations are too computationally intensive for traditional database infrastructures
  • 35. Why Not Use a Vector Search Library? ● Have to manually implement filtering ● Not optimized to take advantage of the latest hardware ● Unable to handle large scale data ● Lack of lifecycle management ● Inefficient indexing capabilities ● No built in safety mechanisms TL;DR: Vector search libraries lack the infrastructure to help you scale, deploy, and manage your apps in production.
  • 36. What is Milvus/Zilliz ideal for? ○ Advanced filtering ○ Hybrid search ○ Durability and backups ○ Replications/High Availability ○ Sharding ○ Aggregations ○ Lifecycle management ○ Multi-tenancy ○ High query load ○ High insertion/deletion ○ Full precision/recall ○ Accelerator support (GPU, FPGA) ○ Billion-scale storage Purpose-built to store, index and query vector embeddings from unstructured data at scale.
  • 37. Meta Storage Root Query Data Index Coordinator Service Proxy Proxy etcd Log Broker SDK Load Balancer DDL/DCL DML NOTIFICATION CONTROL SIGNAL Object Storage Minio / S3 / AzureBlob Log Snapshot Delta File Index File Worker Node QUERY DATA DATA Message Storage Access Layer Query Node Data Node Index Node High-level overview of Milvus’ Architecture
  • 38. Start building with Zilliz Cloud today! zilliz.com/cloud
  • 39. | © Copyright 9/25/23 Zilliz 39 Appendix
  • 40. Important Notes - Cosine, IP, and L2 are all the SAME rank order. - They differ in use case - L2 for when you need magnitude - Cosine for orientation - IP for magnitude and orientation - OR - Cosine = IP for normalized vectors
  • 42. Basic Idea You want to use your data with a large language model
  • 43. RAG vs Fine Tuning LLM Fine Tuning Augment an LLM by training it on your data Your Data “New” LLM Query Primary Use Case ● Style transfer
  • 44. Takeaway Use RAG to force the LLM to work with your data by injecting it via a vector database like Milvus or Zilliz
  • 45. Chunking Considerations Chunk Size Chunk Overlap Character Splitters
  • 46. How Does Your Data Look? Conversation Data Documentation Data Lecture or Q/A Data
  • 53. Your chunking strategy depends on what your data looks like and what you need from it. Takeaway:
  • 54. Examining Embeddings Picking a model What to embed Metadata
  • 55. Embeddings Strategies Level 1: Embedding Chunks Directly Level 2: Embedding Sub and Super Chunks Level 3: Incorporating Chunking and Non-Chunking Metadata
  • 56. Metadata Examples Chunking ● Paragraph position ● Section header ● Larger paragraph ● Sentence Number ● … Non-Chunking ● Author ● Publisher ● Organization ● Role Based Access Control ● …
  • 57. Your embeddings strategy depends on your accuracy, cost, and use case needs Takeaway:
  • 58. Basic Idea Vector Databases provide the ability to inject your data via semantic similarity Considerations include: scale, performance, and flexibility
  • 59. Milvus Architecture: Differentiation 1. Cloud Native, Distributed System Architecture 2. True Separation of Concerns 3. Scalable Index Creation Strategy with 512 MB Segments
  • 60. Vector Databases are purpose-built to handle indexing, storing, and querying vector data. Milvus & Zilliz are specifically designed for high performance and billion+ scale use cases. Takeaway:
  • 61. Vector Database Resources Give Milvus a Star! Chat with me on Discord!
  • 62. Get Started Free Got questions? Stop by our booth! Milvus Open Source Self-Managed github.com/milvus-io/milvus Zilliz Cloud SaaS Fully-Managed zilliz.com/cloud