SlideShare a Scribd company logo
1 | © Copyright 8/16/23 Zilliz
1 | © Copyright 8/16/23 Zilliz
Tim Spann | Zilliz
Discussion on Vector
Databases, Unstructured Data
and AI
2 | © Copyright 8/16/23 Zilliz
2 | © Copyright 8/16/23 Zilliz
Tim Spann
Principal Developer Advocate, Zilliz
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/paasdev
https://github.com/tspannhw
Speaker
3 | © Copyright 8/16/23 Zilliz
3 | © Copyright 8/16/23 Zilliz
3
Unstructured Data Meetup
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector
databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers,
data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz
maintainers of Milvus.
4 | © Copyright 8/16/23 Zilliz
4 | © Copyright 8/16/23 Zilliz
27.5K+
GitHub
Stars
25M+
Downloads
250+
Contributors
2,700+
Forks
Milvus is an open-source vector database for GenAI projects. pip install on your
laptop, plug into popular AI dev tools, and push to production with a single line of
code.
Easy Setup
Pip-install to start
coding in a notebook
within seconds.
Reusable Code
Write once, and
deploy with one line
of code into the
production
environment
Integration
Plug into OpenAI,
Langchain,
LlmaIndex, and
many more
Feature-rich
Dense & sparse
embeddings,
filtering, reranking
and beyond
5 | © Copyright 8/16/23 Zilliz
5 | © Copyright 8/16/23 Zilliz
The evolution of AI made the semantic search of
unstructured data possible
Search by Probability
Statistical analyses of common
datasets established the foundation for
processing unstructured data, e.g. NLP,
and image classification
AI Model Breakthrough
The advancements in BERT, ViT, CBT
etc. have revolutionized semantic
analysis across unstructured data
Vectorization
Word2Vec, CNNs, Deep Speech pioneered
unstructured data embeddings, mapping the
words, images, videos into high-dimensional
vectors
6 | © Copyright 8/16/23 Zilliz
6 | © Copyright 8/16/23 Zilliz
What your data looks like
7 | © Copyright 8/16/23 Zilliz
7 | © Copyright 8/16/23 Zilliz
This new AI breakthrough requires new databases to
fully unleash its potential
Support multiple
use case types
Accommodate diverse data
requirements, enhancing
flexibility and effectiveness in
varied operational contexts
Scale as needed
Enable robust handling of
expanding data volumes and
search demands
Highly performant
Ensures swift and accurate
query responses, crucial for
optimal user experience
8 | © Copyright 8/16/23 Zilliz
8 | © Copyright 8/16/23 Zilliz
Vector Databases are core component for Retrieval
Augmented Generation (RAG)
9 | © Copyright 8/16/23 Zilliz
9 | © Copyright 8/16/23 Zilliz
…different types of data and schemas needs to be
thoroughly planned ahead of time
10 | © Copyright 8/16/23 Zilliz
10 | © Copyright 8/16/23 Zilliz
Retrieval Augmented
Generation (RAG)
Expand LLMs' knowledge by
incorporating external data sources
into LLMs and your AI applications.
Match user behavior or content
features with other similar ones to
make effective recommendations.
Recommender System
Search for semantically similar
texts across vast amounts of
natural language documents.
Text/ Semantic Search
Image Similarity Search
Identify and search for visually
similar images or objects from a
vast collection of image libraries.
Video Similarity Search
Search for similar videos, scenes,
or objects from extensive
collections of video libraries.
Audio Similarity Search
Find similar audios in large datasets
for tasks like genre classification or
speech recognition
Molecular Similarity Search
Search for similar substructures,
superstructures, and other
structures for a specific molecule.
Anomaly Detection
Detect data points, events, and
observations that deviate
significantly from the usual pattern
Multimodal Similarity Search
Search over multiple types of data
simultaneously, e.g. text and
images
…powers searches across various types of
unstructured data
11 | © Copyright 8/16/23 Zilliz
11 | © Copyright 8/16/23 Zilliz
We’ve built technologies for various types of use
cases
Compute Types
Support different types of
compute powers, such as
AVX512, Neon for SIMD
execution, quantization &
cache-aware optimization,
and GPU
Leverage specific strengths
of each hardware type
efficiently, ensuring
high-speed processing and
cost-effective scalability for
diverse application needs
Search Types
Provide diverse search
types such as top-K ANN,
Range ANN, hybrid ANN
and metadata filtering
Enable unparalleled query
flexibility and accuracy,
allowing developers to
tailor their data retrieval
needs
Multi-tenancy
Enable multi-tenancy
through collection and
partition management
Allow for efficient resource
utilization and customizable
data segregation, ensuring
secure and isolated data
handling for each tenant
Index Types
Offer a diverse range of 11+
index types, including
popular ones like HNSW,
IVF, PQ, and GPU index
Empower developers with
tailored search
optimizations, catering to
specific performance and
accuracy needs
12 | © Copyright 8/16/23 Zilliz
12 | © Copyright 8/16/23 Zilliz
Meta Storage
Root Query Data Index
Coordinator Service
Proxy
Proxy
etcd
Log Broker
SDK
Load Balancer
DDL/DCL
DML
NOTIFICATION
CONTROL SIGNAL
Object Storage
Minio / S3 / AzureBlob
Log Snapshot Delta File Index File
Worker Node QUERY DATA DATA
Message
Storage
Access Layer
Query Node Data Node Index Node
Milvus’ fully distributed architecture is designed
scalability and performance
13 | © Copyright 8/16/23 Zilliz
13 | © Copyright 8/16/23 Zilliz
Tests shows consistent query performance when
scaled from 65 million to 1 billion vectors
14 | © Copyright 8/16/23 Zilliz
14 | © Copyright 8/16/23 Zilliz
ANN Benchmark has recognized Milvus as the
performance leader among vector database players
15 | © Copyright 8/16/23 Zilliz
15 | © Copyright 8/16/23 Zilliz
We provide deployment flexibility for different
operational, security and compliance requirements
BRING YOUR OWN CLOUD
Zilliz BYOC
Enterprise-ready Milvus for
Private VPCs
Deploy in your virtual private cloud
Zilliz Cloud
Milvus Re-engineered for the
Cloud
Available on the leading public
clouds
FULLY MANAGED SERVICE
Coming Soon! Coming Soon!
Milvus
Most widely-adopted open
source vector database
Self hosted on any machine with
community support
SELF MANAGED SOFTWARE
Local Docker K8s
16 | © Copyright 8/16/23 Zilliz
16 | © Copyright 8/16/23 Zilliz
Milvus Lite
pip install pymilvus
17 | © Copyright 8/16/23 Zilliz
17 | © Copyright 8/16/23 Zilliz
Embeddings Models
18 | © Copyright 8/16/23 Zilliz
18 | © Copyright 8/16/23 Zilliz
Questions?
Give Milvus a Star! Code on Github
github.com/tspannhw
github.com/milvus-io/

More Related Content

PDF
06-18-2024-Princeton Meetup-Introduction to Milvus
PDF
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
PDF
Vector Databases 101 - An introduction to the world of Vector Databases
PDF
2025-04-05 - Block71 Event - The Landscape of GenAI and Ecosystem.pdf
PDF
09-12-2024 - Milvus, Vector database used for Sensor Data RAG
PDF
09-26-2024 Conf 42 Kube Native: Unleashing the Potential of Cloud Native Open...
PDF
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
06-18-2024-Princeton Meetup-Introduction to Milvus
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
Vector Databases 101 - An introduction to the world of Vector Databases
2025-04-05 - Block71 Event - The Landscape of GenAI and Ecosystem.pdf
09-12-2024 - Milvus, Vector database used for Sensor Data RAG
09-26-2024 Conf 42 Kube Native: Unleashing the Potential of Cloud Native Open...
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering

Similar to 06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI (20)

PDF
Milvus: Scaling Vector Data Solutions for Gen AI
PDF
2025-02-24 - AWS meetup - Zilliz presentation.pdf
PDF
09-18-2024 NYC Meetup Vector Databases 102
PDF
Scaling Vector Search: How Milvus Handles Billions+
PDF
Supercharge Spark: Unleashing Big Data Potential with Milvus for RAG systems
PDF
MultiModal RAG using vLLM and Pixtral - Stephen Batifol
PDF
MultiModal RAG using vLLM and Pixtral - Stephen Batifol
PDF
Keeping Data Fresh: Mastering Updates in Vector Databases
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
09-19-2024 AI Camp Hybrid Seach - Milvus for Vector Database
PDF
Open Source Milvus Vector Database v 2.6
PPTX
DataJan27.pptxDataFoundationsPresentation
PPTX
Research data management 1.5
PDF
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
PDF
09-03-2024_UnstructuredDataAndAIDiscussion.pdf
PDF
Unlock Your Data for ML & AI using Data Virtualization
PDF
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
PDF
Z Data Tools and APIs Overview
PDF
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
PDF
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
Milvus: Scaling Vector Data Solutions for Gen AI
2025-02-24 - AWS meetup - Zilliz presentation.pdf
09-18-2024 NYC Meetup Vector Databases 102
Scaling Vector Search: How Milvus Handles Billions+
Supercharge Spark: Unleashing Big Data Potential with Milvus for RAG systems
MultiModal RAG using vLLM and Pixtral - Stephen Batifol
MultiModal RAG using vLLM and Pixtral - Stephen Batifol
Keeping Data Fresh: Mastering Updates in Vector Databases
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
09-19-2024 AI Camp Hybrid Seach - Milvus for Vector Database
Open Source Milvus Vector Database v 2.6
DataJan27.pptxDataFoundationsPresentation
Research data management 1.5
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
09-03-2024_UnstructuredDataAndAIDiscussion.pdf
Unlock Your Data for ML & AI using Data Virtualization
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Z Data Tools and APIs Overview
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
PDF
09-25-2024 NJX Venture Summit Introduction to Unstructured Data
PDF
08-15-2024 - AI Camp Meetup - Human Pose Estimation in Real-Time Utilizing Ed...
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
09-25-2024 NJX Venture Summit Introduction to Unstructured Data
08-15-2024 - AI Camp Meetup - Human Pose Estimation in Real-Time Utilizing Ed...
Ad

Recently uploaded (20)

PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Database Infoormation System (DBIS).pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Business Analytics and business intelligence.pdf
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Data_Analytics_and_PowerBI_Presentation.pptx
Clinical guidelines as a resource for EBP(1).pdf
ISS -ESG Data flows What is ESG and HowHow
.pdf is not working space design for the following data for the following dat...
Qualitative Qantitative and Mixed Methods.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Quality review (1)_presentation of this 21
Database Infoormation System (DBIS).pptx
SAP 2 completion done . PRESENTATION.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Reliability_Chapter_ presentation 1221.5784
Business Analytics and business intelligence.pdf
IB Computer Science - Internal Assessment.pptx
climate analysis of Dhaka ,Banglades.pptx
Introduction to machine learning and Linear Models
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI

  • 1. 1 | © Copyright 8/16/23 Zilliz 1 | © Copyright 8/16/23 Zilliz Tim Spann | Zilliz Discussion on Vector Databases, Unstructured Data and AI
  • 2. 2 | © Copyright 8/16/23 Zilliz 2 | © Copyright 8/16/23 Zilliz Tim Spann Principal Developer Advocate, Zilliz tim.spann@zilliz.com https://www.linkedin.com/in/timothyspann/ https://x.com/paasdev https://github.com/tspannhw Speaker
  • 3. 3 | © Copyright 8/16/23 Zilliz 3 | © Copyright 8/16/23 Zilliz 3 Unstructured Data Meetup https://www.meetup.com/unstructured-data-meetup-new-york/ This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
  • 4. 4 | © Copyright 8/16/23 Zilliz 4 | © Copyright 8/16/23 Zilliz 27.5K+ GitHub Stars 25M+ Downloads 250+ Contributors 2,700+ Forks Milvus is an open-source vector database for GenAI projects. pip install on your laptop, plug into popular AI dev tools, and push to production with a single line of code. Easy Setup Pip-install to start coding in a notebook within seconds. Reusable Code Write once, and deploy with one line of code into the production environment Integration Plug into OpenAI, Langchain, LlmaIndex, and many more Feature-rich Dense & sparse embeddings, filtering, reranking and beyond
  • 5. 5 | © Copyright 8/16/23 Zilliz 5 | © Copyright 8/16/23 Zilliz The evolution of AI made the semantic search of unstructured data possible Search by Probability Statistical analyses of common datasets established the foundation for processing unstructured data, e.g. NLP, and image classification AI Model Breakthrough The advancements in BERT, ViT, CBT etc. have revolutionized semantic analysis across unstructured data Vectorization Word2Vec, CNNs, Deep Speech pioneered unstructured data embeddings, mapping the words, images, videos into high-dimensional vectors
  • 6. 6 | © Copyright 8/16/23 Zilliz 6 | © Copyright 8/16/23 Zilliz What your data looks like
  • 7. 7 | © Copyright 8/16/23 Zilliz 7 | © Copyright 8/16/23 Zilliz This new AI breakthrough requires new databases to fully unleash its potential Support multiple use case types Accommodate diverse data requirements, enhancing flexibility and effectiveness in varied operational contexts Scale as needed Enable robust handling of expanding data volumes and search demands Highly performant Ensures swift and accurate query responses, crucial for optimal user experience
  • 8. 8 | © Copyright 8/16/23 Zilliz 8 | © Copyright 8/16/23 Zilliz Vector Databases are core component for Retrieval Augmented Generation (RAG)
  • 9. 9 | © Copyright 8/16/23 Zilliz 9 | © Copyright 8/16/23 Zilliz …different types of data and schemas needs to be thoroughly planned ahead of time
  • 10. 10 | © Copyright 8/16/23 Zilliz 10 | © Copyright 8/16/23 Zilliz Retrieval Augmented Generation (RAG) Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications. Match user behavior or content features with other similar ones to make effective recommendations. Recommender System Search for semantically similar texts across vast amounts of natural language documents. Text/ Semantic Search Image Similarity Search Identify and search for visually similar images or objects from a vast collection of image libraries. Video Similarity Search Search for similar videos, scenes, or objects from extensive collections of video libraries. Audio Similarity Search Find similar audios in large datasets for tasks like genre classification or speech recognition Molecular Similarity Search Search for similar substructures, superstructures, and other structures for a specific molecule. Anomaly Detection Detect data points, events, and observations that deviate significantly from the usual pattern Multimodal Similarity Search Search over multiple types of data simultaneously, e.g. text and images …powers searches across various types of unstructured data
  • 11. 11 | © Copyright 8/16/23 Zilliz 11 | © Copyright 8/16/23 Zilliz We’ve built technologies for various types of use cases Compute Types Support different types of compute powers, such as AVX512, Neon for SIMD execution, quantization & cache-aware optimization, and GPU Leverage specific strengths of each hardware type efficiently, ensuring high-speed processing and cost-effective scalability for diverse application needs Search Types Provide diverse search types such as top-K ANN, Range ANN, hybrid ANN and metadata filtering Enable unparalleled query flexibility and accuracy, allowing developers to tailor their data retrieval needs Multi-tenancy Enable multi-tenancy through collection and partition management Allow for efficient resource utilization and customizable data segregation, ensuring secure and isolated data handling for each tenant Index Types Offer a diverse range of 11+ index types, including popular ones like HNSW, IVF, PQ, and GPU index Empower developers with tailored search optimizations, catering to specific performance and accuracy needs
  • 12. 12 | © Copyright 8/16/23 Zilliz 12 | © Copyright 8/16/23 Zilliz Meta Storage Root Query Data Index Coordinator Service Proxy Proxy etcd Log Broker SDK Load Balancer DDL/DCL DML NOTIFICATION CONTROL SIGNAL Object Storage Minio / S3 / AzureBlob Log Snapshot Delta File Index File Worker Node QUERY DATA DATA Message Storage Access Layer Query Node Data Node Index Node Milvus’ fully distributed architecture is designed scalability and performance
  • 13. 13 | © Copyright 8/16/23 Zilliz 13 | © Copyright 8/16/23 Zilliz Tests shows consistent query performance when scaled from 65 million to 1 billion vectors
  • 14. 14 | © Copyright 8/16/23 Zilliz 14 | © Copyright 8/16/23 Zilliz ANN Benchmark has recognized Milvus as the performance leader among vector database players
  • 15. 15 | © Copyright 8/16/23 Zilliz 15 | © Copyright 8/16/23 Zilliz We provide deployment flexibility for different operational, security and compliance requirements BRING YOUR OWN CLOUD Zilliz BYOC Enterprise-ready Milvus for Private VPCs Deploy in your virtual private cloud Zilliz Cloud Milvus Re-engineered for the Cloud Available on the leading public clouds FULLY MANAGED SERVICE Coming Soon! Coming Soon! Milvus Most widely-adopted open source vector database Self hosted on any machine with community support SELF MANAGED SOFTWARE Local Docker K8s
  • 16. 16 | © Copyright 8/16/23 Zilliz 16 | © Copyright 8/16/23 Zilliz Milvus Lite pip install pymilvus
  • 17. 17 | © Copyright 8/16/23 Zilliz 17 | © Copyright 8/16/23 Zilliz Embeddings Models
  • 18. 18 | © Copyright 8/16/23 Zilliz 18 | © Copyright 8/16/23 Zilliz Questions? Give Milvus a Star! Code on Github github.com/tspannhw github.com/milvus-io/