SlideShare a Scribd company logo
1 | © Copyright 2024 Zilliz
1
Presented by:
New York
Unstructured Data Meetup
2 | © Copyright 2024 Zilliz
2
2 | © Copyright 10/22/23 Zilliz
2 | © Copyright 2024 Zilliz
Tim Spann
Principal Developer
Advocate, Zilliz
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/PaaSDev
Unstructured Data Meetup | Host
3 | © Copyright 2024 Zilliz
3
Code of
Conduct
Be respectful and kind
When communicating with all event participants,
speakers, and hosts. Be considerate
All ideas are welcome
Be present and participate actively in discussions. Ask
questions and reach out for help when needed.
Report inappropriate behavior
Any inappropriate behavior is not tolerated at this event.
Inform a Zilliz team member immediately if you see any
behavior deemed inappropriate
4 | © Copyright 2024 Zilliz
4
4 | © Copyright 10/22/23 Zilliz
4 | © Copyright 2024 Zilliz
Milvus
Open Source Self-Managed
Zilliz Cloud
SaaS Fully-Managed
github.com/milvus-io/milvus
Getting Started with Vector Databases
zilliz.com/cloud
5 | © Copyright 2024 Zilliz
5
Zilliz is
Hiring!
Join our
Team
Zilliz.com/careers
• Developer Advocate
• Senior Software Engineer
• Staff Software Engineer
• Solutions Architect
6 | © Copyright 2024 Zilliz
6
Join the
Milvus
Discord!
7 | © Copyright 2024 Zilliz
7
Become a
Speaker!
Interesting in speaking at and/or
sponsoring a Zilliz Unstructured
Data Meetup? Fill out this form!
🎤🎤🎤
8 | © Copyright 2024 Zilliz
8
Have you built
something cool
using Milvus or
Zilliz? We want to
hear all about it.
Share Your Story
9 | © Copyright 2024 Zilliz
9
Star Milvus
for a chance
to win a prize
tonight!
10 | © Copyright 2024 Zilliz
10
Share your
photos!
#ZillizUnstructuredData
@zilliz_universe, @milvusio
Zilliz, Milvus
11 | © Copyright 2024 Zilliz
11
11 | © Copyright 10/22/23 Zilliz
11 | © Copyright 2024 Zilliz
Welcome Speakers
How Inkeep and Zilliz built
an AI Assistant
Introduction to the Data
Prep Kit
RGBX Model Development:
Exploring Four Channel ML
Workflows
TECH TALK 1 TECH TALK 2 TECH TALK 3
Robert Tran
Founder, CTO  Inkeep
Santosh Borse
Senior Engineer, watsonx Data
Engineering at IBM Research
Daniel Gural
Machine Learning and
DevRel, Voxel 51
12 | © Copyright 2024 Zilliz
12
Join us at our next meetup!
lu.ma/unstructured-data-meetup
13 | © Copyright 2024 Zilliz
13
13 | © Copyright 10/22/23 Zilliz
13 | © Copyright 2024 Zilliz
Quick Intro to Unstructured Data, Edge AI and Milvus
Tim Spann
Principal Developer Advocate, Zilliz
14 | © Copyright 2024 Zilliz
14
Welcome to New York!
Tim Spann @ Zilliz
These Slides
fdfdf
16 | © Copyright Zilliz
16
17 | © Copyright Zilliz
17
01 Introduction
18 | © Copyright Zilliz
18
19 | © Copyright Zilliz
19
Three Pillars of GenAI & the opportunities they
bring
Models AI Hardware Data
Vector Database
● Data Encryption
● Data ETL
● Data Security
● Data Pipeline
● Data Observability
● Data Compliance
20 | © Copyright Zilliz
20
https://milvus.io/milvus-demos/reverse-image-search
Show Me
https://multimodal-demo.milvus.io/
21 | © Copyright Zilliz
21
https://zilliz-semantic-search-example.vercel.app/
Show Me Another Demo
22 | © Copyright Zilliz
22
About Milvus
Milvus is an open-source vector database for
GenAI projects. pip install on your laptop, plug into
popular AI dev tools, and push to production with
a single line of code.
29K
GitHub Stars
25M
Downloads
250
Contributors
2,600
Forks
Easy Setup
Pip-install to start coding in a notebook within seconds
Integration
Plug into OpenAI, Langchain, LlmaIndex, and many more
Reusable Code
Write once, and deploy with one line of code into the production
environment
Feature-rich
Dense & sparse embeddings, filtering, reranking and beyond
23
2024
Higher scalability
10B vectors
of 1536 dimensions
in a single Milvus/Zilliz Cloud
instance
100B vectors
in one of the largest deployment
https://zilliz.com/learn/large-language-models-and-search
25 | © Copyright Zilliz
25
02 Hybrid Search
26 | © Copyright Zilliz
26
Hybrid Search
●
https://zilliz.com/blog/metadata-filtering-hybrid-search-or-agent-in-rag-applications
27 | © Copyright Zilliz
27
Hybrid Search
Support the fusion of vector search and full-text search
Support the fusion of multimodal vectors from various unstructured
data types such as images, videos, audio, and text files
Utilize various types of vector embeddings. This includes dense
embeddings from models like BERT and Transformers and sparse
embeddings from algorithms like BM25, BGEM3, and SPLADE.
28 | © Copyright Zilliz
28
Hybrid Search
● Milvus supports the creation of up to 10 vector fields for the same
dataset within a single collection. Based on this support, hybrid
search allows users to search across multiple vector columns
simultaneously. This capability allows for combining multimodal
search, hybrid sparse and dense search, and hybrid dense and
full-text search, offering versatile and flexible search functionality.
● These vectors in different columns represent diverse facets of data,
originating from different embedding models or undergoing distinct
processing methods. The results of hybrid searches are integrated
using various re-ranking strategies.
29 | © Copyright Zilliz
29
Hybrid Search
This feature enables different columns to:
● Represent multiple perspectives of information. For instance, in e-commerce, product images
include front, side, and top views. Different views can be represented with different types or
dimensions of vectors.
● Utilize various types of vector embeddings. This includes dense embeddings from models like BERT
and Transformers and sparse embeddings from algorithms like BM25, BGE-M3, and SPLADE.
● Support the fusion of multimodal vectors from various unstructured data types such as images,
videos, audio, and text files. For example, in criminal investigations, suspects can be represented
through biometric modalities such as fingerprints, voiceprints, and facial recognition, aiding in
identifying individuals across different modalities.
● Support the fusion of vector search and full-text search.
https://milvus.io/docs/multi-vector-search.md
30 | © Copyright Zilliz
30
When is Hybrid Search Recommended?
Hybrid search is ideal for complex situations demanding high
accuracy, especially when an entity can be represented by multiple,
diverse vectors. This applies to cases where the same data, such as
a sentence, is processed through different embedding models or
when multimodal information (like images, fingerprints, and
voiceprints of an individual) is converted into various vector formats.
By assigning weights to these vectors, their combined influence can
significantly enrich recall and improve the effectiveness of search
results.
31 | © Copyright Zilliz
31
Hybrid Search - FAQ 2
● How does a weighted ranker normalize distances between different vector fields?
A weighted ranker normalizes the distances between vector fields using assigned weights to each field. It
calculates the importance of each vector field according to its weight, prioritizing those with higher
weights. Itʼs advised to use the same metric type across ANN search requests to ensure consistency. This
method ensures that vectors deemed more significant have a greater influence on the overall ranking.
● Is it possible to conduct multiple hybrid search operations at the same time?
Yes, simultaneous execution of multiple hybrid search operations is supported.
● Can I use the same vector field in multiple AnnSearchRequest objects to perform hybrid searches?
Technically, it is possible to use the same vector field in multiple AnnSearchRequest objects for hybrid
searches. It is not necessary to have multiple vector fields for a hybrid search.
32 | © Copyright Zilliz
32
33 | © Copyright Zilliz
33
Choosing Vector Embedding Types
34 | © Copyright Zilliz
34 | © Copyright Zilliz
34
RESOURCES
09-18-2024 NYC Meetup Vector Databases 102
36 | © Copyright Zilliz
36
Well-connected in LLM infrastructure to enable RAG
use cases
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database
https://medium.com/@tspann/unstructured-data-processing-with-a-raspberry-pi-ai-kit-c959dd7fff47
Raspberry Pi AI Kit Hailo
Edge AI
https://medium.com/@tspann/edgeai-edge-vector-database-6a9b5238bffb
https://github.com/tspannhw/AIM-XavierEdgeAI
39 | © Copyright Zilliz
39
Vector Database Resources
Give Milvus a Star! Chat with me on Discord!
https://github.com/milvus-io/milvus
40 | © Copyright Zilliz
40
https://zilliz.com/learn/generative-ai
41
Unstructured Data Meetup
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics
such as vector databases, LLMs, and managing data at scale. The intended audience of this group
includes roles like machine learning engineers, data scientists, data engineers, software engineers, and
PMs.
This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
09-18-2024 NYC Meetup Vector Databases 102
https://medium.com/@tspann/unstructured-street-data-in-new-york-8d3cde0a1e5b
https://medium.com/@tspann/not-every-field-is-just-text-numbers-or-vectors-976231e90e4d
https://medium.com/@tspann/shining-some-light-on-the-new-milvus-lite-5a0565eb5dd9
Extracting Value from Unstructured Data
Example
• A company has 100,000s+ pages of
proprietary documentation to enable
their staff to service customers.
Problem
• Searching can be slow, inefficient, or
lack context.
Solution
• Create internal chatbot with ChatGPT
and a vector database enriched with
company documentation to provide
direction and support to employees
and customers.
https://osschat.io/chat
47 | © Copyright Zilliz
47
Well-connected in LLM infrastructure to enable RAG
use cases
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database
48 | © Copyright 2024 Zilliz
48
48
This week in Milvus, Towhee, Attu, GPT
Cache, Gen AI, LLM, Apache NiFi, Apache
Flink, Apache Kafka, ML, AI, Apache Spark,
Apache Iceberg, Python, Java, Vector DB
and Open Source friends.
https://bit.ly/32dAJft
https://github.com/milvus-io/milvus
AIM Weekly by Tim Spann
49 | © Copyright 2024 Zilliz
49
milvus.io
github.com/milvus-io/
@milvusio
@paasDev
/in/timothyspann
Connect with me!
Thank you!
50 | © Copyright 2024 Zilliz
50
These capabilities make us the perfect partner to
uplifting your initiatives on vector search and AI/ML
Data
processing &
connectivity
Security
& Availability
Operational
burden &
resources
Milvus Zilliz
Tedious configuration. Manual &
resource-intensive day-to-day operations
to deploy, manage, and scale clusters.
Custom-built security tools & integrations
that creates tech & operational debt.
Resource-intensive failover design &
operations.
Instant cluster provisioning and scaling.
Automated capacity mgmt and upgrades.
Improved performance for compute intensive
use cases.
Battle-tested and enterprise-grade security
tools and compliance ready out-of-the-box.
Highly available and consistent access to
data across all of your environments.
Siloed solutions and custom integrations,
escalating complexity & costs to manage
and maintain platform as it scales
Well-integrated into AI and data ecosystems.
Out-of-box pipeline builders transform
unstructured data into searchable vectors
efficiently.
51 | © Copyright 2024 Zilliz
51
Cool AI News
OpenAI and Thrive Capital recently backed Chai Discovery, a six-month-old AI biology startup founded by ex-OpenAI and Meta
researchers that raised $30 million to develop AI models for drug discovery.
The details:
● Chai’s AI model, Chai-1, predicts biochemical molecule structures, potentially speeding up drug development.
● The company claims Chai-1 outperforms Google DeepMind’s AlphaFold on certain benchmarks.
● Chai-1 can work with proteins, small molecules, DNA, and RNA, making it versatile for various applications.
● Chai is making its first model free and open-source for non-commercial use.
https://github.com/chaidiscovery/chai-lab
52 | © Copyright 2024 Zilliz
52
53 | © Copyright 2024 Zilliz
53
54 | © Copyright 2024 Zilliz
54
Join us at our next meetup!
meetup.com/unstructured-data-meetup-
new-york/
55 | © Copyright Zilliz
55
T H A N K Y O U

More Related Content

PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
06-18-2024-Princeton Meetup-Introduction to Milvus
PDF
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
PDF
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
PDF
09-03-2024_UnstructuredDataAndAIDiscussion.pdf
PDF
09-19-2024 AI Camp Hybrid Seach - Milvus for Vector Database
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
08-13-2024 NYC Meetup Unstructured Data Processing From Cloud to Edge (Milvus)
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
06-18-2024-Princeton Meetup-Introduction to Milvus
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
09-03-2024_UnstructuredDataAndAIDiscussion.pdf
09-19-2024 AI Camp Hybrid Seach - Milvus for Vector Database
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
08-13-2024 NYC Meetup Unstructured Data Processing From Cloud to Edge (Milvus)

Similar to 09-18-2024 NYC Meetup Vector Databases 102 (20)

PDF
NYC Meetup Unstructured Data Processing From Cloud to Edge (Milvus)
PDF
09-26-2024 Conf 42 Kube Native: Unleashing the Potential of Cloud Native Open...
PDF
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
PDF
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
PDF
2025-02-24 - AWS meetup - Zilliz presentation.pdf
PDF
09-25-2024 NJX Venture Summit Introduction to Unstructured Data
PDF
Milvus: Scaling Vector Data Solutions for Gen AI
PDF
09-12-2024 - Milvus, Vector database used for Sensor Data RAG
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PDF
Multimodal Search with Open-Source Tools
PDF
Supercharge Spark: Unleashing Big Data Potential with Milvus for RAG systems
PDF
Unstructured Data Processing from Cloud to Edge Webinar
PDF
Unstructured Data Processing from Cloud to Edge Webinar
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
Vector Databases 101 - An introduction to the world of Vector Databases
PDF
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
PDF
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
PDF
Scaling Vector Search: How Milvus Handles Billions+
PDF
Milvus 2.5: Full-Text Search, More Powerful Metadata Filtering, and more!
NYC Meetup Unstructured Data Processing From Cloud to Edge (Milvus)
09-26-2024 Conf 42 Kube Native: Unleashing the Potential of Cloud Native Open...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
2025-02-24 - AWS meetup - Zilliz presentation.pdf
09-25-2024 NJX Venture Summit Introduction to Unstructured Data
Milvus: Scaling Vector Data Solutions for Gen AI
09-12-2024 - Milvus, Vector database used for Sensor Data RAG
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Multimodal Search with Open-Source Tools
Supercharge Spark: Unleashing Big Data Potential with Milvus for RAG systems
Unstructured Data Processing from Cloud to Edge Webinar
Unstructured Data Processing from Cloud to Edge Webinar
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Vector Databases 101 - An introduction to the world of Vector Databases
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Scaling Vector Search: How Milvus Handles Billions+
Milvus 2.5: Full-Text Search, More Powerful Metadata Filtering, and more!
Ad

More from Timothy Spann (18)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
08-15-2024 - AI Camp Meetup - Human Pose Estimation in Real-Time Utilizing Ed...
PDF
Unstructured Data Meetup - NYC - Qarbine - Milvus 13-Aug-2024
PDF
Milvus Vector Database: Integrating Semantic Search Capabilities with .NET an...
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
08-15-2024 - AI Camp Meetup - Human Pose Estimation in Real-Time Utilizing Ed...
Unstructured Data Meetup - NYC - Qarbine - Milvus 13-Aug-2024
Milvus Vector Database: Integrating Semantic Search Capabilities with .NET an...
Ad

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PDF
Mega Projects Data Mega Projects Data
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Introduction to Business Data Analytics.
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
Fluorescence-microscope_Botany_detailed content
Mega Projects Data Mega Projects Data
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Business Acumen Training GuidePresentation.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to Business Data Analytics.
IB Computer Science - Internal Assessment.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Supervised vs unsupervised machine learning algorithms
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Reliability_Chapter_ presentation 1221.5784
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Galatica Smart Energy Infrastructure Startup Pitch Deck
Moving the Public Sector (Government) to a Digital Adoption
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Knowledge Engineering Part 1

09-18-2024 NYC Meetup Vector Databases 102

  • 1. 1 | © Copyright 2024 Zilliz 1 Presented by: New York Unstructured Data Meetup
  • 2. 2 | © Copyright 2024 Zilliz 2 2 | © Copyright 10/22/23 Zilliz 2 | © Copyright 2024 Zilliz Tim Spann Principal Developer Advocate, Zilliz tim.spann@zilliz.com https://www.linkedin.com/in/timothyspann/ https://x.com/PaaSDev Unstructured Data Meetup | Host
  • 3. 3 | © Copyright 2024 Zilliz 3 Code of Conduct Be respectful and kind When communicating with all event participants, speakers, and hosts. Be considerate All ideas are welcome Be present and participate actively in discussions. Ask questions and reach out for help when needed. Report inappropriate behavior Any inappropriate behavior is not tolerated at this event. Inform a Zilliz team member immediately if you see any behavior deemed inappropriate
  • 4. 4 | © Copyright 2024 Zilliz 4 4 | © Copyright 10/22/23 Zilliz 4 | © Copyright 2024 Zilliz Milvus Open Source Self-Managed Zilliz Cloud SaaS Fully-Managed github.com/milvus-io/milvus Getting Started with Vector Databases zilliz.com/cloud
  • 5. 5 | © Copyright 2024 Zilliz 5 Zilliz is Hiring! Join our Team Zilliz.com/careers • Developer Advocate • Senior Software Engineer • Staff Software Engineer • Solutions Architect
  • 6. 6 | © Copyright 2024 Zilliz 6 Join the Milvus Discord!
  • 7. 7 | © Copyright 2024 Zilliz 7 Become a Speaker! Interesting in speaking at and/or sponsoring a Zilliz Unstructured Data Meetup? Fill out this form! 🎤🎤🎤
  • 8. 8 | © Copyright 2024 Zilliz 8 Have you built something cool using Milvus or Zilliz? We want to hear all about it. Share Your Story
  • 9. 9 | © Copyright 2024 Zilliz 9 Star Milvus for a chance to win a prize tonight!
  • 10. 10 | © Copyright 2024 Zilliz 10 Share your photos! #ZillizUnstructuredData @zilliz_universe, @milvusio Zilliz, Milvus
  • 11. 11 | © Copyright 2024 Zilliz 11 11 | © Copyright 10/22/23 Zilliz 11 | © Copyright 2024 Zilliz Welcome Speakers How Inkeep and Zilliz built an AI Assistant Introduction to the Data Prep Kit RGBX Model Development: Exploring Four Channel ML Workflows TECH TALK 1 TECH TALK 2 TECH TALK 3 Robert Tran Founder, CTO  Inkeep Santosh Borse Senior Engineer, watsonx Data Engineering at IBM Research Daniel Gural Machine Learning and DevRel, Voxel 51
  • 12. 12 | © Copyright 2024 Zilliz 12 Join us at our next meetup! lu.ma/unstructured-data-meetup
  • 13. 13 | © Copyright 2024 Zilliz 13 13 | © Copyright 10/22/23 Zilliz 13 | © Copyright 2024 Zilliz Quick Intro to Unstructured Data, Edge AI and Milvus Tim Spann Principal Developer Advocate, Zilliz
  • 14. 14 | © Copyright 2024 Zilliz 14 Welcome to New York! Tim Spann @ Zilliz
  • 16. 16 | © Copyright Zilliz 16
  • 17. 17 | © Copyright Zilliz 17 01 Introduction
  • 18. 18 | © Copyright Zilliz 18
  • 19. 19 | © Copyright Zilliz 19 Three Pillars of GenAI & the opportunities they bring Models AI Hardware Data Vector Database ● Data Encryption ● Data ETL ● Data Security ● Data Pipeline ● Data Observability ● Data Compliance
  • 20. 20 | © Copyright Zilliz 20 https://milvus.io/milvus-demos/reverse-image-search Show Me https://multimodal-demo.milvus.io/
  • 21. 21 | © Copyright Zilliz 21 https://zilliz-semantic-search-example.vercel.app/ Show Me Another Demo
  • 22. 22 | © Copyright Zilliz 22 About Milvus Milvus is an open-source vector database for GenAI projects. pip install on your laptop, plug into popular AI dev tools, and push to production with a single line of code. 29K GitHub Stars 25M Downloads 250 Contributors 2,600 Forks Easy Setup Pip-install to start coding in a notebook within seconds Integration Plug into OpenAI, Langchain, LlmaIndex, and many more Reusable Code Write once, and deploy with one line of code into the production environment Feature-rich Dense & sparse embeddings, filtering, reranking and beyond
  • 23. 23 2024 Higher scalability 10B vectors of 1536 dimensions in a single Milvus/Zilliz Cloud instance 100B vectors in one of the largest deployment
  • 25. 25 | © Copyright Zilliz 25 02 Hybrid Search
  • 26. 26 | © Copyright Zilliz 26 Hybrid Search ● https://zilliz.com/blog/metadata-filtering-hybrid-search-or-agent-in-rag-applications
  • 27. 27 | © Copyright Zilliz 27 Hybrid Search Support the fusion of vector search and full-text search Support the fusion of multimodal vectors from various unstructured data types such as images, videos, audio, and text files Utilize various types of vector embeddings. This includes dense embeddings from models like BERT and Transformers and sparse embeddings from algorithms like BM25, BGEM3, and SPLADE.
  • 28. 28 | © Copyright Zilliz 28 Hybrid Search ● Milvus supports the creation of up to 10 vector fields for the same dataset within a single collection. Based on this support, hybrid search allows users to search across multiple vector columns simultaneously. This capability allows for combining multimodal search, hybrid sparse and dense search, and hybrid dense and full-text search, offering versatile and flexible search functionality. ● These vectors in different columns represent diverse facets of data, originating from different embedding models or undergoing distinct processing methods. The results of hybrid searches are integrated using various re-ranking strategies.
  • 29. 29 | © Copyright Zilliz 29 Hybrid Search This feature enables different columns to: ● Represent multiple perspectives of information. For instance, in e-commerce, product images include front, side, and top views. Different views can be represented with different types or dimensions of vectors. ● Utilize various types of vector embeddings. This includes dense embeddings from models like BERT and Transformers and sparse embeddings from algorithms like BM25, BGE-M3, and SPLADE. ● Support the fusion of multimodal vectors from various unstructured data types such as images, videos, audio, and text files. For example, in criminal investigations, suspects can be represented through biometric modalities such as fingerprints, voiceprints, and facial recognition, aiding in identifying individuals across different modalities. ● Support the fusion of vector search and full-text search. https://milvus.io/docs/multi-vector-search.md
  • 30. 30 | © Copyright Zilliz 30 When is Hybrid Search Recommended? Hybrid search is ideal for complex situations demanding high accuracy, especially when an entity can be represented by multiple, diverse vectors. This applies to cases where the same data, such as a sentence, is processed through different embedding models or when multimodal information (like images, fingerprints, and voiceprints of an individual) is converted into various vector formats. By assigning weights to these vectors, their combined influence can significantly enrich recall and improve the effectiveness of search results.
  • 31. 31 | © Copyright Zilliz 31 Hybrid Search - FAQ 2 ● How does a weighted ranker normalize distances between different vector fields? A weighted ranker normalizes the distances between vector fields using assigned weights to each field. It calculates the importance of each vector field according to its weight, prioritizing those with higher weights. Itʼs advised to use the same metric type across ANN search requests to ensure consistency. This method ensures that vectors deemed more significant have a greater influence on the overall ranking. ● Is it possible to conduct multiple hybrid search operations at the same time? Yes, simultaneous execution of multiple hybrid search operations is supported. ● Can I use the same vector field in multiple AnnSearchRequest objects to perform hybrid searches? Technically, it is possible to use the same vector field in multiple AnnSearchRequest objects for hybrid searches. It is not necessary to have multiple vector fields for a hybrid search.
  • 32. 32 | © Copyright Zilliz 32
  • 33. 33 | © Copyright Zilliz 33 Choosing Vector Embedding Types
  • 34. 34 | © Copyright Zilliz 34 | © Copyright Zilliz 34 RESOURCES
  • 36. 36 | © Copyright Zilliz 36 Well-connected in LLM infrastructure to enable RAG use cases Framework Hardware Infrastructure Embedding Models LLMs Software Infrastructure Vector Database
  • 39. 39 | © Copyright Zilliz 39 Vector Database Resources Give Milvus a Star! Chat with me on Discord! https://github.com/milvus-io/milvus
  • 40. 40 | © Copyright Zilliz 40 https://zilliz.com/learn/generative-ai
  • 41. 41 Unstructured Data Meetup https://www.meetup.com/unstructured-data-meetup-new-york/ This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs. This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
  • 46. Extracting Value from Unstructured Data Example • A company has 100,000s+ pages of proprietary documentation to enable their staff to service customers. Problem • Searching can be slow, inefficient, or lack context. Solution • Create internal chatbot with ChatGPT and a vector database enriched with company documentation to provide direction and support to employees and customers. https://osschat.io/chat
  • 47. 47 | © Copyright Zilliz 47 Well-connected in LLM infrastructure to enable RAG use cases Framework Hardware Infrastructure Embedding Models LLMs Software Infrastructure Vector Database
  • 48. 48 | © Copyright 2024 Zilliz 48 48 This week in Milvus, Towhee, Attu, GPT Cache, Gen AI, LLM, Apache NiFi, Apache Flink, Apache Kafka, ML, AI, Apache Spark, Apache Iceberg, Python, Java, Vector DB and Open Source friends. https://bit.ly/32dAJft https://github.com/milvus-io/milvus AIM Weekly by Tim Spann
  • 49. 49 | © Copyright 2024 Zilliz 49 milvus.io github.com/milvus-io/ @milvusio @paasDev /in/timothyspann Connect with me! Thank you!
  • 50. 50 | © Copyright 2024 Zilliz 50 These capabilities make us the perfect partner to uplifting your initiatives on vector search and AI/ML Data processing & connectivity Security & Availability Operational burden & resources Milvus Zilliz Tedious configuration. Manual & resource-intensive day-to-day operations to deploy, manage, and scale clusters. Custom-built security tools & integrations that creates tech & operational debt. Resource-intensive failover design & operations. Instant cluster provisioning and scaling. Automated capacity mgmt and upgrades. Improved performance for compute intensive use cases. Battle-tested and enterprise-grade security tools and compliance ready out-of-the-box. Highly available and consistent access to data across all of your environments. Siloed solutions and custom integrations, escalating complexity & costs to manage and maintain platform as it scales Well-integrated into AI and data ecosystems. Out-of-box pipeline builders transform unstructured data into searchable vectors efficiently.
  • 51. 51 | © Copyright 2024 Zilliz 51 Cool AI News OpenAI and Thrive Capital recently backed Chai Discovery, a six-month-old AI biology startup founded by ex-OpenAI and Meta researchers that raised $30 million to develop AI models for drug discovery. The details: ● Chai’s AI model, Chai-1, predicts biochemical molecule structures, potentially speeding up drug development. ● The company claims Chai-1 outperforms Google DeepMind’s AlphaFold on certain benchmarks. ● Chai-1 can work with proteins, small molecules, DNA, and RNA, making it versatile for various applications. ● Chai is making its first model free and open-source for non-commercial use. https://github.com/chaidiscovery/chai-lab
  • 52. 52 | © Copyright 2024 Zilliz 52
  • 53. 53 | © Copyright 2024 Zilliz 53
  • 54. 54 | © Copyright 2024 Zilliz 54 Join us at our next meetup! meetup.com/unstructured-data-meetup- new-york/
  • 55. 55 | © Copyright Zilliz 55 T H A N K Y O U