SlideShare a Scribd company logo
FULL TEXT SEARCH,
VECTOR SEARCH
OR BOTH?
INTRODUCTION
Bartosz Sypytkowski
▪ @horusiath@fosstodon.org
▪ b.sypytkowski@gmail.com
▪ bartoszsypytkowski.com
 Full Text Search
 Vector Search
 Tips & Tricks
 Hybrid Search
AGENDA
FULL TEXT INDEX
lexically ordered
list of “pages” containing the word
“terms” not words
“stop words” not included
PROBLEMS
WITH FULL
TEXT
SEARCH
1. Stemming is language-specific –
in .NET use NTextCat
2. Typographic errors – use n-gram
similarity
3. Different words, same meaning
4. Text only
WHAT VECTOR SEARCH IS NOT
REFERENCES
VECTOR REPRESENTATION
INTUITIVE INTRODUCTION
VECTOR REPRESENTATION #1
AVERAGE COLOUR (RGB)
99 99 93
R G B
Number of dimensions:
3
Dimension
size: 8bit
VECTOR REPRESENTATION #2
AVERAGE COLOUR (CMYK)
0 0 6 61
C M Y
Number of dimensions:
4
Dimension
size: 0-100
K
VECTOR REPRESENTATION #3
COLOUR PALETTE
A8B6BF 191A17 393836 637B97 7E6742
Number of dimensions: 5
Dimension
size: 32bit
EMBEDDING
MODELS
Model
Dimensions
(float32)
MTEB score
OpenAI
text-embedding-3-small
256
1536
62.0
62.3
OpenAI
text-embedding-3-large
1024
3072
64.1
64.6
OpenAI
text-embedding-ada-002
512
1536
61.6
61.0
Ollama
all-minilm-l6-v2
384 56.26
Ollama
mxbai-embed-large
1024 64.68
Gemini
text-embedding-004
768 68.32
VECTOR SIMILARITY
RIGHT TOOL FOR THE JOB
EUCLIDEAN DISTANCE
• Values: [0, +Inf) (smaller is better)
• Fast
• Anomaly and fraud detection
a
b
𝑑(𝑎 ,𝑏)=√(𝑎 1−𝑏1)2+(𝑎2−𝑏2)2…+(𝑎𝑛−𝑏𝑛)2
DOT PRODUCT SIMILARITY
• Value: (-Inf, +Inf) (bigger is better)
• Fast
• Image retrieval and matching
• Music recommendation
a
b
𝜕
𝑎∙𝑏=|𝑎|∨
b∨
cos𝜕
COSINE DISTANCE
• Value: [-1.0, 1.0] (bigger is better)
• Slow
• Text document similarity
• Recommendation systems
a
b
𝜕
HNSW
INDEXING VECTORS
WHY DO WE NEED TO INDEX?
BUILDING
HSNW
INDEX
Layer 0
BUILDING
HSNW
INDEX
Layer 0
JOIN EACH VECTOR WITH N CLOSEST NEIGHBOURS IN LAYER 0
BUILDING
HSNW
INDEX
Layer 0
PROMOTE M% OF THE POINTS TO CONSTRUCT UPPER LAYER
Layer 1
BUILDING
HSNW
INDEX
Layer 0
Layer 1
JOIN EACH VECTOR WITH N CLOSEST NEIGHBOURS IN LAYER 1
QUERYING HNSW INDEX
USING
HSNW
INDEX Layer 0
Layer 1
query
USING
HSNW
INDEX Layer 0
Layer 1
query
PICK THE CLOSEST NODE FROM TOP LAYER
USING
HSNW
INDEX Layer 0
Layer 1
query
MOVE TO NEXT LAYER BELOW
USING
HSNW
INDEX Layer 0
Layer 1
query
ONCE AT LAYER 0 PICK M RESULTS CLOSEST TO ORIGINAL QUERY
HNSW IN
SQLITE
.load ./vec0
CREATE VIRTUAL TABLE document_embeddings USING vec0(
embedding FLOAT[768]
);
-- query
SELECT
rowid,
distance
FROM document_embeddings
WHERE embedding MATCH ‘[0.83443, 0.15224, …]’
ORDER BY distance
HNSW IN
POSTGRESQL
CREATE EXTENSION vector;
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
embedding VECTOR(768)
);
-- create index
CREATE INDEX ON documents
USING hnsw(embedding vector_cosine_ops)
WITH (m=16, ef_construction=64);
-- query
SELECT
id,
embedding <=> ‘[0.83443, 0.15224, …]’ AS distance
FROM documents
WHERE distance > 0.7
ORDER BY distance DESC
PROBLEM #1
EMBEDDING MODELS HAVE FIXED CONTEXT WINDOW
MULTI VECTOR INDEX
V1
V2
V3
PROBLEM #2
HOW TO INDEX DOCUMENTS EDITED IN REAL TIME?
MULTI VECTOR INDEX
1. Cut content into paragraphs
MULTI VECTOR INDEX
1. Cut content into paragraphs
2. Group paragraphs into blocks by max allowed
size (ie. 8000 chars).
MULTI VECTOR INDEX
1. Cut content into paragraphs
2. Group paragraphs into blocks by max allowed
size (ie. 8000 chars).
3. Use content address hashing to identify blocks
0xae12e7
0x3902a1
0xef7312
0x06cd01
MULTI VECTOR INDEX
1. Cut content into paragraphs
2. Group paragraphs into blocks by max allowed
size (ie. 8000 chars).
3. Use content address hashing to identify blocks
4. Reindex block only when sufficient change (i.e.
>15%) was made.
0xae12e7
0x3902a1
0x43bf01
0x06cd01
MULTI VECTOR INDEX
1. Cut content into paragraphs
2. Group paragraphs into blocks by max allowed
size (ie. 8000 chars).
3. Use content address hashing to identify blocks
4. Reindex block only when sufficient change (i.e.
>15%) was made.
5. If block went under min allowed size (ie. 4000
chars), stich it to smallest adjacent block
0xae12e7
0x3902a1
0x9e721c
MULTI VECTOR INDEX
1. Cut content into paragraphs
2. Group paragraphs into blocks by max allowed
size (ie. 8000 chars).
3. Use content address hashing to identify blocks
4. Reindex block only when sufficient change (i.e.
>15%) was made.
5. If block went under min allowed size (ie. 4000
chars), stich it to smallest adjacent block
6. If block went over max allowed size, split it by
paragraphs into two halves
0xae12e7
0x3902a1
0x12e757
0x49bcd1
PROBLEMS
WITH
VECTOR
SEARCH 1. Weak at keyword search
2. Always produces results
3. CPU/Memory /Disk heavy
4. Complicated to maintain
HYBRID SEARCH
RECIPROCAL RANK FUSION
RECIPROCAL RANK FUSION
Query
full text search
vector search
RECIPROCAL RANK FUSION
Query
full text search
vector search
1 A
2 B
3 C
4 D
5 E
1 C
2 B
3 F
4 A
5 D
RECIPROCAL RANK FUSION
Query
full text search
vector search
1 A
2 B
3 C
4 D
5 E
1 C
2 B
3 F
4 A
5 D
1
1+𝑟𝑎𝑛𝑘𝐹𝑇𝑆
+
1
1+𝑟𝑎𝑛𝑘𝑉𝑆
RECIPROCAL RANK FUSION
Query
full text search
vector search
1 A
2 B
3 C
4 D
5 E
1 C
2 B
3 F
4 A
5 D
1
1+𝑟𝑎𝑛𝑘𝐹𝑇𝑆
+
1
1+𝑟𝑎𝑛𝑘𝑉𝑆
0.75 C
0.70 A
0.66 B
0.37 D
0.25 F
0.17 E
SUMMARY
 Azure AI Search: https://learn.microsoft.com/en-us/azure/search/vector-search-overview
 Hybrid search in Postgres: https://supabase.com/docs/guides/ai/hybrid-search
 Sqlite vector search extension: https://github.com/asg017/sqlite-vec
 HNSW Index explained: https://youtu.be/77QH0Y2PYKg
REFERENCES
THANK YOU

More Related Content

PDF
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
PDF
stackconf 2022: Introduction to Vector Search with Weaviate
PDF
Word2vec in Postgres
PPTX
Vector Databases and Why Are They Used in Modern AI - Marko Lohert - ATD 2024
PPTX
PgVector + : Enable Richer Interaction with vector database.pptx
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
PPTX
Information retrieval dynamic indexing
PDF
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
stackconf 2022: Introduction to Vector Search with Weaviate
Word2vec in Postgres
Vector Databases and Why Are They Used in Modern AI - Marko Lohert - ATD 2024
PgVector + : Enable Richer Interaction with vector database.pptx
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Information retrieval dynamic indexing
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI

Similar to Full text search, vector search or both? (20)

PDF
DevFest Taipei - Advanced Ticketing System.pdf
PDF
How Vector Search Transforms Information Retrieval?
PDF
Interactive Questions and Answers - London Information Retrieval Meetup
PDF
Full Text Search In PostgreSQL
PDF
Efficient Query Processing in Web Search Engines
PDF
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Text Indexing / Inverted Indices
PPTX
RAG Patterns and Vector Search in Generative AI
PPT
Web search engines
PPT
Slides
PPT
Inverted Files for Text Search Engin.ppt
PPT
Cs583 info-retrieval
PDF
Mastering Vector Search with MongoDB Atlas - Manosh Malai - Mydbops MyWebinar 39
PDF
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
PDF
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
PPTX
Candidate selection tutorial
PPTX
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
PPTX
Gpu programming with java
PPTX
Vector Search using OpenAI in Azure Cognitive Search.pptx
DevFest Taipei - Advanced Ticketing System.pdf
How Vector Search Transforms Information Retrieval?
Interactive Questions and Answers - London Information Retrieval Meetup
Full Text Search In PostgreSQL
Efficient Query Processing in Web Search Engines
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Vector Search with ScyllaDB by Szymon Wasik
Text Indexing / Inverted Indices
RAG Patterns and Vector Search in Generative AI
Web search engines
Slides
Inverted Files for Text Search Engin.ppt
Cs583 info-retrieval
Mastering Vector Search with MongoDB Atlas - Manosh Malai - Mydbops MyWebinar 39
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Candidate selection tutorial
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
Gpu programming with java
Vector Search using OpenAI in Azure Cognitive Search.pptx
Ad

More from Bartosz Sypytkowski (17)

PPTX
Service-less communication: is it possible?
PPTX
Serviceless or how to build software without servers
PPTX
Postgres indexes: how to make them work for your application
PPTX
How do databases perform live backups and point-in-time recovery
PPTX
Scaling connections in peer-to-peer applications
PPTX
Rich collaborative data structures for everyone
PPTX
Postgres indexes
PPTX
Behind modern concurrency primitives
PPTX
Collaborative eventsourcing
PPTX
Behind modern concurrency primitives
PPTX
Living in eventually consistent reality
PPTX
Virtual machines - how they work
PPTX
Short story of time
PPTX
Akka.NET streams and reactive streams
PPTX
Collaborative text editing
PPTX
The last mile from db to disk
PPTX
GraphQL - an elegant weapon... for more civilized age
Service-less communication: is it possible?
Serviceless or how to build software without servers
Postgres indexes: how to make them work for your application
How do databases perform live backups and point-in-time recovery
Scaling connections in peer-to-peer applications
Rich collaborative data structures for everyone
Postgres indexes
Behind modern concurrency primitives
Collaborative eventsourcing
Behind modern concurrency primitives
Living in eventually consistent reality
Virtual machines - how they work
Short story of time
Akka.NET streams and reactive streams
Collaborative text editing
The last mile from db to disk
GraphQL - an elegant weapon... for more civilized age
Ad

Recently uploaded (20)

PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Transform Your Business with a Software ERP System
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
medical staffing services at VALiNTRY
PPTX
assetexplorer- product-overview - presentation
PDF
Cost to Outsource Software Development in 2025
PDF
top salesforce developer skills in 2025.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
System and Network Administraation Chapter 3
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
CHAPTER 2 - PM Management and IT Context
Operating system designcfffgfgggggggvggggggggg
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Transform Your Business with a Software ERP System
Why Generative AI is the Future of Content, Code & Creativity?
medical staffing services at VALiNTRY
assetexplorer- product-overview - presentation
Cost to Outsource Software Development in 2025
top salesforce developer skills in 2025.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Reimagine Home Health with the Power of Agentic AI​
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Which alternative to Crystal Reports is best for small or large businesses.pdf
Computer Software and OS of computer science of grade 11.pptx
Odoo Companies in India – Driving Business Transformation.pdf
Design an Analysis of Algorithms I-SECS-1021-03
System and Network Administraation Chapter 3
iTop VPN Free 5.6.0.5262 Crack latest version 2025
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool

Full text search, vector search or both?