Retrieval 101

Isaac Flath

Building AI solutions and the teams that sustain them

Published Mar 18, 2025

A practical guide to implementing semantic search for technical content

The Challenge of Finding Relevant Content

Have you ever spent hours writing a blog post only to have it disappear into the void? You're not alone. I spend a lot of time making detailed tutorials, explanations, and code examples, yet much of this content is hard to find.

90% of users never venture past the first page of search results, and most scan only the top 3-5 entries before reformulating their query or abandoning the search altogether. For technical blogs, this problem is even more acute - the specialized vocabulary and conceptual relationships between topics make keyword matching particularly ineffective.

Search solutions that rely solely on finding exact words will often have huge misses. For example my post about custom tags for FastHTML may be be highly relevant to someone looking to use web components but web components is a term never mentioned directly in the blog post!

This disconnect creates a frustrating experience for both content creators and consumers. Content creators have to spend time trying to jam in keywords to make things more discoverable, and readers struggle to find the solutions they're looking for.

The question becomes: how can we make technical content discoverable by meaning rather than just keywords?

What This Article Will Deliver

In this tutorial, you'll learn how to implement a semantic search system using LanceDB. Instead of relying on exact keyword matches, you'll be able to find content based on meaning and conceptual relationships. We will go beyond the hello-world of "vector-search" and do a hybrid vector-search + keyword search approach and then re-rank final results with a cross-encoder. By the end of this tutorial, you'll know what that means and how to implement it.

By the end of this post, you'll have a complete solution that can:

Find conceptually related blog posts even when they use different terminology
Surface relevant technical content based on the intent behind a search query
Provide more accurate and helpful search results to your readers

Here's a quick preview of what we'll build:

This isn't just theory - you'll implement this system step-by-step using real blog posts, and I'll show you how to adapt it to your own content. Whether you have a personal tech blog, manage documentation for a larger project, or have anything else you want people to be able to search, this approach will help your valuable content reach the right audience

This is the foundation of a modern retrieval system and I am hard-pressed to thing of an example where you want good semantic search but would not want this as the foundation.

Vector Embeddings: The Key to Semantic Search

At the heart of semantic search is the concept of vector embeddings. AI models cannot understand words directly so we have to convert everything to numbers to compare them programatically. Let's take a simple example, maybe we give 3 words vector embeddings

Cat = [1.5, 2.5, 2.2]
Dog = [1.8, 2.6, 2.6]
Animal = [1.6, 2.4, 2.4]
Bog = [0.1, 0.3, 5.9]

You can see how using just the numbers Cat and Dog are more similar to Animal than they are to Bog. What does each specific number mean? I don't know - they don't map to human language! Similar concepts end up close to each other in this space, even if they use different words. That's what semantic search is.

In a more targetted example, the phrases "how to test Python code" and "writing unit tests for Python functions" might use different words, but their vector embeddings would be very similar because they represent the same concept.

This leads to the question: "How do you pick the numbers for each word, sentence, or article?".

Modern language models like BERT, GPT, and their derivatives can generate these embeddings by processing vast amounts of text and learning the relationships between words and concepts. We can leverage those pre-existing models to generate these embeddings (training a new model to do this is out of the scope of this tutorial).

Initial Solution Attempt: Setting Up LanceDB

We'll prepare our blog posts. For simplicity, we'll extract titles and content from our sample posts:

Credit: This post was inspired by Ben Clavie's excellent work on semantic search implementations. His talk on RAG systems at a conference provided many of the core concepts and techniques explored in this tutorial. I highly recommend checking out his original presentation for additional context and perspectives. You will be glad you did - it is well worth the time.

The next thing we need is some sort of vector embedding for each post. As mentioned before, we can use a pretrained model for this.

💡 One place to look for embedding models to use is the MTEB Leaderboard . Though be careful because many of the models are overfit to the leaderboard.

We can load all that into our vector database, .

Now, we're ready to use LanceDB to search. There are a two main steps for this

1. Create an embedding of your query or question

((384,), array([-0.01018321, -0.05370776, -0.08562952], dtype=float32))

2. Compare Similarity

Now we can search the table to find most similar embeddings

0 0.585938 1 0.810537 2 0.951037 3 0.976564 Name: _distance, dtype: float32

💡 Different models use different distance metrics. This was trained with cosine similarity per the huggingface model card so I matched to that.

3. See Results

Now we can see which post is the most similar to the query based on those embeddings

This basic implementation allows us to search our blog posts semantically. When we run a query like ""How do I make web components?", it will find the revelant post even though is not mentioned in the tutorial directly.

However, this approach has a lot of limitations.

Why the Initial Approach Isn't Enough

Our simple semantic search implementation works, but it has several limitations that make it inadequate for serious technical content:

Full document embeddings lose detail: When we embed entire blog posts, we're compressing thousands of words into a single 384-dimensional vector. This means specific technical details get lost.
Code blocks get mixed with text: Technical blogs contain a mix of explanatory text and code examples. Our current approach treats both the same way, diluting the search quality.
No re-ranking: We're using a simple bi-encoder approach with cosine similarity, but as Ben Clavié explains, this misses the nuance that cross-encoders can provide.

Chunking

To address the limitations of our initial approach, we need to implement a more sophisticated solution. Following Ben Clavié's recommendations, we'll improve our system by:

Chunking our documents by markdown sections
Adding keyword search capabilities (BM25)
Implementing a re-ranking step

Let's start with the chunking strategy, which will help us preserve more context and detail:

There are a plethora of chunking strategies you can test, but lets use the most obvious one. Let's use the chunks defined by the author of the article, and split on markdown headers.

Note: There's lots of discussions on optimal chunk length, and overlap, and different ways to split it. I recommend starting with what makes sense, and then try out different ones.

[{'title': 'Markdown', 'content': '# Markdown'}, {'title': 'Chunked', 'content': '# Chunked'}, {'title': 'Based on markdown', 'content': '## Based on markdown'}, {'title': 'Headers for RAG', 'content': '## Headers for RAG'}]

Now let's apply this chunking function to our blog posts:

The most important thing to learn from this entire guide is that when you do things, you should look at your data. Don't assume things were right. Don't assume your idea made sense. Print it out and look!

Now we can create a new LanceDB table with our chunked content:

We can then query to find the most relevant chunks instead of entire documents.

This chunking approach gives us several advantages:

Each vector now represents a more focused piece of content
We preserve the hierarchical structure of the document
We can return specific sections rather than entire posts
We stay within the token limits of our embedding model

In the next section, we'll enhance our retrieval system by implementing a hybrid search approach that combines vector similarity with keyword matching for more accurate results.

Hybrid Search

Now that we've improved our system with chunking, let's implement the next key improvements from Ben Clavié's recommendations:

Adding keyword search (BM25) alongside vector search
Implementing a re-ranking step

Let's start with implementing keyword search to complement our vector search:

BM25 is a powerful keyword-based search algorithm that works by analyzing term frequency and document length. We'll use it to complement our vector search by capturing exact keyword matches that semantic search might miss.

Split strings: 0%| | 0/66 [00:00<?, ?it/s]

BM25S Count Tokens: 0%| | 0/66 [00:00<?, ?it/s]

BM25S Compute Scores: 0%| | 0/66 [00:00<?, ?it/s]

Split strings: 0%| | 0/1 [00:00<?, ?it/s]

BM25S Retrieve: 0%| | 0/1 [00:00<?, ?it/s]

Re-ranking

So far with Vector Search a model takes in a piece of text and creates a representation of that output. This is really fast in practice because all of the documents you want to search against can have these vectors pre-calculated. Their vector representation doesn't change regardless of the user query so we calculated them up front and stored them in LanceDB.

However, cross-encoders make more powerful vector embeddings. It accomplished this by examining each query-document pair in context, which helps it understand nuanced relationships that might be missed by the initial retrieval step. For example, it can better understand when a document answers a question even if it uses different terminology.

The process works in two stages:

We use our hybrid search (vector + BM25) to efficiently retrieve a set of candidate chunks
We then apply the more computationally expensive cross-encoder to re-rank these candidates

This approach gives us the best of both worlds - the efficiency of bi-encoders for initial retrieval and the accuracy of cross-encoders for final ranking.

💡 A good safe bet (today) is to use Cohere Rerank for an API for reranking. I'm not using this for this blog post because it requires an API key and it's not neccesary for this intro tutorial, but something you should look into!

Split strings: 0%| | 0/66 [00:00<?, ?it/s]

BM25S Count Tokens: 0%| | 0/66 [00:00<?, ?it/s]

BM25S Compute Scores: 0%| | 0/66 [00:00<?, ?it/s]

Split strings: 0%| | 0/1 [00:00<?, ?it/s]

BM25S Retrieve: 0%| | 0/1 [00:00<?, ?it/s]

While our hybrid search with re-ranking is already a significant improvement, this are still lots of improvements that can be made.

Key Takeaways and Principles

Semantic search isn't magic - It's about transforming text into numbers that capture meaning. These numerical representation are flawed and cannot be relied on exclusively for every type of query.
Domain Knowledge is king - Understanding your specific domain and content type allows you to make intelligent decisions. Be a user of your own system and actually query and see responses. Do this A LOT. This allows you to start with a simple approach, identifying limitations, and systematically addressing them with targeted improvements.
Hybrid approaches outperform single methods - There is no magic answer that fixes all problems. A combination of vector search, keyword matching, and re-ranking provides significantly better results than any single approach alone. But this is just the basics and there are many more things to add based on use-case.
Chunking matters - How you divide your content has a huge impact on retrieval quality and user experience. The chunks should be meaningful units that preserve context while remaining focused enough to be useful. After chunking, look at them to see if they make sense!
The bi-encoder/cross-encoder pattern is widely applicable - The pattern of using a fast but less accurate method for initial retrieval, followed by a slower but more accurate method for refinement is super powerful
Evaluation is essential - Though we didn't cover it in this tutorial, having a way to measure search quality is critical for ongoing improvement. What gets measured getscan be improved.

Next Steps

If you've followed along to this point, you now have a powerful semantic search system for your blog posts. But there's massive room for growth and experimentation:

Extend Your Implementation

Add Metadata Filtering: Enable users to narrow results by programming language, difficulty level, or post type (e.g., "Show me only Python tutorials for beginners").
Add Multi-Modal Search: Incorporate images, diagrams, and code snippets in your search index to find visual explanations alongside text (e.g., "Find me posts with architecture diagrams for microservices").
Add Evaluation Framework: Build a systematic way to measure search quality with metrics like precision and recall to continuously improve your system. Create implicit evaluation (link tracking) or user feedback systems.
Add Query Pre-processing: Identify and prioritize technical terms and entities in user queries to better match domain-specific content (e.g., recognizing "React hooks" as a specific technical concept).
Add Query Classification: Detect the intent behind searches to provide more tailored results (e.g., distinguishing between "how to" tutorials vs conceptual explanations).
Add Query Expansion: Automatically add related technical terms to queries to improve recall (e.g., expanding "web components" to include "custom elements" and "shadow DOM").
Experiment with different chunking strategies: Test different chunking strategies and sizes to find the optimal balance between context preservation and specificity for your content.
"More Like This" Functionality: Allow users to find similar content to a post they're already reading, creating a natural exploration path through your technical content.

Resources for Further Learning

LanceDB Documentation - Dive deeper into vector database capabilities
Sentence Transformers Library - Explore more embedding models and fine-tuning
Ben Clavié's RAG Talk - The inspiration for many techniques in this post
MTEB Leaderboard - Compare embedding model performance
Rerankers Library - Explore more re-ranking options

Join the Conversation

I'd love to hear about your experiences implementing semantic search:

What chunking strategies worked best for your content?
Which embedding models performed best in your domain?
What unexpected challenges did you encounter?

Share your implementation, ask questions, or suggest improvements in the comments below or reach out on Twitter/X @isaac_flath.

Remember, the field of semantic search and retrieval is evolving rapidly. The techniques we've covered provide a solid foundation, but staying curious and experimental will keep your system at the cutting edge.

Retrieval 101

Isaac Flath

Building AI solutions and the teams that sustain them

A practical guide to implementing semantic search for technical content

The Challenge of Finding Relevant Content

What This Article Will Deliver

Vector Embeddings: The Key to Semantic Search

Initial Solution Attempt: Setting Up LanceDB

Why the Initial Approach Isn't Enough

Chunking

Hybrid Search

Re-ranking

Key Takeaways and Principles

Next Steps

Extend Your Implementation

Resources for Further Learning

Join the Conversation

Others also viewed

Complete Guide On How Google's Panda Algorithm Works

NeuronWriter vs Jasper: The Truth No One Told Me

Rytr vs Jasper Showdown: Which AI Writer Actually Delivers?

The 2025 AI Content Stack: How to Earn Citations from LLMs (Not Just Rankings)

Article Spinning: What Is It? And Does It Hurt SEO?

AI Blog Writing That Doesn’t Sound Boring

How to Make Semantic Content Briefs: A Step-by-Step Framework (with a Real Example)

Semantic Search in 2017: Looking at the Future of Content

Content Revival: How Tiny SEO Changes Can Breathe New Life into Your Articles

How to Create a Semantic Content Brief

Explore topics