Haystack Framework: A Beginner's Guide and My Advent of Haystack Journey
Towards the end of 2024, I stumbled upon Haystack, an open-source framework by deepset designed for building production-ready LLM applications, retrieval-augmented generative (RAG) pipelines, and state-of-the-art search systems capable of intelligently navigating vast document collections. This exciting discovery came through their then-ongoing event, the "Advent of Haystack".
This article shares my insights and a brief overview of what I learned during the event. If you’re a beginner or someone curious about Retrieval-Augmented Generation (RAG) and how Large Language Models (LLMs) are utilized in production, you’ve come to the right place.
The Advent of Haystack was structured as a 10-day journey, with each day introducing a new problem or concept to explore. In this article, I’ll walk you through my experience, day by day.
All the code referred to in this article can be found at my github repository, while the official solutions can be found here.
Day 1: Understanding Haystack Pipelines and Modularity
My biggest challenge on Day 1 was understanding how Haystack works, as this was my first time exploring the framework.
After spending some time navigating through the documentation and experimenting, I realized that Haystack is built around modular components called Haystack components. Each component acts as a sort of black box—taking a specific input, performing its operation, and providing an output. The beauty of Haystack is that while these components might initially feel like black boxes, their inner workings are thoroughly documented.
The real magic lies in connecting these components, whether they're pre-built or custom ones. By linking them together, you can create your own Haystack Pipeline, a robust framework for solving complex problems. And just like that, the idea of creating intelligent search and retrieval systems felt a little less daunting—and a lot more exciting!
Okay, enough talks, how does some of this look in code? Let's understand the day 1 pipeline that answers questions based on the contents of a URL.
Setting up Components
To build a functional pipeline in Haystack, the first step was setting up the necessary components. Each component serves a specific purpose, and luckily, Haystack provides a variety of pre-built modules to get started quickly. Here's a breakdown of the components I used:
The LinkContentFetcher component fetches URLs for us. Haystack conveniently includes this out of the box:
To process HTML files, I used the HTMLToDocument converter, which transforms them into Haystack-compatible documents:
For splitting the documents into manageable chunks, the DocumentSplitter came in handy. I configured it to split text by sentences:
To rank documents based on semantic similarity to the query, I used the SentenceTransformersDiversityRanker:
The PromptBuilder component is responsible for setting up the prompt format that will be sent to the LLM model.
The Generator is the core LLM responsible for creating responses. While Haystack offers several built-in generators, I chose to experiment with localized LLMs like Ollama:
Of course, you can use any generator available in Haystack's library, depending on your needs.
Here's the OpenAIGenerator implementation:
Observe that OllamaGenerator can easily replace OpenAIGenerator (or any other Generator) and the rest of the pipeline is unaffected. This modular behavior is a key feature of Haystack.
Setting up Pipeline and Connections
With these components configured, the foundation of my Haystack pipeline was ready, But we still need to make connections and initiate our pipeline for that matter.
To do this, I created the pipeline and added each component:
Now that the pipeline was initialized, the next step was to make connections between components. Understanding the input and output of each component is crucial here. For pre-built components, Haystack's documentation provides detailed information about their behavior and expected parameters, which helped streamline this process.
For example:
The LinkContentFetcher outputs raw HTML fetched from a URL.
This HTML is passed to the HTMLToDocument converter, which generates structured documents.
The documents are split into smaller chunks by the DocumentSplitter before being ranked by the Ranker based on their relevance to the query.
The ranked documents are formatted into a query-specific prompt by the PromptBuilder, which is finally sent to the Generator to produce an answer.
Each connection ensures a seamless flow of data, transforming raw inputs into meaningful outputs step by step.
The Big Picture
This stage is a mix of technical and conceptual challenges, as we have to ensure each component's configuration matched its role in the pipeline. Once the connections were complete, the Haystack pipeline started feeling less like a collection of independent components and more like a cohesive system—ready to be queried.
Here's the example queries associated with the problem for your understanding.
Now that we have successfully executed our pipeline. Let's now check the results.
Day 2: Weaviate Integration
One of Haystack's standout features is how seamlessly it integrates with existing solutions. On Day 2, his was demonstrated by integrating our pipeline with Weaviate—an open-source vector database that combines vector search with structured filtering.
The task provided access to an exisiting WeaviateDocumentStore that contained pre-embedded vectors. Here's how I approached the integration:
Setting Up the Weaviate Document Store
The WeaviateDocumentStore served as the foundation for querying pre-embedded vectors stored in the database.
With the document store in place, I implemented a simple Retrieval-Augmented Generation (RAG) pipeline. Following the same principles as Day 1, the components were added step by step:
Querying and Filtering in Weaviate
After connecting the components and querying the pipeline, the result included the name of a collection in Weaviate. The task was to retrieve this collection and filter its data to identify items that were not marked as decoys.
Connecting to the Weaviate Client
Retrieving the Collection
Filtering Non-Decoy Items
With the collection in hand, I used Weaviate’s filtering capabilities to isolate items where the property decoy was marked as False:
The seamless connection between the pipeline and Weaviate reaffirmed how well Haystack adapts to real-world scenarios, enabling the creation of intelligent, production-ready pipelines.
Day 3: Custom Components for Multi-Querying
A fundamental concept in prompt engineering is multi-querying, which involves converting a single user query into multiple related queries. This approach has significant benefits in Retrieval-Augmented Generation (RAG) solutions as it increases the diversity of ranked documents retrieved, often leading to richer and more comprehensive results than a single query could achieve.
For Day 3, the challenge was to implement this concept by creating a custom Haystack component for multi-query generation and integrating it into the pipeline.
Creating the Custom Component
Here’s the implementation of the custom MultiQueryGenerator component:
And it's associated MultiQueryHandler:
Once the custom components were created, the next step was to seamlessly integrate it into the Haystack pipeline. The beauty of Haystack’s modular architecture is that adding custom components feels like working with pre-built ones. The main task was to understand the input and output types of each component to ensure smooth interactions between them.
I'll be honest I was rather impressed with how seemless the integration was, after creation of custom components, they behaved the same way as prebuilt components during the pipeline setup.
Day 4: Transcription with AssemblyAI
Haystack’s flexibility extends to integrating third-party services, and one of the standout integrations is with AssemblyAI, a platform for understanding human speech and voice data. On Day 4, we were tasked with leveraging AssemblyAI's transcription and summarization models to process audio data and integrate these outputs into a Haystack pipeline.
Transcribing Audio with AssemblyAI
The first step in this task was transcribing audio into text. Haystack provides a simple and effective way to convert audio files into transcribed text using their AssemblyAITranscriber component.
Here’s how the transcription process was carried out:
Summarizing the Transcription
In addition to transcription, a summarization feature is also provided. This is useful when dealing with long audio content where we only need key insights or a condensed version of the information. The summarization can be easily triggered by passing the summarization=True flag.
Here’s how I utilized it:
Creating the Pipeline for Processed Text
After obtaining the transcribed and summarized text from the audio file, I proceeded to create a Haystack pipeline to process and generate insights from the text. This part of the task was similar to how we had set up pipelines in previous days, where we connected components to perform a series of actions on the data.
You can find detailed code in the attached codebook for creating the pipeline that processes the transcribed text.
Takeaways
With transcription, summarization, and further text processing seamlessly integrated into the pipeline, I was able to build an end-to-end solution for handling audio inputs. This is a great example of how Haystack's extensibility allows for building powerful multimodal AI applications.
Day 5: Drag and Drop RAG creation with Deepset Cloud
The highlights of Day 5 was working with Deepset Cloud Studio, a powerful tool for simplifying the creation and management of Retrieval-Augmented Generation (RAG) pipelines. The task involved using Deepset Cloud Studio's user-friendly interface to quickly build and test a RAG pipeline, without needing to write code from scratch.
About Deepset Cloud
Deepset Cloud is a SaaS designed to make the process of building and deploying AI applications easy and intuitive. For this task, we were provided with sample AI files, which we had to upload into the Deepset Cloud workspace. From there, the goal was to set up a RAG pipeline to query the uploaded documents in the Playground tab.
Building the RAG Pipeline
The process of building the RAG pipeline was straightforward thanks to the predefined templates available in Deepset Studio. The template I utilized was the RAG Question Answering GPT-4o template, which provided a ready-to-go pipeline for handling question-answering tasks based on retrieval-augmented generation.
Drag and Drop: Deepset Studio allowed me to drag and drop components in the workspace. This made it incredibly easy to connect the various components and build a fully functional RAG pipelines.
Predefined Templates: With the RAG Question Answering GPT-4o template, much of the configuration was already handled for us, allowing me to focus on customizing the pipeline for specific queries and documents.
Testing in the Playground
After setting up the pipeline, I moved to the Playground tab in Deepset Studio to test the pipeline.
Takeaways
Day 5 demonstrated the power of Deepset Studio as a tool for quickly prototyping and testing RAG pipelines. The predefined templates and drag-and-drop interface allowed me to focus on the logic and structure of the pipeline.
Day 6: NVIDIA NIM microservices for Optimal Task Assignment & Delivery Organiser
Day 6 introduced the use of NVIDIA NIM (NVIDIA Inference Microservices), which are pre-trained models designed for various AI tasks. These models can be seamlessly integrated into Haystack pipelines, significantly enhancing the functionality and performance of AI applications. There were two key tasks that I worked on using NVIDIA NIM microservices in Haystack pipelines.
Task 1: Elf Task Assignment with NVIDIA Ranker
The first task involved assigning tasks to the correct Elf based on their profiles. This was achieved by using NVIDIA Ranking Models. The goal was to match the best Elf to each task based on the Elf’s profile and the requirements of the task.
While resolving a small issue I faced during this task, I made a small contribution to the Haystack Core Integrations (view the contribution here).
NVIDIA Ranker Model Integration
To rank the Elves and assign the tasks, I used the NvidiaRanker component from the Haystack
In addition to the first ranker model, we had to explore an alternative ranker model:
Evaluating the Rankers with MRR
To evaluate the ranking quality of the models, I used the DocumentMRR Evaluator, which computes the Mean Reciprocal Rank (MRR) score. MRR measures how well the first relevant document ranks in the output.
The task helped me evaluate the efficiency of different ranking models in matching Elves with tasks based on their profiles.
Task 2: Multilingual Embedding with NVIDIA API for Santa's Gift Delivery
The second task involved using multilingual embedders from the NVIDIA API catalog to find the correct delivery location for Santa’s gifts on Christmas. This was a classic retrieval task where we had to map notes to their correct locations based on semantic similarity.
NVIDIA Text Embedder Integration
I used the NvidiaTextEmbedder component from the Haystack integrations to embed both the notes and the location options:
Evaluating
Evaluation was done using cosine similarity which helped identify which location best matched the notes based on semantic meaning, ensuring the gifts were delivered to the correct place. For code check this notebook.
Day 7: LLM as a Judge and Arize-Phoenix for Tracing
On Day 7, the focus was on leveraging Arize Phoenix to trace and evaluate pipelines effectively, particularly when dealing with LLMs (Large Language Models). This day revolved around ensuring production pipeline reliability, given the non-deterministic nature of LLM outputs, and utilizing LLM as a Judge for qualitative evaluation.
Why Tracing is Essential
LLMs can produce variable outputs even when given the same input multiple times. This variability makes it crucial to:
Trace pipeline input-output interactions
Debug and evaluate performance across all components.
Arize Phoenix enables tracing at scale, offering a detailed view of pipeline operations via a UI-based dashboard.
Enabling Arize Phoenix
The Phoenix app was activated locally to enable tracing and visualization:
Tracing setup involved registering and instrumenting the pipeline:
Automatic Tracing
Once enabled, Arize Phoenix automatically traced the input-output data of every pipeline component.
This tracing provided a seamless mechanism to:
Analyze bottlenecks.
Evaluate component-level performance.
LLM as a Judge
The task utilized LLM-based evaluation to assess the pipeline’s outputs. This involved classifying outputs as correct or incorrect based on a predefined prompt.
Steps to Set Up LLM Evaluation
Extracting Spans for Evaluation
Using Phoenix to download spans related to LLM outputs:
Creating an evaluation dataframe:
Classification with LLM as a Judge
Defined evaluation rails (categories):
Configured the classification template:
Running the LLM Judge for evaluation:
Scoring the Outputs
Added a scoring mechanism to the evaluation results:
Uploading Results to Phoenix Dashboard
The evaluation results were uploaded to the Phoenix app for visualization and analysis:
Day 8: Inventory Agent Using Haystack Experimental
On Day 8, the goal was to create a specialized Inventory Management Agent using Haystack Experimental and Tool Invoker. The task introduced a chatbot capable of managing an inventory. The Agent leveraged various tools to execute inventory-related tasks effectively.
Key Features and Tools
Lookup Tool
Searches the inventory for a specific item and returns its details.
Add Item Tool
Adds new items to the inventory, accommodating origin tracking (e.g., handmade or Amazon).
Inventory Summary Tool:
Generates a summary of all items in the inventory.
Take from Inventory Tool:
Removes a specific number of units of an item from the inventory.
Price from Amazon Tool:
Uses web search and LLMs to retrieve the price of items on Amazon.
For the pipeline see notebook.
Buy from Amazon Tool:
Simulates buying an item from Amazon and adding it to the inventory.
Agent Setup
Configuring the Chat Agent The chatbot was configured to manage Santa's inventory with a whimsical Christmas tone. It utilized the tools above through Haystack's Tool Invoker and OpenAI's Chat Generator.
System Prompt The system prompt set the tone and guidelines for the chatbot:
The Agent was then executed using a while True: Loop.
Day 9: Self Reflecting Components and MongoDB Atlas Integration
The challenge for Day 9 involved building a Self-Reflecting Gift Selection Agent using Haystack and MongoDB Atlas. This agent was designed to optimize gift selections for children based on their wishlists and budget constraints. Key highlights included integrating MongoDB Atlas for semantic matching and implementing a self-reflecting mechanism to refine gift combinations.
Document Store Setup:
Data, including children's wishlists, gift metadata, and budget information, was stored in MongoDB Atlas, enabling scalable and efficient queries.(See notebook)
Self-Reflecting Custom Component
The GiftChecker component was at the heart of the self-reflection process. Its job was to evaluate the gift selection and decide whether further optimization was required
Functionality:
If the LLM marks the selection as "DONE," the gifts are finalized.
Otherwise, the gifts are flagged for further refinement.
MongoDB Atlas Integration
The agent leveraged MongoDB Atlas Vector Search for semantic matching. This allowed for efficient retrieval of wishlist-related gift options based on embeddings.
Retriever Configuration:
Pipeline Creation
The project involved constructing a pipeline to process gift selection queries and integrate the self-reflection mechanism.
Key Features of the Pipeline:
Embedding Creation: Converts input text (e.g., wishlists) into embeddings.
Semantic Matching: Retrieves relevant gifts using MongoDB Atlas vector search.
LLM Generation: Generates responses or gift suggestions.
Reflection Loop: Optimizes suggestions until the "DONE" status is achieved.
Execution Flow
Input: A child's wishlist and budget are provided.
Embedding & Retrieval: The wishlist is embedded, and relevant gift options are retrieved from MongoDB Atlas.
LLM Evaluation: The gift suggestions are assessed for relevance and budget compliance.
Reflection Mechanism:
If the selection is suboptimal, the pipeline loops back to refine it.
If the selection is satisfactory, the process is marked "DONE," and the final gifts are returned.
Day 10: Evaluation Harness
The final day of the Advent of Haystack journey was all about addressing one of the most critical stages before taking any system to production: Evaluation. Our task was to evaluate a simple Retrieval-Augmented Generation (RAG) pipeline using Haystack's RAGEvaluationHarness and generate a score report. This exercise was the perfect culmination of all the skills gained throughout the event.
The Importance of Evaluation
Evaluation is a cornerstone of any AI or NLP system, ensuring that the pipeline not only functions but also performs optimally. For this task, I was tasked to focus on measuring:
Semantic Answer Similarity: How closely the answers match the ground truth.
Faithfulness: Whether the answers are supported by retrieved documents.
Context Relevance: How relevant the answers are to the input question.
Document Recall (Single Hit): The percentage of queries for which at least one relevant document was retrieved.
Document MRR (Mean Reciprocal Rank): A measure of how well the retrieved documents are ranked.
Setting the Stage
Preparing Evaluation Data
The first step was to select a subset of evaluation samples—questions, their ground truth answers, and the associated documents.
This ensured a diverse, yet manageable dataset for evaluation.
Configuring the Evaluation Harness
Haystack’s EvaluationHarness executes a pipeline with a given set of inputs and evaluates its outputs with an evaluation pipeline using Haystack's built-in Evaluators. RAGEvaluationHarness class, derived from EvaluationHarness, simplifies the evaluation process specifically for RAG pipelines.
This setup ensured the evaluation covered all the critical aspects of a RAG pipeline’s performance.
Providing Inputs
Running the Evaluation
With the harness and inputs ready, the pipeline was executed, and the results were generated.
Exploring Variations: Adjusting top_k and Model Selection
The final challenge involved tweaking the retriever’s top_k value and experimenting with an alternative model.
top_k defines the number of documents retrieved for answer generation. Increasing or decreasing this value impacts metrics like Document Recall and Semantic Answer Similarity.
Switching to a different model highlighted trade-offs between computational efficiency and answer quality.
By comparing the score reports from these variations, I gained a deeper understanding of how different configurations influence pipeline performance.
Reflections and Closing Thoughts
Personally for me, Participating in the Advent of Haystack has been an unforgettable experience, since I knew nothing about this framework before this. Over the course of 2 weeks, I delved into the intricacies of Haystack, exploring its components, integrations, and evaluation methodologies. Each task helped me understand what different aspects to consider when building and refining LLM/NLP/RAG pipelines.
I hope through this article, few individuals get the same sense of learning that I did during this event.
Thank you for reading.
~Vedant
Driven | Project Intern @IITBHU | SIH’24 Finalist | Walmart CodeHers’25 | CodeWithCisco’25 Finalist | SSOC PA’25 | IIIT-BH CSE’26 | Pupil @Codeforces | 4⭐ @Codechef | Top 8% @LC| 4⭐ @GFG | ML & Gen-AI
7moGood work Vedant! 🔥🙌 Keep going!
Automation & AI Solutions Architect | Building Business Chatbots, CRM Workflows & Business Automations | Building with n8n & AI | AI/ML Intern @Quamine Tech solutions
7moAmazing Vedant Yadav sir
Field Engineering @ Cursor
7moGreat work Vedant! Glad you enjoyed the challenge!
Intern at Swiss AI Lab IDSIA | Former Intern at NUS Singapore, ISRO, IIM Bangalore & IIT Roorkee |ThinkSwiss Scholar'24 | IIITBH'25
7moThat's really nice Vedant Yadav. Keep going