SOFIA KONCHAKOVA DSC EUROPE
NOVEMBER 2024
CONTENT
MANAGEMENT
FOR RAG
SYSTEMS
ABOUT ME
ABOUT
ML for IoT
RAG Systems
MLOps
AI Consulting
Topics of Interest:
AGENDA
1
2
3
RAG 101
SHORT ABOUT DATA
RETRIEVER
MANAGEMENT
REMIND THE
BASICS
RAGOPS
WHAT IS RAG?
1
1
2
USER QUERY
RETRIEVER
RETRIEVAL AUGMENTED
GENERATION (RAG)
1
RELEVANT CONTENT EXTRACTION
PROMPT
AUGMENTATI
ON
RESPONSE
R
E
T
R
I
E
V
E
D
C
O
N
T
E
N
T
FULL PROMPT
WHY DO WE NEED RAG?
We no longer have
training costs by enabling
smaller models to retrieve
information on-demand,
eliminating the need to
store all knowledge
within the model.
Cost Reduction
It increases accuracy by
retrieving relevant, up-to-
date information during
generation, reducing
inaccuracies and
"hallucinations" often seen
in generative AI.
RAG frameworks are
versatile, enabling retrieval
from various data sources
and adapting across
domains, making them
ideal for custom
applications in different
industries.
Higher Accuracy
Flexibility
TRADITIONAL MLOPS
Business Problem
Definition
Data Management Machine Learning
Deployment &
Monitoring
RAGOPS
Data Management Model Management
Prompt
Management
System
Management &
Deployment
DATA TYPES DATA REQUIREMENTS
SHORT ABOUT DATA
1
2
2
PDF
WHAT DATA CAN WE USE?
THOUGH RAG SUPPORTS VARIOUS
DATA FORMATS, IT MAY NOT
ALWAYS BE THE BEST OPTION.
DASHBOARDS CONNECTED TO A
DATA WAREHOUSE MIGHT BE
MORE EFFICIENT FOR SOME
CASES.
REQUIREMENTS FOR THE
DATA MANAGEMENT PIPELINE
Scalability and Performance Efficiency: Efficient handling of increasing volumes of data and
complex processing tasks without performance degradation and manual intervention.
Monitoring and Observability: Continuous tracking of the performance and health of the data
pipeline using metrics, logs, and alerts.
Data Transformation and Enrichment: The ability to clean, normalize, and enrich data as it
moves through the pipeline.
Automation and Orchestration: Automating repetitive tasks, including timely data updates,
and coordinating complex workflows between different pipeline stages.
GARBAGE IN -
GARBAGE OUT!
WE WANT OUR
SYSTEM TO HAVE
ACCESS TO RELEVANT
DATA.
RETRIEVER & GENERATOR CHUNKING STRATEGIES
RETRIEVER
MANAGEMENT
1
2
2
RETRIEVAL STRATEGIES
3 RERANKING STRATEGIES
4
Retriever Generator
Responsible for fetching relevant information from
an external knowledge base based on the input
query. It acts as the system's non-parametric
"memory," allowing it to access and utilize up-to-
date or domain-specific information.
Language model that produces the final response
by conditioning on both the input query and the
retrieved documents. It generates coherent and
contextually appropriate text as an answer to the
query.
RETRIEVAL AUGMENTED
GENERATION (RAG)
1
RETRIEVAL AUGMENTED
GENERATION (RAG)
1
Reranker
Embedding
Model
Optimal Chunk
Size
Sparse vs. Dense
Retriever
Named Entity
Recognition
Similarity Metrics
Retriever
TO CHUNK
OR
NOT TO CHUNK?
OPTIMAL CHUNK SIZE
Content Coherence vs. Granularity:
Balance between semantic preservation and retrieval precision
Large chunks: risk relevance dilution
Small chunks: may break context flow
Model Context Window Limitations:
Consider LLM token capacity constraints
Reserve space for context and response
Optimize for efficient token utilization
Query-Content Alignment:
Match chunk size to content type
Technical docs: smaller, focused chunks
Narrative content: larger, context-preserving chunks
And remember there's no one-size-fits-all approach!
CHUNKING ISN’T
ALWAYS NECESSARY!
FOR SHORTER
DOCUMENTS, IT’S
OFTEN BEST TO KEEP
THEM INTACT.
FIXED-SIZED CHUNKING RECURSIVE CHUNKING
CHUNKING ALGORITHMS
Chunks are created by setting a fixed number
of tokens per chunk, with optional overlap to
retain semantic context.
Simple and efficient
Consistent chunk sizes
Potential loss of context
Inflexible with content length
Splits text by trying larger separators (e.g.,
paragraphs) first, then progressively uses
smaller separators until reaching the desired
chunk size.
Preserves natural document
structure
Reduces context fragmentation
Higher computational overhead
More complex implementation
SEMANTIC CHUNKING AGENTIC CHUNKING
CHUNKING ALGORITHMS
Splits text based on meaningful units and
topical boundaries rather than length limits,
using embedding similarity to identify and
maintain coherent content blocks while
preserving context.
Better preservation of context
Reduced information loss
Requires sophisticated NLP models
Produces chunks of varying sizes
Uses LLMs as active agents to split text
intelligently based on content, context, and
user needs, rather than fixed rules or
similarity metrics.
More intelligent splits
Purpose-aware chunking
High computational cost &
latency
Limited standardization and
maturity as a new technology
RETRIEVAL
STRATEGIES
SPARSE RETRIEVER
A sparse retriever uses keyword-based matching, creating vectors where only
present words get values while all other positions are zero.
Text:
“I am speaking at DSC Europe. I speak about RAG.”
SPARSE RETRIEVER
A sparse retriever uses keyword-based matching, creating vectors where only
present words get values while all other positions are zero.
Fast and efficient
Good for exact matches
Easy to interpret results
Low computational cost
Misses semantic relationships
Sensitive to wording variations
Poor with synonyms
No understanding of context
BM25, TF-IDF
DENSE RETRIEVER
A dense retriever uses embeddings to convert text into meaningful vectors that
capture semantic meaning.
Text:
“I am speaking at DSC Europe. I speak about RAG.”
DENSE RETRIEVER
A dense retriever uses embeddings to convert text into meaningful vectors that
capture semantic meaning.
Better semantic understanding
Handles synonyms well
Good with paraphrasing
Context-aware
Computationally expensive
Requires more storage
Harder to interpret
May miss exact matches
MTEB Leaderboard on HuggingFace
ChromaDB, FAISS
HYBRID RETRIEVER
SPARSE DENSE
H
Y
B
R
I
D
A hybrid retriever merges keyword-based
search with embedding-based search to
leverage the strengths of both approaches.
Better overall accuracy
Combines benefits of both methods
Most complex to implement
More maintenance overhead
Weaviate
Pinecone
LangChain's hybrid retrievers
BALANCED KEYWORD-FOCUSED
WITH METADATA
SEMANTIC-FOCUSED
50% 50%
70% 30%
30% 70%
DENSE SPARSE
40% 20% 40%
META
HYBRID RETRIEVER
NAMED ENTITY RECOGNITION
(NER)
Technique that identifies and classifies named entities (like names,
organizations, locations, dates) in text into predefined categories. These
extracted entities can be stored as metadata in databases, enabling
enhanced search capabilities through metadata filtering.
GLiNER, Flair, StanfordNER
RERANKING
STRATEGIES
TIME-BASED WEIGHTING
Method that adjusts document relevance scores based on their age, applying a
decay factor to give more weight to recent documents while gradually reducing the
importance of older ones.
Easy to implement
Prioritises fresh content
Configurable decay rates
Can over-penalise “classics”
Domain-specific tuning required
Not suitable for evergreen content
SOURCE AUTHORITY WEIGHTING
Method that modifies document relevance scores based on the credibility and reliability of
their sources, considering factors like domain expertise, publication authority and etc.
Promotes reliable sources
Supports quality control
Reduces misinformation risk
May miss emerging experts
Subjective authority tuning
Need regular updates of authority
scores
CROSS-ENCODER RERANKING
A cross-encoder is a neural architecture that processes a query and document together as a
single input to determine their relevance. Think of it as having two people watching and
discussing a movie together, rather than separately writing reviews and comparing them later.
USER QUERY RETRIEVER TOP-5 CHUNKS
BI-ENCODER
CROSS-ENCODER RERANKING
A cross-encoder is a neural architecture that processes a query and document together as a
single input to determine their relevance. Think of it as having two people watching and
discussing a movie together, rather than separately writing reviews and comparing them later.
USER QUERY RETRIEVER
TOP-50
CHUNKS
CROSS-ENCODER
RERANKER TOP-5 CHUNKS
CROSS-ENCODER RERANKING
Higher accuracy
Better understanding of query-
passage relationships
More nuanced ranking decisions
Can capture complex relevance
patterns
Computationally expensive
Slower inference time
Higher memory requirements
Not suitable for initial retriever
Sentence-Transformers, HuggingFace Transformers, PyTerrier
METRICS APPROACH
EVALUATION
1
2
2
RETRIEVER CHUNKER RERANKER
RECALL@k
PRECISION@K
COST
LATENCY
SEMANTIC COHERENCE
EVALUATION
METRICS
Extraction of a representative
slice of the real data
Evaluation Metric Definition
Creating QA data frame
Generating questions to the real
data with the LLM
Cross-check of outputs by another
LLM and human evaluator
Clarity
Specificity
Realism
EVALUATION DATASET
Baseline
Characteristics
Chunking
Choice
Chunk Size
Chunk Approach
Create Different chunks with the use of different
Chunk sizes and chunking methods
Store in local
store
Conduct
experiments
Characteristics of the
best previous iterations
Update
Embeddings
Chunk Size
Chunk Approach
Embedding the best performers of the iteration A with
the new model
Store in local
store
Conduct
experiments
Embedding
Methods
Create Sparse
Vectors
Store in local
store
Conduct
experiments
Characteristics of the
best previous iterations
Chunk Size
Chunk Approach
Implementing Sparse Vectors and Hybrid Search for the
best performers of the iteration B
Embedding
Methods
Create Sparse
Vectors
A
B
C
A
B
C
TESTING & EVALUATION
THANK
YOU!
SOFIA KONCHAKOVA DSC EUROPE
NOVEMBER 2024

More Related Content

PPTX
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
PDF
A Comprehensive Technical Report on Retrieval.pdf
PPTX
TechDayPakistan-Slides RAG with Cosmos DB.pptx
PDF
Gen AI Applications in Different Industries.pdf
PPT
webclustering engine
PPTX
Roman Kyslyi: Production RAG and GraphRAG (UA)
PDF
introductiontoragretrievalaugmentedgenerationanditsapplication-240312101523-6...
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
A Comprehensive Technical Report on Retrieval.pdf
TechDayPakistan-Slides RAG with Cosmos DB.pptx
Gen AI Applications in Different Industries.pdf
webclustering engine
Roman Kyslyi: Production RAG and GraphRAG (UA)
introductiontoragretrievalaugmentedgenerationanditsapplication-240312101523-6...

Similar to [DSC Europe 24] Sofia Konchakova - Effective Context Tuning Strategies for Retrieval-Augmented Generation (RAG) Systems (20)

PPTX
Introduction to RAG (Retrieval Augmented Generation) and its application
PPTX
AI presentation Genrative LLM for users.pptx
PPTX
AI presentation for dummies LLM Generative AI.pptx
PDF
Blending AI in Enterprise Architecture.pdf
PPTX
LlamaIndex_HassGeek_Workshop_for_AI.pptx
PDF
solulab.com-A Complete LLM Technique Comparison (2).pdf
PDF
solulab.com-A Complete LLM Technique Comparison.pdf
PPTX
Latest trends in AI and information Retrieval
PPTX
Case study of Rujhaan.com (A social news app )
PPTX
Natural Language Processing (NLP), RAG and its applications .pptx
PDF
Dynamic Multi-Agent Orchestration and Retrieval for Multi-Source Question-Ans...
PDF
"Chat with your private data using Llama3 and LLPhant in PHP", Enrico Zimuel
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
PDF
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and custom...
PPTX
A gentle exploration of Retrieval Augmented Generation
PDF
AI presentation and introduction - Retrieval Augmented Generation RAG 101
PDF
Orchestrating Multi-Agent Systems for Multi-Source Information Retrieval and ...
PDF
Orchestrating Multi-Agent Systems for Multi-Source Information Retrieval and ...
PDF
Orchestrating Multi-Agent Systems for Multi-Source Information Retrieval and ...
PPTX
Recent Trends in Semantic Search Technologies
Introduction to RAG (Retrieval Augmented Generation) and its application
AI presentation Genrative LLM for users.pptx
AI presentation for dummies LLM Generative AI.pptx
Blending AI in Enterprise Architecture.pdf
LlamaIndex_HassGeek_Workshop_for_AI.pptx
solulab.com-A Complete LLM Technique Comparison (2).pdf
solulab.com-A Complete LLM Technique Comparison.pdf
Latest trends in AI and information Retrieval
Case study of Rujhaan.com (A social news app )
Natural Language Processing (NLP), RAG and its applications .pptx
Dynamic Multi-Agent Orchestration and Retrieval for Multi-Source Question-Ans...
"Chat with your private data using Llama3 and LLPhant in PHP", Enrico Zimuel
Matinée GenAI & GraphRAG Paris - Décembre 24
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and custom...
A gentle exploration of Retrieval Augmented Generation
AI presentation and introduction - Retrieval Augmented Generation RAG 101
Orchestrating Multi-Agent Systems for Multi-Source Information Retrieval and ...
Orchestrating Multi-Agent Systems for Multi-Source Information Retrieval and ...
Orchestrating Multi-Agent Systems for Multi-Source Information Retrieval and ...
Recent Trends in Semantic Search Technologies
Ad

More from DataScienceConferenc1 (20)

PPTX
[DSC Europe 24] Anastasia Shapedko - How Alice, our intelligent personal assi...
PPTX
[DSC Europe 24] Joy Chatterjee - Balancing Personalization and Experimentatio...
PPTX
[DSC Europe 24] Pratul Chakravarty - Personalized Insights and Engagements us...
PPTX
[DSC Europe 24] Domagoj Maric - Modern Web Data Extraction: Techniques, Tools...
PPTX
[DSC Europe 24] Marcin Szymaniuk - The path to Effective Data Migration - Ove...
PPTX
[DSC Europe 24] Fran Mikulicic - Building a Data-Driven Culture: What the C-S...
PPTX
[DSC Europe 24] Sofija Pervulov - Building up the Bosch Semantic Data Lake
PDF
[DSC Europe 24] Dani Ei-Ayyas - Overcoming Loneliness with LLM Dating Assistant
PDF
[DSC Europe 24] Ewelina Kucal & Maciej Dziezyc - How to Encourage Children to...
PPTX
[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-An...
PPTX
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
PPTX
[DSC Europe 24] Sray Agarwal - 2025: year of Ai dilemma - ethics, regulations...
PDF
[DSC Europe 24] Peter Kertys & Maros Buban - Application of AI technologies i...
PPTX
[DSC Europe 24] Orsalia Andreou - Fostering Trust in AI-Driven Finance
PPTX
[DSC Europe 24] Arnault Ioualalen - AI Trustworthiness – A Path Toward Mass A...
PDF
[DSC Europe 24] Nathan Coyle - Open Data for Everybody: Social Action, Peace ...
PPTX
[DSC Europe 24] Miodrag Vladic - Revolutionizing Information Access: All Worl...
PPTX
[DSC Europe 24] Katherine Munro - Where there’s a will, there’s a way: The ma...
PPTX
[DSC Europe 24] Ana Stojkovic Knezevic - How to effectively manage AI/ML proj...
PPTX
[DSC Europe 24] Simun Sunjic & Lovro Matosevic - Empowering Sales with Intell...
[DSC Europe 24] Anastasia Shapedko - How Alice, our intelligent personal assi...
[DSC Europe 24] Joy Chatterjee - Balancing Personalization and Experimentatio...
[DSC Europe 24] Pratul Chakravarty - Personalized Insights and Engagements us...
[DSC Europe 24] Domagoj Maric - Modern Web Data Extraction: Techniques, Tools...
[DSC Europe 24] Marcin Szymaniuk - The path to Effective Data Migration - Ove...
[DSC Europe 24] Fran Mikulicic - Building a Data-Driven Culture: What the C-S...
[DSC Europe 24] Sofija Pervulov - Building up the Bosch Semantic Data Lake
[DSC Europe 24] Dani Ei-Ayyas - Overcoming Loneliness with LLM Dating Assistant
[DSC Europe 24] Ewelina Kucal & Maciej Dziezyc - How to Encourage Children to...
[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-An...
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
[DSC Europe 24] Sray Agarwal - 2025: year of Ai dilemma - ethics, regulations...
[DSC Europe 24] Peter Kertys & Maros Buban - Application of AI technologies i...
[DSC Europe 24] Orsalia Andreou - Fostering Trust in AI-Driven Finance
[DSC Europe 24] Arnault Ioualalen - AI Trustworthiness – A Path Toward Mass A...
[DSC Europe 24] Nathan Coyle - Open Data for Everybody: Social Action, Peace ...
[DSC Europe 24] Miodrag Vladic - Revolutionizing Information Access: All Worl...
[DSC Europe 24] Katherine Munro - Where there’s a will, there’s a way: The ma...
[DSC Europe 24] Ana Stojkovic Knezevic - How to effectively manage AI/ML proj...
[DSC Europe 24] Simun Sunjic & Lovro Matosevic - Empowering Sales with Intell...
Ad

Recently uploaded (20)

PPT
DU, AIS, Big Data and Data Analytics.ppt
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
SET 1 Compulsory MNH machine learning intro
PPTX
recommendation Project PPT with details attached
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
New ISO 27001_2022 standard and the changes
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PPTX
ai agent creaction with langgraph_presentation_
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
Caseware_IDEA_Detailed_Presentation.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
A biomechanical Functional analysis of the masitary muscles in man
PPTX
MBA JAPAN: 2025 the University of Waseda
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
statsppt this is statistics ppt for giving knowledge about this topic
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
DU, AIS, Big Data and Data Analytics.ppt
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
CYBER SECURITY the Next Warefare Tactics
SET 1 Compulsory MNH machine learning intro
recommendation Project PPT with details attached
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
New ISO 27001_2022 standard and the changes
retention in jsjsksksksnbsndjddjdnFPD.pptx
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
ai agent creaction with langgraph_presentation_
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Caseware_IDEA_Detailed_Presentation.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
A biomechanical Functional analysis of the masitary muscles in man
MBA JAPAN: 2025 the University of Waseda
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
statsppt this is statistics ppt for giving knowledge about this topic
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx

[DSC Europe 24] Sofia Konchakova - Effective Context Tuning Strategies for Retrieval-Augmented Generation (RAG) Systems

  • 1. SOFIA KONCHAKOVA DSC EUROPE NOVEMBER 2024 CONTENT MANAGEMENT FOR RAG SYSTEMS
  • 2. ABOUT ME ABOUT ML for IoT RAG Systems MLOps AI Consulting Topics of Interest:
  • 3. AGENDA 1 2 3 RAG 101 SHORT ABOUT DATA RETRIEVER MANAGEMENT
  • 5. USER QUERY RETRIEVER RETRIEVAL AUGMENTED GENERATION (RAG) 1 RELEVANT CONTENT EXTRACTION PROMPT AUGMENTATI ON RESPONSE R E T R I E V E D C O N T E N T FULL PROMPT
  • 6. WHY DO WE NEED RAG? We no longer have training costs by enabling smaller models to retrieve information on-demand, eliminating the need to store all knowledge within the model. Cost Reduction It increases accuracy by retrieving relevant, up-to- date information during generation, reducing inaccuracies and "hallucinations" often seen in generative AI. RAG frameworks are versatile, enabling retrieval from various data sources and adapting across domains, making them ideal for custom applications in different industries. Higher Accuracy Flexibility
  • 7. TRADITIONAL MLOPS Business Problem Definition Data Management Machine Learning Deployment & Monitoring
  • 8. RAGOPS Data Management Model Management Prompt Management System Management & Deployment
  • 9. DATA TYPES DATA REQUIREMENTS SHORT ABOUT DATA 1 2 2
  • 10. PDF WHAT DATA CAN WE USE?
  • 11. THOUGH RAG SUPPORTS VARIOUS DATA FORMATS, IT MAY NOT ALWAYS BE THE BEST OPTION. DASHBOARDS CONNECTED TO A DATA WAREHOUSE MIGHT BE MORE EFFICIENT FOR SOME CASES.
  • 12. REQUIREMENTS FOR THE DATA MANAGEMENT PIPELINE Scalability and Performance Efficiency: Efficient handling of increasing volumes of data and complex processing tasks without performance degradation and manual intervention. Monitoring and Observability: Continuous tracking of the performance and health of the data pipeline using metrics, logs, and alerts. Data Transformation and Enrichment: The ability to clean, normalize, and enrich data as it moves through the pipeline. Automation and Orchestration: Automating repetitive tasks, including timely data updates, and coordinating complex workflows between different pipeline stages.
  • 13. GARBAGE IN - GARBAGE OUT! WE WANT OUR SYSTEM TO HAVE ACCESS TO RELEVANT DATA.
  • 14. RETRIEVER & GENERATOR CHUNKING STRATEGIES RETRIEVER MANAGEMENT 1 2 2 RETRIEVAL STRATEGIES 3 RERANKING STRATEGIES 4
  • 15. Retriever Generator Responsible for fetching relevant information from an external knowledge base based on the input query. It acts as the system's non-parametric "memory," allowing it to access and utilize up-to- date or domain-specific information. Language model that produces the final response by conditioning on both the input query and the retrieved documents. It generates coherent and contextually appropriate text as an answer to the query. RETRIEVAL AUGMENTED GENERATION (RAG) 1
  • 16. RETRIEVAL AUGMENTED GENERATION (RAG) 1 Reranker Embedding Model Optimal Chunk Size Sparse vs. Dense Retriever Named Entity Recognition Similarity Metrics Retriever
  • 18. OPTIMAL CHUNK SIZE Content Coherence vs. Granularity: Balance between semantic preservation and retrieval precision Large chunks: risk relevance dilution Small chunks: may break context flow Model Context Window Limitations: Consider LLM token capacity constraints Reserve space for context and response Optimize for efficient token utilization Query-Content Alignment: Match chunk size to content type Technical docs: smaller, focused chunks Narrative content: larger, context-preserving chunks And remember there's no one-size-fits-all approach!
  • 19. CHUNKING ISN’T ALWAYS NECESSARY! FOR SHORTER DOCUMENTS, IT’S OFTEN BEST TO KEEP THEM INTACT.
  • 20. FIXED-SIZED CHUNKING RECURSIVE CHUNKING CHUNKING ALGORITHMS Chunks are created by setting a fixed number of tokens per chunk, with optional overlap to retain semantic context. Simple and efficient Consistent chunk sizes Potential loss of context Inflexible with content length Splits text by trying larger separators (e.g., paragraphs) first, then progressively uses smaller separators until reaching the desired chunk size. Preserves natural document structure Reduces context fragmentation Higher computational overhead More complex implementation
  • 21. SEMANTIC CHUNKING AGENTIC CHUNKING CHUNKING ALGORITHMS Splits text based on meaningful units and topical boundaries rather than length limits, using embedding similarity to identify and maintain coherent content blocks while preserving context. Better preservation of context Reduced information loss Requires sophisticated NLP models Produces chunks of varying sizes Uses LLMs as active agents to split text intelligently based on content, context, and user needs, rather than fixed rules or similarity metrics. More intelligent splits Purpose-aware chunking High computational cost & latency Limited standardization and maturity as a new technology
  • 23. SPARSE RETRIEVER A sparse retriever uses keyword-based matching, creating vectors where only present words get values while all other positions are zero. Text: “I am speaking at DSC Europe. I speak about RAG.”
  • 24. SPARSE RETRIEVER A sparse retriever uses keyword-based matching, creating vectors where only present words get values while all other positions are zero. Fast and efficient Good for exact matches Easy to interpret results Low computational cost Misses semantic relationships Sensitive to wording variations Poor with synonyms No understanding of context BM25, TF-IDF
  • 25. DENSE RETRIEVER A dense retriever uses embeddings to convert text into meaningful vectors that capture semantic meaning. Text: “I am speaking at DSC Europe. I speak about RAG.”
  • 26. DENSE RETRIEVER A dense retriever uses embeddings to convert text into meaningful vectors that capture semantic meaning. Better semantic understanding Handles synonyms well Good with paraphrasing Context-aware Computationally expensive Requires more storage Harder to interpret May miss exact matches MTEB Leaderboard on HuggingFace ChromaDB, FAISS
  • 27. HYBRID RETRIEVER SPARSE DENSE H Y B R I D A hybrid retriever merges keyword-based search with embedding-based search to leverage the strengths of both approaches. Better overall accuracy Combines benefits of both methods Most complex to implement More maintenance overhead Weaviate Pinecone LangChain's hybrid retrievers
  • 28. BALANCED KEYWORD-FOCUSED WITH METADATA SEMANTIC-FOCUSED 50% 50% 70% 30% 30% 70% DENSE SPARSE 40% 20% 40% META HYBRID RETRIEVER
  • 29. NAMED ENTITY RECOGNITION (NER) Technique that identifies and classifies named entities (like names, organizations, locations, dates) in text into predefined categories. These extracted entities can be stored as metadata in databases, enabling enhanced search capabilities through metadata filtering. GLiNER, Flair, StanfordNER
  • 31. TIME-BASED WEIGHTING Method that adjusts document relevance scores based on their age, applying a decay factor to give more weight to recent documents while gradually reducing the importance of older ones. Easy to implement Prioritises fresh content Configurable decay rates Can over-penalise “classics” Domain-specific tuning required Not suitable for evergreen content
  • 32. SOURCE AUTHORITY WEIGHTING Method that modifies document relevance scores based on the credibility and reliability of their sources, considering factors like domain expertise, publication authority and etc. Promotes reliable sources Supports quality control Reduces misinformation risk May miss emerging experts Subjective authority tuning Need regular updates of authority scores
  • 33. CROSS-ENCODER RERANKING A cross-encoder is a neural architecture that processes a query and document together as a single input to determine their relevance. Think of it as having two people watching and discussing a movie together, rather than separately writing reviews and comparing them later. USER QUERY RETRIEVER TOP-5 CHUNKS BI-ENCODER
  • 34. CROSS-ENCODER RERANKING A cross-encoder is a neural architecture that processes a query and document together as a single input to determine their relevance. Think of it as having two people watching and discussing a movie together, rather than separately writing reviews and comparing them later. USER QUERY RETRIEVER TOP-50 CHUNKS CROSS-ENCODER RERANKER TOP-5 CHUNKS
  • 35. CROSS-ENCODER RERANKING Higher accuracy Better understanding of query- passage relationships More nuanced ranking decisions Can capture complex relevance patterns Computationally expensive Slower inference time Higher memory requirements Not suitable for initial retriever Sentence-Transformers, HuggingFace Transformers, PyTerrier
  • 38. Extraction of a representative slice of the real data Evaluation Metric Definition Creating QA data frame Generating questions to the real data with the LLM Cross-check of outputs by another LLM and human evaluator Clarity Specificity Realism EVALUATION DATASET
  • 39. Baseline Characteristics Chunking Choice Chunk Size Chunk Approach Create Different chunks with the use of different Chunk sizes and chunking methods Store in local store Conduct experiments Characteristics of the best previous iterations Update Embeddings Chunk Size Chunk Approach Embedding the best performers of the iteration A with the new model Store in local store Conduct experiments Embedding Methods Create Sparse Vectors Store in local store Conduct experiments Characteristics of the best previous iterations Chunk Size Chunk Approach Implementing Sparse Vectors and Hybrid Search for the best performers of the iteration B Embedding Methods Create Sparse Vectors A B C A B C TESTING & EVALUATION
  • 40. THANK YOU! SOFIA KONCHAKOVA DSC EUROPE NOVEMBER 2024