[DSC Europe 24] Sofia Konchakova - Effective Context Tuning Strategies for Retrieval-Augmented Generation (RAG) Systems

SOFIA KONCHAKOVA DSC EUROPE
NOVEMBER 2024
CONTENT
MANAGEMENT
FOR RAG
SYSTEMS

ABOUT ME
ABOUT
ML for IoT
RAG Systems
MLOps
AI Consulting
Topics of Interest:

AGENDA
1
2
3
RAG 101
SHORT ABOUT DATA
RETRIEVER
MANAGEMENT

REMIND THE
BASICS
RAGOPS
WHAT IS RAG?
1
1
2

USER QUERY
RETRIEVER
RETRIEVAL AUGMENTED
GENERATION (RAG)
1
RELEVANT CONTENT EXTRACTION
PROMPT
AUGMENTATI
ON
RESPONSE
R
E
T
R
I
E
V
E
D
C
O
N
T
E
N
T
FULL PROMPT

WHY DO WE NEED RAG?
We no longer have
training costs by enabling
smaller models to retrieve
information on-demand,
eliminating the need to
store all knowledge
within the model.
Cost Reduction
It increases accuracy by
retrieving relevant, up-to-
date information during
generation, reducing
inaccuracies and
"hallucinations" often seen
in generative AI.
RAG frameworks are
versatile, enabling retrieval
from various data sources
and adapting across
domains, making them
ideal for custom
applications in different
industries.
Higher Accuracy
Flexibility

TRADITIONAL MLOPS
Business Problem
Definition
Data Management Machine Learning
Deployment &
Monitoring

RAGOPS
Data Management Model Management
Prompt
Management
System
Management &
Deployment

DATA TYPES DATA REQUIREMENTS
SHORT ABOUT DATA
1
2
2

THOUGH RAG SUPPORTS VARIOUS
DATA FORMATS, IT MAY NOT
ALWAYS BE THE BEST OPTION.
DASHBOARDS CONNECTED TO A
DATA WAREHOUSE MIGHT BE
MORE EFFICIENT FOR SOME
CASES.

REQUIREMENTS FOR THE
DATA MANAGEMENT PIPELINE
Scalability and Performance Efficiency: Efficient handling of increasing volumes of data and
complex processing tasks without performance degradation and manual intervention.
Monitoring and Observability: Continuous tracking of the performance and health of the data
pipeline using metrics, logs, and alerts.
Data Transformation and Enrichment: The ability to clean, normalize, and enrich data as it
moves through the pipeline.
Automation and Orchestration: Automating repetitive tasks, including timely data updates,
and coordinating complex workflows between different pipeline stages.

GARBAGE IN -
GARBAGE OUT!
WE WANT OUR
SYSTEM TO HAVE
ACCESS TO RELEVANT
DATA.

RETRIEVER & GENERATOR CHUNKING STRATEGIES
RETRIEVER
MANAGEMENT
1
2
2
RETRIEVAL STRATEGIES
3 RERANKING STRATEGIES
4

Retriever Generator
Responsible for fetching relevant information from
an external knowledge base based on the input
query. It acts as the system's non-parametric
"memory," allowing it to access and utilize up-to-
date or domain-specific information.
Language model that produces the final response
by conditioning on both the input query and the
retrieved documents. It generates coherent and
contextually appropriate text as an answer to the
query.
RETRIEVAL AUGMENTED
GENERATION (RAG)
1

RETRIEVAL AUGMENTED
GENERATION (RAG)
1
Reranker
Embedding
Model
Optimal Chunk
Size
Sparse vs. Dense
Retriever
Named Entity
Recognition
Similarity Metrics
Retriever

OPTIMAL CHUNK SIZE
Content Coherence vs. Granularity:
Balance between semantic preservation and retrieval precision
Large chunks: risk relevance dilution
Small chunks: may break context flow
Model Context Window Limitations:
Consider LLM token capacity constraints
Reserve space for context and response
Optimize for efficient token utilization
Query-Content Alignment:
Match chunk size to content type
Technical docs: smaller, focused chunks
Narrative content: larger, context-preserving chunks
And remember there's no one-size-fits-all approach!

CHUNKING ISN’T
ALWAYS NECESSARY!
FOR SHORTER
DOCUMENTS, IT’S
OFTEN BEST TO KEEP
THEM INTACT.

FIXED-SIZED CHUNKING RECURSIVE CHUNKING
CHUNKING ALGORITHMS
Chunks are created by setting a fixed number
of tokens per chunk, with optional overlap to
retain semantic context.
Simple and efficient
Consistent chunk sizes
Potential loss of context
Inflexible with content length
Splits text by trying larger separators (e.g.,
paragraphs) first, then progressively uses
smaller separators until reaching the desired
chunk size.
Preserves natural document
structure
Reduces context fragmentation
Higher computational overhead
More complex implementation

SEMANTIC CHUNKING AGENTIC CHUNKING
CHUNKING ALGORITHMS
Splits text based on meaningful units and
topical boundaries rather than length limits,
using embedding similarity to identify and
maintain coherent content blocks while
preserving context.
Better preservation of context
Reduced information loss
Requires sophisticated NLP models
Produces chunks of varying sizes
Uses LLMs as active agents to split text
intelligently based on content, context, and
user needs, rather than fixed rules or
similarity metrics.
More intelligent splits
Purpose-aware chunking
High computational cost &
latency
Limited standardization and
maturity as a new technology

SPARSE RETRIEVER
A sparse retriever uses keyword-based matching, creating vectors where only
present words get values while all other positions are zero.
Text:
“I am speaking at DSC Europe. I speak about RAG.”

SPARSE RETRIEVER
A sparse retriever uses keyword-based matching, creating vectors where only
present words get values while all other positions are zero.
Fast and efficient
Good for exact matches
Easy to interpret results
Low computational cost
Misses semantic relationships
Sensitive to wording variations
Poor with synonyms
No understanding of context
BM25, TF-IDF

DENSE RETRIEVER
A dense retriever uses embeddings to convert text into meaningful vectors that
capture semantic meaning.
Text:
“I am speaking at DSC Europe. I speak about RAG.”

DENSE RETRIEVER
A dense retriever uses embeddings to convert text into meaningful vectors that
capture semantic meaning.
Better semantic understanding
Handles synonyms well
Good with paraphrasing
Context-aware
Computationally expensive
Requires more storage
Harder to interpret
May miss exact matches
MTEB Leaderboard on HuggingFace
ChromaDB, FAISS

HYBRID RETRIEVER
SPARSE DENSE
H
Y
B
R
I
D
A hybrid retriever merges keyword-based
search with embedding-based search to
leverage the strengths of both approaches.
Better overall accuracy
Combines benefits of both methods
Most complex to implement
More maintenance overhead
Weaviate
Pinecone
LangChain's hybrid retrievers

BALANCED KEYWORD-FOCUSED
WITH METADATA
SEMANTIC-FOCUSED
50% 50%
70% 30%
30% 70%
DENSE SPARSE
40% 20% 40%
META
HYBRID RETRIEVER

NAMED ENTITY RECOGNITION
(NER)
Technique that identifies and classifies named entities (like names,
organizations, locations, dates) in text into predefined categories. These
extracted entities can be stored as metadata in databases, enabling
enhanced search capabilities through metadata filtering.
GLiNER, Flair, StanfordNER

TIME-BASED WEIGHTING
Method that adjusts document relevance scores based on their age, applying a
decay factor to give more weight to recent documents while gradually reducing the
importance of older ones.
Easy to implement
Prioritises fresh content
Configurable decay rates
Can over-penalise “classics”
Domain-specific tuning required
Not suitable for evergreen content

SOURCE AUTHORITY WEIGHTING
Method that modifies document relevance scores based on the credibility and reliability of
their sources, considering factors like domain expertise, publication authority and etc.
Promotes reliable sources
Supports quality control
Reduces misinformation risk
May miss emerging experts
Subjective authority tuning
Need regular updates of authority
scores

CROSS-ENCODER RERANKING
A cross-encoder is a neural architecture that processes a query and document together as a
single input to determine their relevance. Think of it as having two people watching and
discussing a movie together, rather than separately writing reviews and comparing them later.
USER QUERY RETRIEVER TOP-5 CHUNKS
BI-ENCODER

A cross-encoder is a neural architecture that processes a query and document together as a
single input to determine their relevance. Think of it as having two people watching and
discussing a movie together, rather than separately writing reviews and comparing them later.
USER QUERY RETRIEVER
TOP-50
CHUNKS
CROSS-ENCODER
RERANKER TOP-5 CHUNKS

Higher accuracy
Better understanding of query-
passage relationships
More nuanced ranking decisions
Can capture complex relevance
patterns
Computationally expensive
Slower inference time
Higher memory requirements
Not suitable for initial retriever
Sentence-Transformers, HuggingFace Transformers, PyTerrier

METRICS APPROACH
EVALUATION
1
2
2

RETRIEVER CHUNKER RERANKER
RECALL@k
PRECISION@K
COST
LATENCY
SEMANTIC COHERENCE
EVALUATION
METRICS

Extraction of a representative
slice of the real data
Evaluation Metric Definition
Creating QA data frame
Generating questions to the real
data with the LLM
Cross-check of outputs by another
LLM and human evaluator
Clarity
Specificity
Realism
EVALUATION DATASET

Baseline
Characteristics
Chunking
Choice
Chunk Size
Chunk Approach
Create Different chunks with the use of different
Chunk sizes and chunking methods
Store in local
store
Conduct
experiments
Characteristics of the
best previous iterations
Update
Embeddings
Chunk Size
Chunk Approach
Embedding the best performers of the iteration A with
the new model
Store in local
store
Conduct
experiments
Embedding
Methods
Create Sparse
Vectors
Store in local
store
Conduct
experiments
Characteristics of the
best previous iterations
Chunk Size
Chunk Approach
Implementing Sparse Vectors and Hybrid Search for the
best performers of the iteration B
Embedding
Methods
Create Sparse
Vectors
A
B
C
A
B
C
TESTING & EVALUATION

THANK
YOU!
SOFIA KONCHAKOVA DSC EUROPE
NOVEMBER 2024

[DSC Europe 24] Sofia Konchakova - Effective Context Tuning Strategies for Retrieval-Augmented Generation (RAG) Systems

More Related Content

Similar to [DSC Europe 24] Sofia Konchakova - Effective Context Tuning Strategies for Retrieval-Augmented Generation (RAG) Systems (20)

More from DataScienceConferenc1 (20)

Recently uploaded (20)

[DSC Europe 24] Sofia Konchakova - Effective Context Tuning Strategies for Retrieval-Augmented Generation (RAG) Systems