SlideShare a Scribd company logo
1 | © Copyright 11/17/23 Zilliz
1 | © Copyright 11/17/23 Zilliz
1 | © Copyright 11/17/23 Zilliz
1 | © Copyright 11/17/23 Zilliz
Speaker
Christy Bergman
Developer Advocate, Zilliz
https://www.linkedin.com/in/christybergman/
https://github.com/milvus-io/milvus
discord: https://discord.gg/FjCMmaJng6
2 | © Copyright 11/17/23 Zilliz
2 | © Copyright 11/17/23 Zilliz
Image source: https://thedataquarry.com/posts/vector-db-1/
3 | © Copyright 11/17/23 Zilliz
3 | © Copyright 11/17/23 Zilliz
3 Pillars of Generative AI
4 | © Copyright 11/17/23 Zilliz
4 | © Copyright 11/17/23 Zilliz
3 Pillars of Generative AI
5 | © Copyright 11/17/23 Zilliz
5 | © Copyright 11/17/23 Zilliz
Opportunities in Unstructured Data
6 | © Copyright 11/17/23 Zilliz
6 | © Copyright 11/17/23 Zilliz
7 | © Copyright 11/17/23 Zilliz
7 | © Copyright 11/17/23 Zilliz
T H A N K Y O U
󰚥 We need your stars!
https://github.com/milvus-io/milvus
💬Join our discord: https://discord.gg/FjCMmaJng6
8 | © Copyright 11/17/23 Zilliz
8 | © Copyright 11/17/23 Zilliz
AGENDA
01 AI Hallucinations and RAG
03
04 RAG Evaluation Methods
02 4 Challenges
Demo
9 | © Copyright 11/17/23 Zilliz
9 | © Copyright 11/17/23 Zilliz
01
AI Hallucinations
and RAG
10 | © Copyright 11/17/23 Zilliz
10 | © Copyright 11/17/23 Zilliz
Example AI Hallucination
gemini
wikipedia
11 | © Copyright 11/17/23 Zilliz
11 | © Copyright 11/17/23 Zilliz
Example AI Hallucination
gemini
wikipedia
hallucinated
answer
12 | © Copyright 11/17/23 Zilliz
12 | © Copyright 11/17/23 Zilliz
Why do models hallucinate?
• The reason LLMs
hallucinate is because
…
• They are trained on
sequences of words
(tokens)
Sample Data
The hamster cabinet …
!!@#%# …
Monkey eats shark …
trees in the moons…
13 | © Copyright 11/17/23 Zilliz
13 | © Copyright 11/17/23 Zilliz
Vector
Database
Where do Vectors Come From?
Unstructured Data
Embeddings here
Pre-trained Deep
Learning Models
Vectors
14 | © Copyright 11/17/23 Zilliz
14 | © Copyright 11/17/23 Zilliz
Where do Vectors Come From?
Unstructured Data Vectors
15 | © Copyright 11/17/23 Zilliz
15 | © Copyright 11/17/23 Zilliz
Semantic Similarity
Image from Sutor et al
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Man = [0.5, 0.2]
Queen - Woman + Man = King
Queen = [0.3, 0.9]
- Woman = [0.3, 0.4]
[0.0, 0.5]
+ Man = [0.5, 0.2]
King = [0.5, 0.7]
Man = [0.5, 0.2]
16 | © Copyright 11/17/23 Zilliz
16 | © Copyright 11/17/23 Zilliz
Retrieval Augmented Generation (RAG)
Your Data
Embedding Model
Vector Database
Question
Question + Context
Search
Gen AI Model
Reliable Answers
What is the default
AUTOINDEX distance
metric in Milvus?
The default
AUTOINDEX distance
metric in Milvus is L2.
17 | © Copyright 11/17/23 Zilliz
17 | © Copyright 11/17/23 Zilliz
Conversation
Data
Documentation
Data
Lecture or Q/A
Data
Pain Point #3: Chunking
18 | © Copyright 11/17/23 Zilliz
18 | © Copyright 11/17/23 Zilliz
Conversation
Data
Documentation
Data
Question Answer
Data
add
conversation
memory
use Q&A tuple
formatting
Pain Point #3: Chunking
19 | © Copyright 11/17/23 Zilliz
19 | © Copyright 11/17/23 Zilliz
Pain Point #3: Chunks need more context
Tesla Roadster
2018
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem
2023
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem
Chunk #1
Chunk #2
Naive Chunks
20 | © Copyright 11/17/23 Zilliz
20 | © Copyright 11/17/23 Zilliz
Tesla Roadster
2018
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem
2023
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem
Tesla Roadster 2018
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tem
Tesla Roadster 2023
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tem
HTMLHeaderTextSplitter
ParentDocumentRetriever
Title 2-levels above
Title 1-level above
Naive Chunks Better Chunks
HierarchicalNodeParser
AutoMergingRetriever
Pain Point #3: Chunks need more context
21 | © Copyright 11/17/23 Zilliz
21 | © Copyright 11/17/23 Zilliz
Pain Point #3: Chunks need more context
Naive Chunks
Better Chunks
22 | © Copyright 11/17/23 Zilliz
22 | © Copyright 11/17/23 Zilliz
04
RAG Evaluation
Methods
23 | © Copyright 11/17/23 Zilliz
23 | © Copyright 11/17/23 Zilliz
Foundation Model Evals vs Production System Evals
Your RAG system
Arena Elo score
24 | © Copyright 11/17/23 Zilliz
24 | © Copyright 11/17/23 Zilliz
RAG Evaluation Methods
https://arxiv.org/pdf/2306.05685.pdf
GPT-4 favors itself with a 10% higher
win rate; Claude-v1 favors itself with a
25% higher win rate
Open weight Prometheus-eval aligns
with human judgments up to 85% as
of May 2024.
25 | © Copyright 11/17/23 Zilliz
25 | © Copyright 11/17/23 Zilliz
Known Problems with LLM-as-Judge
https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG
GPT-4 is not a good
judge of
comprehensiveness
GPT-4
Matches
Human
judgements on
Correctness &
Readability
26 | © Copyright 11/17/23 Zilliz
26 | © Copyright 11/17/23 Zilliz
Known Problems with LLM-as-Judge
https://arxiv.org/pdf/2305.17926
AI scores
max/min higher
Humans
score
medians
higher
27 | © Copyright 11/17/23 Zilliz
27 | © Copyright 11/17/23 Zilliz
RAG Evaluation Methods
https://github.com/explodinggradients/ragas
faithfulness
context_precision
context_recall
Query
Context
answer_relevancy
Ground Truth
Answer
answer_correctness
answer_similarity
Response
28 | © Copyright 11/17/23 Zilliz
28 | © Copyright 11/17/23 Zilliz
05 Demo RAG Eval
29 | © Copyright 11/17/23 Zilliz
29 | © Copyright 11/17/23 Zilliz
RETRIEVAL +46%, GENERATION +6%
####################################################
# Avg Context Precision htmlsplitter score = 0.67 (46% improvement)
# Avg Context Precision simple score = 0.46
####################################################
####################################################
# Avg mistralai mixtral_8x7b_instruct score = 0.7031 (6% improvement over
gpt-3.5-turbo)
# Avg llama3_70b_anyscale_chat score = 0.6888
# Avg llama3_70b_groq_instruct score = 0.6867
# Avg llama_3_70b_octoai_instruct score = 0.6863
# Avg llama_3_8b_ollama_instruct score = 0.6783
# Avg openai gpt-3.5-turbo score = 0.665
####################################################

More Related Content

PPTX
Gartner Talk on AI Transformation & Innovation
PDF
Introduction to Open Source RAG and RAG Evaluation
PPTX
Introduction to Machine Learning
PPTX
Machine Learning Using Python
PDF
Vector database
PDF
Real-world coding with GitHub Copilot: tips & tricks
PDF
Data Architecture PowerPoint Presentation Slides
PPTX
Mock Test Listening TOEFL ITP
Gartner Talk on AI Transformation & Innovation
Introduction to Open Source RAG and RAG Evaluation
Introduction to Machine Learning
Machine Learning Using Python
Vector database
Real-world coding with GitHub Copilot: tips & tricks
Data Architecture PowerPoint Presentation Slides
Mock Test Listening TOEFL ITP

What's hot (20)

PDF
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
PDF
Using MLOps to Bring ML to Production/The Promise of MLOps
PDF
Jeff Maruschek: How does RAG REALLY work?
PDF
Build an LLM-powered application using LangChain.pdf
PDF
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
PPTX
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
PPTX
Journey of Generative AI
PDF
Anomaly detection
PPTX
NoSQL Data Architecture Patterns
PDF
Build an LLM-powered application using LangChain.pdf
PDF
Explainability and bias in AI
PDF
Active Retrieval Augmented Generation.pdf
PDF
Holland & Barrett: Gen AI Prompt Engineering for Tech teams
PDF
Intro to LLMs
PDF
Autonomous Data Warehouse
PDF
Data Versioning and Reproducible ML with DVC and MLflow
PDF
General introduction to AI ML DL DS
PPTX
An Introduction to XAI! Towards Trusting Your ML Models!
PDF
NOSQL- Presentation on NoSQL
PDF
10 Lessons Learned from Building Machine Learning Systems
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Using MLOps to Bring ML to Production/The Promise of MLOps
Jeff Maruschek: How does RAG REALLY work?
Build an LLM-powered application using LangChain.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
Journey of Generative AI
Anomaly detection
NoSQL Data Architecture Patterns
Build an LLM-powered application using LangChain.pdf
Explainability and bias in AI
Active Retrieval Augmented Generation.pdf
Holland & Barrett: Gen AI Prompt Engineering for Tech teams
Intro to LLMs
Autonomous Data Warehouse
Data Versioning and Reproducible ML with DVC and MLflow
General introduction to AI ML DL DS
An Introduction to XAI! Towards Trusting Your ML Models!
NOSQL- Presentation on NoSQL
10 Lessons Learned from Building Machine Learning Systems
Ad

Similar to Retrieval Augmented Generation Evaluation with Ragas (20)

PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
PDF
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
Constrained Sampling from Large Language Models: Producing Structured Output
PDF
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
NYC Meetup Unstructured Data Processing From Cloud to Edge (Milvus)
PDF
08-13-2024 NYC Meetup Unstructured Data Processing From Cloud to Edge (Milvus)
PDF
2025-02-24 - AWS meetup - Zilliz presentation.pdf
PDF
Combining Lexical and Semantic Search with Milvus 2.5
PDF
Supercharge Spark: Unleashing Big Data Potential with Milvus for RAG systems
PDF
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
PDF
09-18-2024 NYC Meetup Vector Databases 102
PDF
Introduction to Large Language Model Customization.pdf
PDF
2025-04-05 - Block71 Event - The Landscape of GenAI and Ecosystem.pdf
PDF
A Beginners Guide to Building a RAG App Using Open Source Milvus
PDF
What Makes "Deep Research"? A Dive into AI Agents
PDF
GraphRAG Agents with Neo4j, Milvus and GPT4
PDF
Scaling Vector Search: How Milvus Handles Billions+
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Constrained Sampling from Large Language Models: Producing Structured Output
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
NYC Meetup Unstructured Data Processing From Cloud to Edge (Milvus)
08-13-2024 NYC Meetup Unstructured Data Processing From Cloud to Edge (Milvus)
2025-02-24 - AWS meetup - Zilliz presentation.pdf
Combining Lexical and Semantic Search with Milvus 2.5
Supercharge Spark: Unleashing Big Data Potential with Milvus for RAG systems
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
09-18-2024 NYC Meetup Vector Databases 102
Introduction to Large Language Model Customization.pdf
2025-04-05 - Block71 Event - The Landscape of GenAI and Ecosystem.pdf
A Beginners Guide to Building a RAG App Using Open Source Milvus
What Makes "Deep Research"? A Dive into AI Agents
GraphRAG Agents with Neo4j, Milvus and GPT4
Scaling Vector Search: How Milvus Handles Billions+
Ad

More from Zilliz (20)

PDF
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
PDF
Zilliz Cloud Demo for performance and scale
PDF
Open Source Milvus Vector Database v 2.6
PDF
Zilliz Cloud Monthly Technical Review: May 2025
PDF
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
PDF
Hands-on Tutorial: Building an Agent to Reason about Private Data with OpenAI...
PDF
Webinar - Zilliz Cloud Monthly Demo - March 2025
PDF
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
PDF
February Product Demo: Discover the Power of Zilliz Cloud
PDF
Full Text Search with Milvus 2.5 - UD Meetup Berlin Jan 23
PDF
Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus
PDF
Voice-to-Value- LLM-Powered Customer Interaction Analysis.pdf
PDF
Accelerate AI Agents with Multimodal RAG powered by Friendli Endpoints and Mi...
PDF
1 Table = 1000 Words? Foundation Models for Tabular Data
PDF
How Milvus allows you to run Full Text Search
PDF
How to Optimize Your Embedding Model Selection and Development through TDA Cl...
PDF
Milvus: Scaling Vector Data Solutions for Gen AI
PDF
Keeping Data Fresh: Mastering Updates in Vector Databases
PDF
Using LLM Agents with Llama 3.2, LangGraph and Milvus
PDF
Milvus 2.5: Full-Text Search, More Powerful Metadata Filtering, and more!
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz Cloud Demo for performance and scale
Open Source Milvus Vector Database v 2.6
Zilliz Cloud Monthly Technical Review: May 2025
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Hands-on Tutorial: Building an Agent to Reason about Private Data with OpenAI...
Webinar - Zilliz Cloud Monthly Demo - March 2025
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
February Product Demo: Discover the Power of Zilliz Cloud
Full Text Search with Milvus 2.5 - UD Meetup Berlin Jan 23
Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus
Voice-to-Value- LLM-Powered Customer Interaction Analysis.pdf
Accelerate AI Agents with Multimodal RAG powered by Friendli Endpoints and Mi...
1 Table = 1000 Words? Foundation Models for Tabular Data
How Milvus allows you to run Full Text Search
How to Optimize Your Embedding Model Selection and Development through TDA Cl...
Milvus: Scaling Vector Data Solutions for Gen AI
Keeping Data Fresh: Mastering Updates in Vector Databases
Using LLM Agents with Llama 3.2, LangGraph and Milvus
Milvus 2.5: Full-Text Search, More Powerful Metadata Filtering, and more!

Recently uploaded (20)

PPTX
1. Introduction to Computer Programming.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
cuic standard and advanced reporting.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Approach and Philosophy of On baking technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Empathic Computing: Creating Shared Understanding
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Spectroscopy.pptx food analysis technology
1. Introduction to Computer Programming.pptx
Getting Started with Data Integration: FME Form 101
cuic standard and advanced reporting.pdf
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Assigned Numbers - 2025 - Bluetooth® Document
Approach and Philosophy of On baking technology
Digital-Transformation-Roadmap-for-Companies.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Spectral efficient network and resource selection model in 5G networks
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Spectroscopy.pptx food analysis technology

Retrieval Augmented Generation Evaluation with Ragas

  • 1. 1 | © Copyright 11/17/23 Zilliz 1 | © Copyright 11/17/23 Zilliz 1 | © Copyright 11/17/23 Zilliz 1 | © Copyright 11/17/23 Zilliz Speaker Christy Bergman Developer Advocate, Zilliz https://www.linkedin.com/in/christybergman/ https://github.com/milvus-io/milvus discord: https://discord.gg/FjCMmaJng6
  • 2. 2 | © Copyright 11/17/23 Zilliz 2 | © Copyright 11/17/23 Zilliz Image source: https://thedataquarry.com/posts/vector-db-1/
  • 3. 3 | © Copyright 11/17/23 Zilliz 3 | © Copyright 11/17/23 Zilliz 3 Pillars of Generative AI
  • 4. 4 | © Copyright 11/17/23 Zilliz 4 | © Copyright 11/17/23 Zilliz 3 Pillars of Generative AI
  • 5. 5 | © Copyright 11/17/23 Zilliz 5 | © Copyright 11/17/23 Zilliz Opportunities in Unstructured Data
  • 6. 6 | © Copyright 11/17/23 Zilliz 6 | © Copyright 11/17/23 Zilliz
  • 7. 7 | © Copyright 11/17/23 Zilliz 7 | © Copyright 11/17/23 Zilliz T H A N K Y O U 󰚥 We need your stars! https://github.com/milvus-io/milvus 💬Join our discord: https://discord.gg/FjCMmaJng6
  • 8. 8 | © Copyright 11/17/23 Zilliz 8 | © Copyright 11/17/23 Zilliz AGENDA 01 AI Hallucinations and RAG 03 04 RAG Evaluation Methods 02 4 Challenges Demo
  • 9. 9 | © Copyright 11/17/23 Zilliz 9 | © Copyright 11/17/23 Zilliz 01 AI Hallucinations and RAG
  • 10. 10 | © Copyright 11/17/23 Zilliz 10 | © Copyright 11/17/23 Zilliz Example AI Hallucination gemini wikipedia
  • 11. 11 | © Copyright 11/17/23 Zilliz 11 | © Copyright 11/17/23 Zilliz Example AI Hallucination gemini wikipedia hallucinated answer
  • 12. 12 | © Copyright 11/17/23 Zilliz 12 | © Copyright 11/17/23 Zilliz Why do models hallucinate? • The reason LLMs hallucinate is because … • They are trained on sequences of words (tokens) Sample Data The hamster cabinet … !!@#%# … Monkey eats shark … trees in the moons…
  • 13. 13 | © Copyright 11/17/23 Zilliz 13 | © Copyright 11/17/23 Zilliz Vector Database Where do Vectors Come From? Unstructured Data Embeddings here Pre-trained Deep Learning Models Vectors
  • 14. 14 | © Copyright 11/17/23 Zilliz 14 | © Copyright 11/17/23 Zilliz Where do Vectors Come From? Unstructured Data Vectors
  • 15. 15 | © Copyright 11/17/23 Zilliz 15 | © Copyright 11/17/23 Zilliz Semantic Similarity Image from Sutor et al Woman = [0.3, 0.4] Queen = [0.3, 0.9] King = [0.5, 0.7] Woman = [0.3, 0.4] Queen = [0.3, 0.9] King = [0.5, 0.7] Man = [0.5, 0.2] Queen - Woman + Man = King Queen = [0.3, 0.9] - Woman = [0.3, 0.4] [0.0, 0.5] + Man = [0.5, 0.2] King = [0.5, 0.7] Man = [0.5, 0.2]
  • 16. 16 | © Copyright 11/17/23 Zilliz 16 | © Copyright 11/17/23 Zilliz Retrieval Augmented Generation (RAG) Your Data Embedding Model Vector Database Question Question + Context Search Gen AI Model Reliable Answers What is the default AUTOINDEX distance metric in Milvus? The default AUTOINDEX distance metric in Milvus is L2.
  • 17. 17 | © Copyright 11/17/23 Zilliz 17 | © Copyright 11/17/23 Zilliz Conversation Data Documentation Data Lecture or Q/A Data Pain Point #3: Chunking
  • 18. 18 | © Copyright 11/17/23 Zilliz 18 | © Copyright 11/17/23 Zilliz Conversation Data Documentation Data Question Answer Data add conversation memory use Q&A tuple formatting Pain Point #3: Chunking
  • 19. 19 | © Copyright 11/17/23 Zilliz 19 | © Copyright 11/17/23 Zilliz Pain Point #3: Chunks need more context Tesla Roadster 2018 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem 2023 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem Chunk #1 Chunk #2 Naive Chunks
  • 20. 20 | © Copyright 11/17/23 Zilliz 20 | © Copyright 11/17/23 Zilliz Tesla Roadster 2018 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem 2023 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem Tesla Roadster 2018 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem Tesla Roadster 2023 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem HTMLHeaderTextSplitter ParentDocumentRetriever Title 2-levels above Title 1-level above Naive Chunks Better Chunks HierarchicalNodeParser AutoMergingRetriever Pain Point #3: Chunks need more context
  • 21. 21 | © Copyright 11/17/23 Zilliz 21 | © Copyright 11/17/23 Zilliz Pain Point #3: Chunks need more context Naive Chunks Better Chunks
  • 22. 22 | © Copyright 11/17/23 Zilliz 22 | © Copyright 11/17/23 Zilliz 04 RAG Evaluation Methods
  • 23. 23 | © Copyright 11/17/23 Zilliz 23 | © Copyright 11/17/23 Zilliz Foundation Model Evals vs Production System Evals Your RAG system Arena Elo score
  • 24. 24 | © Copyright 11/17/23 Zilliz 24 | © Copyright 11/17/23 Zilliz RAG Evaluation Methods https://arxiv.org/pdf/2306.05685.pdf GPT-4 favors itself with a 10% higher win rate; Claude-v1 favors itself with a 25% higher win rate Open weight Prometheus-eval aligns with human judgments up to 85% as of May 2024.
  • 25. 25 | © Copyright 11/17/23 Zilliz 25 | © Copyright 11/17/23 Zilliz Known Problems with LLM-as-Judge https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG GPT-4 is not a good judge of comprehensiveness GPT-4 Matches Human judgements on Correctness & Readability
  • 26. 26 | © Copyright 11/17/23 Zilliz 26 | © Copyright 11/17/23 Zilliz Known Problems with LLM-as-Judge https://arxiv.org/pdf/2305.17926 AI scores max/min higher Humans score medians higher
  • 27. 27 | © Copyright 11/17/23 Zilliz 27 | © Copyright 11/17/23 Zilliz RAG Evaluation Methods https://github.com/explodinggradients/ragas faithfulness context_precision context_recall Query Context answer_relevancy Ground Truth Answer answer_correctness answer_similarity Response
  • 28. 28 | © Copyright 11/17/23 Zilliz 28 | © Copyright 11/17/23 Zilliz 05 Demo RAG Eval
  • 29. 29 | © Copyright 11/17/23 Zilliz 29 | © Copyright 11/17/23 Zilliz RETRIEVAL +46%, GENERATION +6% #################################################### # Avg Context Precision htmlsplitter score = 0.67 (46% improvement) # Avg Context Precision simple score = 0.46 #################################################### #################################################### # Avg mistralai mixtral_8x7b_instruct score = 0.7031 (6% improvement over gpt-3.5-turbo) # Avg llama3_70b_anyscale_chat score = 0.6888 # Avg llama3_70b_groq_instruct score = 0.6867 # Avg llama_3_70b_octoai_instruct score = 0.6863 # Avg llama_3_8b_ollama_instruct score = 0.6783 # Avg openai gpt-3.5-turbo score = 0.665 ####################################################