SlideShare a Scribd company logo
Rachel Bakke
Product Manager, SambaNova Systems
Efficient Inference and
Information Retrieval for
Agents: SambaNova + Milvus
v 1.0
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only
2
2
Agenda
1. Introduction to Agentic AI
2. What is SambaNova?
3. SambaNova Cloud
4. Zilliz Integration
5. Setting up an Agent
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Shared under NDA
What is Coming Next…
New Era of AI Automation
These new AI use cases will have:
● Multiple Large LLMs for complex reasoning
and decomposition
● Many expert models for task-specific
execution
● Non-deterministic & complex chains of
model requests
These systems need:
✓Blazing fast interactivity
✓Ability to gracefully handle
very large inputs
✓High-speed model switching
Agentic Systems, which are AI systems
designed to autonomously pursue complex
goals and workflows with limited direct human
supervision.
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only
4
4
Customer: I have a vacation package
booked with Premier Airlines for tomorrow.
My wife however just got a cold, so I need
to cancel the booking.
Virtual Agent: Sorry to hear that. Our
cancelation policy states that you must
cancel at least 48 hours in advance.
Customer: No, I bought insurance.
Can I speak to an agent to help
address this?
Customer: I have a vacation package
booked with Premier Airlines for tomorrow.
My wife however just got a cold, so I need
to cancel the booking.
Planning Function
Calling
Generate
Response
Reflection
Virtual Agent: Sorry to hear that. It looks
like you bought insurance for your package,
and I can process the cancelation for you if
you would like to proceed
Commonly Deployed Chatbot Agent Service Assistant
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only 5
Speed matters — Where GPT4 falls apart
Server
Client
User Request
+ Document
LLM Call #1
90s
Application
processes
request and
generates
second
request
Application
processes
request and
generates
third request
LLM Call #2
90s
… Complex
agentic flows
110 mins
Single Large LLMs running on GPUs introduce unacceptable latency
LLM Call #n
90s
10xGPU
latency*# of
LLM requests
5 Requests=50
seconds vs. 5
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only 6
6
AI Is Becoming a Commodity
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only
7
7
405B is the Best Open-Source Model
164
85
57 56
Snapshot Founded by pioneers in AI
▪ Founded in 2017 by industry luminaries
and originated at Stanford University
▪ Fully integrated generative AI platform,
from 4th generation hardware to
pre-trained models
▪ $1B+ funding raised
Rodrigo Liang
Co-founder & CEO
Kunle Olukotun
Co-founder &
Chief Technologist &
Stanford Professor
Christopher Ré
Co-founder & Stanford
Professor
Sophisticated, long-term investors
SambaNova: Who We Are
Lip-Bu Tan
Executive Chairman
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only
SN40L: The Best Chip Designed for AI
“Ceruleanˮ Architecture-based Reconfigurable Dataflow Unit
1.5 TB
High Capacity Memory
5nm TSMC 3-tier Dataflow Memory
1,040 RDU Cores
102B Transistors
64 GB
High Bandwidth Memory
520 MB
On-Chip Memory
638 TFLOPS (bf16)
Cerulean SN40L RDU
Generative AI Training and Inference
10
The Fastest AI Inference on the Best Models
11
SambaNova Nvidia
Llama 3.2 1B 2477 304
Llama 3.1 8B 1066 93
Llama 3.1 70B 460 32
Llama 3.1 405B 200 14
Over 10X Faster Tokens/Second/User
No Number of GPUs Can
Achieve RDU Performance
12
12
Building Agents
13
13
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only
Andrew Ng’s 4 Design Patterns of Agent Systems
Reflection Asking one the same or different LLMs to check the work
Tool Use
Interacting with external sources like web browsers, calendars, vector
databases, etc
Planning
Decompose complex user questions into a collection of tasks for better
and more dynamic execution
Multi-Agent Collaboration Agents working together to solve more complex problems
“Fast token generation is important. Generating more tokens, even from a lower quality
LLM can give good resultsˮ
- Andrew Ng
Menlo Ventures : What’s Really Agentic?
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only
16
16
Hello World from SambaNova Cloud
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary
17
17
Recently integrated SambaNova’s Cloud
and Milvus for developers to:
1. Get started in <10 minutes using these
two powerful tools
2. Run a full app utilizing RAG and
powered by Milvus
Better Together
with Zilliz
https://github.com/sambanova/ai-
starter-kit/tree/main/integrations
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only
Retrieval Augmented Generation
Analyst
Document
Parse and
Chunking
PDF Upload
Vector
Embeddings
Milvus
Query
Chunks
Retrieval
Store
Re-Ranker
Consumer
Similar Vectors
Sorted Chunks
Response
Analysts Upload Knowledge
Documents
Consumers Retrieve Knowledge
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only 19
19
Run the AI Starter Kit
on our Cloud or
locally to customize.
Example of Agent with RAG from our Hackathon
20
https://sambanova.devpost.com/submissions/586146-pokecompanion
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only
21
21
Questions? Join the Community
Community.SambaNova.ai
Upcoming Events
Feedback? Come talk to us!
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only
Try It Today
cloud.sambanova.ai
24
Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only
THANK YOU!
25

More Related Content

PDF
A comprehensive guide to Agentic AI Systems
PDF
Agentic AI: Scalable & Responsible Deployment of AI Agents in the Enterprise
PDF
LLM-based Multi-Agent Systems to Replace Traditional Software
PDF
Generative AI at the edge.pdf
PDF
Agents for Enterprise Workflows - Berkeley LLM AI Agents MOOC
PDF
5 Things to Consider When Deploying AI in Your Enterprise
PPTX
AI Agents, such as Autogen at Tide Sprint
PDF
From Assistants to Autopilots_ The Rise of AI Agents.pdf
A comprehensive guide to Agentic AI Systems
Agentic AI: Scalable & Responsible Deployment of AI Agents in the Enterprise
LLM-based Multi-Agent Systems to Replace Traditional Software
Generative AI at the edge.pdf
Agents for Enterprise Workflows - Berkeley LLM AI Agents MOOC
5 Things to Consider When Deploying AI in Your Enterprise
AI Agents, such as Autogen at Tide Sprint
From Assistants to Autopilots_ The Rise of AI Agents.pdf

Similar to Efficient Inference and Information Retrieval for Agents: SambaNova + Milvus (20)

PDF
Types of AI Agents | Presentation | PPT
PDF
Building Your Own AI Agent System: A Comprehensive Guide
PDF
Multi-Agent Era will Define the Future of Software
PDF
Devoxx Morocco 2024 - The Future Beyond LLMs: Exploring Agentic AI
PPTX
Introduction to LLMs and their relevance for Official Statistics
PDF
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
PDF
'The Art & Science of LLM Reliability - Building Trustworthy AI Systems' by M...
PDF
EIS-Webinar-Agent-Approaches-2024-08-21.pdf
PDF
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
PDF
Understanding Autonomous AI Agents and Their Importance in 2024.pdf
PDF
Understanding Autonomous AI Agents and Their Importance in 2024.pdf
PDF
Enterprise Trends for Gen AI - Berkeley LLM AI Agents MOOC
PDF
ai-mythbusters-debunking-common-myths-about-agentic-ai.pdf
PDF
How to Build an AI Agent System - Overview.pdf
PDF
solulab.com-How to Build an AI Agent System.pdf
PDF
Adobe XD 50.0.12 for MacOS Crack   Free
PDF
lanamalic-aiagents-250212223710-84219c4c-250408115702-2f9e4f0e.pdf
PDF
Adobe XD 50.0.12 for MacOS Crack  Free Download
PDF
IObit Driver Booster Pro 12.3.0.557 Free
PPTX
AgenticAI Architecture In Feature 2025.pptx
Types of AI Agents | Presentation | PPT
Building Your Own AI Agent System: A Comprehensive Guide
Multi-Agent Era will Define the Future of Software
Devoxx Morocco 2024 - The Future Beyond LLMs: Exploring Agentic AI
Introduction to LLMs and their relevance for Official Statistics
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
'The Art & Science of LLM Reliability - Building Trustworthy AI Systems' by M...
EIS-Webinar-Agent-Approaches-2024-08-21.pdf
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Understanding Autonomous AI Agents and Their Importance in 2024.pdf
Understanding Autonomous AI Agents and Their Importance in 2024.pdf
Enterprise Trends for Gen AI - Berkeley LLM AI Agents MOOC
ai-mythbusters-debunking-common-myths-about-agentic-ai.pdf
How to Build an AI Agent System - Overview.pdf
solulab.com-How to Build an AI Agent System.pdf
Adobe XD 50.0.12 for MacOS Crack   Free
lanamalic-aiagents-250212223710-84219c4c-250408115702-2f9e4f0e.pdf
Adobe XD 50.0.12 for MacOS Crack  Free Download
IObit Driver Booster Pro 12.3.0.557 Free
AgenticAI Architecture In Feature 2025.pptx
Ad

More from Zilliz (20)

PDF
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
PDF
Zilliz Cloud Demo for performance and scale
PDF
Open Source Milvus Vector Database v 2.6
PDF
Zilliz Cloud Monthly Technical Review: May 2025
PDF
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
PDF
Hands-on Tutorial: Building an Agent to Reason about Private Data with OpenAI...
PDF
Webinar - Zilliz Cloud Monthly Demo - March 2025
PDF
What Makes "Deep Research"? A Dive into AI Agents
PDF
Combining Lexical and Semantic Search with Milvus 2.5
PDF
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
PDF
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
PDF
February Product Demo: Discover the Power of Zilliz Cloud
PDF
Full Text Search with Milvus 2.5 - UD Meetup Berlin Jan 23
PDF
Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus
PDF
Voice-to-Value- LLM-Powered Customer Interaction Analysis.pdf
PDF
Accelerate AI Agents with Multimodal RAG powered by Friendli Endpoints and Mi...
PDF
1 Table = 1000 Words? Foundation Models for Tabular Data
PDF
How Milvus allows you to run Full Text Search
PDF
How to Optimize Your Embedding Model Selection and Development through TDA Cl...
PDF
Milvus: Scaling Vector Data Solutions for Gen AI
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz Cloud Demo for performance and scale
Open Source Milvus Vector Database v 2.6
Zilliz Cloud Monthly Technical Review: May 2025
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Hands-on Tutorial: Building an Agent to Reason about Private Data with OpenAI...
Webinar - Zilliz Cloud Monthly Demo - March 2025
What Makes "Deep Research"? A Dive into AI Agents
Combining Lexical and Semantic Search with Milvus 2.5
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
February Product Demo: Discover the Power of Zilliz Cloud
Full Text Search with Milvus 2.5 - UD Meetup Berlin Jan 23
Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus
Voice-to-Value- LLM-Powered Customer Interaction Analysis.pdf
Accelerate AI Agents with Multimodal RAG powered by Friendli Endpoints and Mi...
1 Table = 1000 Words? Foundation Models for Tabular Data
How Milvus allows you to run Full Text Search
How to Optimize Your Embedding Model Selection and Development through TDA Cl...
Milvus: Scaling Vector Data Solutions for Gen AI
Ad

Recently uploaded (20)

PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Spectroscopy.pptx food analysis technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Getting Started with Data Integration: FME Form 101
PDF
August Patch Tuesday
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Machine Learning_overview_presentation.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Machine learning based COVID-19 study performance prediction
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
Univ-Connecticut-ChatGPT-Presentaion.pdf
A Presentation on Artificial Intelligence
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
Network Security Unit 5.pdf for BCA BBA.
Spectroscopy.pptx food analysis technology
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Getting Started with Data Integration: FME Form 101
August Patch Tuesday
A comparative analysis of optical character recognition models for extracting...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Machine Learning_overview_presentation.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Machine learning based COVID-19 study performance prediction
TLE Review Electricity (Electricity).pptx
OMC Textile Division Presentation 2021.pptx
Encapsulation_ Review paper, used for researhc scholars

Efficient Inference and Information Retrieval for Agents: SambaNova + Milvus

  • 1. Rachel Bakke Product Manager, SambaNova Systems Efficient Inference and Information Retrieval for Agents: SambaNova + Milvus v 1.0
  • 2. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only 2 2 Agenda 1. Introduction to Agentic AI 2. What is SambaNova? 3. SambaNova Cloud 4. Zilliz Integration 5. Setting up an Agent
  • 3. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Shared under NDA What is Coming Next… New Era of AI Automation These new AI use cases will have: ● Multiple Large LLMs for complex reasoning and decomposition ● Many expert models for task-specific execution ● Non-deterministic & complex chains of model requests These systems need: ✓Blazing fast interactivity ✓Ability to gracefully handle very large inputs ✓High-speed model switching Agentic Systems, which are AI systems designed to autonomously pursue complex goals and workflows with limited direct human supervision.
  • 4. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only 4 4 Customer: I have a vacation package booked with Premier Airlines for tomorrow. My wife however just got a cold, so I need to cancel the booking. Virtual Agent: Sorry to hear that. Our cancelation policy states that you must cancel at least 48 hours in advance. Customer: No, I bought insurance. Can I speak to an agent to help address this? Customer: I have a vacation package booked with Premier Airlines for tomorrow. My wife however just got a cold, so I need to cancel the booking. Planning Function Calling Generate Response Reflection Virtual Agent: Sorry to hear that. It looks like you bought insurance for your package, and I can process the cancelation for you if you would like to proceed Commonly Deployed Chatbot Agent Service Assistant
  • 5. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only 5 Speed matters — Where GPT4 falls apart Server Client User Request + Document LLM Call #1 90s Application processes request and generates second request Application processes request and generates third request LLM Call #2 90s … Complex agentic flows 110 mins Single Large LLMs running on GPUs introduce unacceptable latency LLM Call #n 90s 10xGPU latency*# of LLM requests 5 Requests=50 seconds vs. 5
  • 6. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only 6 6 AI Is Becoming a Commodity
  • 7. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only 7 7 405B is the Best Open-Source Model 164 85 57 56
  • 8. Snapshot Founded by pioneers in AI ▪ Founded in 2017 by industry luminaries and originated at Stanford University ▪ Fully integrated generative AI platform, from 4th generation hardware to pre-trained models ▪ $1B+ funding raised Rodrigo Liang Co-founder & CEO Kunle Olukotun Co-founder & Chief Technologist & Stanford Professor Christopher Ré Co-founder & Stanford Professor Sophisticated, long-term investors SambaNova: Who We Are Lip-Bu Tan Executive Chairman
  • 9. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only SN40L: The Best Chip Designed for AI “Ceruleanˮ Architecture-based Reconfigurable Dataflow Unit 1.5 TB High Capacity Memory 5nm TSMC 3-tier Dataflow Memory 1,040 RDU Cores 102B Transistors 64 GB High Bandwidth Memory 520 MB On-Chip Memory 638 TFLOPS (bf16) Cerulean SN40L RDU Generative AI Training and Inference
  • 10. 10 The Fastest AI Inference on the Best Models
  • 11. 11 SambaNova Nvidia Llama 3.2 1B 2477 304 Llama 3.1 8B 1066 93 Llama 3.1 70B 460 32 Llama 3.1 405B 200 14 Over 10X Faster Tokens/Second/User
  • 12. No Number of GPUs Can Achieve RDU Performance 12 12
  • 14. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only Andrew Ng’s 4 Design Patterns of Agent Systems Reflection Asking one the same or different LLMs to check the work Tool Use Interacting with external sources like web browsers, calendars, vector databases, etc Planning Decompose complex user questions into a collection of tasks for better and more dynamic execution Multi-Agent Collaboration Agents working together to solve more complex problems “Fast token generation is important. Generating more tokens, even from a lower quality LLM can give good resultsˮ - Andrew Ng
  • 15. Menlo Ventures : What’s Really Agentic?
  • 16. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only 16 16 Hello World from SambaNova Cloud
  • 17. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary 17 17 Recently integrated SambaNova’s Cloud and Milvus for developers to: 1. Get started in <10 minutes using these two powerful tools 2. Run a full app utilizing RAG and powered by Milvus Better Together with Zilliz https://github.com/sambanova/ai- starter-kit/tree/main/integrations
  • 18. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only Retrieval Augmented Generation Analyst Document Parse and Chunking PDF Upload Vector Embeddings Milvus Query Chunks Retrieval Store Re-Ranker Consumer Similar Vectors Sorted Chunks Response Analysts Upload Knowledge Documents Consumers Retrieve Knowledge
  • 19. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only 19 19 Run the AI Starter Kit on our Cloud or locally to customize.
  • 20. Example of Agent with RAG from our Hackathon 20 https://sambanova.devpost.com/submissions/586146-pokecompanion
  • 21. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only 21 21 Questions? Join the Community Community.SambaNova.ai
  • 24. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only Try It Today cloud.sambanova.ai 24
  • 25. Copyright © 2024 SambaNova Systems Inc. | Confidential & Proprietary | Internal Use Only THANK YOU! 25