SlideShare a Scribd company logo
MemGPT
why we need memory-augmented LLMs
👋 Charles Packer
● PhD candidate @ Sky / BAIR, focus in AI
● Author of MemGPT
○ First paper demonstrating how to give GPT-4
self-editing memory (AI that can learn over time)
● Working on agents since 2017
○ “the dark ages”
○ 5 BC = Before ChatGPT
📧 cpacker@berkeley.edu
🐦 @charlespacker
Agents in 2017 🙈
For LLMs, “memory” is everything
memory is context
context includes long-term memory, tool use, ICL, RAG, …
For LLMs, “memory” is everything
“memory” =
MemGPT - giving LLMs real “memory”
GPT
Why is this the “best” AI product?
What about this?
Search engine AI assistant
Search engine AI assistant
MemGPT: Introduction to Memory Augmented Chat
Search engine AI assistant
Search engine AI assistant
MemGPT: Introduction to Memory Augmented Chat
tl;dr
LLMs doing constrained Q/A
🤩
tl;dr
LLMs doing long-range, open-ended tasks
🤨
MemGPT: Introduction to Memory Augmented Chat
90%+ of questions are
related to one project
No shared context! Why?
We don’t know how to do it…
How to get an LLM to use
● hundreds of chats
● + code base (1M+ LoC)
● + …
● …RAG?
● Lots of retrieval?
● Multi-step retrieval?
● Retrieval that works?
● What about writing?
…long-context LLMs?
Cost + latency
Context pollution
No shared context! Why?
We don’t know how to do it…
Search engine AI assistant
state management
MemGPT -> giving LLMs real “memory”
MemGPT -> memory via tools
LLM
tools
��
Memory
Text
User message
��
GPT-4
Context window
8k max token limit
ChatGPT
Text
Agent reply
��
Standard LLM setup
e.g., ChatGPT UI + GPT-4 model
Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse parse
MemGPT LLM OS setup
Event loop + functions + memory hierarchy
Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
8k max token limit
MemGPT
parse parse
Fixed-context LLM
e.g., GPT-4 with 8k max tokens
Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
8k max token limit
MemGPT
parse parse
LLM inputs are “events” (JSON)
System alerts help the LLM manage memory
Virtual context
Main context
External context
∞ tokens
Max token limit
LLM
Event
User message
��
Document upload
��
System alert
🔔
LLM inputs are “events” (JSON)
System alerts help the LLM manage memory
{ “type”: “user_message”,
“content”: “how to undo git commit
-am?” }
{ “type”: “document_upload”,
“info”: “9 page PDF”,
“summary”: “MemGPT research paper” }
{ “type”: “system_alert”,
“content”: “Memory warning: 75% of
context used.” }
Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse parse
LLM outputs are functions (JSON)
Event loop + functions that allow editing memory
Function
Send message
��
Query database
Pause interrupts
��
Agent can query out-of-context
information with functions
{
“function”: “ archival_memory_search”,
“params”: {
“query”: “Berkeley LLM Meetup”,
“page”: “0”
}
}
Function
Send message
��
Query database
Pause interrupts
��
Pages into (finite) LLM context
{
“function”: “ archival_memory_search”,
“params”: {
“query”: “Berkeley LLM Meetup”,
“page”: “0”
}
}
LLM
Function
Send message
��
Edit context
Pause interrupts
��
Agent can edit their own memory
including their own context
{
“function”: “ core_memory_replace”,
“params”: {
“old_content”: “OAI Assistants API”,
“new_content”: “MemGPT API”
}
}
Function
Send message
��
Edit context
Pause interrupts
��
Core memory is a reserved block
System
prompt
In-context
memory block
Working
context queue
{
“function”: “ core_memory_replace”,
“params”: {
“old_content”: “OAI Assistants API”,
“new_content”: “MemGPT API”
}
}
Function
Send message
��
Query database
Pause interrupts
��
{
“function”: “ send_message”,
“params”: {
“message”: “How may I assist you?”
}
}
User messages are a function
Allows interacting with system
autonomously w/o user inputs
{ “type”: “user_message”,
“content”: “ what’s happening on may 21 2024?” }
{
“function”: “archival_memory_search”,
“params”: {
“query”: “ may 21 2024”,
}
}
{
“function”: “send_message”,
“params”: {
“message”: “ Have you heard about Milvus?”
}
}
🧑
🤖
what’s happening on may 21 2024?
Have you heard about Milvus?
🧑
🤖
(User’s POV)
Event
User message
��
Document upload
��
System alert
🔔
Function
Send message
��
Query database
Pause interrupts
��
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse parse
MemGPT LLM OS setup
Event loop + functions + memory hierarchy
Calling & executing custom tools
MemGPT -> Building LLM Agents
Long-term memory management
��
��
Loading external data sources (RAG)
🛠
MemGPT
= the OSS platform for building 🛠 and hosting 🏠 LLM agents
Developer
User
MemGPT
Dev Portal
MemGPT CLI
$ memgpt run
MemGPT server
User-facing
application
REST API Users
Agents
Tools
Sources
user_id: …
agent_id: … Personal Assistant
State Memories
Documents
MemGPT server
User-facing
application
REST API
Users
Agents
Tools
Sources
user_id: …
agent_id: …
Personal Assistant
State Memories
Documents
Webhooks
MemGPT
may 21 developer update 🎉
MemGPT: Introduction to Memory Augmented Chat
Docker integration - the fastest way to create a MemGPT server
Step 1: docker compose up
Step 2: create/edit/message agents using the MemGPT API
MemGPT ❤
MemGPT streaming API - token streaming
CLI: memgpt run --stream
REST API: use the stream_tokens flag [PR #1280 - staging]
MemGPT streaming API - token streaming
MemGPT API works with both non-streaming + streaming endpoints
If the true LLM backend doesn’t support streaming, “fake streaming”
MemGPT /chat/completions proxy API
Connect your MemGPT server to any /chat/completions service!
For example - 📞 voice call your MemGPT agents using VAPI!
MemGPT ��

More Related Content

PDF
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
PPTX
[DSC DACH 24] Increasing user adoption with GenAI offerings - Martin Flechl
PDF
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
PDF
Context Engineering for AI Agents, approaches, memories.pdf
PDF
MLFlow: Platform for Complete Machine Learning Lifecycle
PDF
Max euro python 2015
PDF
Customizing LLMs
PDF
«Что такое serverless-архитектура и как с ней жить?» Николай Марков, Aligned ...
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
[DSC DACH 24] Increasing user adoption with GenAI offerings - Martin Flechl
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Context Engineering for AI Agents, approaches, memories.pdf
MLFlow: Platform for Complete Machine Learning Lifecycle
Max euro python 2015
Customizing LLMs
«Что такое serverless-архитектура и как с ней жить?» Николай Марков, Aligned ...

Similar to MemGPT: Introduction to Memory Augmented Chat (20)

PPTX
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
PDF
Thug: a new low-interaction honeyclient
PPTX
How I Developed My First MCP Server? & How You Can Develop It Too?
PDF
Deep learning for FinTech
ODP
Log aggregation and analysis
PPT
ImpressCMS Persistable Framework: Rapid Modules Development
PDF
Big Data and Machine Learning with FIWARE
PDF
Building Reactive Real-time Data Pipeline
PDF
Cloud operations with streaming analytics using big data tools
PDF
Texter blue - gdpr watchdog
PPTX
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
PDF
Linked Process
PPTX
Generative AI in CSharp with Semantic Kernel.pptx
PPT
Computing Outside The Box
PDF
GGX 2014 - Grails and the real time world
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
PDF
Deep Learning with CNTK
PPTX
Microsoft Dryad
PDF
The hidden engineering behind machine learning products at Helixa
PDF
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Thug: a new low-interaction honeyclient
How I Developed My First MCP Server? & How You Can Develop It Too?
Deep learning for FinTech
Log aggregation and analysis
ImpressCMS Persistable Framework: Rapid Modules Development
Big Data and Machine Learning with FIWARE
Building Reactive Real-time Data Pipeline
Cloud operations with streaming analytics using big data tools
Texter blue - gdpr watchdog
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Linked Process
Generative AI in CSharp with Semantic Kernel.pptx
Computing Outside The Box
GGX 2014 - Grails and the real time world
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
Deep Learning with CNTK
Microsoft Dryad
The hidden engineering behind machine learning products at Helixa
Ad

More from Zilliz (20)

PDF
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
PDF
Zilliz Cloud Demo for performance and scale
PDF
Open Source Milvus Vector Database v 2.6
PDF
Zilliz Cloud Monthly Technical Review: May 2025
PDF
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
PDF
Hands-on Tutorial: Building an Agent to Reason about Private Data with OpenAI...
PDF
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
PDF
Webinar - Zilliz Cloud Monthly Demo - March 2025
PDF
What Makes "Deep Research"? A Dive into AI Agents
PDF
Combining Lexical and Semantic Search with Milvus 2.5
PDF
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
PDF
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
PDF
February Product Demo: Discover the Power of Zilliz Cloud
PDF
Full Text Search with Milvus 2.5 - UD Meetup Berlin Jan 23
PDF
Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus
PDF
Voice-to-Value- LLM-Powered Customer Interaction Analysis.pdf
PDF
Accelerate AI Agents with Multimodal RAG powered by Friendli Endpoints and Mi...
PDF
1 Table = 1000 Words? Foundation Models for Tabular Data
PDF
How Milvus allows you to run Full Text Search
PDF
How to Optimize Your Embedding Model Selection and Development through TDA Cl...
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz Cloud Demo for performance and scale
Open Source Milvus Vector Database v 2.6
Zilliz Cloud Monthly Technical Review: May 2025
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Hands-on Tutorial: Building an Agent to Reason about Private Data with OpenAI...
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
Webinar - Zilliz Cloud Monthly Demo - March 2025
What Makes "Deep Research"? A Dive into AI Agents
Combining Lexical and Semantic Search with Milvus 2.5
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
February Product Demo: Discover the Power of Zilliz Cloud
Full Text Search with Milvus 2.5 - UD Meetup Berlin Jan 23
Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus
Voice-to-Value- LLM-Powered Customer Interaction Analysis.pdf
Accelerate AI Agents with Multimodal RAG powered by Friendli Endpoints and Mi...
1 Table = 1000 Words? Foundation Models for Tabular Data
How Milvus allows you to run Full Text Search
How to Optimize Your Embedding Model Selection and Development through TDA Cl...
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Cloud computing and distributed systems.
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
cuic standard and advanced reporting.pdf
Approach and Philosophy of On baking technology
Spectral efficient network and resource selection model in 5G networks
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Unlocking AI with Model Context Protocol (MCP)
Cloud computing and distributed systems.
Review of recent advances in non-invasive hemoglobin estimation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
“AI and Expert System Decision Support & Business Intelligence Systems”
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Big Data Technologies - Introduction.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...
cuic standard and advanced reporting.pdf

MemGPT: Introduction to Memory Augmented Chat