Forget RAG, Introducing FACT: Fast Augmented Context Tools (3.2x faster, 90% cost reduction vs RAG)

Reuven Cohen

♾️ Agentic Engineer / aiCTO / Coach

Published May 26, 2025

TL;DR

FACT (Fast Augmented Context Tools) is a production-ready retrieval framework that merges prompt caching with deterministic tool execution, engineered for agentic systems. 3.2x faster, 91.5% cheaper, and optimized for structured, real-time, feedback-driven workflows.

What is FACT?

Last week, I left Claude Opus 4 running without prompt caching. I didn’t notice, until the bill hit $300. I’d assumed caching was working. It wasn’t. That moment snapped me out of the RAG-era mindset and forced me to rethink what retrieval should look like in 2025.

RAG (Retrieval-Augmented Generation) made sense when vector search was all we had. But in dynamic, tool-driven systems? Vectors are slow, fuzzy, and fundamentally unreliable. They're optimized for static knowledge that rarely changes, not for real-time data, recursive agent flows, or moment-by-moment precision.

What I wanted instead was something explicit. Deterministic. Fast. Cheap. Built for agentic engineering, where tools, feedback loops, and memory systems all work together. That led to FACT: Fast Augmented Context Tools.

FACT is built for systems that think and act.

It combines intelligent prompt caching with structured tool execution using MCP. Instead of guessing with vector similarity, FACT routes queries to tools and caches the exact results that matter, while letting transient data expire.

I'm using Arcade-dev to power the secure, scalable tool execution layer, giving FACT the ability to route tasks locally using MCPs and agents or through remote edge agentic deployments.

Instead of “Find me something like this,” it becomes: “Run this tool. Get a live API result. Use this schema. Cache the output if relevant.”

The result?

3.2x faster response times
91%+ cost reduction
Deterministic, auditable outputs
No embedding drift. No tuning thresholds. No guesswork.

FACT systems decide what to cache, what to re-fetch, and what to ignore, automatically. It’s not just about saving tokens. It’s about building systems that can reflect, recurse, and adapt.

RAG brought retrieval. FACT brings understanding.

Introduction to FACT

FACT (Fast Augmented Context Tools) introduces a new paradigm for language model–powered data retrieval by replacing vector-based retrieval with a prompt-and-tool approach under the Model Context Protocol (MCP). Instead of relying on embeddings and similarity searches, FACT combines intelligent prompt caching with deterministic tool invocation to deliver fresh, precise, and auditable results.

Key Differences from RAG

FACT represents a fundamental shift from traditional RAG (Retrieval-Augmented Generation) approaches:

Retrieval Mechanism

RAG: Embeddings → Vector search → LLM completion
FACT: Prompt cache → MCP tool calls → LLM refinement

Data Freshness

RAG: Periodic re-indexing required
FACT: Live data via on-demand tool execution

Accuracy

RAG: Probabilistic, fuzzy matches
FACT: Exact outputs from SQL, API, or custom tools

Cost & Latency

RAG: Embedding + lookup + token costs
FACT: Cache hits eliminate tokens; cache misses trigger fast tool calls

Core Architectural Innovation

Agentic Engineering & Intelligent Caching

FACT enables agentic workflows where AI systems make intelligent decisions about data retrieval, caching, and tool execution in complex, multi-step processes. Unlike static vector databases that treat all data equally, FACT implements intelligent caching that understands the dynamic nature of different data types.

The Vector Problem with Dynamic Data

Vectors excel at static content that changes infrequently, but they're fundamentally ill-suited for:

Real-time data that changes moment-by-moment
Request-specific context that varies per user or session
Dynamic calculations that depend on current parameters
Time-sensitive information with specific TTL requirements

When data needs to change request-by-request with precise time-to-live characteristics, vectors are the worst possible choice.

Intelligent Cache Decision-Making

FACT's caching system makes sophisticated decisions about what to cache and when:

What Makes FACT Different

1. Intelligent Cache-First Design Philosophy

FACT leverages Claude's native caching with intelligent decision-making to store and reuse responses automatically, eliminating the need for complex vector databases or RAG systems:

Context-Aware Caching: System determines optimal cache duration based on data type
Adaptive TTL Management: Cache expiration varies by content volatility
Smart Invalidation: Proactive cache updates based on data change patterns
Multi-Tier Strategy: Different caching approaches for static vs. dynamic content

2. Natural Language Interface

Agentic Workflow Example

This demonstrates how FACT's agentic system makes nuanced decisions about what to cache and for how long, something impossible with static vector approaches. This query is automatically transformed into optimized tool execution and returns formatted results in milliseconds.

3. MCP Tool-Based Architecture

FACT employs the Model Context Protocol for secure, standardized tool execution:

Read-Only Data Access: Prevents data modification
Input Validation: Comprehensive query validation
Audit Trail: Complete logging of all operations
Security Patterns: Advanced injection protection

4. Hybrid Execution Model

Integration with cloud services enables intelligent routing between local and remote execution:

Local Execution: Speed-optimized for simple queries
Cloud Execution: Feature-rich for complex analytics
Automatic Failover: Seamless degradation handling
Performance Optimization: Real-time execution path selection

Core Concepts

Three-Tier Architecture

Tool-Based Data Retrieval

FACT employs secure, containerized tools for data access:

Available Tools:

SQL.QueryReadonly: Execute SELECT queries on financial databases
SQL.GetSchema: Retrieve database schema information
SQL.GetSampleQueries: Get example queries for exploration
System.GetMetrics: Access performance and system metrics

Cache Hierarchy and Optimization

FACT implements a sophisticated multi-level caching system:

Memory Cache: Immediate access to frequently used queries
Persistent Cache: Long-term storage for common patterns
Distributed Cache: Shared cache across multiple instances
Strategy-Based Selection: Intelligent cache tier selection

Benefits of FACT

Revolutionary Performance Improvements

Speed Transformation

FACT delivers order-of-magnitude improvements over traditional financial data systems:

Cache Hits: Sub-50ms response times (vs. 2-5 seconds traditional)
Cache Misses: Under 140ms average response time
Complex Analytics: 85% faster than traditional RAG systems
Concurrent Processing: 1000+ queries per minute throughput

Cost Optimization Breakthrough

The intelligent caching architecture delivers unprecedented cost efficiency:

90% Cost Reduction: Through automated query result caching
Token Efficiency: Automatic optimization of API token usage
Resource Minimization: No vector databases or complex indexing required
Scalability Economics: Linear cost scaling with exponential performance gains

Operational Excellence

FACT transforms operational characteristics of financial analytics:

99%+ Uptime: Robust error handling and graceful degradation
Zero SQL Knowledge Required: Complete natural language interface
Enterprise Security: Comprehensive audit and compliance features

FACT's Enterprise-Ready Results

With FACT, your system becomes intelligent enough to decide what to cache, when to execute tools, and how to route requests in real time—without guessing. RAG brought retrieval to language models. But FACT makes retrieval intentional, structured, and enterprise-ready.

Smart systems don't just retrieve. They know what to retrieve, how to get it, and when to remember it.

Automated Optimization: Self-tuning performance characteristics

Getting Started

Prerequisites

Python 3.8+ (Python 3.11+ recommended)
API Keys: Anthropic API key, Arcade API key (optional)
System Requirements: 2GB RAM minimum, 4GB recommended

Quick Installation

First Query

Technical Advantages

Minimal Infrastructure Requirements

Unlike traditional systems requiring complex infrastructure:

Intelligent Query Processing

FACT's query understanding surpasses traditional keyword-based systems:

Natural Language Understanding:

Security-First Design

Comprehensive security framework addresses enterprise requirements:

Multi-Layer Validation: Input → Processing → Output security checks
Principle of Least Privilege: Read-only database access
Comprehensive Auditing: Every query logged with full context
Injection Prevention: Advanced SQL injection detection and blocking

Use Case Benefits

Financial Analysts

Transform data exploration and reporting efficiency:

Data Scientists

Accelerate financial model development:

Rapid Data Exploration: Natural language data discovery
API Integration: Programmatic access for model training
Performance Benchmarking: Built-in performance validation tools
Automated Feature Engineering: Intelligent data transformation suggestions

System Administrators

Simplified monitoring and maintenance:

Real-Time Dashboards: Performance and health monitoring
Automated Alerts: Proactive issue detection
Security Monitoring: Comprehensive audit trail analysis
Resource Optimization: Automatic performance tuning

Business Stakeholders

Direct access to financial insights:

No Technical Barriers: Pure natural language interface
Instant Answers: Sub-second response times
Consistent Results: Cached responses ensure data consistency
Mobile Accessibility: Cross-platform compatibility

Performance Benchmarks

Production Performance Validation

FACT consistently exceeds production benchmarks across all critical metrics:

Real-World Performance Analysis

Query Response Time Distribution

Simple Queries (e.g., "Show technology companies"):

Complex Queries (e.g., "Compare quarterly revenue growth across sectors"):

Concurrent User Scalability

Performance under increasing user load:

Cost Analysis Comparison

Traditional RAG System vs. FACT:

Benchmark Test Results

Cache Performance Analysis

Comparative Performance Study

FACT vs. Traditional Systems:

Usage Examples

Natural Language Financial Queries

FACT transforms complex financial analysis into intuitive conversations:

Basic Financial Data Access

Advanced Financial Analysis

API Integration Examples

Python SDK Integration

Arcade-dev Integration

FACT's integration with Arcade.dev represents a breakthrough in hybrid AI tool execution, seamlessly blending local performance with enterprise-scale cloud capabilities.

Why Arcade-dev Integration Transforms FACT

Enterprise-Scale Capabilities

Arcade.dev provides enterprise-grade infrastructure that complements FACT's intelligent caching:

Advanced Security: Enterprise authentication, encryption, and compliance
Scalable Execution: Cloud-native scalability for complex analytical workloads
Advanced Monitoring: Comprehensive observability and performance analytics
Compliance Ready: SOC2, GDPR, HIPAA compliance out of the box

Hybrid Intelligence Architecture

The integration enables intelligent decision-making about where to execute each query:

Integration Features

Intelligent Routing Engine

The system analyzes each query to determine optimal execution:

Multi-Level Caching

Advanced caching strategies across local and cloud environments:

Production Benefits

Performance Optimization Results

Real-world performance data from hybrid execution:

Additional Capabilities

Advanced Security Framework

Comprehensive Security Architecture

FACT implements defense-in-depth security across multiple layers:

Monitoring and Observability

Real-Time Performance Dashboard

Comprehensive Metrics Collection

System Metrics: CPU, memory, disk, network usage
Application Metrics: Query performance, cache efficiency
Business Metrics: Cost savings, user satisfaction
Security Metrics: Failed authentication, suspicious queries

Error Handling and Resilience

Graceful Degradation

FACT implements sophisticated error handling strategies:

Fungibility

18,369 followers

+ Subscribe

Walter Schärer

Generative AI Beratung im Online Marketing

1mo

Will you migrate FACT to Rust?

1 Reaction

Cali LaFollett

♾️ Principal Agentic SWE | RealManage | github.com/clafollett/agenterra

2mo

This looks SPICY! 🔥 😁 As one who has come from traditional backend systems and lives DBs and and caching concepts, this makes complete sense! Game changer potential 😁 great work Reuven Cohen

Vasiliy Bondarenko

2mo

Reuven Cohen, this is a lot to digest... Can you make some real-world examples showcasing the difference? You have context to write it and and understand easily, readers - not really :) It seems fascinating, but i don't get it yet.

Atanas Kiryakov

2mo

I do agree that vector databases are often not the best solution for RAG and other GenAI integration patterns. The transition from vanilla RAG towards agents, tools and MCP is easy to agree on. Still, I have several critical comments about the article: - RAG is not all about Chunky RAG - the most popular pattern of retrieving document chunks from a vector database. Converting NL questions (NLQ) to SQL is also a sort of RAG. And so is GraphRAG. - Comparing the principal advantages of NLQ-to-SQL vs Chunky RAG doesn't make much sense - their sweet spots are so different. With all the use cases that Chunky RAG doesn't do well, there are document-centric ones where NLQ-to-SQL will be helpless. - Translating NL questions to SQL queries is simple, only when mapping simple questions against a simple data schema. Beyond that, it requires a lot of guesswork, too. - Finally, caching is efficient mostly for static content. I would not hope for big savings when caching SQL results in the general case.

3 Reactions

Alan Zhao

Cofounder, Head of Product @ Warmly.ai the #1 Real-Time List for GTM | 🎆 Brand Marketing | 📈 Organic Growth Marketing | 📺 Building a media empire on a shoestring budget