Forget RAG, Introducing FACT: Fast Augmented Context Tools (3.2x faster, 90% cost reduction vs RAG)

Forget RAG, Introducing FACT: Fast Augmented Context Tools (3.2x faster, 90% cost reduction vs RAG)

TL;DR

FACT (Fast Augmented Context Tools) is a production-ready retrieval framework that merges prompt caching with deterministic tool execution, engineered for agentic systems. 3.2x faster, 91.5% cheaper, and optimized for structured, real-time, feedback-driven workflows.

What is FACT?

Last week, I left Claude Opus 4 running without prompt caching. I didn’t notice, until the bill hit $300. I’d assumed caching was working. It wasn’t. That moment snapped me out of the RAG-era mindset and forced me to rethink what retrieval should look like in 2025.

RAG (Retrieval-Augmented Generation) made sense when vector search was all we had. But in dynamic, tool-driven systems? Vectors are slow, fuzzy, and fundamentally unreliable. They're optimized for static knowledge that rarely changes, not for real-time data, recursive agent flows, or moment-by-moment precision.

What I wanted instead was something explicit. Deterministic. Fast. Cheap. Built for agentic engineering, where tools, feedback loops, and memory systems all work together. That led to FACT: Fast Augmented Context Tools.

FACT is built for systems that think and act.

It combines intelligent prompt caching with structured tool execution using MCP. Instead of guessing with vector similarity, FACT routes queries to tools and caches the exact results that matter, while letting transient data expire.

I'm using Arcade-dev to power the secure, scalable tool execution layer, giving FACT the ability to route tasks locally using MCPs and agents or through remote edge agentic deployments.

Instead of “Find me something like this,” it becomes: “Run this tool. Get a live API result. Use this schema. Cache the output if relevant.”

The result?

  • 3.2x faster response times

  • 91%+ cost reduction

  • Deterministic, auditable outputs

  • No embedding drift. No tuning thresholds. No guesswork.

FACT systems decide what to cache, what to re-fetch, and what to ignore, automatically. It’s not just about saving tokens. It’s about building systems that can reflect, recurse, and adapt.

RAG brought retrieval. FACT brings understanding.


Introduction to FACT

FACT (Fast Augmented Context Tools) introduces a new paradigm for language model–powered data retrieval by replacing vector-based retrieval with a prompt-and-tool approach under the Model Context Protocol (MCP). Instead of relying on embeddings and similarity searches, FACT combines intelligent prompt caching with deterministic tool invocation to deliver fresh, precise, and auditable results.

Key Differences from RAG

FACT represents a fundamental shift from traditional RAG (Retrieval-Augmented Generation) approaches:

Retrieval Mechanism

  • RAG: Embeddings → Vector search → LLM completion

  • FACT: Prompt cache → MCP tool calls → LLM refinement

Data Freshness

  • RAG: Periodic re-indexing required

  • FACT: Live data via on-demand tool execution

Accuracy

  • RAG: Probabilistic, fuzzy matches

  • FACT: Exact outputs from SQL, API, or custom tools

Cost & Latency

  • RAG: Embedding + lookup + token costs

  • FACT: Cache hits eliminate tokens; cache misses trigger fast tool calls

Core Architectural Innovation

Agentic Engineering & Intelligent Caching

FACT enables agentic workflows where AI systems make intelligent decisions about data retrieval, caching, and tool execution in complex, multi-step processes. Unlike static vector databases that treat all data equally, FACT implements intelligent caching that understands the dynamic nature of different data types.

The Vector Problem with Dynamic Data

Vectors excel at static content that changes infrequently, but they're fundamentally ill-suited for:

  • Real-time data that changes moment-by-moment

  • Request-specific context that varies per user or session

  • Dynamic calculations that depend on current parameters

  • Time-sensitive information with specific TTL requirements

When data needs to change request-by-request with precise time-to-live characteristics, vectors are the worst possible choice.

Intelligent Cache Decision-Making

FACT's caching system makes sophisticated decisions about what to cache and when:

What Makes FACT Different

1. Intelligent Cache-First Design Philosophy

FACT leverages Claude's native caching with intelligent decision-making to store and reuse responses automatically, eliminating the need for complex vector databases or RAG systems:

  • Context-Aware Caching: System determines optimal cache duration based on data type

  • Adaptive TTL Management: Cache expiration varies by content volatility

  • Smart Invalidation: Proactive cache updates based on data change patterns

  • Multi-Tier Strategy: Different caching approaches for static vs. dynamic content


2. Natural Language Interface

Powered by Claude Sonnet-4, FACT understands complex queries in natural language:

Agentic Workflow Example

This demonstrates how FACT's agentic system makes nuanced decisions about what to cache and for how long, something impossible with static vector approaches. This query is automatically transformed into optimized tool execution and returns formatted results in milliseconds.

3. MCP Tool-Based Architecture

FACT employs the Model Context Protocol for secure, standardized tool execution:

  • Read-Only Data Access: Prevents data modification

  • Input Validation: Comprehensive query validation

  • Audit Trail: Complete logging of all operations

  • Security Patterns: Advanced injection protection

4. Hybrid Execution Model

Integration with cloud services enables intelligent routing between local and remote execution:

  • Local Execution: Speed-optimized for simple queries

  • Cloud Execution: Feature-rich for complex analytics

  • Automatic Failover: Seamless degradation handling

  • Performance Optimization: Real-time execution path selection

Core Concepts

Three-Tier Architecture

Tool-Based Data Retrieval

FACT employs secure, containerized tools for data access:

Available Tools:

  • SQL.QueryReadonly: Execute SELECT queries on financial databases

  • SQL.GetSchema: Retrieve database schema information

  • SQL.GetSampleQueries: Get example queries for exploration

  • System.GetMetrics: Access performance and system metrics

Cache Hierarchy and Optimization

FACT implements a sophisticated multi-level caching system:

  1. Memory Cache: Immediate access to frequently used queries

  2. Persistent Cache: Long-term storage for common patterns

  3. Distributed Cache: Shared cache across multiple instances

  4. Strategy-Based Selection: Intelligent cache tier selection


Benefits of FACT

Revolutionary Performance Improvements

Speed Transformation

FACT delivers order-of-magnitude improvements over traditional financial data systems:

  • Cache Hits: Sub-50ms response times (vs. 2-5 seconds traditional)

  • Cache Misses: Under 140ms average response time

  • Complex Analytics: 85% faster than traditional RAG systems

  • Concurrent Processing: 1000+ queries per minute throughput

Cost Optimization Breakthrough

The intelligent caching architecture delivers unprecedented cost efficiency:

  • 90% Cost Reduction: Through automated query result caching

  • Token Efficiency: Automatic optimization of API token usage

  • Resource Minimization: No vector databases or complex indexing required

  • Scalability Economics: Linear cost scaling with exponential performance gains

Operational Excellence

FACT transforms operational characteristics of financial analytics:

  • 99%+ Uptime: Robust error handling and graceful degradation

  • Zero SQL Knowledge Required: Complete natural language interface

  • Enterprise Security: Comprehensive audit and compliance features

FACT's Enterprise-Ready Results

With FACT, your system becomes intelligent enough to decide what to cache, when to execute tools, and how to route requests in real time—without guessing. RAG brought retrieval to language models. But FACT makes retrieval intentional, structured, and enterprise-ready.

Smart systems don't just retrieve. They know what to retrieve, how to get it, and when to remember it.

  • Automated Optimization: Self-tuning performance characteristics

Getting Started

Prerequisites

  • Python 3.8+ (Python 3.11+ recommended)

  • API Keys: Anthropic API key, Arcade API key (optional)

  • System Requirements: 2GB RAM minimum, 4GB recommended

Quick Installation

First Query

Technical Advantages

Minimal Infrastructure Requirements

Unlike traditional systems requiring complex infrastructure:

Intelligent Query Processing

FACT's query understanding surpasses traditional keyword-based systems:

Natural Language Understanding:

Security-First Design

Comprehensive security framework addresses enterprise requirements:

  • Multi-Layer Validation: Input → Processing → Output security checks

  • Principle of Least Privilege: Read-only database access

  • Comprehensive Auditing: Every query logged with full context

  • Injection Prevention: Advanced SQL injection detection and blocking

Use Case Benefits

Financial Analysts

Transform data exploration and reporting efficiency:

Data Scientists

Accelerate financial model development:

  • Rapid Data Exploration: Natural language data discovery

  • API Integration: Programmatic access for model training

  • Performance Benchmarking: Built-in performance validation tools

  • Automated Feature Engineering: Intelligent data transformation suggestions

System Administrators

Simplified monitoring and maintenance:

  • Real-Time Dashboards: Performance and health monitoring

  • Automated Alerts: Proactive issue detection

  • Security Monitoring: Comprehensive audit trail analysis

  • Resource Optimization: Automatic performance tuning

Business Stakeholders

Direct access to financial insights:

  • No Technical Barriers: Pure natural language interface

  • Instant Answers: Sub-second response times

  • Consistent Results: Cached responses ensure data consistency

  • Mobile Accessibility: Cross-platform compatibility

Performance Benchmarks

Production Performance Validation

FACT consistently exceeds production benchmarks across all critical metrics:

Real-World Performance Analysis

Query Response Time Distribution

Simple Queries (e.g., "Show technology companies"):

Complex Queries (e.g., "Compare quarterly revenue growth across sectors"):

Concurrent User Scalability

Performance under increasing user load:

Cost Analysis Comparison

Traditional RAG System vs. FACT:

Benchmark Test Results

Cache Performance Analysis

Comparative Performance Study

FACT vs. Traditional Systems:

Usage Examples

Natural Language Financial Queries

FACT transforms complex financial analysis into intuitive conversations:

Basic Financial Data Access

Advanced Financial Analysis

API Integration Examples

Python SDK Integration


Arcade-dev Integration

FACT's integration with Arcade.dev represents a breakthrough in hybrid AI tool execution, seamlessly blending local performance with enterprise-scale cloud capabilities.

Why Arcade-dev Integration Transforms FACT

Enterprise-Scale Capabilities

Arcade.dev provides enterprise-grade infrastructure that complements FACT's intelligent caching:

  • Advanced Security: Enterprise authentication, encryption, and compliance

  • Scalable Execution: Cloud-native scalability for complex analytical workloads

  • Advanced Monitoring: Comprehensive observability and performance analytics

  • Compliance Ready: SOC2, GDPR, HIPAA compliance out of the box

Hybrid Intelligence Architecture

The integration enables intelligent decision-making about where to execute each query:

Integration Features

Intelligent Routing Engine

The system analyzes each query to determine optimal execution:

Multi-Level Caching

Advanced caching strategies across local and cloud environments:

Production Benefits

Performance Optimization Results

Real-world performance data from hybrid execution:


Additional Capabilities

Advanced Security Framework

Comprehensive Security Architecture

FACT implements defense-in-depth security across multiple layers:

Monitoring and Observability

Real-Time Performance Dashboard

Comprehensive Metrics Collection

  • System Metrics: CPU, memory, disk, network usage

  • Application Metrics: Query performance, cache efficiency

  • Business Metrics: Cost savings, user satisfaction

  • Security Metrics: Failed authentication, suspicious queries

Error Handling and Resilience

Graceful Degradation

FACT implements sophisticated error handling strategies:

Walter Schärer

Generative AI Beratung im Online Marketing

1mo

Will you migrate FACT to Rust?

Cali LaFollett

♾️ Principal Agentic SWE | RealManage | github.com/clafollett/agenterra

2mo

This looks SPICY! 🔥 😁 As one who has come from traditional backend systems and lives DBs and and caching concepts, this makes complete sense! Game changer potential 😁 great work Reuven Cohen

Like
Reply
Vasiliy Bondarenko

BS Detector | Fractional CTO | Technical Due-Diligence for Investors | Board IT Advisor | Fintech, Blockchain, AI, SDLC | 4x Founder | 28+ Years

2mo

Reuven Cohen, this is a lot to digest... Can you make some real-world examples showcasing the difference? You have context to write it and and understand easily, readers - not really :) It seems fascinating, but i don't get it yet.

Like
Reply

I do agree that vector databases are often not the best solution for RAG and other GenAI integration patterns. The transition from vanilla RAG towards agents, tools and MCP is easy to agree on. Still, I have several critical comments about the article: - RAG is not all about Chunky RAG - the most popular pattern of retrieving document chunks from a vector database. Converting NL questions (NLQ) to SQL is also a sort of RAG. And so is GraphRAG. - Comparing the principal advantages of NLQ-to-SQL vs Chunky RAG doesn't make much sense - their sweet spots are so different. With all the use cases that Chunky RAG doesn't do well, there are document-centric ones where NLQ-to-SQL will be helpless. - Translating NL questions to SQL queries is simple, only when mapping simple questions against a simple data schema. Beyond that, it requires a lot of guesswork, too. - Finally, caching is efficient mostly for static content. I would not hope for big savings when caching SQL results in the general case.

Alan Zhao

Cofounder, Head of Product @ Warmly.ai the #1 Real-Time List for GTM | 🎆 Brand Marketing | 📈 Organic Growth Marketing | 📺 Building a media empire on a shoestring budget

2mo

Great insights on optimizing context and speed, Reuven!

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics