Inside AI Gateway Architecture: Ultra-Low Latency, Smart Routing & Observability
Hey Folks!
We’re excited to bring you a fresh round of product updates in this edition of the newsletter.
We’ll start with a look at the AI Gateway’s architecture, followed by a rundown of some key feature launches.
Our roadmap continues to gain momentum — with a strong focus on the AI Gateway, agent infrastructure, the new MCP Server, and A2A communication.
Explore how our AI Gateway delivers ultra-low latency at scale
Zero External Calls in Hot Path - All request handling—from client to LLM—is self-contained. No network latency or failure risk during live inference.
In-Memory Decision Engine - Rate limiting, load balancing, authentication, and authorization are executed entirely in memory for ultra-low latency.
Asynchronous Logging & Metrics - Logs and metrics are pushed to a queue asynchronously, ensuring the request path remains non-blocking and fast.
Resilient to Queue Failures - Gateway is fault-tolerant—requests are never dropped even if downstream logging infrastructure is temporarily unavailable.
Separation of Proxy and Control Plane - Enables globally distributed gateways across regions with centralized config management for seamless scalability.
Latency Based Routing in AI Gateway
TrueFoundry’s AI Gateway supports intelligent latency-based routing that automatically directs requests to the fastest available model — no manual weight tuning required.
In-memory latency tracking: Uses time-per-token metrics across recent requests (up to 20 minutes or last 100 calls).
Self-adaptive: The system continuously monitors model performance and dynamically routes to the lowest-latency model.
Fairness threshold: Models within 1.2x of the fastest response time are treated as “fast” to prevent excessive switching.
Zero config overhead: You don’t need to define weights — just enable the strategy and the Gateway does the rest.
Robust to low traffic: If fewer than 3 requests are seen for a model, it’s still eligible for routing to build latency history.
This strategy ensures requests are always handled with minimal latency while maintaining system stability.
Metadata based observability in AI Gateway
Metadata Observability in AI Gateway enables tagging each request with custom context — such as user, team, environment, or feature. You can use this metadata not just for filtering logs and analyzing usage patterns, but also to configure rate limits, load balancing rules, and fallback behaviors with precision — all tailored to your business context.
We Are Hiring! Come build with us.
The success of TrueFoundry is driven by the people behind it. We are looking for talented people who want to work on high-impact, cutting-edge AI infrastructure challenges.
Explore open roles:https://truefoundry.com/careers Reach out to us: careers@truefoundry.com
Thank you to our customers, investors, and the entire TrueFoundry team for believing in this vision. We are just getting started.
Get in Touch!
Scaling AI shouldn’t be complex. TrueFoundry helps enterprises deploy and manage AI seamlessly, optimizing infrastructure for speed and efficiency. NVIDIA and Siemens Healthineers already trust us—let’s explore how we can support your AI journey.