Building a Smart HTTP Client in Java
🚀 A Dynamic Thread Management with Rate Limiting & Circuit Breaker
📌 By Kashan Asim | Aug 2025 🔗 GitHub Repository
In my recent work with a bulk API calling system, I had to design an HTTP client that could hit downstream services at a desired input TPS (transactions per second), dynamically adapt to runtime latency or congestion, and avoid overwhelming downstream services.
Initially, the system either over-flooded the service with threads or hung when the server started lagging. So, I decided to go beyond traditional approaches like RestTemplate or WebClient with fixed thread pools and implement a self-aware HTTP client.
In this article, I’ll walk you through:
The problem with fixed threads in bulk processing
The rate-limited, circuit-breaker-enabled HTTP client
A smart thread adjustment algorithm
A realistic delay test service
Real-world benefits, and a known limitation we’ll explore in a future article.
💡 The Problem: Threads Gone Wild
When firing thousands of requests per minute, using a static thread pool creates two extremes:
If the thread pool is too small, you won’t meet your TPS targets.
If the thread pool is too big, you’ll saturate downstream systems, or worse — overload the JVM with blocking threads.
What we needed was an adaptive mechanism that starts with a baseline thread count and dynamically scales up/down based on how quickly requests complete.
🧠 The Solution: Dynamic Threaded HTTP Client
Let’s break it down into its major components:
1️⃣ Circuit Breaker & Rate Limiter
This makes sure:
We never exceed our desired TPS
We skip sending requests when the system is unstable
2️⃣ Submitting Requests with Monitoring
We track latency and adjust thread counts accordingly.
3️⃣ Dynamic Thread Tuner
We run a scheduler every 15 seconds to check current TPS and adjust thread count.
This ensures:
We scale up threads if we’re not hitting target TPS
We scale down if we’re overshooting and overloading downstream
🧪 The Test: Simulated Delay Service
I created a quick Spring Boot controller to simulate downstream latency and spike scenarios.
✅ This helped verify how well the HTTP client adjusts to changing server responsiveness.
📈 Example Scenario
Let’s say we want to achieve 100 TPS:
We start with, say, 20 threads.
Our scheduler observes we’re only achieving 65 TPS.
Based on this delta (35 short), we scale threads up to 40.
On the next run, we hit 102 TPS. Great.
But if the downstream becomes slow, we scale back to avoid queuing delays or JVM lock-ups.
This loop continues, always aiming to hover around target TPS, avoiding saturation.
🌍 Real-World Benefits
SLA Adherence: Meet your client TPS/SLA requirements more reliably.
Resource Efficient: Prevent over-provisioning of threads and CPU usage.
Backpressure-Aware: Reacts to latency spikes instead of pushing harder.
Auto-Tuning: Minimal human intervention or redeploys for tuning thread count.
⚠️ A Drawback to Address
One limitation in the current implementation is the blind ramp-up/down logic. It assumes all request failures are performance-related, which isn't always the case. Also, it does not differentiate between client-side timeouts vs. server-side issues vs. network glitches.
💡 In a future article, I’ll improve this model by adding latency buckets, error classification, and a sliding window TPS calculator for smarter decisions. Stay tuned!
🧩 Final Thoughts
This smart HTTP client isn't just a tool — it’s a strategy for making high-volume services reliable, predictable, and efficient.
Feel free to contribute, fork, or follow the repo here: 🔗 GitHub Repository
Software Engineer @ Systems Limited | RAK Bank | Java | Spring MVC | Spring Boot | Java 21 | MySQL | Mongo DB | Postgres | Micro Services | Rabbit MQ | Hibernate | AWS
2wbut i think your approach is a fine one if we use traditional threads
Software Engineer @ Systems Limited | RAK Bank | Java | Spring MVC | Spring Boot | Java 21 | MySQL | Mongo DB | Postgres | Micro Services | Rabbit MQ | Hibernate | AWS
2wKāshān Asim very insightful this ...i have a query though did you try virtual threads with rate limiting they are light weight threads so even if there numbers is high it will be not as intense as platform thread