How to send prompts in bulk with Spring AI and Virtual Threads
TL;DR: You’re building an AI-powered app that needs to send lots of prompts to OpenAI. Instead of sending them one by one, you want to do it in bulk — efficiently and safely. This is how you can use Spring AI with Java Virtual Threads to process hundreds of prompts in parallel.
When calling LLM APIs like OpenAI, you’re dealing with a high-latency, network-bound task. Normally, doing that in a loop slows you down and blocks threads. But with Spring AI and Java 21 Virtual Threads, you can fire off hundreds of requests in parallel without killing your app.
This is particularly useful when you want the LLM to perform actions such as summarizing or extracting information from lots of documents.
Here’s the flow:
Get your list of text inputs.
Filter the ones that need processing.
Split them into batches.
For each batch:
Use Virtual Threads to make OpenAI calls in parallel
Wait for all calls to finish (using CompletableFuture)
Save the results
Virtual Threads for Massive Parallelism
Java Virtual Threads are perfect for this. They’re lightweight, run on the JVM, and don’t block OS threads. Ideal for I/O operations like talking to APIs.
Each OpenAI request runs in its own thread, but without the overhead of real threads.
Spring AI Prompt Call
You create a Prompt, then send it to the model:
You get back a structured response. From there, you just extract the output:
Processing in Batches
Sending all prompts at once isn’t a good idea (rate limits, reliability, memory). Instead, chunk them into smaller batches (e.g., 300 items):
For each batch: - Launch a CompletableFuture for every input - Wait for all with CompletableFuture.allOf(…).join() - Collect the results
Handling Errors Gracefully
Each task is wrapped in a try/catch block. So if one OpenAI call fails, it doesn’t crash the batch. You just skip that result.
Process Results in Bulk
After processing each batch: - Filter out the failed ones - Process the valid results
Full Implementation
In this example, we get a list of text, and send them to OpenAI in batches to get a summary. We do that in parallel, which makes the process much faster. After getting the summaries, we saves the results. Everything runs in a way that handles errors and avoids overloading the system.
And that’s it! You now have a fully async, high-throughput pipeline that can send hundreds of prompts to OpenAI — safely and efficiently — using nothing but Spring AI, Java Virtual Threads, and good batching.