How to send prompts in bulk with Spring AI and Virtual Threads

Raphael De Lio

Growing @ Redis | Software Engineer | AI | Machine Learning | International Speaker

Published May 13, 2025

TL;DR: You’re building an AI-powered app that needs to send lots of prompts to OpenAI. Instead of sending them one by one, you want to do it in bulk — efficiently and safely. This is how you can use Spring AI with Java Virtual Threads to process hundreds of prompts in parallel.

When calling LLM APIs like OpenAI, you’re dealing with a high-latency, network-bound task. Normally, doing that in a loop slows you down and blocks threads. But with Spring AI and Java 21 Virtual Threads, you can fire off hundreds of requests in parallel without killing your app.

This is particularly useful when you want the LLM to perform actions such as summarizing or extracting information from lots of documents.

Here’s the flow:

Get your list of text inputs.
Filter the ones that need processing.
Split them into batches.
For each batch:
Use Virtual Threads to make OpenAI calls in parallel
Wait for all calls to finish (using CompletableFuture)
Save the results

Virtual Threads for Massive Parallelism

Java Virtual Threads are perfect for this. They’re lightweight, run on the JVM, and don’t block OS threads. Ideal for I/O operations like talking to APIs.

Each OpenAI request runs in its own thread, but without the overhead of real threads.

Spring AI Prompt Call

You create a Prompt, then send it to the model:

You get back a structured response. From there, you just extract the output:

Processing in Batches

Sending all prompts at once isn’t a good idea (rate limits, reliability, memory). Instead, chunk them into smaller batches (e.g., 300 items):

For each batch: - Launch a CompletableFuture for every input - Wait for all with CompletableFuture.allOf(…).join() - Collect the results

Handling Errors Gracefully

Each task is wrapped in a try/catch block. So if one OpenAI call fails, it doesn’t crash the batch. You just skip that result.

Process Results in Bulk

After processing each batch: - Filter out the failed ones - Process the valid results

Full Implementation

In this example, we get a list of text, and send them to OpenAI in batches to get a summary. We do that in parallel, which makes the process much faster. After getting the summaries, we saves the results. Everything runs in a way that handles errors and avoids overloading the system.

And that’s it! You now have a fully async, high-throughput pipeline that can send hundreds of prompts to OpenAI — safely and efficiently — using nothing but Spring AI, Java Virtual Threads, and good batching.

How to send prompts in bulk with Spring AI and Virtual Threads

Raphael De Lio

Growing @ Redis | Software Engineer | AI | Machine Learning | International Speaker

Here’s the flow:

Virtual Threads for Massive Parallelism

Spring AI Prompt Call

Processing in Batches

Handling Errors Gracefully

Process Results in Bulk

Full Implementation

Stay curious!

More articles by this author

Others also viewed

Serialization and Deserialization at the Micro Level in Spark

cdxgen v11.2.x - Scala and SaaSBOM

Working with Strings in Rust: A Definitive Guide

LangChain4j LLM framework with Oracle Database 23ai Vector Embedding Store - Fruit Search Java App

Parallel Sort

Parallelism in ConcurrentHashMap

16+ Reasons to Why Learn Scala

Playing around with Spark and Spark ML

Not Just ‘P’-less, But Perception-altering

JSON Strings and Python Objects for Data Wrangling: A Beginner's Guide (Part 1)

Explore topics

Here’s the flow:

Virtual Threads for Massive Parallelism

Spring AI Prompt Call

Processing in Batches

Handling Errors Gracefully

Process Results in Bulk

Full Implementation

Stay curious!

Semantic Caching with Spring AI & Redis

Aug 4, 2025

Agent Memory with Spring AI & Redis

Jul 21, 2025

Semantic Search with Spring Boot & Redis

May 1, 2025

How To Learn Any Language Quickly And Efficiently

Feb 3, 2020

6 dicas para aprender Inglês (Por Raphael De Lio)

Feb 16, 2016

Others also viewed

Serialization and Deserialization at the Micro Level in Spark

cdxgen v11.2.x - Scala and SaaSBOM

Working with Strings in Rust: A Definitive Guide

LangChain4j LLM framework with Oracle Database 23ai Vector Embedding Store - Fruit Search Java App

Parallel Sort

Parallelism in ConcurrentHashMap

16+ Reasons to Why Learn Scala

Playing around with Spark and Spark ML

Not Just ‘P’-less, But Perception-altering

JSON Strings and Python Objects for Data Wrangling: A Beginner's Guide (Part 1)

Explore topics