When GenAI meets with Java with Quarkus and langchain4j

JF James, 2024
When Java
meets GenAI
at JChateau

Context
I'm neither a data scientist
nor an AI specialist
Just a Java Dev and
Software Architect
Wondering how to leverage
LLMs impressive
capabilities in our Apps

Experimentation
LET’S
EXPERIMENT
QUARKUS-
LANCHAIN4J
EXTENSION
WITH A SIMPLE
CAR BOOKING
APP
FOCUS ON
RAG AND
FUNCTION CALLS
USING AZURE GPT 3.5 & 4

How to
• Basic REST/HTTP
• Specific SDK: OpenAI
• Framework: langchain
• Low/No Code: FlowizeAI
• Orchestration tool: RAGNA

LangChain
• A popular framework for developing applications powered
by language models
• Assemblages of components for accomplishing higher-
level tasks
• Connect various building blocks: large language models,
document loaders, text splitters, output parsers, vector
stores to store text embeddings, tools, and prompts
• Supports Python and JavaScript
• Launched elf 2022 (just after ChatGPT release)

langchain4j
• The “Java version” of langchain
• Simplify the integration of AI/LLM capabilities into your
Java application
• Launched in 2023
• Last release : 0.27.1 (6 March 2024)

Quarkus-langchain4j
• Seamless integration between Quarkus and LangChain4j
• Easy incorporation of LLMs into your Quarkus applications
• Launched eof 2023
• Last release : 0.9.0 (6 March 2024) based on langchain4j
0.27.1

A fast pace of change
2017
Transformer
GPT1
2018
langchain
2022
2022
ChatGPT
2023
langchain4j
quarkus-langchain4j
2023

Defining an AI interface
@RegisterAiService
public interface CustomerSupportAgent {
// Free chat method, unstructured user message
@SystemMessage("You are a customer support agent of a car rental company …")
String chat(String userMessage);
// Structured fraud detection method with parameters
@SystemMessage("You are a car booking fraud detection AI… »)
@UserMessage("Your task is to detect if a fraud was committed for the customer {{name}} {{surname}} …")
String detectFraudForCustomer(String name, String surname);
}

LLM configuration
# Connection configuration to Azure OpenAI instance
quarkus.langchain4j.azure-openai.api-key=…
quarkus.langchain4j.azure-openai.resource-name=…
quarkus.langchain4j.azure-openai.deployment-name=…
quarkus.langchain4j.azure-openai.endpoint=…
# Warning: function calls support depends on the api-version
quarkus.langchain4j.azure-openai.api-version=2023-12-01-preview
quarkus.langchain4j.azure-openai.max-retries=2
quarkus.langchain4j.azure-openai.timeout=60S
# Set the model temperature for deterministic (non-creative) behavior (between 0 and 2)
quarkus.langchain4j.azure-openai.chat-model.temperature=0.1
# An alternative (or a complement?) to temperature: 0.1 means only top 10% probable tokens are considered
quarkus.langchain4j.azure-openai.chat-model.top-p=0.1
# Logging requests and responses in dev mode
%dev.quarkus.langchain4j.azure-openai.log-requests=true
%dev.quarkus.langchain4j.azure-openai.log-responses=true

Rietreval
Augmented
Generation

Principles
• Augment the LLM with specific knowledge
• From different data sources and formats: text, PDF, CSV …
• First off, the input text is turned into a vectorial representation
• Each request is then completed with relevant selected data
• Vector databases: InMemory, PgVector, Redis, Chroma …
• In-process embedding models: all-minlm-l6-v2-q, bge-small-en, bge-
small-zh …

Ingesting documents
public void ingest(@Observes StartupEvent evt) throws Exception {
DocumentSplitter splitter = DocumentSplitters.recursive(500, 0);
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor
.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.documentSplitter(splitter)
.build();
List<Document> docs = loadDocs();
ingestor.ingest(docs);
}

Retrieving relevant contents
public class DocRetriever implements ContentRetriever {
…
// From 0 (low selectivity) to 1 (high selectivity)
private static final double MIN_SCORE = 0.7;
@Inject
public DocRetriever(EmbeddingStore<TextSegment> store, EmbeddingModel model) {
this.retriever = EmbeddingStoreContentRetriever
.builder()
.embeddingModel(model)
.embeddingStore(store)
.maxResults(MAX_RESULTS)
.minScore(MIN_SCORE)
.build();
}
@Override
public List<Content> retrieve(Query query) {
return retriever.retrieve(query);
}
}

Binding an AI service to a document retriever
// Binding is defined with the RegisterAiService annotation
@RegisterAiService(retrievalAugmentor = DocRagAugmentor.class))
public interface CustomerSupportAgent { … }
// DocRagAugmentor is an intermediate class supplying the retriever
public class DocRagAugmentor implements Supplier<RetrievalAugmentor> {
@Override
public RetrievalAugmentor get() { … }
}

RAG configuration
# Local Embedding Model for RAG
quarkus.langchain4j.embedding-model.provider=dev.langchain4j…AllMiniLmL6V2EmbeddingModel
# Local directory for RAG documents
app.local-data-for-rag.dir=data-for-rag

Stephan Pirson, 2023
Basic principles
1. Instruct the LLM to call App functions
2. A function is a Java method annotated with @Tool
3. Function descriptors are sent requests
4. The LLM decides whether it’s relevant to call a function
5. A description of the function call is provided in the response
6. quarkus-langchain4j automatically calls the @Tool method

Perspective
Use the LLM as a “workflow
engine”
The LLM is entrusted with the
decision to call business logic
Both powerful and dangerous
Trustable? Reliable?

Defining a function
@Tool("Get booking details for booking number {bookingNumber} and customer {name} {surname}")
public Booking getBookingDetails(String bookingNumber, String name, String surname) {
Log.info("DEMO: Calling Tool-getBookingDetails: " + bookingNumber + " and customer: "
+ name + " " + surname);
return checkBookingExists(bookingNumber, name, surname);
}

Binding the functions to an AI interface
@RegisterAiService(tools = BookingService.class)
public interface CustomerSupportAgent { … }

LLM initial request
"functions":[
{
"name":"getBookingDetails",
"description":"Get booking details for {bookingNumber} and customer {firstName} {lastName}",
"parameters":{
"type":"object",
"properties":{
"firstName":{
"type":"string"
},
"lastName":{
"type":"string"
},
"bookingNumber":{
"type":"string"
}
},
"required":[
"bookingNumber",
"firstName",
"lastName"
]
}
}, …]

LLM intermediate response
"choices":[
{
"finish_reason":"function_call",
"index":0,
"message":{
"role":"assistant",
"function_call":{
"arguments":"{"firstName":"James","lastName":"Bond","bookingNumber":"456-789"}"
}
},
…
}
]

LLM intermediate request
{
"role":"function",
"content":"{"bookingNumber" : "456-789",
"customer" : { "firstName" : "James", "lastName" : "Bond" },
"startDate" : "2024-03-01",
"endDate" : "2024-03-09",
"carModel" : "Volvo",
"cancelled" : false}"
}

Example of a booking cancelation
Initial request
Second request
Response: “Your booking 456-789 has
been successfully cancelled, Mr. Bond.
Prompt: “I'm James Bond, can you
cancel my booking 456-789”
Local execution
Third request
call getBookingDetails
POST
final response (finish_reason=stop)
POST cancelBooking result
Stateless request
processing
Local execution
call cancelBooking
POST getBookingDetails result
Stateless request
processing
Stateless request
processing
User Application LLM

Lesson learns
• Overall interesting results:
• quarkus-langchain4j makes GenAI really easy!
• Even a generic LLM such as GPT proves to be helpful regarding a specific domain context
• GPT4 is more precise but significantly slower in this example:
• GPT 4 >=5 sec
• GPT 3.5 >=2 sec
• RAG:
• Be selective: set min_score appropriately in your context when retrieving text segments
• Request message can be verbose: selected text segments are added to the user message
• Function calls:
• Not supported by all LLMs
• Powerful and dangerous
• Hard to debug
• Potentially verbose: 1 round-trip per function call
• Many requests under the cover, similar to JPA N+1 queries problem
• Non-deterministic behavior but acceptable with temperature and seed set to minimum
• To be used with care on critical functions: payment, cancelation

Next steps
• Testability
• Auditability
• Observability
• Security
• Production readiness
• Real use cases beyond the fun

Code available on GitHub
https://github.com/jefrajames/car-booking

When GenAI meets with Java with Quarkus and langchain4j

More Related Content

What's hot (20)

Similar to When GenAI meets with Java with Quarkus and langchain4j (20)

More from Jean-Francois James (7)

Recently uploaded (20)

When GenAI meets with Java with Quarkus and langchain4j