A comprehensive guide to Agentic AI Systems

A Comprehensive Guide
to Agentic AI
Debmalya Biswas, PhD
Introduction to AI Agents
Reference Architecture
Agents Discovery &
Marketplace
Personalizing UX for
Agentic AI
Agent Observability &
Memory Management
Agentic AI Scenarios:
Agentic RAGs
Reinforcement Learning
Agents
Responsible AI Agents

AI Agents
In the Generative AI context,
Agents are representative of
an Autonomous Agent that
can execute complex tasks,
e.g.,
• - make a sale,
• - plan a trip,
• - make a flight booking,
• - book a contractor to do a
house job,
• - order a pizza.

Agentic AI capabilities – Task Decomposition

Agentic AI capabilities – Memory Management

Agentic AI capabilities – Reflect & Adapt

Agentic AI Use-case: Funds Email Marketing Campaign

Agentic AI
Reference
Architecture

Gen AI Architecture Patterns – APIs & Embedded Gen AI
While Enterprise LLM Apps have the
potential to accelerate LLM adoption by
providing an enterprise ready solution; the
same caution needs to be exercised as you
would do before using a 3rd party ML
model — validate LLM/training data
ownership, IP, liability clauses.
Black-box LLM APIs: This is the classic
ChatGPT example, where we have black-
box access to a LLM API/UI. Prompts are
the primary interaction mechanism for such
scenarios.
* D. Biswas. Generative AI – LLMOps Architecture Patterns. Data Driven Investor, 2023 (link)
* D. Biswas. Generative AI – LLMOps Architecture
Patterns. Data Driven Investor, 2023 (link)

Gen AI Architecture Patterns – Fine-tuning
LLMs are generic in nature. To
realize the full potential of LLMs for
Enterprises, they need to be
contextualized with enterprise
knowledge captured in terms of
documents, wikis, business
processes, etc.
This is achieved by fine-tuning a LLM
with enterprise knowledge /
embeddings to develop a context-
specific LLM.

Gen AI Architecture Patterns – Retrieval-Augmented-
Generation (RAG)
Fine-tuning is a computationally intensive process. RAG provides a viable alternative by providing
additional context with the prompts — grounding the retrieval / responses to the given context.
Given a user query, a RAG pipeline literally consists of
the 3 phases below:
- Retrieve: Transform user queries to embeddings to
compare its similarity score with other content.
- Augment: with search results / context retrieved from
a vector store that is kept current and in sync with the
underlying document repository.
- Generate: contextualized responses by making
retrieved chunks part of the prompt template that
provides additional context to the LLM on how to
answer the query.

Agentic AI Platform Reference Architecture
* D. Biswas. Stateful Monitoring and Responsible Deployment of AI Agents. 17th
International Conference on Agents and Artificial Intelligence (ICAART), 2025 (link)
The future where enterprises
will be able to develop new
Enterprise AI Apps by
orchestrating / composing
multiple existing AI Agents.

AI Agents Marketplace
& Discovery for Multi-
agent Systems

(Complex) Agentic AI Task Decomposition
A high-level approach to solving complex tasks:
• - decomposition of the given complex task into
a hierarchy or workflow of) simple tasks,
followed by
• - composition of agents able to execute the
simpler tasks.
This can be achieved in a dynamic or static manner.
• Dynamic: given a complex user task, the system
comes up with a plan to fulfil the request
depending on the capabilities of available
agents at run-time.
• Static: given a set of agents, composite agents
are defined manually at design-time combining
their capabilities.

Agent Marketplace & Discovery of AI Agents
Agent decomposition
and planning (be it static
or dynamic) requires a
discovery module to
identify the agent(s)
capable of executing a
given task.
This implies that there
exists a marketplace
with a registry of agents,
with a well-defined
description of the agent
capabilities and
constraints.

Hierarchical Agent
Composition
In LangGraph (for example), hierarchical agents are
captured as agent nodes that can be langgraph
objects themselves, connected by supervisor
nodes.
• LangGraph: Multi-Agent Workflows,
https://blog.langchain.dev/langgraph-multi-agent-workflows/
Hierarchical Finite State Machine (FSM)
representation of a Travel Funds Service

Limitations of LLMs as execution engines for Agentic AI
Current Agentic AI platforms leverage LLMs for both task decomposition and execution
of the identified tasks / agents.
- - The overall execution occurs within the context of a single LLM, or each task can be
routed to a different LLM.
- - In short, each task execution corresponds to an LLM invocation at run-time.
- Unfortunately, this approach is neither scalable nor practical for complex tasks.
LLMs cannot be expected to come-up with
the most efficient (agent) execution
approach for a given task at run-time every
time, esp. those requiring integration with
enterprise systems.
Agentic AI platforms need to learn
over multiple execution runs (meta-
learning): involving a combination of
user prompts, agents, and their
relevant skills (capabilities).

Non-determinism in Agentic AI Systems
There are two non-deterministic
operators in the execution plan:
‘Check Credit’ and ‘Delivery Mode’.
The choice ‘Delivery Mode’ indicates
that the user can either pick-up the
order directly from the store or have
it shipped to his address.
Given this, shipping is a non-
deterministic choice and may not be
invoked during the actual execution.

L2R for Agent Discovery based on Natural
Language Descriptions
Learning-to-rank (L2R) algorithm
to select top-k agents given a user
prompt:
- We first convert agent (class)
descriptions to semantic
embeddings offline and use them to
train the L2R model.
- The user prompts and the agents
use the same generic embedding
model.
- The inference results including the
agent description embeddings
during training and inferencing are
cached to enable the meta-learning
process for the L2R algorithm.

Agent Discovery based on a Constraints Model
The constraints are specified as logic
predicates in the service description of
the corresponding service published by
its agent.
An agent P provides a set of services
{S1,S2, … , Sn}. Each service S in turn has
a set of associated constraints {C1,C2, …
,Cm}. For each constraint C of a service
S, the constraint values maybe
- a single value (e.g., price of a
service),
- list of values (e.g., list of
destinations served by an airline), or
- or range of values (e.g., minimum,
maximum)
Capability: connects City A to B
Constraint: Flies only on certain
days a week; Needs payment by
Credit Card
* D. Biswas. Constraints Enabled Autonomous Agent Marketplace:
Discovery and Matchmaking. 16th International Conference on Agents
and Artificial Intelligence (ICAART), 2024 (link)

Personalizing UX
for Agentic AI

AI Agent Personalization
Analogous to fine-tuning of large language models (LLMs) to domain specific
LLMs / SLMs,
we argue that personalization / fine-tuning of (marketplace) AI agents will be
needed with respect to enterprise specific context (of applicable user
personas and use-cases) to drive their enterprise adoption.
Key benefits of AI agent personalization include:
- Personalized interaction: The AI agent adapts its language, tone, and
complexity based on user preferences and interaction history. This ensures that
the conversation is more aligned with the user’s expectations and
communication style.
- Use-case context: The AI agent is aware of the underlying enterprise use-case
processes, so that it can prioritize or highlight process features, relevant pieces
of content, etc. — optimizing the interaction to achieve the use-case goal more
efficiently.
- Proactive Assistance: The AI agent anticipates the needs of different users and
offers proactive suggestions, resources, or reminders tailored to their specific
profiles or tasks.

AI Agent Personalization Architecture
We highlight that UI/UX for AI agents is critical as the
last mile to enterprise adoption in this talk.

User Persona based Agent Personalization
Enterprise AI agent personalization remains challenging due to scale, performance, and privacy challenges.
* D. Biswas. Personalizing UX for Agentic AI. AI Advances, 2024 (link)
* D. Biswas. Personalizing UX for Agentic AI. AI
Advances, 2024 (link)
User persona-based agent personalization
segments the end-users of a service into a
manageable set of user categories, which
represent the demographics and preferences
of majority of users.
The fine-tuning process consists of
first parameterizing (aggregated) user data and
conversation history and storing it as memory in
the LLM via adapters, followed by fine-tuning
the LLM for personalized response generation.
The agent — user persona router helps in
performing user segmentation (scoring) and
routing the tasks / prompts to the most
relevant agent persona.

User Data Embeddings
Fine-tuning AI agents on raw user data is often too complex, even if it is at
the (aggregated) persona level.
This is primarily due to the following reasons::
- Agent interaction data usually spans multiple journeys with sparse data
points, various interaction types (multimodal), and potential noise or
inconsistencies with incomplete queries — responses.
- Moreover, effective personalization often requires a deep understanding
of the latent intent / sentiment behind user actions, which can pose
difficulties for generic (pre-trained) LLMs.
- Finally, fine-tuning is computationally intensive. Agent-user interaction
data can be lengthy. Processing and modeling such long sequences (e.g.,
multi-years’ worth of interaction history) with LLMs can be practically
infeasible.

User Data Embeddings (USER-LLM)
USER-LLM distills compressed
representations from diverse and noisy
user interactions, effectively capturing the
essence of a user’s behavioral patterns
and preferences across various interaction
modalities.
* L. Liu L. Ning. USER-LLM: Efficient LLM Contextualization with User Embeddings. Google Research, 2024 (link)
* L. Liu & L. Ning. USER-LLM: Efficient LLM Contextualization
with User Embeddings. Google Research, 2024 (link)

Reinforcement Learning based Personalization
We show how LLM generated responses can be personalized based on a Reinforcement Learning
(RL) enabled Recommendation Engine (RE).
High-level, the RL based LLM response / action RE
works as follows:
- The (current) user sentiment and agent
interaction history are combined to quantify the
user sentiment curve and discount any sudden
changes in user sentiment;
- leading to the aggregate reward value
corresponding to the last LLM response provided
to the user.
- This reward value is then provided as feedback
to the RL agent — to choose the next optimal
LLM generated response / action to be provided
to the user.
D. Biswas. Delayed Rewards in the Context of Reinforcement Learning based Recommender Systems. AAI4H@ECAI 2020: 49-53, (link)
E. Ricciardelli, D. Biswas. Self-improving Chatbots based on Reinforcement Learning. RLDM 2019 (link)
• D. Biswas. Delayed Rewards in the Context of Reinforcement Learning
based Recommender Systems. AAI4H@ECAI 2020: 49-53, (link)
• E. Ricciardelli, D. Biswas. Self-improving Chatbots based on
Reinforcement Learning. RLDM 2019 (link)

Agent Observability
& Memory
Management

Observability Challenges for Agentic AI
Observability for AI Agents is
challenging:
- No global observer: Due to their
distributed nature, we cannot assume
the existence of an entity having
visibility over the entire execution. In
fact, due to their privacy and
autonomy requirements, even the
composite agent may not have
visibility over the internal processing
of its component agents.
- Parallelism: AI agents allow parallel
composition of processes.
- Dynamic configuration: The agents
are selected incrementally as the
execution progresses (dynamic
binding). Thus, the “components” of
the distributed system may not be
known in advance.

Stateful execution for AI Agents
AgentOps monitoring is critical given the
complexity and long running nature of AI
agents. We define observability as the
ability to find out where in the process the
execution is and whether any
unanticipated glitches have appeared.
- Local queries: Queries which can be
answered based on the local state
information of an agent.
- Composite queries: Queries expressed
over the states of several agents.
- Historical queries: Queries related to the
execution history of the composition.
- Relationship queries: Queries based on
the relationship between states.
* D. Biswas. Stateful Monitoring and Responsible Deployment of AI Agents. 17th
International Conference on Agents and Artificial Intelligence (ICAART), 2025 (link)

Conversational Memory Management using Vector DBs
Vector DBs are currently the primary
medium to store and retrieve data
(memory) corresponding to
conversational agents.
- This involves selecting an encoder
model that performs offline data
encoding as a separate process,
converting various forms of raw data,
such as text, audio, and video, into
vectors.
- During a chat, the conversational agent
has the option of querying the long-
term memory system by encoding the
query and searching for relevant
information within Vector DB. The
retrieved information is then used to
answer the query based on the stored
information.

Human Memory Understanding
We need to consider the following memory types.
- Semantic memory: general knowledge with facts, concepts,
meanings, etc.
- Episodic memory: personal memory with respect to specific
events and situations from the past.
- Procedural memory: motor skills like driving a car, with the
corresponding procedures to achieve the task.
- Emotional memory: feelings associated with experiences.

Agentic Memory Management
The memory router, always, by
default, routes to the long-term
memory (LTM) module to see if
an existing pattern is there to
respond to the given user
prompt. If yes, it retrieves and
immediately responds,
personalizing it as needed.
* D. Biswas. Long-term Memory for AI Agents. AI Advances, 2024 (link)
* D. Biswas. Long-term Memory for AI Agents. AI
If the LTM fails, the memory
router routes it to the short-
term memory (STM) module
which then uses its retrieval
processes (APIs, etc.) to get the
relevant context into the STM
(working memory) —leveraging
applicable data services.

Agentic Memory Management (2)
The STM — LTM transformer module is
always active and constantly getting the
context retrieved and extracting recipes
out of it (e.g., refer to the concepts of
teachable agents and recipes
in AutoGen) and storing in a semantic
layer (implemented via Vector DB).
At the same time, it is also collecting
other associated properties (e.g., no. of
tokens, cost of executing the response,
state of the system, etc.) and
- creating an episode which is then getting
stored in a knowledge graph
- with the underlying procedure stored in
a finite state machine (FSM).

Agentic AI Scenarios:
- Agentic RAGs
- Reinforcement
Learning Agents

Agentic RAGs: extending RAGs to SQL Databases
Agentic AI framework to build RAG
pipelines that work seamlessly over
both structured and unstructured
data stored in Snowflake.
* D. Biswas. Agentic RAGs: extending RAGs to SQL Databases. AI Advances, 2024 (link)
* D. Biswas. Agentic RAGs: extending RAGs to SQL
Databases. AI Advances, 2024 (link)
The SQL & Document query agents
leverage the respective Snowflake
Cortex Analyst and Search
components detailed earlier to
query the underlying SQL and
Document repositories.
Finally, to complete the RAG
pipeline, the retrieved data is added
to the original prompt — leading
the generation of a contextualized
response.

Reinforcement Learning Agents
When we talk about AI agents
today, we mostly talk about LLM
agents, which loosely translates
to invoking (prompting) an LLM
to perform natural language
processing (NLP) tasks
Some agentic tasks might be
better suited to other ML
techniques, e.g., Reinforcement
Learning (RL), predictive
analytics, etc. — depending on
the use-case objectives.
* D. Biswas. LLM based fine-tuning of Reinforcement Learning Agents. AI Advances, 2024 (link)
* D. Biswas. LLM based fine-tuning of Reinforcement
Learning Agents. AI Advances, 2024 (link)

LLM based fine-tuning of Reinforcement Learning Agents
* D. Biswas. LLM based fine-tuning of Reinforcement Learning Agents. AI Advances, 2024 (link)
* D. Biswas. LLM based fine-tuning of Reinforcement
Learning Agents. AI Advances, 2024 (link)
We focus on RL agents, and
show how LLMs can be used
to fine-tune the RL agent
reward / policy functions.

Reinforcement Learning Agents applied to HVAC
Optimization
* D. Biswas. Reinforcement Learning based Energy Optimization in Factories, in proc. of the 11th ACM Conference on Future
Energy Systems (e-Energy), 2020. (link)
* D. Biswas. Reinforcement Learning based Energy Optimization in
Factories, in proc. of the 11th ACM Conference on Future Energy
Systems (e-Energy), 2020. (link)
We show a concrete
example of applying
the fine-tuning
methodology to a real-
life industrial control
system — designing
the RL based controller
for HVAC optimization
in a building setting.

Data Quality Issues with respect to LLMs, esp.
Vector DBs
From a data quality point of view,
we see the following challenges
w.r.t. LLMs, esp. Vector DBs:
- Accuracy of the encodings in vector
stores, measures in terms of
correctness and groundedness of
the generated LLM responses.
- Incorrect and/or inconsistent
vectors: Due to issues in the
embedding process, some vectors
may end up getting corrupted, be
incomplete, or getting generated
with a different dimensionality.
- Missing data can be in the form of
missing vectors or metadata.
- Timeliness issues w.r.t. outdated
documents impacting the vector
store.

Explainability
Explainable AI is an umbrella term for
a range of tools, algorithms and
methods; which accompany AI model
predictions with explanations.
- Explainability of AI models ranks
high among the list of ‘non-
functional’ AI features to be
considered by enterprises.
- For example, this implies having
to explain why an ML model
profiled a user to be in a specific
segment — which led him/her to
receiving an advertisement.
(Labeled)
Data
Train ML
Model
Predictions
Explanation
Model
Explainable
Predictions

Fairness & Bias
Bias creeps into AI models, primarily
due to the inherent bias already
present in the training data.
So the ‘data’ part of AI model
development is key to addressing
bias.
- Historical Bias: arises due to
historical inequality of human
decisions captured in the training
data
- Representation Bias: arises due to
training data that is not
representative of the actual
population.
*H. Suresh, J. V. Guttag. A Framework for Understanding Unintended Consequences of Machine Learning,
2020 (link)
*H. Suresh, J. V. Guttag. A Framework for Understanding
Unintended Consequences of Machine Learning, 2020 (link)

ML Privacy Risks
Two broad categories of
privacy inference attacks:
• Membership inference (if a
specific user data item was
present in the training
dataset) and
• Property inference
(reconstruct properties of a
participant’s dataset)
attacks.
Black box attacks are still
possible when the attacker
only has access to the APIs:
invoke the model and observe
the relationships between
inputs and outputs.
Training
dataset
wants access to
ML Model
(Classification,
Prediction)
Inference
API
has access to
Attacker
* D. Biswas. Privacy Preserving Chatbot Conversations. IEEE AIKE 2020: 179-182 (link)
*D. Biswas, K. Vidyasankar. A Privacy Framework for Hierarchical Federated Learning. CIKM Workshops 2021 (link)

Gen AI Privacy Risks – novel challenges
From a privacy point of view, we
need to consider the following
additional / different LLM privacy
risks:
- Membership and property
leakage from pre-training data
- Model features leakage from
pre-trained LLM
- Privacy leakage from
conversations (history) with
LLMs
- Compliance with privacy intent
of users
* D. Biswas. Privacy Risks of Large Language Models. AI Advances, 2024 (link)
* D. Biswas. Privacy Risks of Large Language Models.
AI Advances, 2024 (link)

Responsible deployment of AI Agents
* D. Biswas. Stateful Monitoring and Responsible Deployment of AI Agents. 17th International Conference on Agents and Artificial Intelligence (ICAART), 2025 (link)

Use-case specific Evaluation of LLMs
Need for a comprehensive LLM evaluation strategy with targeted
success metrics specific to the use-cases.
* D. Biswas. Use Case-Based Evaluation Strategy for LLMs. AI Advances, 2024 (link)
* D. Biswas. Use Case-Based Evaluation
Strategy for LLMs. AI Advances, 2024 (link)

LLM Safety Leaderboard
*Hugging Face LLM Safety Leaderboard (link)
*B. Wang, et. Al. DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
Models, 2024 (link)

Thanks
&
Questions
Debmalya Biswas
https://www.linkedin.com/in/debmalya-
biswas-3975261/
https://medium.com/@debmalyabiswas

A comprehensive guide to Agentic AI Systems

More Related Content

What's hot (20)

Similar to A comprehensive guide to Agentic AI Systems (20)

More from Debmalya Biswas (17)

Recently uploaded (20)

A comprehensive guide to Agentic AI Systems