Building generative AI applications with Amazon Bedrock Agents – Part 2
OpenAI Dall-E 3: Amazon Bedrock Agents use cases

Building generative AI applications with Amazon Bedrock Agents – Part 2

In the rapidly advancing landscape of artificial intelligence, one of the most significant transformations has been the evolution of AI agents. What began as simple rule-based systems has now evolved into sophisticated autonomous entities capable of learning, adapting, and making complex decisions in real-time.

From deterministic agents with predictable behaviors to non-deterministic agents that can handle uncertainty, these AI agents can now perceive their environments and take independent action with unprecedented capability.

Today's AI agent systems represent an intricate integration of components working in harmony, incorporating large language models and multi-agent coordination frameworks to achieve increasingly complex objectives.

Understanding the different types of AI agents will help us trace their technological progression and capabilities. These agents have developed significantly over the years, with various types designed to handle increasingly complex tasks.

The Evolution of AI Agents: From Simple Rules to Advanced Learning Systems

The evolution of artificial intelligence has progressed from basic rule-based systems in the 1950s to today's sophisticated autonomous agents capable of learning and adaptation. This transformation occurred through several key phases:

  • Early AI relied on deterministic rules with predictable but inflexible behavior.

  • Advances in handling larger datasets introduced probabilistic outcomes and non-deterministic decision-making.

  • And the 1990s marked a critical shift toward machine learning and neural networks, laying groundwork for deep learning.

Key theoretical foundations were established by pioneers like Hewitt (Actor model), Minsky ("Society of Mind"), and Russell & Norvig (Artificial Intelligence: A Modern Approach). The post-2012 era brought revolutionary changes with deep neural networks and back-propagation, enabling more sophisticated agent behaviors. Reinforcement Learning emerged as a significant paradigm, demonstrated powerfully in 2013 when DeepMind's systems outperformed humans in Atari games.

Despite advances in Deep Reinforcement Learning (DRL), challenges persist in generalization, computational intensity, and designing effective reward functions. The introduction of Large Language Models since 2017, particularly GPT-3 in 2020, has transformed AI's capabilities in language understanding and generation.

Today's AI systems employ diverse techniques including reinforcement learning and transfer learning, with researchers now developing hybrid models that combine LLMs' generalization abilities with reinforcement learning methodologies, creating more robust and adaptable intelligent agents.

LLM vs. Traditional Agents

Traditional agents operate on predetermined algorithms and rule sets, making them effective for specific tasks but limited in their ability to generalize or reason beyond their programmed scope.

In contrast, Large Language Models (LLMs) represent a significant advancement in AI agent design. Trained on vast text corpora, LLMs excel not only in natural language processing but also demonstrate strong generalization capabilities, allowing them to integrate with various tools.

Additionally, their emergent reasoning abilities enable LLMs to learn from mistakes, further enhancing their adaptability compared to traditional agents.

Figure 1 presents a comparison between AI agents and traditional generative AI.

Figure 1: AI agents vs. traditional generative AI. Image credit: Maximilian Vogel

Intelligent Agents

An intelligent agent is fundamentally defined as any entity capable of perceiving its environment and taking actions to achieve specific objectives. What distinguishes agents from mere computer programs is their autonomy and ability to operate within diverse environments, using past experiences and knowledge to make decisions aligned with their goals.

Four key characteristics define intelligent agents:

  • Autonomy: Agents operate independently, making decisions and taking actions without constant external guidance.

  • Perception: Through various sensory mechanisms, agents gather information about their surroundings.

  • Decision-making: Based on perceived information, agents select appropriate actions to accomplish their goals.

  • Action: Agents perform actions that change their environment's state to achieve desired outcomes.

Intelligent agents can be classified into five categories: Simple Reflex agents, Model-based Reflex agents, Goal-based agents, Utility-based agents, and Learning agents. Among these, Reinforcement Learning (RL) agents and Large Language Model (LLM) agents represent the most advanced Learning agents.

Figure 2 presents a comparison table of Intelligent Agent Types.

Figure 2: Comparison Table of Intelligent Agent Types

AI agent system

Large Language Model (LLM)-based agents systems that can be categorized into two fundamental categories: Single-Agent and Multi-Agent systems.

Single-Agent systems feature individual LLM-based intelligent agents capable of handling diverse tasks across multiple domains through their extensive language understanding, task generalization abilities, and action capabilities. These systems combine the linguistic capabilities of LLMs with structured components that enable them to plan, remember, reflect, interact with environments, and take actions.

The five fundamental components that constitute an LLM-based single agent system are: Planning, Memory, Rethinking, Environment, and Action.

Figure 3 describes the structure of the LLM-based single agent system.

Figure 3: Overview of LLM-based agents

--> Planning

Planning represents the cognitive foundation of LLM-based agents—their ability to strategize, sequence actions, and achieve goals within environmental constraints. Unlike traditional agents that rely on algorithmic planning methods such as Dijkstra's algorithm and POMDPs (Partially Observable Markov Decision Processes), LLM-based agents derive their planning capabilities primarily from the language model itself.

Three main planning methodologies have emerged:

1. In-Context Learning (ICL) Methods

In-Context Learning (ICL) uses natural language prompts to guide language models in problem-solving. Several methodologies have been developed to enhance planning capabilities:

  • Chain of Thought (CoT) methodologies encourage systematic breakdown of complex tasks into manageable components. Various implementations include:

- Complex CoT: For addressing particularly intricate problems

- Auto CoT: For automated reasoning path generation

- Zero-shot CoT: For reasoning without prior examples

  • Self-consistency techniques generate multiple reasoning pathways using an LLM and integrate the resulting answers, often selecting the most consistent response through a voting system among different pathways. Examples include: Mathematical problem solving, Commonsense reasoning and Symbolic Reasoning.

  • Tree of Thought (ToT) creates a tree-like structure by segmenting problems into distinct thinking stages, with multiple concepts generated at each stage. The search process employs either breadth-first or depth-first exploration and evaluates states using classifiers or majority voting. Examples include: 24 Game, Creative Writing, and Mini Crosswords.

  • Least-to-Most approaches break down complex problems into sub-problems that are addressed sequentially, enhancing CoT's generalization abilities. Examples include: Educational & Assistive Technology apps like Avaz AAC app and Habit-Building & Wellness apps.

  • Skeleton of Thought (SoT) directs the LLM to first generate a framework for an answer before completing each skeleton point through API calls or batch decoding, significantly expediting answer generation. Examples include: Business report generation, scientific paper drafting, customer service chatbot and software Documentation.

  • Graph of Thought (GoT) represents LLM-produced information as an arbitrary graph, with information units (thoughts) as vertices and dependencies between these vertices as edges. Examples include: Sorting, set operations, keyword counting and document merging.

  • Progressive Hint Prompting (PHP) accelerates guidance toward accurate answers by employing previously generated responses as prompts, improving reasoning capabilities. Examples include: Solving complex mathematical problems, educational applications and other reasoning-related applications.

  • Self-Refine enables LLMs to provide multifaceted feedback on their outputs and iteratively refine prior outputs based on this feedback, mimicking human iterative improvement processes. Examples include: Acronym generation, dialogue response generation, commonsense generation and code optimization.

2. External Capabilities Methods

External capabilities enhance LLM planning through integration with specialized tools and algorithms:

  • LLM+P leverages classical planners for long-term planning using Planning Domain Definition Language (PDDL) as an intermediate interface. The process involves translating problems into PDDL descriptions, generating plans, and converting them back to natural language. Examples include: Robot task planning

  • LLM-DP combines LLM with symbolic planners for solving embodied tasks, leveraging the LLM's understanding of action impacts and the planner's solution-finding efficiency. Examples include: Symbolic planners, Planning and reasoning problems and Alfworld.

  • RAP (Reasoning And Planning) framework implements conscious planning reasoning by adding a world model, employing Monte Carlo Tree Search for efficient exploration to generate high-reward reasoning trajectories. Examples include: Planning and reasoning problems, math and logical reasoning problems.

Other approaches include using LLMs as commonsense world models with heuristic strategies, integrating Cognitive Architectures with LLMs, simulation-based methods representing contexts through knowledge graphs, and knowledge graph-based search methods like Think-on-Graph for identifying optimal planning paths.

3. Multi-stage Methods

Multi-stage planning divides the process into discrete phases to enhance performance in complex reasoning tasks:

  • SwiftSage combines behavior cloning and guided LLMs through two modules, the SWIFT module for rapid, intuitive thinking and the SAGE module for deliberative thinking.

  • DECKARD divides exploration into Dreaming and Awake stages, where the Dreaming stage decomposes tasks into sub-goals, and the Awake stage learns modular strategies for each sub-goal.

--> Memory

The memory system in LLM-based agents preserves and regulates knowledge, experiential data, and historical information. Memory in these agents is typically documented in textual format to enable seamless interaction with the LLM and fall into distinct categories:

1. Short-term Memory:

Short-term memory stores and manipulates limited quantities of transient information relevant to ongoing tasks, constrained by the LLM's context length. Implementation examples include:

  • ChatDev: Archives conversation history to enable decision-making based on inter-agent communication records

  • LangChain: Enhances efficiency by encapsulating crucial information from each interaction while preserving the most recurrent interactions

2. Long-term Memory:

Long-term memory stores substantial volumes of knowledge, experiences, and historical records, often implemented through:

Implementation examples include:

  • Voyager: Employs an expanding skill repository for storing and retrieving complex behaviors

  • GITM: Extracts relevant textual knowledge from external knowledge bases to identify necessary materials and tools

  • ExpeL: Preserves experiences across multiple tasks to enhance performance

  • Reflexion: Stores experiences from self-reflection in long-term memory to influence future actions

  • MemGPT: Manages diverse memory hierarchies to provide extended context within limited context windows

3. Memory Retrieval:

Efficient memory retrieval is essential for LLM-based agents, implemented through techniques such as:

  • Retrieval-augmented generation combines information retrieval with an LLM to produce more reliable outputs.

  • LaGR-SEQ introduces Sample Efficient Query (SEQ), which trains a secondary RL-based agent to determine when to query the LLM for solutions.

  • REMEMBER equips LLMs with long-term memory capabilities, enabling them to draw from past experiences through reinforcement learning and experience memory.

  • Synapse removes task-irrelevant information from raw states to enable more samples within a restricted context, generalizing to novel tasks through similarity-based retrieval of stored sample embeddings.

  • DT-Mem provides an internal memory module for storing and retrieving information relevant to various downstream tasks.

--> Rethinking

Rethinking encompasses an LLM-based agent's capacity for introspection, evaluating prior decisions and environmental feedback. This faculty permits the agent to examine its behavior, decision-making, and learning processes, enhancing intelligence and adaptability. Several Rethinking methodologies are available such as:

1. In-Context Learning Methods

  • ReAct implements an interactive paradigm that alternates between generating task-related linguistic reasoning and actions, fostering synergistic enhancement of reasoning and action proficiencies.

  • Reflexion computes heuristics after each action and uses them for performance evaluation and improvement.

2. Supervised Learning Methods

Supervised learning relies on diverse sources including LLMs, human expertise, code compilers, and external knowledge:

  • CoH (Chain of Hindsight) exploits sequences of prior outputs annotated with feedback to foster model self-enhancement through supervised fine-tuning, experience replay, and grading mechanisms.

  • Process supervision has been experimentally shown to surpass outcome supervision in mathematical reasoning tasks, with active learning significantly boosting the efficacy of process supervision.

  • Introspective Tips introduces a self-examination framework based on past trajectories or expert demonstrations, generating concise but valuable insights for strategy optimization.

3. Reinforcement Learning Methods

These methods emphasize parameter enhancement through learning from historical experiences:

  • Retroformer improves agents by learning from retrospective models and employing policy gradients to autonomously modulate the LLM-based agent's prompts.

  • REMEMBER introduces a novel semi-parametric reinforcement learning methodology that combines reinforcement learning with experience memory to update capabilities through experiential analogies.

  • REX incorporates an auxiliary reward layer and assimilates concepts similar to Upper Confidence Bound scores, resulting in more robust and efficient agent performance.

  • ICPI (In-Context Policy Iteration) demonstrates the capacity to execute RL tasks without expert demonstrations or gradients by iteratively updating prompt content through trial-and-error interactions.

4. Modular Coordination Methods

These approaches typically involve multiple modules working together:

  • DIVERSITY investigates various prompts to increase reasoning pathway diversity, incorporating verification mechanisms to distinguish between favorable and unfavorable responses.

  • DEPS framework interacts with LLM planners through descriptors, interpreters, and goal selectors to improve overall success rates.

  • PET (Planning, Elimination, and Tracking) leverages LLM knowledge to streamline control problems for embodied agents, accomplishing higher-level subtasks through its three-module approach.

--> Environments

LLM-based agents interact and learn from various environments through environmental feedback, including computer, gaming, code, real-world, and simulation environments.

1. Computer Environment

LLM-based agents engage with websites, APIs, databases, and applications across computer, web, and mobile contexts. The modes of interaction include:

  • Web Scraping: Collecting information from websites to acquire essential data

  • API Calls: Employing Web APIs to access or transmit data

  • Web Searching: Using search engines to discover relevant information

  • Software Interaction: Manipulating interfaces with software applications

  • Database Queries: Accessing and updating databases directly

Recent research has introduced methodologies like RCI (guiding language models via natural language commands), WebArena (an independent web environment for autonomous agents), WebGPT (leveraging search engines for document retrieval), Mobile-Env (allowing agents to interact with Android OS), and SheetCopilot (facilitating spreadsheet interaction using natural language).

2. Gaming Environment

In gaming environments, LLM-based agents interact with virtual characters, objects, and settings. Interaction methods include:

  • Character Control: Directing in-game characters through commands

  • Environmental Interactions: Engaging with objects in the game environment

  • State Perception: Gathering status information for decision-making

Notable applications include DECKARD (LLM-guided exploration in Minecraft), VOYAGER (a Minecraft-based lifelong learning agent), GITM (translating complex goals into low-level actions), AgentSims (generating virtual towns with diverse buildings and residents), and LLM-Deliberation (a testing platform for text-based negotiation games).

3. Coding Environment

The coding environment enables LLM-based agents to compose, modify, and execute code. Interaction methodologies encompass:

  • Code Generation: Producing code snippets or complete programs

  • Code Debugging: Identifying and rectifying errors

  • Code Evaluation: Executing code and assessing performance

Key implementations include LLift (an automated agent interfacing with static analysis tools), MetaGPT (incorporating human workflows into LLM-driven collaboration), ChatDev (a virtual chat-driven software development company), and CSV (augmenting mathematical reasoning abilities through code verification).

4. Real-World Environment

LLM-based agents can interact with physical devices, sensors, and actuators in real-world scenarios through:

  • Data Collection: Accumulating real-time data from sensors

  • Device Control: Manipulating actuators like robotic arms and drones

  • Human-Machine Interaction: Engaging in natural language communication with humans

Research in this area includes language-centric reasoning toolkit frameworks for robot manipulation, TaPA (embedded task-planning agents for physical scene constraints), and heuristic methods for guiding LLM-based agents in collaborating with humans.

5. Simulation Environment

In simulation environments, LLM-based agents interact with virtual models representing real-world systems through:

  • Model Manipulation: Adjusting parameters to explore various scenarios

  • Data Analysis: Analyzing simulation data to identify patterns and insights

  • Optimization: Determining optimal actions within constraints

Examples include TrafficGPT for traffic flow analysis and AucArena for auction simulations.

--> Action

The action capabilities of LLM-based agents involve performance of actions or the employment of tools. These agents primarily interact through text generation, but can also employ external tools through three key approaches:

1. Tool Employment

Tool employment forms a critical aspect of an LLM-based agent's action capabilities. Several innovative approaches have emerged in this domain:

  • MRKL (Modular Reasoning, Knowledge, and Language) integrates LLMs with external tools to address complex problems through module construction and natural language query routing.

  • TALM (Tool-Augmented Language Models) establishes connections between language models and tools, facilitating text-to-text API interactions.

  • ToolFormer demonstrates the capacity of LLMs to leverage external tools, significantly enhancing performance across various tasks.

  • HuggingGPT combines multiple AI models and tools for comprehensive task planning and execution, encompassing text classification, object detection, and more.

  • Gorilla focuses on applying LLMs in API calls and program synthesis, incorporating context learning and task decomposition to improve performance.

  • RestGPT connects LLMs with RESTful APIs to address user requests through online planning and API execution.

  • TaskMatrix.AI processes inputs in various formats (text, images, videos, audio, code) and generates code that invokes APIs to complete tasks.

  • D-Bot provides database maintenance suggestions, covering knowledge detection, root cause analysis, and multi-LLM collaboration.

  • Chameleon employs various tools to address challenges and uses a natural language planner to select and combine modules for solution construction.

  • AVIS represents an autonomous visual information-seeking system that leverages LLMs to formulate strategies for using external tools and examining their outputs.

2. Tool Planning

Effective tool planning is essential for maximizing the utility of available resources:

  • ChatCoT models chain-like thinking into multi-turn dialogues, enhancing complex task handling through tool-aided reasoning.

  • TPTU introduces a comprehensive task execution framework that includes task instructions, design prompts, toolkits, LLM, results, and capabilities for task planning and tool utilization.

  • ToolLLM develops a Decision Tree based on Depth-First Search, enabling LLMs to evaluate multiple API-based reasoning paths.

  • Gentopia provides a framework for flexible customization of agents through simple configuration, integrating various language models, task formats, prompt modules, and plugins.

3. Tool Creation

Beyond using existing tools, advanced LLM-based agents can also create new tools for specific tasks:

  • LATM (LLMs As Tool Makers) is a framework for tool creation and utilization, generating tools suitable for diverse tasks through staged tool generation and task execution.

  • CRAFT focuses on developing and retrieving general-purpose tools, enabling the generation of specialized toolkits tailored for specific tasks. LLMs can extract tools from these toolkits to address complex tasks.

In contrast to single-Agent systems, the Multi-Agent Systems (MAS), which will be covered in greater depth in the next blog article, involve multiple interacting intelligent agents requiring intricate coordination, with each agent typically possessing specialized domain expertise. MAS frameworks encompass dimensions such as agent granularity, knowledge heterogeneity, control distribution mechanisms, and communication protocols, and can be further classified based on role coordination (cooperative, competitive, mixed, or hierarchical) and planning approaches (centralized or decentralized).

Amazon Bedrock: A Model-Based Reflex Agent

Amazon Bedrock exemplifies the implementation of model-based reflex agent architecture in modern artificial intelligence. As a fully managed service, Bedrock leverages foundational models (FMs) to create an internal representation of complex environments while continually updating its understanding based on real-world observations. Unlike simple reflex agents that follow basic condition-action rules, Bedrock incorporate a multi-stage processing framework that aligns perfectly with the model-based reflex paradigm. When Bedrock receives input (Sense), it dynamically constructs and updates its internal model of the relevant domain using its suite of foundation models from from leading AI companies like AI21 Labs, Anthropic, Cohere, DeepSeek, Luma, Meta, Mistral AI, Stability AI, Amazon's own Titan models and Amazon Nova. It then applies sophisticated reasoning mechanisms to evaluate potential actions against this model (Reason), considering not just immediate responses but anticipating how its actions might affect future states. Finally, it executes the optimal response through various application integrations (Act). This cautionary approach allows Bedrock to excel in environments with partial observability, where direct perception alone would be insufficient for effective decision-making. By maintaining and continuously refining its internal models with new data, Bedrock can make informed decisions even when faced with complex, ambiguous scenarios, making it an ideal platform for businesses seeking to implement AI solutions that require nuanced understanding of dynamic environments while balancing computational efficiency with sophisticated reasoning capabilities. Figure 4 describes the structure of the Model-based reflex agent.

Figure 4: Model-based reflex agents. Based on Russel and Norvig (2009)

Conclusion

LLM-based agent systems combine language capabilities with structured components for autonomous operation. The five fundamental elements—Planning, Memory, Rethinking, Environment, and Action work together to create agents that can plan, remember, reflect, interact and execute effectively.

In our upcoming article, we'll explore how multiple such agents can work together in Multi-Agent Systems, examining various architectural frameworks that enable collaborative problem-solving.

Hakan Korkmaz

Senior Partner Solutions Architect at AWS

3mo

Another great article. Congratulations Zakaria.

Peter E.

Founder of ComputeSphere | Building cloud infrastructure for startups | Simplifying hosting with predictable pricing

3mo

Big win,congrats! I’ve seen how much sweat goes into getting those badges, and AWS doesn’t hand them out like candy. I think titles like “ambassador” only matter when the person behind them actually walks the talk and it seems like you do. Curious to see how you use this platform next!

Lahcen Hadouni

Advisory Board Member Currency News

3mo

Great and all the best Zakaria .. 👍👏

To view or add a comment, sign in

Others also viewed

Explore topics