SlideShare a Scribd company logo
Chatbots and AI Agents
1 1 - 6 6 7 : L A R G E L A N G U A G E M O D E L S :
M E T H O D S A N D A P P L I C A T I O N S
What to expect on the midterm
• Conceptual questions about the content of the lecture and readings
• Topics you should prepare to be assessed on
• Transformer architecture
• Pretraining (data collection and learning objectives)
• Finetuning techniques and data (alignment, RLHF, PETM)
• Evaluation (human and automatic)
• In-context learning
• Interpretability
• Applications (search, dialog agents)
When you think of “chatbot” what comes to
mind?
• ChatGPT
• Bard
• character.ai
Implementation
1. Take pre-trained LLM
2. Finetune it on appropriate data
3. Clever prompting
What distinguishes an AI agent from a
chatbot?
• An agent…
• exists within an environment
• can take actions that change its environment
• can converse with other agents within the environment
• Has a persona
• Has a goal
• Has memories of what has previously transpired
General-purpose chatbots (ChatGPT, Bard, etc.) do not exist in an
environment they can alter, and they do not have specific goals. All
memory is implicit in the conversational history.
Why care about building AI agents?
• Entertainment / video games
• Modeling real-user behaviour
• For example, testing a new application with “mock” users could be less expensive than hiring
real users to test it out.
• Pre-requisite for embodied agents.
• We can use agents acting in a virtual environment to measure progress
toward agents acting in a real one.
• Challenging evaluation platform for natural language understanding
and generation
• Agents in a fantasy text adventure game
• “Learning to Speak and Act in a Fantasy Text Adventure Game.” Urbanek et al.
2021.
• Diplomacy-playing agent
• “Human-level play in the game of Diplomacy by combining language models
with strategic reasoning.” Bakhtin et al. 2022.
• Simulated town
• “Generative Agents: Interactive Simulacra of Human Behavior.” Park et al.
2023.
Case Studies in this Lecture
Agents in a fantasy text adventure game
• Environment:
• Locations, randomly glued together
• Each location also has some number of items
• Agents:
• Each agent is situated in the environment.
• Each agent possess some number of items
• Agent actions:
• Emote: {applaud, cringe, cry, etc.}
• Chat with other agents
• Perform a physical action (e.g. “put robes in
closet” or “eat salmon”)
• Agent, locations, and items have natural
language descriptions.
Agents in a text adventure game
Agents in a text adventure game
Agents in a text adventure game
Task goal: Can we generate a conversation between the thief and the gravedigger and
predict which actions/emotes they will take after each conversational utterance?
Agents in Diplomacy, a negotiation-based board game
• Seven players compete to control
countries (SCs) on a map.
• At each turn, players chat with each-other
to decide on their actions.
• Any promises, agreements, threats, etc. are
non-binding.
• Once chatting is over, players may choose
to
• Move their units, waging war if into an
already-occupied region
• Use their units to support other units (which
could include the units of a different player)
Agents in Diplomacy, a negotiation-based board game
• Seven players compete to control
countries (SCs) on a map.
• At each turn, players chat with each-other
to decide on their actions.
• Any promises, agreements, threats, etc. are
non-binding.
• Once chatting is over, players may choose
to
• Move their units, waging war if into an
already-occupied region
• Use their units to support other units (which
could include the units of a different player)
Task goal: An AI agent that follows the same rules and norms as the human agents, and
has as good a win-rate as skilled human players.
Agents in a simulated town
Agents in a simulated town
• Modeled after the video game the
Sims
• 25 agents
• Each begins the simulation with a pre-
defined set of “seed memories”
• Agents do not have explicit goals
Agents in a simulated town
• Modeled after the video game the
Sims
• 25 agents
• Each begins the simulation with a
pre-defined set of “seed memories”
• Agents do not have explicit goals
• At each step:
• Each agents output a natural
language statement of their action
• “write in journal”
• “walk to pharmacy”
• “talk to Joe”
• Actions and environment state are
parsed into memories, reflections,
and observations.
Where can LLMs be used in these systems?
• Dialog with other agents (who may be either human agents or other AI
agents)
• Deciding on agent actions
• Choosing what information (from the environment and from the agent’s
internal state) to condition the conversation and decision-making on.
.
Where can LLMs be used in these systems?
• Dialog with other agents (who may be either human agents or other AI
agents)
• Deciding on agent intents
• Choosing what information (from the environment and from the agent’s
internal state) to condition the conversation and decision-making on.
Challenges:
• How can we convert world and agent state into natural language?
• How can we convert natural language into agent actions and environment
changes?
• Can all these tasks be accomplished with a general-purpose LM or do we
need finetuned models?
Choosing information to condition the
conversation and decision-making on.
• In many cases, there will be more information than can fit into an LM
context window. Most of this won’t be relevant.
• The Town Sim keeps around a database of memories. Memories are scored by
their recency, importance, and relevance to ongoing memory.
Compute LM sequence embedding of query
memory and each memory in database.
Score database memories by dot product
with query memory.
Choosing information to condition the
conversation and decision-making on.
• In many cases, there will be more information than can fit into an LM
context window. Most of this won’t be relevant.
• The Town Sim keeps around a database of memories. Memories are scored by
their recency, importance, and relevance to ongoing memory.
Choosing information to condition the
conversation and decision-making on.
• In many cases, there will be more information than can fit into an LM
context window. Most of this won’t be relevant.
• The Town Sim keeps around a database of memories. Memories are scored by
their recency, importance, and relevance to ongoing memory.
• In Diplomacy, the dialog model and intent model see as input:
• dialogue history (all messages exchanged between player A and the six other players up
to time t)
• game state, action history, and metadata (current game state, recent action history,
game settings, A’s Elo rating, etc.)
• For the dialog model: A’s intended actions, and the actions A wants its conversational
partner to complete.
Choosing information to condition the
conversation and decision-making on.
• In many cases, there will be more information than can fit into an LM
context window. Most of this won’t be relevant.
• The Town Sim keeps around a database of memories. Memories are scored by
their recency, importance, and relevance to ongoing memory.
• In Diplomacy, the dialog model and intend model see as input:
• dialogue history (all messages exchanged between player A and the six other players up
to time t)
• game state, action history, and metadata (current game state, recent action history,
game settings, A’s Elo rating, etc.)
• For the dialog model: A’s intended actions, and the actions A wants its conversational
partner to complete.
• In the Fantasy Text Adventure, dialog rounds were short enough that all
environment information and history fit into max sequence length.
Deciding on agent intent
• Can we trust an LLM to choose reasonable intents?
• Fantasy Text Adventure Game
• Yes, via a finetuned BERT-based ranker
• Simulated Town
• Yes, through prompting GPT-3 with an agent’s description and memories
• Hierarchical generation: generate a broad plan first, and then generate smaller steps in
the plan
• Diplomacy
• No, use a reinforcement learning agent trained through self-play to output an action
intent
Dialog with other agents
• All three examples in our case study use LLMs to generate dialog.
• Diplomacy and Fantasy Text Adventure finetuned models
• Simulated Town used instruction-tuned GPT-3 without further finetuning
• When is finetuning especially helpful:
• If the world state cannot be effectively represented in natural language.
• When bad dialog can lead to poor outcomes
• Simulated Town paper notes how their generated dialogs tend to be very formal and
stilted, likely due to GPT-3’s instruction tuning.
• An LLM is not always the right tool for the job:
• Example: Settlers of Catan AI agent can do well just with templated text
generation
Takeaways
Quiz Question
In what kinds of scenarios would a pre-trained LLM without finetuning
not be a good choice for outputting agent intents?

More Related Content

PPT
Lifelike computer characters
PDF
Alexa, Tell Me About Global Chatbot Design and Localization!
PDF
Bots: The Unspoken Challenge of Conversations
PPTX
Matrix Game Simulation Presentation - VCOW 2025 v1.pptx
PDF
Understanding and improving games through machine learning - Natasha Latysheva
PPTX
Matrix Game Simulation Presentation - VCOW 2025 v1.1.pptx
PPTX
Long Lin at AI Frontiers : AI in Gaming
PDF
Beginners chatbotai workshopdbb_bitfusion
Lifelike computer characters
Alexa, Tell Me About Global Chatbot Design and Localization!
Bots: The Unspoken Challenge of Conversations
Matrix Game Simulation Presentation - VCOW 2025 v1.pptx
Understanding and improving games through machine learning - Natasha Latysheva
Matrix Game Simulation Presentation - VCOW 2025 v1.1.pptx
Long Lin at AI Frontiers : AI in Gaming
Beginners chatbotai workshopdbb_bitfusion

Similar to W6L1_LARGE LANGUAGE MODELS: METHODS AND APPLICATIONS - Chatbots and AI Agents (20)

PDF
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
PDF
Realizing AI Conversational Bot
PDF
Chatbot and AI Design Principles
PPTX
ChatGPT_Webinar_Slides How to use it properly in Education.pptx
PDF
Chat bot technologies overview
PDF
Game Writing & Narrative Design
PPTX
Large Scale Data Management
PPTX
ChatGPT … How Does it Flow?, Mark Jones
PDF
Introduction to Artificial Intelligence
PPTX
Prototyping Accessibility: Booster 2019
PPTX
Pair Programming Styles
PDF
Workshop: Chatbot in a box - Introduction to conversation design and conducti...
PPTX
TJD_2023_Lab_06.pptx
PPTX
Artificial Intelligence in Gaming
PPTX
Generative-AI-on-the-MSc-Environmental-Technology-23_24.pptx
PPTX
Generative-AI-on-the-MSc-Environmental-Technology-23_24.pptx
PPTX
The Software Challenges of Building Smart Chatbots - ICSE'21
PDF
A model of a social chatbot
PDF
2017 Tutorial - Deep Learning for Dialogue Systems
PDF
Automated Negotiation
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Realizing AI Conversational Bot
Chatbot and AI Design Principles
ChatGPT_Webinar_Slides How to use it properly in Education.pptx
Chat bot technologies overview
Game Writing & Narrative Design
Large Scale Data Management
ChatGPT … How Does it Flow?, Mark Jones
Introduction to Artificial Intelligence
Prototyping Accessibility: Booster 2019
Pair Programming Styles
Workshop: Chatbot in a box - Introduction to conversation design and conducti...
TJD_2023_Lab_06.pptx
Artificial Intelligence in Gaming
Generative-AI-on-the-MSc-Environmental-Technology-23_24.pptx
Generative-AI-on-the-MSc-Environmental-Technology-23_24.pptx
The Software Challenges of Building Smart Chatbots - ICSE'21
A model of a social chatbot
2017 Tutorial - Deep Learning for Dialogue Systems
Automated Negotiation
Ad

More from cniclsh1 (20)

PDF
Knowledge Representation Part VI by Jan Pettersen Nytun
PDF
Knowledge Representation Part III by Jan Pettersen Nytun
PDF
interacting-with-ai-2023---module-2---session-4---handout.pdf
PDF
interacting-with-ai-2023---module-2---session-3---handout.pdf
PDF
interacting-with-ai-2023---module-2---session-1---handout.pdf
PDF
Chatbot are sentient, turing test, generative AI
PDF
Model-Based Reinforcement Learning CS 285: Deep Reinforcement Learning, Decis...
PDF
Inverse Reinforcement Learning CS 285: Deep Reinforcement Learning, Decision ...
PDF
Probabilistic AI Lecture 1: Introduction to variational inference and the ELBO
PDF
Bayesian Statistics in High Dimensions Lecture 1: Curve and surface estimation
PDF
Foundations of Artificial Intelligence 1. Introduction Organizational Aspects...
PDF
W1L2_11-667 - Building Blocks of Modern LLMs 2: Pretraining Tasks
PDF
W4L2_11-667: LARGE LANGUAGE MODELS: METHODS AND APPLICATIONS - Parameter Effi...
PDF
W4L1_11-667: LARGE LANGUAGE MODELS: METHODS AND APPLICATIONS - Human Evaluati...
PDF
W4L2_11-667: LARGE LANGUAGE MODELS: METHODS AND APPLICATIONS PETM Parameter E...
PDF
LLM for Search Engines: Part 2,Pretrain retrieval representations
PDF
W9L2 Scaling Up LLM Pretraining: Scaling Law
PDF
W10L2 Scaling Up LLM Pretraining: Parallel Training Scaling Up Optimizer Basi...
PDF
Scaling Up LLM Pretraining: Parallel Training
PDF
W11L2 Efficient Scaling Retrieval Augmentation.pdf
Knowledge Representation Part VI by Jan Pettersen Nytun
Knowledge Representation Part III by Jan Pettersen Nytun
interacting-with-ai-2023---module-2---session-4---handout.pdf
interacting-with-ai-2023---module-2---session-3---handout.pdf
interacting-with-ai-2023---module-2---session-1---handout.pdf
Chatbot are sentient, turing test, generative AI
Model-Based Reinforcement Learning CS 285: Deep Reinforcement Learning, Decis...
Inverse Reinforcement Learning CS 285: Deep Reinforcement Learning, Decision ...
Probabilistic AI Lecture 1: Introduction to variational inference and the ELBO
Bayesian Statistics in High Dimensions Lecture 1: Curve and surface estimation
Foundations of Artificial Intelligence 1. Introduction Organizational Aspects...
W1L2_11-667 - Building Blocks of Modern LLMs 2: Pretraining Tasks
W4L2_11-667: LARGE LANGUAGE MODELS: METHODS AND APPLICATIONS - Parameter Effi...
W4L1_11-667: LARGE LANGUAGE MODELS: METHODS AND APPLICATIONS - Human Evaluati...
W4L2_11-667: LARGE LANGUAGE MODELS: METHODS AND APPLICATIONS PETM Parameter E...
LLM for Search Engines: Part 2,Pretrain retrieval representations
W9L2 Scaling Up LLM Pretraining: Scaling Law
W10L2 Scaling Up LLM Pretraining: Parallel Training Scaling Up Optimizer Basi...
Scaling Up LLM Pretraining: Parallel Training
W11L2 Efficient Scaling Retrieval Augmentation.pdf
Ad

Recently uploaded (20)

PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Digital Strategies for Manufacturing Companies
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
ai tools demonstartion for schools and inter college
PDF
top salesforce developer skills in 2025.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
2025 Textile ERP Trends: SAP, Odoo & Oracle
PTS Company Brochure 2025 (1).pdf.......
Design an Analysis of Algorithms II-SECS-1021-03
Which alternative to Crystal Reports is best for small or large businesses.pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Digital Strategies for Manufacturing Companies
Softaken Excel to vCard Converter Software.pdf
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Operating system designcfffgfgggggggvggggggggg
VVF-Customer-Presentation2025-Ver1.9.pptx
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
How to Migrate SBCGlobal Email to Yahoo Easily
ai tools demonstartion for schools and inter college
top salesforce developer skills in 2025.pdf
CHAPTER 2 - PM Management and IT Context
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...

W6L1_LARGE LANGUAGE MODELS: METHODS AND APPLICATIONS - Chatbots and AI Agents

  • 1. Chatbots and AI Agents 1 1 - 6 6 7 : L A R G E L A N G U A G E M O D E L S : M E T H O D S A N D A P P L I C A T I O N S
  • 2. What to expect on the midterm • Conceptual questions about the content of the lecture and readings • Topics you should prepare to be assessed on • Transformer architecture • Pretraining (data collection and learning objectives) • Finetuning techniques and data (alignment, RLHF, PETM) • Evaluation (human and automatic) • In-context learning • Interpretability • Applications (search, dialog agents)
  • 3. When you think of “chatbot” what comes to mind? • ChatGPT • Bard • character.ai Implementation 1. Take pre-trained LLM 2. Finetune it on appropriate data 3. Clever prompting
  • 4. What distinguishes an AI agent from a chatbot? • An agent… • exists within an environment • can take actions that change its environment • can converse with other agents within the environment • Has a persona • Has a goal • Has memories of what has previously transpired General-purpose chatbots (ChatGPT, Bard, etc.) do not exist in an environment they can alter, and they do not have specific goals. All memory is implicit in the conversational history.
  • 5. Why care about building AI agents? • Entertainment / video games • Modeling real-user behaviour • For example, testing a new application with “mock” users could be less expensive than hiring real users to test it out. • Pre-requisite for embodied agents. • We can use agents acting in a virtual environment to measure progress toward agents acting in a real one. • Challenging evaluation platform for natural language understanding and generation
  • 6. • Agents in a fantasy text adventure game • “Learning to Speak and Act in a Fantasy Text Adventure Game.” Urbanek et al. 2021. • Diplomacy-playing agent • “Human-level play in the game of Diplomacy by combining language models with strategic reasoning.” Bakhtin et al. 2022. • Simulated town • “Generative Agents: Interactive Simulacra of Human Behavior.” Park et al. 2023. Case Studies in this Lecture
  • 7. Agents in a fantasy text adventure game • Environment: • Locations, randomly glued together • Each location also has some number of items • Agents: • Each agent is situated in the environment. • Each agent possess some number of items • Agent actions: • Emote: {applaud, cringe, cry, etc.} • Chat with other agents • Perform a physical action (e.g. “put robes in closet” or “eat salmon”) • Agent, locations, and items have natural language descriptions.
  • 8. Agents in a text adventure game
  • 9. Agents in a text adventure game
  • 10. Agents in a text adventure game Task goal: Can we generate a conversation between the thief and the gravedigger and predict which actions/emotes they will take after each conversational utterance?
  • 11. Agents in Diplomacy, a negotiation-based board game • Seven players compete to control countries (SCs) on a map. • At each turn, players chat with each-other to decide on their actions. • Any promises, agreements, threats, etc. are non-binding. • Once chatting is over, players may choose to • Move their units, waging war if into an already-occupied region • Use their units to support other units (which could include the units of a different player)
  • 12. Agents in Diplomacy, a negotiation-based board game • Seven players compete to control countries (SCs) on a map. • At each turn, players chat with each-other to decide on their actions. • Any promises, agreements, threats, etc. are non-binding. • Once chatting is over, players may choose to • Move their units, waging war if into an already-occupied region • Use their units to support other units (which could include the units of a different player) Task goal: An AI agent that follows the same rules and norms as the human agents, and has as good a win-rate as skilled human players.
  • 13. Agents in a simulated town
  • 14. Agents in a simulated town • Modeled after the video game the Sims • 25 agents • Each begins the simulation with a pre- defined set of “seed memories” • Agents do not have explicit goals
  • 15. Agents in a simulated town • Modeled after the video game the Sims • 25 agents • Each begins the simulation with a pre-defined set of “seed memories” • Agents do not have explicit goals • At each step: • Each agents output a natural language statement of their action • “write in journal” • “walk to pharmacy” • “talk to Joe” • Actions and environment state are parsed into memories, reflections, and observations.
  • 16. Where can LLMs be used in these systems? • Dialog with other agents (who may be either human agents or other AI agents) • Deciding on agent actions • Choosing what information (from the environment and from the agent’s internal state) to condition the conversation and decision-making on. .
  • 17. Where can LLMs be used in these systems? • Dialog with other agents (who may be either human agents or other AI agents) • Deciding on agent intents • Choosing what information (from the environment and from the agent’s internal state) to condition the conversation and decision-making on. Challenges: • How can we convert world and agent state into natural language? • How can we convert natural language into agent actions and environment changes? • Can all these tasks be accomplished with a general-purpose LM or do we need finetuned models?
  • 18. Choosing information to condition the conversation and decision-making on. • In many cases, there will be more information than can fit into an LM context window. Most of this won’t be relevant. • The Town Sim keeps around a database of memories. Memories are scored by their recency, importance, and relevance to ongoing memory. Compute LM sequence embedding of query memory and each memory in database. Score database memories by dot product with query memory.
  • 19. Choosing information to condition the conversation and decision-making on. • In many cases, there will be more information than can fit into an LM context window. Most of this won’t be relevant. • The Town Sim keeps around a database of memories. Memories are scored by their recency, importance, and relevance to ongoing memory.
  • 20. Choosing information to condition the conversation and decision-making on. • In many cases, there will be more information than can fit into an LM context window. Most of this won’t be relevant. • The Town Sim keeps around a database of memories. Memories are scored by their recency, importance, and relevance to ongoing memory. • In Diplomacy, the dialog model and intent model see as input: • dialogue history (all messages exchanged between player A and the six other players up to time t) • game state, action history, and metadata (current game state, recent action history, game settings, A’s Elo rating, etc.) • For the dialog model: A’s intended actions, and the actions A wants its conversational partner to complete.
  • 21. Choosing information to condition the conversation and decision-making on. • In many cases, there will be more information than can fit into an LM context window. Most of this won’t be relevant. • The Town Sim keeps around a database of memories. Memories are scored by their recency, importance, and relevance to ongoing memory. • In Diplomacy, the dialog model and intend model see as input: • dialogue history (all messages exchanged between player A and the six other players up to time t) • game state, action history, and metadata (current game state, recent action history, game settings, A’s Elo rating, etc.) • For the dialog model: A’s intended actions, and the actions A wants its conversational partner to complete. • In the Fantasy Text Adventure, dialog rounds were short enough that all environment information and history fit into max sequence length.
  • 22. Deciding on agent intent • Can we trust an LLM to choose reasonable intents? • Fantasy Text Adventure Game • Yes, via a finetuned BERT-based ranker • Simulated Town • Yes, through prompting GPT-3 with an agent’s description and memories • Hierarchical generation: generate a broad plan first, and then generate smaller steps in the plan • Diplomacy • No, use a reinforcement learning agent trained through self-play to output an action intent
  • 23. Dialog with other agents • All three examples in our case study use LLMs to generate dialog. • Diplomacy and Fantasy Text Adventure finetuned models • Simulated Town used instruction-tuned GPT-3 without further finetuning • When is finetuning especially helpful: • If the world state cannot be effectively represented in natural language. • When bad dialog can lead to poor outcomes • Simulated Town paper notes how their generated dialogs tend to be very formal and stilted, likely due to GPT-3’s instruction tuning. • An LLM is not always the right tool for the job: • Example: Settlers of Catan AI agent can do well just with templated text generation
  • 25. Quiz Question In what kinds of scenarios would a pre-trained LLM without finetuning not be a good choice for outputting agent intents?