Intelligent Agents: Technology and Applications Multi-agent Learning IST  597B Spring  200 3 John Yen
Learning Objectives How to identify goals for agent projects? How to design agents? How to identify risks/obstacles early on?
Multi-Agent Learning
Multi-Agent Learning The learned behavior can be used as a basis for more complex interactive behavior Enables agent to participate in higher level collaborative or adversarial learning situations Learning would not be possible if the agent was isolated
Examples Examples of single agent learning in a multi-agent environment: Reinforcement Learning agent which incorporates information gathered by another agent (Tan, 93) Agent learning negotiating techniques of another using Bayesian Learning (Zeng & Sycara, 96) Class of multi-agent learning in which an agent attempts to model another agent
Examples Training scenario in which a novice agent learns from a knowledgeable agent (Clouse, 96) A common thing among all the examples is that the learning agent is interacting with other agents
Predator/Pray (Pursuit) Domain Introduced by Bends et. al (86) Four predators and one prey Goal: to capture (or surround) the prey Not a complex real-world, toy domain that helps concretize concepts
Predator/Pray (Pursuit) Domain
Taxonomy of MAS Taxonomy organized along  the degree of heterogeneity, and  the degree of communication Homogenous, Non-Communicating Agents Heterogeneous, Non-Communicating Agents Homogenous, Communicating Agents Heterogeneous, Communicating Agents
Taxonomy of MAS
Taxonomy of MAS
1. Homogenous, Non-Communicating Agents All agents have the same internal structure Goals Domain knowledge Actions The only difference is their sensory input and the actions that they take They are situated differently in the world Korf (1992) introduces a policy for each predictor based on an attractive force to the prey and a repulsive force from other preditors
1. Homogenous, Non-Communicating Agents Korf concludes that explicit cooperation is not necessary  Haynes & Sen show that Korf’s heuristic does not work for certain instantiation of the domain
1. Homogenous, Non-Communicating Agents Issues: Reactive vs. deliberative agents Local vs. global perspective Modeling of other agents How to affect others Further learning opportunities
1: Reactive vs. Deliberative Agents Reactive agents do not maintain an internal state and simply retrieve pre-set behaviors Deliberative agents maintain an internal state and behave by searching through a space of behaviors, predicting the action of other agents and the effect of actions
2: Local vs. Global Perspective How much sensory input should be available to agents? (observability) Having a global view might lead to sub-optimal results Better performance by agents with less knowledge: “Ignorance is Bliss”
3: Modeling of Other Agents Since agents are identical, they can predict each others actions given the sensory input  Recursive Modeling Method: to model the internal state of another agent in order to predict its actions Each predator bases its move on the predicted move of other predators and vice versa Since reasoning can recurse indefinitely, it should be limited in terms of time or recursion
3: Modeling of Other Agents If agents know too much, RMM could recurse indefinitely For coordination to be possible, some potential knowledge must be ignored Schmidhuber (1996) shows that agents can cooperate without modeling each other They consider each other as part of the environment
4: How to Affect Others Without communication, agents cannot affect each other directly Can affect each other indirectly in several ways They can be sensed by other agents Change the state of another agent (e.g. by pushing it) Affect each other by  stigmergy  (Becker, 94)
4: How to Affect Others Active stigmergy :  an agent alters the environment so as to effect the sensory input of another agent. E.g. an agent might leave a marker for other agents to observe Passive stigmergy : altering the environment so that the effect of another agents’ actions change. If an agent turns of the main water valve of a building, the effect of another agent turning on the faucet is altered
4: How to Affect Others Example : A number of robots in an area with many pucks scattered around. Robots reactively move straight (turning at walls) until they are pushing 3 or more pucks. Then they back up and turn away Although robots do not communicate, they can collect the pucks in a single pile over time When a robot approaches an existing pile, it adds the pucks and turns away A robot approaching an existing pile obliquely might take a puck away, but over time the desired result is accomplished
5: Further Learning Opportunities An agent might try to learn to take actions that will not help it directly in the current situation, but may allow other agents to be more effective in the future. In Traditional RL, if an action leads to a reward by another agent, the acting agent may have no way of reinforcing that action
2. Heterogeneous, Non-Communicating Agents Can be heterogeneous in any of following: Goals Actions Domain knowledge In the pursuit domain, the prey can be modeled as an agent Haynes et. al. have used GA and case-based reasoning to make predators learn to cooperate in absence of communication
2. Heterogeneous, Non-Communicating Agents They also explore the possibility of evolving both predators and the prey Predators use Korf’s greedy heuristic Though one might think this will result in repeated improvement of predator and prey with no convergence, a prey behavior emerges that always succeeds Prey simply moves in a constant straight line Haynes et. al. conclude Korf’s greedy algorithm relies on random prey movement
2. Heterogeneous, Non-Communicating Agents Issues: Benevolence vs. competitiveness Fixed vs. learning agents Modeling of other agents Resource management Social conventions
1: Benevolence vs. Competitiveness Can be benevolent even if they have different goals (if they are willing to help each other) Selfish agents: more effective and biologically plausible Agents cooperate because it is in their own best interest
1: Benevolence vs. Competitiveness Prisoners dilemma : two burglars are captured. Each has to choose whether or not to confess and implicate the other. If neither confess, they will both serve 1 year. If both confess they will both serve 10 years. If one confesses and the other does not, the one who has collaborated will go free and the other will serve for 20 years
1: Benevolence vs. Competitiveness
1: Benevolence vs. Competitiveness Each agent will decide to confess to maximize its own interest If both confess, they will get 10 years each If they had acted “irrationally” and kept quiet, they would each get 1 year Mor et.al. (1995) show that in repeated prisoner’s dilemma cooperative behavior can emerge
1: Benevolence vs. Competitiveness In zero-sum games cooperation is not sensible If a third dimension was to be added to the taxonomy, besides the degree of heterogeneity and communication, it would be benevolence vs. competitiveness
2: Fixed vs. Learning Agents Learning agents desirable in dynamic environments Competitive vs. cooperative learning Possibility of “ arms race ” in competitive learning. Competing agents continually adapt to each other in more and more specialized ways, never stabilizing at a good behavior
2: Fixed vs. Learning Agents Credit-assignment problem : when performance of an agent improves, it is not clear whether the improvement is due to an improvement in the agent’s behavior or a negative behavior in the opponent’s behavior. Same problem if the performance of an agent gets worse.  One solution is to fix the one agent while allowing the other to learn and the to switch. Encourages more arms race than ever!
3: Modeling of other agents Goals, actions and domain knowledge of other agents may be unknown and need modeling Without communication, modeling is done strictly through observation RMM is good for modeling the states of homogenous agents Tambe (1995) takes it one step further, studying how agents can learn models of teams of agents
4: Resource Management Examples: Network traffic problem: several agents send information through the same network (GA) Load balancing: several users have limited amount of computing power to share among them (RL) Braess’ Paradox (Glance et. al., 1995): adding more resources to a network but getting worse performance
5: Social Conventions Imagine you are to meet a friend in Paris. You both arrive on the same day but were unable to get in touch to set a time and place. Where will you go and when? 75% of audience at AAAI-95 Symposium on Active Learning answered (without prior communication) they would go to Eiffel tower at noon. Even without communication agents are able to coordinate actions
3. Homogenous, Communicating Agents Communication can be either broadcast or point-to-point Issues: Distributed sensing Distributed vision project (Matsuyama, 1997) Trafficopter system (Moukas et. al., 1997) Communication content What  they should communicate? states, or goals? Further learning opportunities: When  to communicate?
4. Heterogeneous, Communicating Agents Tradeoff between cost and freedom Osawa suggests predators should go through 4 phases: Autonomy, communication, negotiation, and control When they stop making progress using one strategy, they should move to the next expensive strategy Increasing order of cost (decreasing order of freedom)
4. Heterogeneous, Communicating Agents Important issues: Understanding each other Planning communication acts Negotiation Commitment/decommitment Further learning opportunities
1: Understanding Each Other Need some set protocol for communication Aspects of the protocol: Information content: KIF (Genesereth, 92) Message Format: KQML (Finin, 94) Coordination: COOL (Barbuceanu, 95)
2: Planning Communication Acts The theory of communication as action is called  speech acts   Communication acts have precondition and effects Effects might be to alter an agent’s belief about the state of another agent or agents
3: Negotiation Design negotiating MAS based on law of supply and demand Contract nets (Smith, 1990):  Agents have their own goals, are self-interested, and have limited reasoning resources. They bid to accept tasks from other agents and can then either perform the task or subcontract it to another agent. Agent must pay to contract their tasks.
3: Negotiation MAS controlling air temperature in different rooms of a building: An agent can set the thermostat to any temperature. Depending on the actual air temperature, the agent can ‘buy’ hot or cold air from another room that has an excess. At the same time the agent can sell the excess air at the current temperature to other rooms. Modeling the loss of heat in transfer from one room to another, the agents try to buy and sell at the best possible prices.
4: Commitment/Decommitment Agent agrees to pursue a given goal regardless of how much it serves its own interest Commitments can make the systems run more smoothly by making agents trust each other Unclear how to make self-interested agents to commit to others Belief/desire/intention (BDI) a popular technique for modeling other agents Used in OASIS: air traffic control
5: Further Learning Opportunities Instead of predefining a protocol, allow the agents to learn for themselves what to communicate and how to interpret it Possible result would be more efficient communication
Q Learning Assess state action pairs (s, a) using a Q value Learn the Q value using rewards/feedback A reward receives at time t is discounted to previous state-action pairs (using a discount factor) Goal of learning is to find an optimal policy for selecting actions.
The Q value R: Reward P xy : The probability of reaching state y from x by taking action action alpha. Gamma: Discount factor (between 0 and 1). V*(y): The expected total discounted return starting in y following the policy *. Policy: a sequence of actions.
The Expected Total Discount Return V for a state is the maximal Q value among all actions that can be taken at the state (following the rest of the policy).
 
Learning Rule for Q value Alpha: learning rate
and  for all  and  Do Forever: the current state that maximizes  over all  Carry out action  in the world.  Let the short term reward be  , and the new state be  For each state-action pair  do Choose an action  1. 2. (a) (b)  (c) (d) (e) (f) (g) (h)
Probability for the agent to select action a i  based on Q values T: “temperature” parameter to determine the randomness of decisions.
Towards Collaborative and Adversarial Learning A Case Study in Robotic Soccer Peter Stone & Manuela Veloso
Introduction Layered learning, to develop complex multi-agent behaviors from simple ones Simple multi-agent behavior in Robotic Soccer, to shoot a moving ball Passer Shooter Behavior to be learnt: When the shooter should begin to move (shooting policy)
Simple Behavior
Parameters Ball speed (fixed vs. variable) Ball trajectory (fixed vs. variable) Goal location (fixed vs. variable) Action quadrant (fixed vs. variable)
Parameters
Fixed Ball Motion Simple shooting policy: begin accelerating when the balls distance to its projected point of intersection with the agent’s path reaches 110 units 100% success rate if shooter position fixed 61% success rate if shooter position variable Use Neural network,  Inputs to NN (coordinate independent): Ball distance Agent distance Heading offset Output: 1 or 0 (shot successful or not)  Use random shooting policy for training
Neural Network
Results
Varying Ball Speed Add a fourth input to NN, Ball Speed
Varying Ball’s Trajectory Use the same shooting policy Use another NN to determine the direction the shooter should steer (shooter’s aiming policy)
Moving the Goal Can think of it as aiming for different parts of the goal Change nothing but the shooter’s knowledge of the goal location
Cooperative Learning Passing a moving ball Passer: where to aim the pass,  Shooter: where to position itself
Cooperative Learning
Adversarial Learning
References Peter Stone, Manuela Veloso, 2000, “Multi-Agent Systems: A Survey from a Machine Learning Perspective” Ming Tan, 1993, “Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents” Peter Stone, Manuela Veloso, 1998, “Toward Collaborative and Adversarial Learning: A Case Study in Robotic Soccer”

More Related Content

PDF
Principles of compiler design
PPTX
D2L Intelligent Agents - June 2012
PPTX
Compiler construction
PPTX
The structure of agents
PDF
Compiler Design Lecture Notes
PPT
Compiler Construction introduction
PPT
Introduction to Compiler Construction
PPTX
Cs419 lec11 bottom-up parsing
Principles of compiler design
D2L Intelligent Agents - June 2012
Compiler construction
The structure of agents
Compiler Design Lecture Notes
Compiler Construction introduction
Introduction to Compiler Construction
Cs419 lec11 bottom-up parsing

Viewers also liked (9)

PPTX
Parsing
PDF
Bottom up parser
PPTX
Intelligent agents
DOC
Compiler Design(NANTHU NOTES)
PPTX
Top down and botttom up Parsing
PPT
Lexical analyzer
PDF
Topdown parsing
PPT
Top down parsing
PDF
Lecture 4- Agent types
Parsing
Bottom up parser
Intelligent agents
Compiler Design(NANTHU NOTES)
Top down and botttom up Parsing
Lexical analyzer
Topdown parsing
Top down parsing
Lecture 4- Agent types
Ad

Similar to Intelligent Agents: Technology and Applications (20)

PPTX
Artificial Intelligence and Machine Learning.pptx
PDF
Lect7MAS-Coordination
DOCX
introduction to machine learning
PPTX
Time, Schedules, and Resources in Artificial Intelligence.pptx
PPT
Ch1-2 (artificial intelligence for 3).ppt
PPT
acai01-updated.ppt
PDF
Artificial intelligence what is agent and all about agent
PDF
MAS Course - Lect10 - coordination
PPTX
AI: Artificial Agents on the Go and its types
PPT
c27_mas.ppt
PDF
Mathmodsocsys3 090827155611-phpapp02
PPTX
Lecture 4 (1).pptx
DOC
Hibridization of Reinforcement Learning Agents
PPTX
AI Chapter Two.pArtificial Intelligence Chapter One.pptxptx
PPTX
Artificial Intelligence jejeiejj3iriejrjifirirjdjeie
PPTX
Agents-Artificial Intelligence with different types of agents
PDF
Agent properties
PPTX
Intelligent Agents
PPT
Lecture 2
PPT
Presentation_DAI
Artificial Intelligence and Machine Learning.pptx
Lect7MAS-Coordination
introduction to machine learning
Time, Schedules, and Resources in Artificial Intelligence.pptx
Ch1-2 (artificial intelligence for 3).ppt
acai01-updated.ppt
Artificial intelligence what is agent and all about agent
MAS Course - Lect10 - coordination
AI: Artificial Agents on the Go and its types
c27_mas.ppt
Mathmodsocsys3 090827155611-phpapp02
Lecture 4 (1).pptx
Hibridization of Reinforcement Learning Agents
AI Chapter Two.pArtificial Intelligence Chapter One.pptxptx
Artificial Intelligence jejeiejj3iriejrjifirirjdjeie
Agents-Artificial Intelligence with different types of agents
Agent properties
Intelligent Agents
Lecture 2
Presentation_DAI
Ad

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

Intelligent Agents: Technology and Applications

  • 1. Intelligent Agents: Technology and Applications Multi-agent Learning IST 597B Spring 200 3 John Yen
  • 2. Learning Objectives How to identify goals for agent projects? How to design agents? How to identify risks/obstacles early on?
  • 4. Multi-Agent Learning The learned behavior can be used as a basis for more complex interactive behavior Enables agent to participate in higher level collaborative or adversarial learning situations Learning would not be possible if the agent was isolated
  • 5. Examples Examples of single agent learning in a multi-agent environment: Reinforcement Learning agent which incorporates information gathered by another agent (Tan, 93) Agent learning negotiating techniques of another using Bayesian Learning (Zeng & Sycara, 96) Class of multi-agent learning in which an agent attempts to model another agent
  • 6. Examples Training scenario in which a novice agent learns from a knowledgeable agent (Clouse, 96) A common thing among all the examples is that the learning agent is interacting with other agents
  • 7. Predator/Pray (Pursuit) Domain Introduced by Bends et. al (86) Four predators and one prey Goal: to capture (or surround) the prey Not a complex real-world, toy domain that helps concretize concepts
  • 9. Taxonomy of MAS Taxonomy organized along the degree of heterogeneity, and the degree of communication Homogenous, Non-Communicating Agents Heterogeneous, Non-Communicating Agents Homogenous, Communicating Agents Heterogeneous, Communicating Agents
  • 12. 1. Homogenous, Non-Communicating Agents All agents have the same internal structure Goals Domain knowledge Actions The only difference is their sensory input and the actions that they take They are situated differently in the world Korf (1992) introduces a policy for each predictor based on an attractive force to the prey and a repulsive force from other preditors
  • 13. 1. Homogenous, Non-Communicating Agents Korf concludes that explicit cooperation is not necessary Haynes & Sen show that Korf’s heuristic does not work for certain instantiation of the domain
  • 14. 1. Homogenous, Non-Communicating Agents Issues: Reactive vs. deliberative agents Local vs. global perspective Modeling of other agents How to affect others Further learning opportunities
  • 15. 1: Reactive vs. Deliberative Agents Reactive agents do not maintain an internal state and simply retrieve pre-set behaviors Deliberative agents maintain an internal state and behave by searching through a space of behaviors, predicting the action of other agents and the effect of actions
  • 16. 2: Local vs. Global Perspective How much sensory input should be available to agents? (observability) Having a global view might lead to sub-optimal results Better performance by agents with less knowledge: “Ignorance is Bliss”
  • 17. 3: Modeling of Other Agents Since agents are identical, they can predict each others actions given the sensory input Recursive Modeling Method: to model the internal state of another agent in order to predict its actions Each predator bases its move on the predicted move of other predators and vice versa Since reasoning can recurse indefinitely, it should be limited in terms of time or recursion
  • 18. 3: Modeling of Other Agents If agents know too much, RMM could recurse indefinitely For coordination to be possible, some potential knowledge must be ignored Schmidhuber (1996) shows that agents can cooperate without modeling each other They consider each other as part of the environment
  • 19. 4: How to Affect Others Without communication, agents cannot affect each other directly Can affect each other indirectly in several ways They can be sensed by other agents Change the state of another agent (e.g. by pushing it) Affect each other by stigmergy (Becker, 94)
  • 20. 4: How to Affect Others Active stigmergy : an agent alters the environment so as to effect the sensory input of another agent. E.g. an agent might leave a marker for other agents to observe Passive stigmergy : altering the environment so that the effect of another agents’ actions change. If an agent turns of the main water valve of a building, the effect of another agent turning on the faucet is altered
  • 21. 4: How to Affect Others Example : A number of robots in an area with many pucks scattered around. Robots reactively move straight (turning at walls) until they are pushing 3 or more pucks. Then they back up and turn away Although robots do not communicate, they can collect the pucks in a single pile over time When a robot approaches an existing pile, it adds the pucks and turns away A robot approaching an existing pile obliquely might take a puck away, but over time the desired result is accomplished
  • 22. 5: Further Learning Opportunities An agent might try to learn to take actions that will not help it directly in the current situation, but may allow other agents to be more effective in the future. In Traditional RL, if an action leads to a reward by another agent, the acting agent may have no way of reinforcing that action
  • 23. 2. Heterogeneous, Non-Communicating Agents Can be heterogeneous in any of following: Goals Actions Domain knowledge In the pursuit domain, the prey can be modeled as an agent Haynes et. al. have used GA and case-based reasoning to make predators learn to cooperate in absence of communication
  • 24. 2. Heterogeneous, Non-Communicating Agents They also explore the possibility of evolving both predators and the prey Predators use Korf’s greedy heuristic Though one might think this will result in repeated improvement of predator and prey with no convergence, a prey behavior emerges that always succeeds Prey simply moves in a constant straight line Haynes et. al. conclude Korf’s greedy algorithm relies on random prey movement
  • 25. 2. Heterogeneous, Non-Communicating Agents Issues: Benevolence vs. competitiveness Fixed vs. learning agents Modeling of other agents Resource management Social conventions
  • 26. 1: Benevolence vs. Competitiveness Can be benevolent even if they have different goals (if they are willing to help each other) Selfish agents: more effective and biologically plausible Agents cooperate because it is in their own best interest
  • 27. 1: Benevolence vs. Competitiveness Prisoners dilemma : two burglars are captured. Each has to choose whether or not to confess and implicate the other. If neither confess, they will both serve 1 year. If both confess they will both serve 10 years. If one confesses and the other does not, the one who has collaborated will go free and the other will serve for 20 years
  • 28. 1: Benevolence vs. Competitiveness
  • 29. 1: Benevolence vs. Competitiveness Each agent will decide to confess to maximize its own interest If both confess, they will get 10 years each If they had acted “irrationally” and kept quiet, they would each get 1 year Mor et.al. (1995) show that in repeated prisoner’s dilemma cooperative behavior can emerge
  • 30. 1: Benevolence vs. Competitiveness In zero-sum games cooperation is not sensible If a third dimension was to be added to the taxonomy, besides the degree of heterogeneity and communication, it would be benevolence vs. competitiveness
  • 31. 2: Fixed vs. Learning Agents Learning agents desirable in dynamic environments Competitive vs. cooperative learning Possibility of “ arms race ” in competitive learning. Competing agents continually adapt to each other in more and more specialized ways, never stabilizing at a good behavior
  • 32. 2: Fixed vs. Learning Agents Credit-assignment problem : when performance of an agent improves, it is not clear whether the improvement is due to an improvement in the agent’s behavior or a negative behavior in the opponent’s behavior. Same problem if the performance of an agent gets worse. One solution is to fix the one agent while allowing the other to learn and the to switch. Encourages more arms race than ever!
  • 33. 3: Modeling of other agents Goals, actions and domain knowledge of other agents may be unknown and need modeling Without communication, modeling is done strictly through observation RMM is good for modeling the states of homogenous agents Tambe (1995) takes it one step further, studying how agents can learn models of teams of agents
  • 34. 4: Resource Management Examples: Network traffic problem: several agents send information through the same network (GA) Load balancing: several users have limited amount of computing power to share among them (RL) Braess’ Paradox (Glance et. al., 1995): adding more resources to a network but getting worse performance
  • 35. 5: Social Conventions Imagine you are to meet a friend in Paris. You both arrive on the same day but were unable to get in touch to set a time and place. Where will you go and when? 75% of audience at AAAI-95 Symposium on Active Learning answered (without prior communication) they would go to Eiffel tower at noon. Even without communication agents are able to coordinate actions
  • 36. 3. Homogenous, Communicating Agents Communication can be either broadcast or point-to-point Issues: Distributed sensing Distributed vision project (Matsuyama, 1997) Trafficopter system (Moukas et. al., 1997) Communication content What they should communicate? states, or goals? Further learning opportunities: When to communicate?
  • 37. 4. Heterogeneous, Communicating Agents Tradeoff between cost and freedom Osawa suggests predators should go through 4 phases: Autonomy, communication, negotiation, and control When they stop making progress using one strategy, they should move to the next expensive strategy Increasing order of cost (decreasing order of freedom)
  • 38. 4. Heterogeneous, Communicating Agents Important issues: Understanding each other Planning communication acts Negotiation Commitment/decommitment Further learning opportunities
  • 39. 1: Understanding Each Other Need some set protocol for communication Aspects of the protocol: Information content: KIF (Genesereth, 92) Message Format: KQML (Finin, 94) Coordination: COOL (Barbuceanu, 95)
  • 40. 2: Planning Communication Acts The theory of communication as action is called speech acts Communication acts have precondition and effects Effects might be to alter an agent’s belief about the state of another agent or agents
  • 41. 3: Negotiation Design negotiating MAS based on law of supply and demand Contract nets (Smith, 1990): Agents have their own goals, are self-interested, and have limited reasoning resources. They bid to accept tasks from other agents and can then either perform the task or subcontract it to another agent. Agent must pay to contract their tasks.
  • 42. 3: Negotiation MAS controlling air temperature in different rooms of a building: An agent can set the thermostat to any temperature. Depending on the actual air temperature, the agent can ‘buy’ hot or cold air from another room that has an excess. At the same time the agent can sell the excess air at the current temperature to other rooms. Modeling the loss of heat in transfer from one room to another, the agents try to buy and sell at the best possible prices.
  • 43. 4: Commitment/Decommitment Agent agrees to pursue a given goal regardless of how much it serves its own interest Commitments can make the systems run more smoothly by making agents trust each other Unclear how to make self-interested agents to commit to others Belief/desire/intention (BDI) a popular technique for modeling other agents Used in OASIS: air traffic control
  • 44. 5: Further Learning Opportunities Instead of predefining a protocol, allow the agents to learn for themselves what to communicate and how to interpret it Possible result would be more efficient communication
  • 45. Q Learning Assess state action pairs (s, a) using a Q value Learn the Q value using rewards/feedback A reward receives at time t is discounted to previous state-action pairs (using a discount factor) Goal of learning is to find an optimal policy for selecting actions.
  • 46. The Q value R: Reward P xy : The probability of reaching state y from x by taking action action alpha. Gamma: Discount factor (between 0 and 1). V*(y): The expected total discounted return starting in y following the policy *. Policy: a sequence of actions.
  • 47. The Expected Total Discount Return V for a state is the maximal Q value among all actions that can be taken at the state (following the rest of the policy).
  • 48.  
  • 49. Learning Rule for Q value Alpha: learning rate
  • 50. and for all and Do Forever: the current state that maximizes over all Carry out action in the world. Let the short term reward be , and the new state be For each state-action pair do Choose an action 1. 2. (a) (b) (c) (d) (e) (f) (g) (h)
  • 51. Probability for the agent to select action a i based on Q values T: “temperature” parameter to determine the randomness of decisions.
  • 52. Towards Collaborative and Adversarial Learning A Case Study in Robotic Soccer Peter Stone & Manuela Veloso
  • 53. Introduction Layered learning, to develop complex multi-agent behaviors from simple ones Simple multi-agent behavior in Robotic Soccer, to shoot a moving ball Passer Shooter Behavior to be learnt: When the shooter should begin to move (shooting policy)
  • 55. Parameters Ball speed (fixed vs. variable) Ball trajectory (fixed vs. variable) Goal location (fixed vs. variable) Action quadrant (fixed vs. variable)
  • 57. Fixed Ball Motion Simple shooting policy: begin accelerating when the balls distance to its projected point of intersection with the agent’s path reaches 110 units 100% success rate if shooter position fixed 61% success rate if shooter position variable Use Neural network, Inputs to NN (coordinate independent): Ball distance Agent distance Heading offset Output: 1 or 0 (shot successful or not) Use random shooting policy for training
  • 60. Varying Ball Speed Add a fourth input to NN, Ball Speed
  • 61. Varying Ball’s Trajectory Use the same shooting policy Use another NN to determine the direction the shooter should steer (shooter’s aiming policy)
  • 62. Moving the Goal Can think of it as aiming for different parts of the goal Change nothing but the shooter’s knowledge of the goal location
  • 63. Cooperative Learning Passing a moving ball Passer: where to aim the pass, Shooter: where to position itself
  • 66. References Peter Stone, Manuela Veloso, 2000, “Multi-Agent Systems: A Survey from a Machine Learning Perspective” Ming Tan, 1993, “Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents” Peter Stone, Manuela Veloso, 1998, “Toward Collaborative and Adversarial Learning: A Case Study in Robotic Soccer”