SlideShare a Scribd company logo
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. III (May-Jun. 2016), PP 18-25
www.iosrjournals.org
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 18 | Page
Enhancement in Decision Making with Improved Performance by
Multiagent Learning Algorithms
Deepak A. Vidhate1
, Dr. Parag Kulkarni2
1
(Research Scholar, Department of Computer Engineering, College of Engineering, Pune, India)
2
(EKLaT Research, Shivajinagar, Pune, Maharashtra, India)
Abstract:The output of the system is a sequence of actions in some applications. There is no such measure as
the best action in any in-between state; an action is excellent if it is part of a good policy. A single action is not
important; the policy is important that is the sequence of correct actions to reach the goal. In such a case,
machine learning program should be able to assess the goodness of policies and learn from past good action
sequences to be able to generate a policy. A multi-agent environment is one in which there is more than one
agent, where they interact with one another, and further, where there are restrictions on that environment such
that agents may not at any given time know everything about the world that other agents know. Two features of
multi-agent learning which establish its study as a separate field from ordinary machine learning. Parallelism,
scalability, simpler construction and cost effectiveness are main characteristics of multi-agent systems.
Multiagent learning model is given in this paper. Two multiagent learning algorithms i. e. Strategy Sharing &
Joint Rewards algorithm are implemented. In Strategy Sharing algorithm simple averaging of Q tables is taken.
Each Q-learning agent learns from all of its teammates by taking the average of Q-tables. Joint reward learning
algorithm combines the Q learning with the idea of joint rewards. Paper shows result and performance
comparison of the two multiagent learning algorithms.
Keywords: Joint Rewards, Multiagent, Q Learning, Reinforcement Learning, Strategy Sharing
I. Introduction
Consider the example market chain that has hundreds of stores all over a country selling thousands of
goods to millions of customers. The point of sale terminals record the details of each transaction i.e. date,
customer identification code, goods bought and their amount, total money spent and so forth. This typically
generates gigabytes of data every day. What the mark et chain wants is to be able to predict who are the likely
customers for a product. Again, the algorithm for this is not evident; it changes over time and by geographic
location. If stored data is analyzed and turned into information then it becomes useful so that we can make use
of an example to make predictions. We do not know exactly which people are likely to buy this product, or
another product. We would not need any analysis of the data if we know it already. But because we do not
know, we can only collect data and hope to extract the answers to questions from data.
We do believe that there is a process that explains the data we observe. Though we do not know the
details of the process underlying the generation of data – for example, customer behavior - we know that it is not
completely random. People do not go to markets and buy things at random. When they buy beer, they buy chips;
they buy ice cream in summer and spices for Wine in winter. There are certain patterns in the data. We may not
be able to recognize the process completely, but still we can construct a good and useful approximation. That
approximation may not explain everything, but may still be able to account for some part of the data. Though
identifying the complete process may not be possible, but still patterns or regularities can be detected.
Such patterns may help us to understand the process, or make predictions. Assuming that the near
future will not be much different from the past and future predictions can also be expected to be right. There are
many real world problems that involve more than one entity for maximization of an outcome. For example,
consider a scenario of retail shops in which shop A sales clothes, shop B sales jewelry, shop C sales footwear
and wedding house D. In order to build a single system to automate (certain aspects of) the marketing process,
the internals of all shops A, B, C, and D can be modeled. The only feasible solution is to allow the various stores
to create their own policies that accurately represent their goals and interests. They must then be combined into
the system with the aid of some of the techniques. The goal of each shop is to maximize the profit by an increase
in sale i.e. yield maximization. Different parameters need to be considered in this: variation in seasons, the
dependency of items, special schemes, discount, market conditions etc. Different shops can cooperate with each
other for yield maximization in different situations. Several independent tasks that can be handled by separate
agents could benefit from cooperative nature of agents.
Another example of a domain that requires cooperative learning is hospital scheduling. It requires
different agents to represent the regard of different people within the hospital. Hospital employees have a
different outlook. X-ray operators may want to maximize the throughput on their machines. Nurses in the
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 19 | Page
hospital may want to minimize the patient‟s time in the hospital. Since different people examine candidate with
different criteria, they must be represented by cooperative agents. The output of the system is a sequence of
actions in some applications. There is no such measure as the best action in any in-between state; an action is
excellent if it is part of a good policy. A single action is not important; the policy is important that is the
sequence of correct actions to reach the goal. To be able to generate a policy the machine learning programs
should able to assess the quality of policies and learn from past good action sequences.
This paper is organized as: Section II gives the concept of multi-agent agent learning, Section III
describes multi-agent model. Joint Rewards algorithm is given in section IV and Strategy Sharing algorithm is
given in Section IV. Section VI gives experimental setup and Section VII put up the results comparisons of
both algorithms with final concluding remark and future scope.
II. Multi-Agent Learning
An agent is a computational mechanism that reveals a high degree of autonomy. Based on information
received from the environment, the agent performs actions in its environment. A multi-agent environment is one
in which there is more than one agent, where they interact with one another, and further, where there are
restrictions on that environment such that agents may not at any given time know everything about the world
that other agents know[1]. Two features of multi-agent learning which establish its study as a separate field from
ordinary machine learning. First, because multi-agent learning addresses the problem domains involving
multiple agents. The search space considered is extraordinarily huge. Small changes in learned behaviors can
often result in random changes in the resultant macro-level properties of the multi-agent group as a whole due to
the communication of those agents. Second, multi-agent learning involves multiple learners, each learning and
adapting in the context of others; this introduces complex issues to the learning process which are not yet fully
understood[2].
Parallelism, scalability, simpler construction and cost effectiveness are main characteristics of multi-
agent systems. Having these qualities, multiagent systems are used to resolve complex problems, search in large
domains, execute sophisticated tasks, and make more fault-tolerant and reliable systems. In most of the existing
systems, agents‟ behavior and coordination schemes are designed and fixed by the designer. But, an agent with
incomplete and fixed knowledge and behavior cannot be adequately efficient in a dynamic, complex or
changing environment. Therefore, to have all benefits of applying a multi-agent system, agent team must learn
to manage the fresh, hidden and dynamic situations[3].
In approximately all of the present multi-agent teams, agents learn independently. Agents are not
required to learn all things from their own experiences. Each agent observes the others and learns from their
situation and behavior. Also, agents can check with more expert agents or get guidance from them. Agents can
also share their information and learn from this information, i.e. the agent can cooperate in learning.
In the single-agent system, only one agent interacts with the environment. Multiagent system (MAS)
consists of multiple agents. These agents all carry out actions and control their environment. Each agent selects
actions individually, but it is the resulting joint action which manipulates the environment and generates the
reward for the agents. This leads to severe consequences on the characteristics and the complexity of the
problem. Work focused on cooperative MASs in which the agents have to optimize a shared performance
measure[4].
III. Multi-Agent Model
Parameters for Multiagent Model
General model parameters for MAS are described. Most model parameters extend the parameters from
single-agent systems[5]. A Multiagent System can be described using the following model parameters:
 A discrete time step t = 0, 1, 2, 3, . . . .
 A group of n agents A = { A1, A2, . . . ,An }.
 A finite set of environment states S. A state st
є S describes the state of the system at time step t.
 A finite set of actions Ai for every agent i. The action selected by agent i, Ai at time step t is denoted by at
є
A. The joint action a ∈ A = A1 × . . . × An is the vector of all individual actions.
 A reward function R : S × A → R which provides the agent i with a reward rt+1
∈ R(st
, at
) based on the joint
action at
taken in state st
.
 A state transition function T: S×A×S → [0, 1] which gives the transition probability p(st
|at−1
, st−1
) that
the system moves to state st
when the joint action at−1
is performed in state st−1
.
These parameters are very similar to the ones in the single-agent case. However, difficulties arise due
to the decentralized nature of the problem. Each agent selects actions independently, but it is the resulting joint
action that manipulates the environment and produces the reward.
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 20 | Page
Stochastic games (SGs)
Stochastic games are a very natural extension of MDPs to multiple agents.
Definition 1: A stochastic game is a tuple (n, S, A1…n ,T, R1…n ),where n is the number of agents, S is a set of
states, Ai is the set of actions available to agent i with A being the joint action space A1×…×An, T is a
transition function S×A×S→[0,1], and Ri is a reward function for the ith
agent S×A→R. SGs are a very natural
extension of MDPs to multiple agents[6].
Model of Multiagent Q-learning
Model of multiagent Q-learning with ε-greedy exploration is presented here. Effect of ε -greedy
mechanism and the presence of other agents on learning the process of one agent are studied to develop the
model. The derivation of a continuous time equation for the Q-learning rule is firstly demonstrated. Then the
limits of this equation for the case of a single learner are analysed. It indicates how they change dynamically
when multiple learners are considered. Finally, it is proved that the ε-greedy mechanism affects the shape of the
modeled function[7].
Consider the situation composed of 2 agents with 2 actions each and a single state. The reward
functions of the agents, in this case can be described using tables of the form:
A=
𝑎11 𝑎12
𝑎21 𝑎22
B=
𝑏11 𝑏12
𝑏21 𝑏22
where A describes the rewards, for the first agent and B the rewards for the second agent. Q-learning update
rule can be simplified for only one state as:
Qai := Qai + α(rai − Qai )………………..........……..(1)
where
Qai Q-value of agent a for action i
rai  reward that agent a receives for executing action i.
a  agent and i  action
Analysis
The update rule for the first agent can be rewritten as:
Qai (k + 1) − Qai (k) = α(rai (k + 1) − Qai(k))…........(2)
This difference equation explains the absolute growth in Qai between times k and k + 1. To obtain its
continuous time version, consider Δt ∈ [0, 1] to be a small amount of time.
Qai (k + Δt) − Qai (k) ≈ Δt × α(rai(k+Δt) − Qai (k))..(3)
to be the approximate growth in Qai during Δt.
if Δt = 0 then the equation becomes identity.
If Δt = 1 then it becomes Qai (k + 1) − Qai (k) = α(rai (k + 1) − Qai(k)) i.e. equation 5.
If Δt = {0,1} then it becomes linear approximation.
Divide both sides of the equation by Δt
Qai (k + Δt) − Qai (k)
Δt
≈ α(rai(k+Δt) − Qai (k))………....(4)
taking limits Δt→0 on both sides
lim Δt→0
Qai (k + Δt) − Qai (k)
Δt
≈ lim Δt→0 α ( rai ( k + Δt ) − Qai (k))
lim Δt→0
Qai (k + Δt) − Qai (k)
Δt
≈ α ( rai (k) − Qai (k))
dQai (k)
dt
≈ α ( rai (k) − Qai (k))……….. ………....…(5)
It is an approximation for the continuous time version of Equation 2.
Solution to this equation found by integration as
Qai(k)=C𝑒−𝛼𝑡
+ rai…………...………………........(6)
where C is the constant of integration.
As 𝑒−𝑥
is a monotonic function and limx→∞𝑒−𝑥
=0, it is easy to observe that the limit of Equation 3
when t→ is rai:
ai(K) = C + rai = rai…...……..(7)
It is considered that only the first agent is learning and that the second agent is using a pure strategy
then it will always generate the same reward for the first agent. In this case, the derivation above is sufficient to
prove that Qai will monotonically increase or decrease towards rai, for any initial value of Qai. More particularly,
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 21 | Page
the function is monotonically increasing if Qai (0) < rai and monotonically decreasing if Qai (0) > rai. The model
build here will be used to determine how one learning strategy affects the action selection of another learning
agent.
IV. Joint Rewards Algorithm
To extend Q-learning into a multi-agent system, two main challenges exist. One is how to deal with the
huge Q-table with the increment of dimension in a multi-agent system. The other challenge is how to utilize the
cooperative behavior among agents to obtain more efficient learning results. Joint rewards ensure agents to learn
in a multiagent environment. The experiment results show the efficiency and well convergence of the algorithm.
It is not enough for each agent to proceed selfishly in order to reach a globally optimal strategy in a multi-agent
environment. It is not possible to accomplish with only one agent to accomplish a task.
The section presents a joint reward learning algorithm which combined the Q-learning with the idea of
joint rewards to meet above two challenges partly. In a multi-agent system, every agent needs to maintain a Q
table which contains the information about its and others agent‟ states and actions, i.e. the situation of the whole
environment. In order to encourage cooperative behavior among agents to get global optimal rewards, it should
take account of the other agents‟ actions. Here, we use a simplified of vicarious rewards for feasible to realize.
We call „joint reward‟[9],
jri = b.pri + (1 - b). ……………………………………………………….(8)
where pr is the personal reward of agent i, and the sum of rewards of other agents except i and
0 < b ≤ 1 is the personal weight, denoting how much importance is given to agent‟s personal reward
compared to that of other agents. The improved update rule of Q-learning values of agent i can be
formulated as
Qi
new
(xi, ai) := (1 - α) Qi
Old
(xi, ai) + α (jri + β maxa Q(s, a))......................................(9)
Algorithm 1: Multiagent learning using Joint Reward Algorithm
1. For each agent i (0 < i < m) the learning procedure as
2.for all a ∈ A and s ∈S initialize Q(s,a) = 0
3.let t = 0, = 0
4.Loop:
5.select action ai
which has max Q value
6.execute action ai
7.receive an immediate reward
8.observe the new state s
9.calculate joint rewards as
10. jri = b.pri + (1 - b). // sum of rewards of other agents except i
11. update Q learning values of agent i as
Qi
new
(xi, ai) := (1 - α) Qi
Old
(xi, ai) + α (jri + β maxa Q(s, a))
12. end
V. Strategy Sharing Algorithm
One way for knowledge sharing in multi-agents is knowledge averaging. We can divide averaging in
two general categories, simple averaging and weighted averaging. In simple averaging, which is called strategy
sharing, each Q-learning agent learns from all of its teammates by taking the average of Q-tables[8].
= ……………………………(10)
The SS treats all agents similarly, ignoring their level of knowledge. This method does not consider agents‟
different expertise levels. SS algorithm is described as:
Algorithm 2 : Strategy Sharing Algorithm
1. initialize
2. while not EndOfLearning do
3. begin
4. if InIndividualLearning Mode then
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 22 | Page
5. begin {Individual Learning}
6. xi := FindCurrentState()
7. ai := SelectAction()
8. DoAction(ai)
9. ri := GetReward()
10. yi := GoToNextState()
11. v(yi) := Maxb ∈actionsQ(yi,b)
12. Qi
new
(xi,ai) := (1 - βi)Qi
Old
(xi,ai)+ βi(ri +γiV(yi) )
13. end
14. else {Multiagent Learning}
15. begin
16. for j := 1 to n do
17. Qi
new
:= 0
18. for j := 1 to n do
19. begin
20. Qi
new
:= Qi
new
+ 1/n ∑ Qj
old
// strategy sharing by simple averaging of Q tables
21. end
VI. Experimental Setup
Model design:
Maximize the sale of products that depends on price of product, customer age and period of sale. These
are the information available to each agent i.e. shop. So it becomes the state of environment. Final result is to
maximize profit by increasing total sale of products[10].
Input Data set:
We define the action set as the sale of possible product. i.e. A={p1,p2,p3…….p10}. Hence action a 
A. State of the system is queue of customer in the particular month for the given shop agent. So state can be
described as
X(t) = { x1(t), x2(t),m }
where
x1  customer queue with age ==> { Y, M, O } i.e. young, middle and Old age customer
x2  price of product queue ==> { H, M, L } i.e. High, Medium, Low
m  month of product sale ==> { 1,2,3,4……..12 }
In the system minimum, 108 states and actions are possible. The number of state-action increases as the
number of transactions increases. For simplicity, it is assumed that single state for each transaction else the state
space becomes infinitely large. Shop agent observes the queue and decides product i.e. action for each
customer/state. After every sale reward is given to the agent. The table shows the snapshot of the dataset
generated for single shop agent.
Table 1: Snapshot of Dataset used
In a particular season, the sale of one shop increases. With the help of cooperative learning, other shops
learn about an increase in the sale & they can take necessary actions for their profit maximization[11]. At time
0, the process X(t) is observed and classified into one of the states in the possible set of states (denoted by S).
After identification of state, the agent chooses a product action from A. If the process is in state i and agent
chooses a A, then
i. The process transition into state jS with probability Pij(a)
ii. Conditional on the event that the next state is j. The time until next transition is a random variable with
probability distribution Fij(./a)
After the transition occurs, product sale action is chosen again by the agent and (i) and (ii) are repeated.
TID Age Price Month Action Selected (Product)
1 Y L 1 P1,P2,P4
2 Y M 1 P2,P3
3 Y H 1 P3,P4
4 M L 1 P1,P2
5 M M 1 P1,P2,P3
6 M H 1 P4,P2
7 O L 1 P1,P3
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 23 | Page
VII. Results
The simulation results show the efficiency of the learning algorithm. In multiagent learning algorithms
are applied to shop dataset of cloth, jewellery & footwear shop and the result analysis is done for a year, the
specific number of products is purchased by particular customer age group. Shop agent will understand that in a
year number of products is to be sold to the customers having different age group. Figure 1 shows the results of
Strategy Sharing Algorithm Products Vs Customer Age Count. Figure 2 gives the results Joint Reward
Algorithm Products Vs Customer Age Count. Figure 3 shows the results of Strategy Sharing Algorithm for
Products Vs Quantity and Figure 4 gives the results of Joint Rewards Algorithm for Products Vs Quantity.
Figure 1: Strategy Sharing Algorithm Products Vs Customer Age Count
Figure 2: Joint Reward Algorithm Products Vs Customer Age Count
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 24 | Page
Figure 3: Strategy Sharing Algorithm Products Vs Quantity
Figure 4: Joint Rewards Algorithm Products Vs Quantity
Joint Reward Learning algorithm gives precise results than Strategy Sharing learning algorithm and
gives good predictions of the products. It gives the pattern of product sale with customer age group for a period.
The Q function values are tabulated for obtaining some insights. Q tables show the best action (that is optimal
the product) for different individual states. By knowing the Q function, the shop agent can compute best
possible product for a given state that gives maximum profit to it. Single agent learning, multi-agent learning,
cooperative learning and improved cooperative learning algorithms are implemented and results are compared.
It has shown how a shop agent can effectively use reinforcement learning in setting products dynamically so as
to maximize its profit matrix. It is believed that this is a promising approach for profit maximization in retail
market environments with limited information. In multiagent learning the result analysis is done for a year, the
specific number of products is purchased by particular customer age group. Shop agent will understand that in a
year number of products is to be sold to the customers having the different age group. Joint Reward Learning
algorithm gives precise results than Strategy Sharing learning algorithm and gives good predictions of the
products. These products are combined together for the increase in sale.
VIII. Conclusion
Learning algorithms are best suitable for decision making. Multiagent learning has more knowledge
and information available. In this method, sharing of information and policy is possible. Multiagent learning
always performs better compared to single agent learning. However, multiagent learning is still lacking in
proper communication between agents. Sharing of more knowledge and information is possible and all agents‟
knowledge is used equally, jointly solves the problem cooperatively is the future scope of this paper.
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 25 | Page
References
Journal Papers:
[1] Adnan M. Al-Khatib “Cooperative Machine Learning Method” World of Computer Science and Information Technology Journal
(WCSIT) ISSN: 2221-0741 Vol.1, No.9, 380-383, 2011.
[2] Babak Nadjar Araabi, Sahar Mastoureshgh, and Majid Nili Ahmadabadi “A Study on Expertise of Agents and Its Effects on
Cooperative Q-Learning” IEEE Transactions on Evolutionary Computation, vol:14, pp:23-57, 2010
[3] Liviu Panait Sean Luke “Cooperative Multi-Agent Learning: The State of the Art”, published in Journal of Autonomous Agents and
Multi-Agent Systems Volume 11 Issue 3, pp. 387 – 434, 05.
[4] M.V. Nagendra Prasad & Victor R. Lesser “Learning Situation-Specific Coordination in Cooperative Multi-agent Systems” in
Journal of Autonomous Agents and Multi-Agent Systems, Volume 2 Issue 2, pp. 173 – 207, 1999.
[5] Michael Kinney & Costas Tsatsoulis “Learning Communication Strategies in Multiagent Systems”, in Journal of Applied
Intelligence, Volume 9 Issue 1, pp 71-91, 1998.
[6] Ronen Brafman & Moshe Tennenholtz “Learning to Coordinate Efficiently: A Model-based Approach”, in Journal of Artificial
Intelligence Research, Volume 19 Issue 1, pp. 11-23, 2003.
Books:
[7] Ethem Alpaydin “Introduction to Machine Learning” Second Edition, MIT Press by PHI.
[8] Tom Mitchell “Machine Learning” McGraw Hill International Edition.
Proceedings Papers:
[9] Dr. Hamid R. Berenji David Vengerov “Learning, Cooperation, and Coordination in Multi-Agent Systems”, in Proceedings of 9th
IEEE International Conference on Fuzzy Systems, 2000.
[10] Jun-Yuan Tao, De-Sheng Li “Cooperative Strategy Learning In Multi-Agent Environment With Continuous State Space”, IEEE
International Conference on Machine Learning and Cybernetics, pp.2107 – 2111, 2006.
[11] La-mei GAO, Jun ZENG, Jie WU, Min LI “Cooperative Reinforcement Learning Algorithm to Distributed Power System based on
Multi-Agent” 2009 3rd International Conference on Power Electronics Systems and Applications Digital Reference: K210509035

More Related Content

PPTX
Business Analytics Unit III: Developing analytical talent
PDF
E-commerce online review for detecting influencing factors users perception
PDF
Instance Selection and Optimization of Neural Networks
PDF
Predictive Modelling
DOCX
Dissertation
DOCX
Pbl session 2 report
PDF
B05840510
Business Analytics Unit III: Developing analytical talent
E-commerce online review for detecting influencing factors users perception
Instance Selection and Optimization of Neural Networks
Predictive Modelling
Dissertation
Pbl session 2 report
B05840510

What's hot (17)

PDF
Analytic network process
PDF
Data Science Methodology for Analytics and Solution Implementation
PPTX
Reduction in customer complaints - Mortgage Industry
PPTX
Classes of Model
PDF
How ml can improve purchase conversions
DOCX
Online Assignment- SIMULATION
DOCX
Online Assignment - SIMULATION
DOC
A model for profit pattern mining based on genetic algorithm
PDF
K1802056469
PDF
Internship project report,Predictive Modelling
DOCX
Introductionedited
PPTX
Introduction to Business Anlytics and Strategic Landscape
PPTX
Eckovation Machine Learning
PDF
IRJET- Fatigue Analysis of Offshore Steel Structures
PPTX
The multifaceted role_of_the_analyst
PDF
Predictive data analytics models and their applications
PDF
Risk assessment of information production using extended risk matrix approach
Analytic network process
Data Science Methodology for Analytics and Solution Implementation
Reduction in customer complaints - Mortgage Industry
Classes of Model
How ml can improve purchase conversions
Online Assignment- SIMULATION
Online Assignment - SIMULATION
A model for profit pattern mining based on genetic algorithm
K1802056469
Internship project report,Predictive Modelling
Introductionedited
Introduction to Business Anlytics and Strategic Landscape
Eckovation Machine Learning
IRJET- Fatigue Analysis of Offshore Steel Structures
The multifaceted role_of_the_analyst
Predictive data analytics models and their applications
Risk assessment of information production using extended risk matrix approach
Ad

Viewers also liked (20)

PDF
Cryptography On Android Message Application Using Look Up Table And Dynamic ...
PDF
C018111623
PDF
Spectral studies of 5-({4-amino-2-[(Z)-(2-hydroxybenzylidene) amino] pyrimidi...
PDF
Impact of Hepatitis B Virus (HBV) Vaccination in Childrens Born to HBV Positi...
PDF
B0160709
PDF
B013150813
PDF
F0213137
PDF
K013166871
PDF
I0554754
PDF
I010246467
PDF
H1303055462
PDF
C010111117
PDF
Bridgeless CUK Rectifier with Output Voltage Regulation using Fuzzy controller
PDF
Design of de-coupler for an interacting tanks system
PDF
D017612529
PDF
An Enhanced ILD Diagnosis Method using DWT
PDF
P1303019093
PDF
Stable Multi Optimized Algorithm Used For Controlling The Load Shedding Probl...
PDF
E0432933
PDF
H0964752
Cryptography On Android Message Application Using Look Up Table And Dynamic ...
C018111623
Spectral studies of 5-({4-amino-2-[(Z)-(2-hydroxybenzylidene) amino] pyrimidi...
Impact of Hepatitis B Virus (HBV) Vaccination in Childrens Born to HBV Positi...
B0160709
B013150813
F0213137
K013166871
I0554754
I010246467
H1303055462
C010111117
Bridgeless CUK Rectifier with Output Voltage Regulation using Fuzzy controller
Design of de-coupler for an interacting tanks system
D017612529
An Enhanced ILD Diagnosis Method using DWT
P1303019093
Stable Multi Optimized Algorithm Used For Controlling The Load Shedding Probl...
E0432933
H0964752
Ad

Similar to C1803031825 (20)

PDF
Innovation and System Design
PPTX
MIS 05 Decision Support Systems
PDF
Fundamentals of Information Systems 8th Edition Stair Solutions Manual
PDF
Introduction to Information Systems Supporting 4th Edition Rainer Solutions M...
DOCX
Running Head Data Mining in The Cloud .docx
DOC
Information systems strategy formulation
DOCX
machine learning.docx
PPTX
Machine Learning
PDF
Machine Learning for Business - Eight Best Practices for Getting Started
PDF
journalism research
PDF
journalism research
PPTX
Management ( Six Business Objectives)
PDF
Fundamentals of Information Systems 8th Edition Stair Solutions Manual
PDF
Applied_Data_Science_Presented_by_Yhat
PDF
Fundamentals of Information Systems 8th Edition Stair Solutions Manual
PDF
Ravindra industrial training
PDF
Your Cognitive Future
PDF
solulab.com-How to Build an AI Agent System (2).pdf
PDF
How to Build an AI Agent System - SoluLab
Innovation and System Design
MIS 05 Decision Support Systems
Fundamentals of Information Systems 8th Edition Stair Solutions Manual
Introduction to Information Systems Supporting 4th Edition Rainer Solutions M...
Running Head Data Mining in The Cloud .docx
Information systems strategy formulation
machine learning.docx
Machine Learning
Machine Learning for Business - Eight Best Practices for Getting Started
journalism research
journalism research
Management ( Six Business Objectives)
Fundamentals of Information Systems 8th Edition Stair Solutions Manual
Applied_Data_Science_Presented_by_Yhat
Fundamentals of Information Systems 8th Edition Stair Solutions Manual
Ravindra industrial training
Your Cognitive Future
solulab.com-How to Build an AI Agent System (2).pdf
How to Build an AI Agent System - SoluLab

More from IOSR Journals (20)

PDF
A011140104
PDF
M0111397100
PDF
L011138596
PDF
K011138084
PDF
J011137479
PDF
I011136673
PDF
G011134454
PDF
H011135565
PDF
F011134043
PDF
E011133639
PDF
D011132635
PDF
C011131925
PDF
B011130918
PDF
A011130108
PDF
I011125160
PDF
H011124050
PDF
G011123539
PDF
F011123134
PDF
E011122530
PDF
D011121524
A011140104
M0111397100
L011138596
K011138084
J011137479
I011136673
G011134454
H011135565
F011134043
E011133639
D011132635
C011131925
B011130918
A011130108
I011125160
H011124050
G011123539
F011123134
E011122530
D011121524

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Cloud computing and distributed systems.
PDF
Modernizing your data center with Dell and AMD
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation theory and applications.pdf
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Weekly Chronicles - August'25 Week I
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Spectral efficient network and resource selection model in 5G networks
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation_ Review paper, used for researhc scholars
Diabetes mellitus diagnosis method based random forest with bat algorithm
Cloud computing and distributed systems.
Modernizing your data center with Dell and AMD
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Understanding_Digital_Forensics_Presentation.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

C1803031825

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. III (May-Jun. 2016), PP 18-25 www.iosrjournals.org DOI: 10.9790/0661-1803031825 www.iosrjournals.org 18 | Page Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms Deepak A. Vidhate1 , Dr. Parag Kulkarni2 1 (Research Scholar, Department of Computer Engineering, College of Engineering, Pune, India) 2 (EKLaT Research, Shivajinagar, Pune, Maharashtra, India) Abstract:The output of the system is a sequence of actions in some applications. There is no such measure as the best action in any in-between state; an action is excellent if it is part of a good policy. A single action is not important; the policy is important that is the sequence of correct actions to reach the goal. In such a case, machine learning program should be able to assess the goodness of policies and learn from past good action sequences to be able to generate a policy. A multi-agent environment is one in which there is more than one agent, where they interact with one another, and further, where there are restrictions on that environment such that agents may not at any given time know everything about the world that other agents know. Two features of multi-agent learning which establish its study as a separate field from ordinary machine learning. Parallelism, scalability, simpler construction and cost effectiveness are main characteristics of multi-agent systems. Multiagent learning model is given in this paper. Two multiagent learning algorithms i. e. Strategy Sharing & Joint Rewards algorithm are implemented. In Strategy Sharing algorithm simple averaging of Q tables is taken. Each Q-learning agent learns from all of its teammates by taking the average of Q-tables. Joint reward learning algorithm combines the Q learning with the idea of joint rewards. Paper shows result and performance comparison of the two multiagent learning algorithms. Keywords: Joint Rewards, Multiagent, Q Learning, Reinforcement Learning, Strategy Sharing I. Introduction Consider the example market chain that has hundreds of stores all over a country selling thousands of goods to millions of customers. The point of sale terminals record the details of each transaction i.e. date, customer identification code, goods bought and their amount, total money spent and so forth. This typically generates gigabytes of data every day. What the mark et chain wants is to be able to predict who are the likely customers for a product. Again, the algorithm for this is not evident; it changes over time and by geographic location. If stored data is analyzed and turned into information then it becomes useful so that we can make use of an example to make predictions. We do not know exactly which people are likely to buy this product, or another product. We would not need any analysis of the data if we know it already. But because we do not know, we can only collect data and hope to extract the answers to questions from data. We do believe that there is a process that explains the data we observe. Though we do not know the details of the process underlying the generation of data – for example, customer behavior - we know that it is not completely random. People do not go to markets and buy things at random. When they buy beer, they buy chips; they buy ice cream in summer and spices for Wine in winter. There are certain patterns in the data. We may not be able to recognize the process completely, but still we can construct a good and useful approximation. That approximation may not explain everything, but may still be able to account for some part of the data. Though identifying the complete process may not be possible, but still patterns or regularities can be detected. Such patterns may help us to understand the process, or make predictions. Assuming that the near future will not be much different from the past and future predictions can also be expected to be right. There are many real world problems that involve more than one entity for maximization of an outcome. For example, consider a scenario of retail shops in which shop A sales clothes, shop B sales jewelry, shop C sales footwear and wedding house D. In order to build a single system to automate (certain aspects of) the marketing process, the internals of all shops A, B, C, and D can be modeled. The only feasible solution is to allow the various stores to create their own policies that accurately represent their goals and interests. They must then be combined into the system with the aid of some of the techniques. The goal of each shop is to maximize the profit by an increase in sale i.e. yield maximization. Different parameters need to be considered in this: variation in seasons, the dependency of items, special schemes, discount, market conditions etc. Different shops can cooperate with each other for yield maximization in different situations. Several independent tasks that can be handled by separate agents could benefit from cooperative nature of agents. Another example of a domain that requires cooperative learning is hospital scheduling. It requires different agents to represent the regard of different people within the hospital. Hospital employees have a different outlook. X-ray operators may want to maximize the throughput on their machines. Nurses in the
  • 2. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 19 | Page hospital may want to minimize the patient‟s time in the hospital. Since different people examine candidate with different criteria, they must be represented by cooperative agents. The output of the system is a sequence of actions in some applications. There is no such measure as the best action in any in-between state; an action is excellent if it is part of a good policy. A single action is not important; the policy is important that is the sequence of correct actions to reach the goal. To be able to generate a policy the machine learning programs should able to assess the quality of policies and learn from past good action sequences. This paper is organized as: Section II gives the concept of multi-agent agent learning, Section III describes multi-agent model. Joint Rewards algorithm is given in section IV and Strategy Sharing algorithm is given in Section IV. Section VI gives experimental setup and Section VII put up the results comparisons of both algorithms with final concluding remark and future scope. II. Multi-Agent Learning An agent is a computational mechanism that reveals a high degree of autonomy. Based on information received from the environment, the agent performs actions in its environment. A multi-agent environment is one in which there is more than one agent, where they interact with one another, and further, where there are restrictions on that environment such that agents may not at any given time know everything about the world that other agents know[1]. Two features of multi-agent learning which establish its study as a separate field from ordinary machine learning. First, because multi-agent learning addresses the problem domains involving multiple agents. The search space considered is extraordinarily huge. Small changes in learned behaviors can often result in random changes in the resultant macro-level properties of the multi-agent group as a whole due to the communication of those agents. Second, multi-agent learning involves multiple learners, each learning and adapting in the context of others; this introduces complex issues to the learning process which are not yet fully understood[2]. Parallelism, scalability, simpler construction and cost effectiveness are main characteristics of multi- agent systems. Having these qualities, multiagent systems are used to resolve complex problems, search in large domains, execute sophisticated tasks, and make more fault-tolerant and reliable systems. In most of the existing systems, agents‟ behavior and coordination schemes are designed and fixed by the designer. But, an agent with incomplete and fixed knowledge and behavior cannot be adequately efficient in a dynamic, complex or changing environment. Therefore, to have all benefits of applying a multi-agent system, agent team must learn to manage the fresh, hidden and dynamic situations[3]. In approximately all of the present multi-agent teams, agents learn independently. Agents are not required to learn all things from their own experiences. Each agent observes the others and learns from their situation and behavior. Also, agents can check with more expert agents or get guidance from them. Agents can also share their information and learn from this information, i.e. the agent can cooperate in learning. In the single-agent system, only one agent interacts with the environment. Multiagent system (MAS) consists of multiple agents. These agents all carry out actions and control their environment. Each agent selects actions individually, but it is the resulting joint action which manipulates the environment and generates the reward for the agents. This leads to severe consequences on the characteristics and the complexity of the problem. Work focused on cooperative MASs in which the agents have to optimize a shared performance measure[4]. III. Multi-Agent Model Parameters for Multiagent Model General model parameters for MAS are described. Most model parameters extend the parameters from single-agent systems[5]. A Multiagent System can be described using the following model parameters:  A discrete time step t = 0, 1, 2, 3, . . . .  A group of n agents A = { A1, A2, . . . ,An }.  A finite set of environment states S. A state st є S describes the state of the system at time step t.  A finite set of actions Ai for every agent i. The action selected by agent i, Ai at time step t is denoted by at є A. The joint action a ∈ A = A1 × . . . × An is the vector of all individual actions.  A reward function R : S × A → R which provides the agent i with a reward rt+1 ∈ R(st , at ) based on the joint action at taken in state st .  A state transition function T: S×A×S → [0, 1] which gives the transition probability p(st |at−1 , st−1 ) that the system moves to state st when the joint action at−1 is performed in state st−1 . These parameters are very similar to the ones in the single-agent case. However, difficulties arise due to the decentralized nature of the problem. Each agent selects actions independently, but it is the resulting joint action that manipulates the environment and produces the reward.
  • 3. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 20 | Page Stochastic games (SGs) Stochastic games are a very natural extension of MDPs to multiple agents. Definition 1: A stochastic game is a tuple (n, S, A1…n ,T, R1…n ),where n is the number of agents, S is a set of states, Ai is the set of actions available to agent i with A being the joint action space A1×…×An, T is a transition function S×A×S→[0,1], and Ri is a reward function for the ith agent S×A→R. SGs are a very natural extension of MDPs to multiple agents[6]. Model of Multiagent Q-learning Model of multiagent Q-learning with ε-greedy exploration is presented here. Effect of ε -greedy mechanism and the presence of other agents on learning the process of one agent are studied to develop the model. The derivation of a continuous time equation for the Q-learning rule is firstly demonstrated. Then the limits of this equation for the case of a single learner are analysed. It indicates how they change dynamically when multiple learners are considered. Finally, it is proved that the ε-greedy mechanism affects the shape of the modeled function[7]. Consider the situation composed of 2 agents with 2 actions each and a single state. The reward functions of the agents, in this case can be described using tables of the form: A= 𝑎11 𝑎12 𝑎21 𝑎22 B= 𝑏11 𝑏12 𝑏21 𝑏22 where A describes the rewards, for the first agent and B the rewards for the second agent. Q-learning update rule can be simplified for only one state as: Qai := Qai + α(rai − Qai )………………..........……..(1) where Qai Q-value of agent a for action i rai  reward that agent a receives for executing action i. a  agent and i  action Analysis The update rule for the first agent can be rewritten as: Qai (k + 1) − Qai (k) = α(rai (k + 1) − Qai(k))…........(2) This difference equation explains the absolute growth in Qai between times k and k + 1. To obtain its continuous time version, consider Δt ∈ [0, 1] to be a small amount of time. Qai (k + Δt) − Qai (k) ≈ Δt × α(rai(k+Δt) − Qai (k))..(3) to be the approximate growth in Qai during Δt. if Δt = 0 then the equation becomes identity. If Δt = 1 then it becomes Qai (k + 1) − Qai (k) = α(rai (k + 1) − Qai(k)) i.e. equation 5. If Δt = {0,1} then it becomes linear approximation. Divide both sides of the equation by Δt Qai (k + Δt) − Qai (k) Δt ≈ α(rai(k+Δt) − Qai (k))………....(4) taking limits Δt→0 on both sides lim Δt→0 Qai (k + Δt) − Qai (k) Δt ≈ lim Δt→0 α ( rai ( k + Δt ) − Qai (k)) lim Δt→0 Qai (k + Δt) − Qai (k) Δt ≈ α ( rai (k) − Qai (k)) dQai (k) dt ≈ α ( rai (k) − Qai (k))……….. ………....…(5) It is an approximation for the continuous time version of Equation 2. Solution to this equation found by integration as Qai(k)=C𝑒−𝛼𝑡 + rai…………...………………........(6) where C is the constant of integration. As 𝑒−𝑥 is a monotonic function and limx→∞𝑒−𝑥 =0, it is easy to observe that the limit of Equation 3 when t→ is rai: ai(K) = C + rai = rai…...……..(7) It is considered that only the first agent is learning and that the second agent is using a pure strategy then it will always generate the same reward for the first agent. In this case, the derivation above is sufficient to prove that Qai will monotonically increase or decrease towards rai, for any initial value of Qai. More particularly,
  • 4. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 21 | Page the function is monotonically increasing if Qai (0) < rai and monotonically decreasing if Qai (0) > rai. The model build here will be used to determine how one learning strategy affects the action selection of another learning agent. IV. Joint Rewards Algorithm To extend Q-learning into a multi-agent system, two main challenges exist. One is how to deal with the huge Q-table with the increment of dimension in a multi-agent system. The other challenge is how to utilize the cooperative behavior among agents to obtain more efficient learning results. Joint rewards ensure agents to learn in a multiagent environment. The experiment results show the efficiency and well convergence of the algorithm. It is not enough for each agent to proceed selfishly in order to reach a globally optimal strategy in a multi-agent environment. It is not possible to accomplish with only one agent to accomplish a task. The section presents a joint reward learning algorithm which combined the Q-learning with the idea of joint rewards to meet above two challenges partly. In a multi-agent system, every agent needs to maintain a Q table which contains the information about its and others agent‟ states and actions, i.e. the situation of the whole environment. In order to encourage cooperative behavior among agents to get global optimal rewards, it should take account of the other agents‟ actions. Here, we use a simplified of vicarious rewards for feasible to realize. We call „joint reward‟[9], jri = b.pri + (1 - b). ……………………………………………………….(8) where pr is the personal reward of agent i, and the sum of rewards of other agents except i and 0 < b ≤ 1 is the personal weight, denoting how much importance is given to agent‟s personal reward compared to that of other agents. The improved update rule of Q-learning values of agent i can be formulated as Qi new (xi, ai) := (1 - α) Qi Old (xi, ai) + α (jri + β maxa Q(s, a))......................................(9) Algorithm 1: Multiagent learning using Joint Reward Algorithm 1. For each agent i (0 < i < m) the learning procedure as 2.for all a ∈ A and s ∈S initialize Q(s,a) = 0 3.let t = 0, = 0 4.Loop: 5.select action ai which has max Q value 6.execute action ai 7.receive an immediate reward 8.observe the new state s 9.calculate joint rewards as 10. jri = b.pri + (1 - b). // sum of rewards of other agents except i 11. update Q learning values of agent i as Qi new (xi, ai) := (1 - α) Qi Old (xi, ai) + α (jri + β maxa Q(s, a)) 12. end V. Strategy Sharing Algorithm One way for knowledge sharing in multi-agents is knowledge averaging. We can divide averaging in two general categories, simple averaging and weighted averaging. In simple averaging, which is called strategy sharing, each Q-learning agent learns from all of its teammates by taking the average of Q-tables[8]. = ……………………………(10) The SS treats all agents similarly, ignoring their level of knowledge. This method does not consider agents‟ different expertise levels. SS algorithm is described as: Algorithm 2 : Strategy Sharing Algorithm 1. initialize 2. while not EndOfLearning do 3. begin 4. if InIndividualLearning Mode then
  • 5. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 22 | Page 5. begin {Individual Learning} 6. xi := FindCurrentState() 7. ai := SelectAction() 8. DoAction(ai) 9. ri := GetReward() 10. yi := GoToNextState() 11. v(yi) := Maxb ∈actionsQ(yi,b) 12. Qi new (xi,ai) := (1 - βi)Qi Old (xi,ai)+ βi(ri +γiV(yi) ) 13. end 14. else {Multiagent Learning} 15. begin 16. for j := 1 to n do 17. Qi new := 0 18. for j := 1 to n do 19. begin 20. Qi new := Qi new + 1/n ∑ Qj old // strategy sharing by simple averaging of Q tables 21. end VI. Experimental Setup Model design: Maximize the sale of products that depends on price of product, customer age and period of sale. These are the information available to each agent i.e. shop. So it becomes the state of environment. Final result is to maximize profit by increasing total sale of products[10]. Input Data set: We define the action set as the sale of possible product. i.e. A={p1,p2,p3…….p10}. Hence action a  A. State of the system is queue of customer in the particular month for the given shop agent. So state can be described as X(t) = { x1(t), x2(t),m } where x1  customer queue with age ==> { Y, M, O } i.e. young, middle and Old age customer x2  price of product queue ==> { H, M, L } i.e. High, Medium, Low m  month of product sale ==> { 1,2,3,4……..12 } In the system minimum, 108 states and actions are possible. The number of state-action increases as the number of transactions increases. For simplicity, it is assumed that single state for each transaction else the state space becomes infinitely large. Shop agent observes the queue and decides product i.e. action for each customer/state. After every sale reward is given to the agent. The table shows the snapshot of the dataset generated for single shop agent. Table 1: Snapshot of Dataset used In a particular season, the sale of one shop increases. With the help of cooperative learning, other shops learn about an increase in the sale & they can take necessary actions for their profit maximization[11]. At time 0, the process X(t) is observed and classified into one of the states in the possible set of states (denoted by S). After identification of state, the agent chooses a product action from A. If the process is in state i and agent chooses a A, then i. The process transition into state jS with probability Pij(a) ii. Conditional on the event that the next state is j. The time until next transition is a random variable with probability distribution Fij(./a) After the transition occurs, product sale action is chosen again by the agent and (i) and (ii) are repeated. TID Age Price Month Action Selected (Product) 1 Y L 1 P1,P2,P4 2 Y M 1 P2,P3 3 Y H 1 P3,P4 4 M L 1 P1,P2 5 M M 1 P1,P2,P3 6 M H 1 P4,P2 7 O L 1 P1,P3
  • 6. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 23 | Page VII. Results The simulation results show the efficiency of the learning algorithm. In multiagent learning algorithms are applied to shop dataset of cloth, jewellery & footwear shop and the result analysis is done for a year, the specific number of products is purchased by particular customer age group. Shop agent will understand that in a year number of products is to be sold to the customers having different age group. Figure 1 shows the results of Strategy Sharing Algorithm Products Vs Customer Age Count. Figure 2 gives the results Joint Reward Algorithm Products Vs Customer Age Count. Figure 3 shows the results of Strategy Sharing Algorithm for Products Vs Quantity and Figure 4 gives the results of Joint Rewards Algorithm for Products Vs Quantity. Figure 1: Strategy Sharing Algorithm Products Vs Customer Age Count Figure 2: Joint Reward Algorithm Products Vs Customer Age Count
  • 7. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 24 | Page Figure 3: Strategy Sharing Algorithm Products Vs Quantity Figure 4: Joint Rewards Algorithm Products Vs Quantity Joint Reward Learning algorithm gives precise results than Strategy Sharing learning algorithm and gives good predictions of the products. It gives the pattern of product sale with customer age group for a period. The Q function values are tabulated for obtaining some insights. Q tables show the best action (that is optimal the product) for different individual states. By knowing the Q function, the shop agent can compute best possible product for a given state that gives maximum profit to it. Single agent learning, multi-agent learning, cooperative learning and improved cooperative learning algorithms are implemented and results are compared. It has shown how a shop agent can effectively use reinforcement learning in setting products dynamically so as to maximize its profit matrix. It is believed that this is a promising approach for profit maximization in retail market environments with limited information. In multiagent learning the result analysis is done for a year, the specific number of products is purchased by particular customer age group. Shop agent will understand that in a year number of products is to be sold to the customers having the different age group. Joint Reward Learning algorithm gives precise results than Strategy Sharing learning algorithm and gives good predictions of the products. These products are combined together for the increase in sale. VIII. Conclusion Learning algorithms are best suitable for decision making. Multiagent learning has more knowledge and information available. In this method, sharing of information and policy is possible. Multiagent learning always performs better compared to single agent learning. However, multiagent learning is still lacking in proper communication between agents. Sharing of more knowledge and information is possible and all agents‟ knowledge is used equally, jointly solves the problem cooperatively is the future scope of this paper.
  • 8. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 25 | Page References Journal Papers: [1] Adnan M. Al-Khatib “Cooperative Machine Learning Method” World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741 Vol.1, No.9, 380-383, 2011. [2] Babak Nadjar Araabi, Sahar Mastoureshgh, and Majid Nili Ahmadabadi “A Study on Expertise of Agents and Its Effects on Cooperative Q-Learning” IEEE Transactions on Evolutionary Computation, vol:14, pp:23-57, 2010 [3] Liviu Panait Sean Luke “Cooperative Multi-Agent Learning: The State of the Art”, published in Journal of Autonomous Agents and Multi-Agent Systems Volume 11 Issue 3, pp. 387 – 434, 05. [4] M.V. Nagendra Prasad & Victor R. Lesser “Learning Situation-Specific Coordination in Cooperative Multi-agent Systems” in Journal of Autonomous Agents and Multi-Agent Systems, Volume 2 Issue 2, pp. 173 – 207, 1999. [5] Michael Kinney & Costas Tsatsoulis “Learning Communication Strategies in Multiagent Systems”, in Journal of Applied Intelligence, Volume 9 Issue 1, pp 71-91, 1998. [6] Ronen Brafman & Moshe Tennenholtz “Learning to Coordinate Efficiently: A Model-based Approach”, in Journal of Artificial Intelligence Research, Volume 19 Issue 1, pp. 11-23, 2003. Books: [7] Ethem Alpaydin “Introduction to Machine Learning” Second Edition, MIT Press by PHI. [8] Tom Mitchell “Machine Learning” McGraw Hill International Edition. Proceedings Papers: [9] Dr. Hamid R. Berenji David Vengerov “Learning, Cooperation, and Coordination in Multi-Agent Systems”, in Proceedings of 9th IEEE International Conference on Fuzzy Systems, 2000. [10] Jun-Yuan Tao, De-Sheng Li “Cooperative Strategy Learning In Multi-Agent Environment With Continuous State Space”, IEEE International Conference on Machine Learning and Cybernetics, pp.2107 – 2111, 2006. [11] La-mei GAO, Jun ZENG, Jie WU, Min LI “Cooperative Reinforcement Learning Algorithm to Distributed Power System based on Multi-Agent” 2009 3rd International Conference on Power Electronics Systems and Applications Digital Reference: K210509035