Coordination issues of multi agent systems in distributed data mining

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME
41
COORDINATION ISSUES OF MULTI-AGENT SYSTEMS IN
DISTRIBUTED DATA MINING
Thulasi.Bikku, Asst.Professor
Computer Science Department, NRI Institute of Technology,
Andhra Pradesh, INDIA.
Prof. N.Sambasiva Rao, Principal
Computer Science Department,
Vardhaman College of engineering,
Andhra Pradesh, INDIA.
ABSTRACT
Data mining technology has evolved, for extracting knowledge and identifying patterns and
trends from large data resources. The Data mining technology normally adopts data integration method
to generate large go-down known as Data warehouse, which is used to gather all data into a central
repository, and then run an algorithm against that data to mine the useful Patterns and knowledge
evaluation. However, a single data-mining technique has not been proven suitable for every domain and
data set. Distributed data mining is originated from the need of mining over decentralized data sources.
Multi-agent systems (MAS), which are having Artificial Intelligence (AI), deal with complex
applications that require distributed problem solving. In many applications the individual and collective
behavior of the agents depends on the observed data from scattered data sources. Since multi-agent
systems are often distributed and agents have proactive and reactive features which are very useful for
Knowledge Management Systems, combining DDM with MAS for data intensive application. The
integration of multi-agent system and distributed data mining, also known as multi agent based
distributed data mining.
In this paper we briefly discuss about the existing approaches and the importance of using agent
technology in the domain of knowledge discovery and we propose an approach to distributed data
clustering, summarize its agent-oriented implementation, and security attacks in which agents may
incur. Its core problem concerns collaborative work of distributed data resources in the design of
multi-agent system destined for distributed data mining and classification.
Index Terms: Distributed Data Mining, Multi-Agent Systems, Multi Agent Data Mining, Multi-Agent
Based Distributed Data Mining.
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN
ENGINEERING AND TECHNOLOGY (IJARET)
ISSN 0976 - 6480 (Print)
ISSN 0976 - 6499 (Online)
Volume 4, Issue 3, April 2013, pp. 41-48
© IAEME: www.iaeme.com/ijaret.asp
Journal Impact Factor (2013): 5.8376 (Calculated by GISI)
www.jifactor.com
IJARET
© I A E M E

42
1. INTRODUCTION
Data Mining (DM), originated from knowledge discovery from databases (KDD), the large
variety of DM techniques which have been developed over the past decade includes methods for
pattern-based similarity search, cluster analysis, and decision-tree based classification, prediction,
outlier analysis and generalization taking the data cube or attribute-oriented induction (AOI) approach,
and mining of association rules [1]. Distributed data mining (DDM) mines data from data sources in
spite of their physical locations. The need for such characteristic arises from the fact that data produced
locally at each site may not often be transferred across the network due to the excessive amount of data,
which leads to increase in cost and security issues[2]. Recently, DDM has become a critical component
of knowledge based systems because its decentralized architecture reaches every network such as
weather databases, financial data portals, or emerging disease information systems has been recognized
by industrial companies as an opportunity of major revenues from applications such as warehousing,
process control, medical services, Bio informatics and customer services, where large amounts of data
are stored. Data Mining still poses many challenges to the research community. The main challenges in
data mining are: 1) Data mining has to deal with huge amounts of data located at different physical
locations. 2) Data mining is computationally intensive process involving very large data i.e. more than
petabytes. So, it is necessary to partition and distribute the data for parallel processing to achieve
acceptable time, cost and space performance. 3) The data stored for particular domain the input data
changes rapidly because of regular changes in data [3]. In these cases, knowledge has to be mined fast
and efficiently in order to be usable and modernized.
2. MULTI-AGENTS BEHAVIOR
DDM is a complex system focusing on the distribution of data resources over the network as
well as extraction of useful patterns from those data resources [6, 7]. The very core of DDM systems is
the scalability as the system configuration may be altered according to the time, therefore designing
DDM systems deals with great details of software engineer issues, such reusability, extensibility,
efficiency, effectiveness, compatibility, flexibility, scalability, accuracy, privacy, security and
robustness. For these reasons, agents’ characteristics [4, 5] are useful for DDM systems.
Autonomy: A DM agent operate in an autonomous manner and they are self deterministic, DM agents
having proactive and reactive features so they can deliberatively handle the access to the data source in
agreement with constraints on the required autonomy of the system, data and model. This is in full
compliance with the paradigm of cooperative information systems [15].
Scalability: To reduce the work load of network and DM application server, DM agents migrate to
each of the local data sites in a DDM system on which they may perform mining tasks locally, and then
either return with or send relevant pre-selected patterns to their central server for further processing.
Agents can perform tasks locally if they have sufficient knowledge and resources, and they can interact
with other agents to help in the completion of tasks. [16].
3. STRATEGY OF LEARNING
Several systems have been implemented for distributed data mining. These systems can be classified
according to their learning strategy to three types: central learning, meta-learning, and hybrid learning.
3.1 Central learning strategy: When all the data can be gathered at a central data repository and a
single data model is build. Here the data has to move to a central data repository in order to integrate
them and then apply sequential DM algorithms [12]. This strategy is used only when geographical
distribution of data is very small. The strategy is generally very expensive because the data transferring
from different sources is costly but it provides more accurate results [10]. Agent technology is not
chosen in this strategy.

43
3.2 Meta-learning strategy: It offers a way to mine classifiers from homogeneously distributed data.
Meta-learning follows three main steps [11]: 1) Firstly, it generates base classifiers at each site using a
classifier learning algorithms. 2) Secondly, it collects the base classifiers at a central site, and produce
meta-level data from a separate validation set and predictions generated by the base classifier on it. 3)
Thirdly, it generates the final classifier (meta-classifier) from meta-level data via a combiner or an
arbiter. Copies of classifier agent will exist or deployed on nodes in the network being used [13]. Agent
technology can be preferred in this strategy.
3.3 Hybrid learning strategy is a technique that combines local and centralized learning for model
building [14]. The major criticism of such systems is that it is not always possible to obtain an exact
final result, i.e. the global knowledge model obtained may be different from the one obtained by
applying the one model approach (if possible) to the same data. Approximated results are not always a
major concern, but it is important to be aware of that.
4. MULTI AGENT-BASED DISTRIBUTED DATA MINING (MADM)
MADM takes data mining as a basis foundation and is enhanced with agent technology [16];
therefore, this new data mining technique inherits all powerful properties of agents and, as a result,
yields desirable characteristics. In general, constructing an ADDM system concerns three key
characteristics: interoperability, dynamic system configuration, and performance aspects, discussed as
follows [8]: 1) Interoperability concerns, not only collaboration of agents in the system, but also
external interaction with new agents, which enters the system seamlessly. The architecture of the
system must be open and flexible so that it can support the interaction including communication
protocol, integration policy, and service directory. In this system we had to follow the mechanisms for
adding/removing agents. 2) Communication protocol covers message encoding, encryption, and
transportation of data between agents. Integration policy specifies how a system behaves when an
external component, such as an agent or a data site, requests to enter or leave. The negotiation and
communication mechanism to be adopted to allow the envisaged agents to “talk” to one another.3) In
the dynamic system configuration, that system tends to handle a dynamic configuration, is a challenge
issue due to the complexity of the planning and mining algorithms. A mining task may involve several
agents and data sources, in which agents are configured to equip with an algorithm and deal with given
data sources. In distributed environment, tasks can be executed in parallel, which leads to concurrency
issues. Quality of service control in performance of data mining and system perspectives is desired;
however it can be derived from both data mining and agents’ fields.
The MADM framework facing a number of issues [23]
1. Multiple Data Mining Tasks: The MADM framework must be able to provide mechanisms to allow
the coordination of multiple data mining tasks. The number and nature of the data mining tasks that the
framework is not known in prior, and is expected to evolve based on time. Consequently the framework
should be designed in such a way as to anticipate future tasks.
2. Agent Coordination: The framework must be flexible since it must accommodate new agents as
they are created in the environment or remove the agents when they are no longer in use. Careful
consideration therefore needs to be directed at the communication mechanisms.
3. Agent Reuse: The framework must promote the opportunistic reuse of agent services by other
agents. It has to provide mechanisms by which agents may advertise their capabilities, and ways of
finding agents supporting their capabilities.
4. Scalability and Efficiency: The scalability of a data mining system refers to the ability of the system
to operate effectively and without a substantial or discernible reduction in performance as the number of
data sources increases. Efficiency, on the other hand, refers to the effective use of the available system
resources.

44
5. Portability: A distributed data mining system should be capable of operating across multiple
environments with different hardware and software configurations, and be able to combine multiple
models with different representations. The framework should be able to operate on any major operating
system.
6. Compatibility: Combining multiple models of data mining results has been receiving increasing
attention in the data mining research literature. In much of the prior work on combining multiple
models, it is assumed that all models originate from the same database or from different databases with
identical schema. This is not always the case, and differences in the type and number of attributes
among different data sets are not uncommon. The resulting model computed at a single database is
directly dependent on the format of the underlying data.
7. Adaptivity and Extendibility: Most data mining systems operate in different environments that are
likely to change, a phenomenon known as concept drift. The MADM has to adapt to changes and work
effectively.
The MADM framework used should achieve extendibility to provide the means to easily accept and
incorporate new data sources and new DM techniques.
An ADDM framework can be generalized into a set of components and viewed as depicted in
figure 4.1.We may generalize activities of the architecture into request and response, each of which
involves a different set of components. Basic components of an ADDM system are as follows.
Fig. 4.1: Overview of ADDM system.
Data: Data is the base layer of the architecture. In distributed environment, data can be hosted in
various forms, such as Transactional databases, online relational databases, data stream,
Object-relational databases, Multimedia databases, web pages, etc., in which purpose of the data might
be varied.
5. PROPOSED SCHEME
Here we propose a schema, which describes about the coordination of agents in large-scale
distributed systems, which is becoming an increasingly challenging task. Continuous involvement of
users and administrators is generally limited in large-scale distributed environments. System support is
also needed for configuration and reorganization when systems grows or shrinks with the addition or
removal of new resources. The primary goal of the management of distributed systems is to ensure
efficient use of resources and provide timely service to users with effective computational cost. Most of
the distributed system management techniques still follow the centralized model that is based on the
client-server model because of accurate results. Though it provides accurate results but Centralization
also having some problems, such as: 1) it could cause a traffic overload and processing at the
originating node may affect its performance;2) it does not provide scalability in the increase of the
complexity of the network; 3) the fault in the central originating node can leave the system without a
manager.

45
One approach is the distributed management where management tasks are spread across the
managed infrastructure and are carried out at managed resources. The goal is to minimize the network
traffic related to management and to speed up management tasks by distributing operations across data
resources. The new trend in distributed system management involves using multi-agents to manage the
resources of distributed systems. Agents have the capability to autonomously travel (execution state and
code) among different data repositories to complete their task. The route may be predetermined or
chosen dynamically depending on the results at each local data repository. That is, the agents can share
a common goal (e.g. an ant colony), or they can pursue their own interests.
5.1 Multi-agent system structure
The multi-agent system structure assumes that each node in the system will have a set of agents
residing and running on that node [21, 22]. These agent types are the following:
Client agent (CA) sends or receives service requests, initiated by the user, from the system. The CA
may receive the request from the local user directly. In the other case, it will receive the request from the
exporter agent coming from another node based on the request of the user.
Service list agent (SLA) it consists of a list of the resource agents in the system. This agent will receive
the request from the CA initiated by the user and send it to the resource availability agent. If the reply
indicates that the requested resource is local then the service list agent will deliver the request to the
categorizer agent. Otherwise, it will return the request to the nearby CA.
Resource availability agent (RAA) indicates whether the requested resource is free and also available
for use or not. It also indicates whether the requested resource is local or remote. It receives the request
from the service list agent and checks the status of the requested data resource through the access of the
Management Information Base (MIB). The agent then constructs the reply depending on the retrieved
information from the data resource.
Resource agent (RSA) is responsible for the operation and control of the resource. This agent executes
the request on the resource. Each node may have zero or more RAs.
Router agent (RA) provides the path of the requested resource on the network in case of accessing
remote resources. Before being dispatched, the exporter agent will ask the router agent for the path of
the requested resource. This in turn delivers the routing path to the exporter agent.
Categorizer agent (CZA) allocates a suitable resource agent to perform the users’ request. This agent
receives the inputs coming from the service list agent. It then tries to find a suitable nearby free resource
agent to perform the requested service.
Exporter agent (EA) is a mobile agent that can carry the user request through the path identified by the
RA to reach the node that has the required resource. It passes the requested resource id to the RA and
then receives the reply. If the router agent has no information about the requested resource, the EA will
try to locate the resource in the system based on the users’ request. There are also two additional mobile
agent types exist in the system.
Representative agent (RPA) is a mobile agent that is launched in each sub network. It is responsible
for traversing sub network nodes instead of the exporter agent to do the user requested task and carry
results back to the exporter agent.
Collector agent (CTA) is a mobile agent that is launched from the last sub network visited by the
exporter agent. It is launched when results from that sub network become available. This agent goes
through the reversed itinerary of the exporter agent trip. The CTA collects results from the
representative agents and carries it to the source node. All mobile agents used here are of interrupt
driven type.

46
5.2 System’s operation
The activity cycle of our multi-agents residing inside a local data repository. The client
agent (CA) receives the service requests either from the user or from an exporter agent (EA), if the
request is from other subnet. The client agent then asks the service list agent (SLA) for the
existence of a resource agent that can perform the request. The service list agent (SLA) checks the
availability of the required resource agent by consulting a resource availability agent to perform the
requested service of the user. The reply of the resource availability agent describes whether or not
the resource is locally available and whether or not there is a resource agent that can perform the
requested service. If the resource availability agent then accepts the request, the service list agent
will ask the categorizer agent to allocate a suitable resource agent to the requested service and.
Otherwise, the service list agent informs the client agent with the rejection and is passed to the
exporter agent, if the requested resource is in other subnet. The exporter agent asks the router agent
for the path of the required resource agent. Once the path is determined, the exporter agent will be
dispatched through the network channel to the destination node identified by that path. If the router
agent has no information about the location of the required resource agent, the exporter agent will
search the distributed system to find the location of the required resource agent and assign the
required task to it. As shown in Fig. 5.2.2, the exporter agent traverses the sub networks of the
distributed system through its trip. At each sub network, a representative agent is launched to
traverse the local nodes of that sub network doing the required task and carrying results of that task
and providing the results to the client agent. The agents of the social interface described in Fig.
5.2.1 are implemented at each node in the system. There are two approaches to collect results of the
required task and send these results back to the source. The results of the local resources are
transferred to the originating source and combined to give requested query output.
Fig. 5.2.2: Overview of network architecture MADM.
So in our proposed scheme, we effectively used parallel processing of queries using multiagent
system technology.

47
6. CONCLUSIONS
A Distribution Management System (DMS) is a collection of applications designed to
analyze, monitor & control the entire distribution network efficiently and reliably. This article describes
a new multi-agent system for the management of distributed systems. The system is proposed to
optimize the execution of management functions in distributed systems and provide effective
computational costs. The proposed system can locate, monitor, and manage resources in the system.
The new technique in that system allows management tasks to be submitted to sub networks of the
distributed system and executed the tasks in a parallel fashion. The proposed system uses two multi-
agents: Representative agent and Collector agent. The first is used to submit tasks to the sub networks of
the distributed system and the other collects results from these sub networks. The proposed system is
compared against traditional management techniques in terms of response time, speedup, and
efficiency. The performance results indicate a significant improvement in response time, speedup,
efficiency, and scalability can be compared to traditional techniques. The use of JVM in the
implementation of the proposed system gives the system a certain type of portability such that it can be
used anywhere. Therefore, it is desirable to use the proposed system in the management of distributed
systems. The proposed system is limited to be applied to high-speed networks that have bandwidth
more than 100 Mb/s. Future research will be related to the security of data and mobile agents and of
hosts that receive them in the context of public networks. Mobile agents should be protected against
potentially malicious hosts. The hosts should also be protected against malicious actions that may be
performed by the mobile code they receive and execute. Effective Security must be provided for the
multiagents in the distributed data mining environment.
REFERENCES:
[1] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery
and Data Mining. In Proceedings of the Association for the Advancement of Artificial
Intelligence(AAAI) Press/MIT, 1996.
[2] F. Provost. Distributed Data Mining: Scaling Up and Beyond. In Proceedings of the Advances in
Distributed and Parallel Knowledge Discovery, MIT/AAAI Press, Cambridge, MA, New York. pages
(3-27), 1999.
[3] R. Schollmeier. A Definition of Peer-to-Peer Networking for the Classification of Peer-to-Peer
Architectures and Applications. IN Proceedings of the First International Conference on Peer-to-Peer
Computing (P2P) IEEE, 2001.
[4] I. Rudowsky. Intelligent Agents, volume 14, pages (275-290). In Proceedings of the
Communications of the Association for Information Systems, Springer, London, England, 2004.
[5] “An Introduction to Multiagent Systems” by Michael Wooldridge. Published in February 2002 by
John Wiley & Sons (Chichester, England). ISBN 0 47149691X.
[6] T. Marwala and E. Hurwitz. Multi-Agent Modelling using intelligent agents in a game of Lerpa.
eprint arXiv:0706.0280, 2007.
[7] B. van Aardt and T. Marwala. A Study in a Hybrid Centralised-Swarm Agent Community. In
Proceedings of the IEEE 3rd International Conference on Computational Cybernetics, Mauritius, pages
(169-74), 2005.
[8] A. Symeonidis and P. Mitkas. Agent Intelligence Through Data Mining, volume XXVI, pages
(0-206). In Proceedings of the Multi-agent Systems, Artificial Societies, and Simulated Organizations,
2006.
[9] L. Cao, C. Luo, and C. Zhang. Agent-Mining Interaction: An Emerging Area. In Proceedings of the
AIS-ADM, LNAI 4476, Springer - Verlag, Berlin, Germany, pages (60-73), 2007.
[10] R. Bose and V. Sugumaran. IDM: An Intelligent Software Agent Based Data Mining Environment.
In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 1998.

48
[11] J. Bota, A. Gmez-Skarmeta, M. Valds, and A. Metala. A meta-learning architecture. In
Proceedings of the Computational Intelligence. Theory and Applications, 2206/2001, pages (688-698),
2001.
[12] J. Balter, A. Labarre-Vila, D. Zibelin, and C. Garbay. A Platform Integrating Knowledge and Data
Management for EMG Studies. In Proceedings of the Artificial Intelligence in Medicine (AIME), 2101,
pages (417-420), 2001.
[13] R. Grossman and A. Turinsky. A framework for finding distributed data mining strategies that are
intermediate between centralized strategies and in-place strategies. In Proceedings of the KDD
Workshop on Distributed
Data Mining, 2000.
[14] Souptik Datta, Kanishka Bhaduri, Chris Giannella, Ran Wolff, Hillol Kargupta, "Distributed Data
Mining in Peer-to-Peer Networks," IEEE Internet Computing, vol. 10, no. 4, pp. 18-26, July/Aug. 2006,
doi:10.1109/MIC.2006.74
[15] R. Lakshman Naik, D. Ramesh and B. Manjula, “Instances Selection Using Advance Data Mining
Techniques”, International Journal Of Computer Engineering & Technology (IJCET) Volume 3, Issue 2,
2012, pp. 47 - 53, ISSN Print : 0976 – 6367, ISSN Online : 0976 – 6375.
[16] Mr. M. Karthikeyan, Mr. M. Suriya Kumar and Dr. S. Karthikeyan, “A Literature Review on the
Data Mining and Information Security” International Journal of Computer Engineering & Technology
(IJCET) Volume 3, Issue 1, 2012, pp. 141 - 146, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[17] R. Manickam, D. Boominath and V. Bhuvaneswari, “An Analysis of Data Mining: Past, Present
and Future”, International journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1,
2012, pp. 1 - 9, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

Coordination issues of multi agent systems in distributed data mining

More Related Content

What's hot (18)

Viewers also liked (9)

Similar to Coordination issues of multi agent systems in distributed data mining (20)

More from IAEME Publication (20)

Recently uploaded (20)

Coordination issues of multi agent systems in distributed data mining