Data Mining Applications For Empowering
Knowledge Societies 1st Edition Hakikur Rahman
download
https://ebookbell.com/product/data-mining-applications-for-
empowering-knowledge-societies-1st-edition-hakikur-rahman-1479086
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Surveillance Technologies And Early Warning Systems Data Mining
Applications For Risk Detection 1st Edition Ali Serhan Koyuncugil
https://ebookbell.com/product/surveillance-technologies-and-early-
warning-systems-data-mining-applications-for-risk-detection-1st-
edition-ali-serhan-koyuncugil-7121234
Pharmaceutical Data Mining Approaches And Applications For Drug
Discovery Konstantin V Balakin
https://ebookbell.com/product/pharmaceutical-data-mining-approaches-
and-applications-for-drug-discovery-konstantin-v-balakin-57672036
Pharmaceutical Data Mining Approaches And Applications For Drug
Discovery Konstantin V Balakin Wiley Interscience Online Service
https://ebookbell.com/product/pharmaceutical-data-mining-approaches-
and-applications-for-drug-discovery-konstantin-v-balakin-wiley-
interscience-online-service-4105478
Soft Computing For Data Mining Applications 1st Edition K R Venugopal
https://ebookbell.com/product/soft-computing-for-data-mining-
applications-1st-edition-k-r-venugopal-4193372
Database Support For Data Mining Applications Discovering Knowledge
With Inductive Queries 1st Edition Jeanfranois Boulicaut Auth
https://ebookbell.com/product/database-support-for-data-mining-
applications-discovering-knowledge-with-inductive-queries-1st-edition-
jeanfranois-boulicaut-auth-4238540
Intelligent Data Mining And Analysis In Power And Energy Systems
Models And Applications For Smarter Efficient Power Systems 1st
Edition Zita A Vale
https://ebookbell.com/product/intelligent-data-mining-and-analysis-in-
power-and-energy-systems-models-and-applications-for-smarter-
efficient-power-systems-1st-edition-zita-a-vale-50489556
Data Mining For Bioinformatics Applications 1st Edition He Zengyou
https://ebookbell.com/product/data-mining-for-bioinformatics-
applications-1st-edition-he-zengyou-5138054
Data Mining For Business Applications 1st Edition Cao Longbing Auth
https://ebookbell.com/product/data-mining-for-business-
applications-1st-edition-cao-longbing-auth-1201928
Data Mining For Biomedical Applications Pakdd 2006 Workshop Biodm 2006
Singapore April 9 2006 Proceedings 1st Edition Hon Nian Chua
https://ebookbell.com/product/data-mining-for-biomedical-applications-
pakdd-2006-workshop-biodm-2006-singapore-april-9-2006-proceedings-1st-
edition-hon-nian-chua-1547960
Data Mining Applications For Empowering Knowledge Societies 1st Edition Hakikur Rahman
Data Mining Applications For Empowering Knowledge Societies 1st Edition Hakikur Rahman
Data Mining Applications
for Empowering
Knowledge Societies
Hakikur Rahman
Sustainable Development Networking Foundation (SDNF), Bangladesh
Hershey • New York
InformatIon scIence reference
Director of Editorial Content: Kristin Klinger
Managing Development Editor: Kristin M. Roth
Assistant Managing Development Editor: Jessica Thompson
Assistant Development Editor: Deborah Yahnke
Senior Managing Editor: Jennifer Neidig
Managing Editor: Jamie Snavely
Assistant Managing Editor: Carole Coulson
Copy Editor: Erin Meyer
Typesetter: Sean Woznicki
Cover Design: Lisa Tosheff
Printed at: Yurchak Printing Inc.
Published in the United States of America by
Information Science Reference (an imprint of IGI Global)
701 E. Chocolate Avenue, Suite 200
Hershey PA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail: cust@igi-global.com
Web site: http://www.igi-global.com
and in the United Kingdom by
Information Science Reference (an imprint of IGI Global)
3 Henrietta Street
Covent Garden
London WC2E 8LU
Tel: 44 20 7240 0856
Fax: 44 20 7379 0609
Web site: http://www.eurospanbookstore.com
Copyright © 2009 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by
any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does
not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Library of Congress Cataloging-in-Publication Data
Data mining applications for empowering knowledge societies / Hakikur Rahman, editor.
p. cm.
Summary: “This book presents an overview on the main issues of data mining, including its classification, regression, clustering, and
ethical issues”--Provided by publisher.
Includes bibliographical references and index.
ISBN 978-1-59904-657-0 (hardcover) -- ISBN 978-1-59904-659-4 (ebook)
1. Data mining. 2. Knowledge management. I. Rahman, Hakikur, 1957-
QA76.9.D343D38226 2009
005.74--dc22
2008008466
British Cataloguing in Publication Data
A Cataloguing in Publication record for this book is available from the British Library.
All work contributed to this book set is original material. The views expressed in this book are those of the authors, but not necessarily of
the publisher.
If a library purchased a print copy of this publication, please go to http://www.igi-global.com/agreement for information on activating
the library's complimentary electronic access to this publication.
Foreword ..............................................................................................................................................xi
Preface .................................................................................................................................................xii
Acknowledgment ..............................................................................................................................xxii
Section I
Education and Research
Chapter I
Introduction to Data Mining Techniques via Multiple Criteria Optimization
Approaches and Applications ................................................................................................................ 1
Yong Shi, University of the Chinese Academy of Sciences, China
and University of Nebraska at Omaha, USA
Yi Peng, University of Nebraska at Omaha, USA
Gang Kou, University of Nebraska at Omaha, USA
Zhengxin Chen, University of Nebraska at Omaha, USA
Chapter II
Making Decisions with Data: Using Computational Intelligence Within a
Business Environment ......................................................................................................................... 26
Kevin Swingler, University of Stirling, Scotland
David Cairns, University of Stirling, Scotland
Chapter III
Data Mining Association Rules for Making Knowledgeable Decisions ............................................. 43
A.V. Senthil Kumar, CMS College of Science and Commerce, India
R. S. D. Wahidabanu, Govt. College of Engineering, India
Table of Contents
Section II
Tools, Techniques, Methods
Chapter IV
Image Mining: Detecting Deforestation Patterns Through Satellites .................................................. 55
Marcelino Pereira dos Santos Silva, Rio Grande do Norte State University, Brazil
Gilberto Câmara, National Institute for Space Research, Brazil
Maria Isabel Sobral Escada, National Institute for Space Research, Brazil
Chapter V
Machine Learning and Web Mining: Methods and Applications in Societal Benefit Areas ................ 76
Georgios Lappas, Technological Educational Institution of Western Macedonia,
Kastoria Campus, Greece
Chapter VI
The Importance of Data Within Contemporary CRM ......................................................................... 96
Diana Luck, London Metropolitan University, UK
Chapter VII
Mining Allocating Patterns in Investment Portfolios ......................................................................... 110
Yanbo J. Wang, University of Liverpool, UK
Xinwei Zheng, University of Durham, UK
Frans Coenen, University of Liverpool, UK
Chapter VIII
Application of Data Mining Algorithms for Measuring Performance Impact
of Social Development Activities ...................................................................................................... 136
Hakikur Rahman, Sustainable Development Networking Foundation (SDNF), Bangladesh
Section III
Applications of Data Mining
Chapter IX
Prospects and Scopes of Data Mining Applications in Society Development Activities .................. 162
Hakikur Rahman, Sustainable Development Networking Foundation, Bangladesh
Chapter X
Business Data Warehouse: The Case of Wal-Mart ............................................................................ 189
Indranil Bose, The University of Hong Kong, Hong Kong
Lam Albert Kar Chun, The University of Hong Kong, Hong Kong
Leung Vivien Wai Yue, The University of Hong Kong, Hong Kong
Li Hoi Wan Ines, The University of Hong Kong, Hong Kong
Wong Oi Ling Helen, The University of Hong Kong, Hong Kong
Chapter XI
Medical Applications of Nanotechnology in the Research Literature ............................................... 199
Ronald N. Kostoff, Office of Naval Research, USA
Raymond G. Koytcheff, Office of Naval Research, USA
Clifford G.Y. Lau, Institute for Defense Analyses, USA
Chapter XII
Early Warning System for SMEs as a Financial Risk Detector ......................................................... 221
Ali Serhan Koyuncugil, Capital Markets Board of Turkey, Turkey
Nermin Ozgulbas, Baskent University, Turkey
Chapter XIII
What Role is “Business Intelligence” Playing in Developing Countries?
A Picture of Brazilian Companies ...................................................................................................... 241
Maira Petrini, Fundação Getulio Vargas, Brazil
Marlei Pozzebon, HEC Montreal, Canada
Chapter XIV
Building an Environmental GIS Knowledge Infrastructure .............................................................. 262
Inya Nlenanya, Center for Transportation Research and Education,
Iowa State University, USA
Chapter XV
The Application of Data Mining for Drought Monitoring and Prediction ......................................... 280
Tsegaye Tadesse, National Drought Mitigation Center, University of Nebraska, USA
Brian Wardlow, National Drought Mitigation Center, University of Nebraska, USA
Michael J. Hayes, National Drought Mitigation Center, University of Nebraska, USA
Compilation of References .............................................................................................................. 292
About the Contributors ................................................................................................................... 325
Index ................................................................................................................................................ 330
Foreword ..............................................................................................................................................xi
Preface .................................................................................................................................................xii
Acknowledgment ..............................................................................................................................xxii
Section I
Education and Research
Chapter I
Introduction to Data Mining Techniques via Multiple Criteria Optimization
Approaches and Applications ................................................................................................................ 1
Yong Shi, University of the Chinese Academy of Sciences, China
and University of Nebraska at Omaha, USA
Yi Peng, University of Nebraska at Omaha, USA
Gang Kou, University of Nebraska at Omaha, USA
Zhengxin Chen, University of Nebraska at Omaha, USA
This chapter presents an overview of a series of multiple criteria optimization-based data mining meth-
ods that utilize multiple criteria programming to solve various data mining problems and outlines some
research challenges. At the same time, this chapter points out to several research opportunities for the
data mining community.
Chapter II
Making Decisions with Data: Using Computational Intelligence Within a
Business Environment ......................................................................................................................... 26
Kevin Swingler, University of Stirling, Scotland
David Cairns, University of Stirling, Scotland
This chapter identifies important barriers to the successful application of computational intelligence
techniques in a commercial environment and suggests a number of ways in which they may be over-
come. It further identifies a few key conceptual, cultural, and technical barriers and describes different
ways in which they affect business users and computational intelligence practitioners. This chapter
aims to provide knowledgeable insight for its readers through outcome of a successful computational
intelligence project.
Detailed Table of Contents
Chapter III
Data Mining Association Rules for Making Knowledgeable Decisions ............................................. 43
A.V. Senthil Kumar, CMS College of Science and Commerce, India
R. S. D. Wahidabanu, Govt. College of Engineering, India
This chapter describes two popular data mining techniques that are being used to explore frequent large
itemsets in the database. The first one is called closed directed graph approach where the algorithm scans
the database once making a count on possible 2-itemsets from which only the 2-itemsets with a mini-
mum support are used to form the closed directed graph and explores possible frequent large itemsets
in the database. In the second one, dynamic hashing algorithm where large 3-itemsets are generated at
an earlier stage that reduces the size of the transaction database after trimming and thereby cost of later
iterations will be reduced. However, this chapter envisages that these techniques may help researchers
not only to understand about generating frequent large itemsets, but also finding association rules among
transactions within relational databases, and make knowledgeable decisions.
Section II
Tools, Techniques, Methods
Chapter IV
Image Mining: Detecting Deforestation Patterns Through Satellites .................................................. 55
Marcelino Pereira dos Santos Silva, Rio Grande do Norte State University, Brazil
Gilberto Câmara, National Institute for Space Research, Brazil
Maria Isabel Sobral Escada, National Institute for Space Research, Brazil
This chapter presents with relevant definitions on remote sensing and image mining domain, by refer-
ring to related work in this field and demonstrates the importance of appropriate tools and techniques
to analyze satellite images and extract knowledge from this kind of data. A case study, the Amazonia
with deforestation problem is being discussed, and effort has been made to develop strategy to deal with
challenges involving Earth observation resources. The purpose is to present new approaches and research
directions on remote sensing image mining, and demonstrates how to increase the analysis potential of
such huge strategic data for the benefit of the researchers.
Chapter V
Machine Learning and Web Mining: Methods and Applications in Societal Benefit Areas ................ 76
Georgios Lappas, Technological Educational Institution of Western Macedonia,
Kastoria Campus, Greece
This chapter reviews contemporary researches on machine learning and Web mining methods that are
related to areas of social benefit. It further demonstrates that machine learning and web mining methods
may provide intelligent Web services of social interest. The chapter also discusses about the growing
interest of researchers in recent days for using advanced computational methods, such as machine learn-
ing and Web mining, for better services to the public.
Chapter VI
The Importance of Data Within Contemporary CRM ......................................................................... 96
Diana Luck, London Metropolitan University, UK
This chapter search for the importance of customer relationship management (CRM) in the product
development and service elements as well as organizational structure and strategies, where data takes as
the pivotal dimension around which the concept of CRM revolves in contemporary terms. Subsequently
it has tried to demonstrate how these processes are associated with data management, namely: data col-
lection, data collation, data storage and data mining, and are becoming essential components of CRM
in both theoretical and practical aspects.
Chapter VII
Mining Allocating Patterns in Investment Portfolios ......................................................................... 110
Yanbo J. Wang, University of Liverpool, UK
Xinwei Zheng, University of Durham, UK
Frans Coenen, University of Liverpool, UK
This chapter has introduced the concept of “one-sum” weighted association rules (WARs) and named
such WARs as allocating patterns (ALPs). Here, an algorithm is being proposed to extract hidden and
interestingALPs from data. The chapter further points out thatALPs can be applied in portfolio manage-
ment, and modeling a collection of investment portfolios as a one-sum weighted transaction-database,
ALPs can be applied to guide future investment activities.
Chapter VIII
Application of Data Mining Algorithms for Measuring Performance Impact
of Social Development Activities ...................................................................................................... 136
Hakikur Rahman, Sustainable Development Networking Foundation (SDNF), Bangladesh
This chapter focuses to data mining applications and their utilizations in devising performance-measuring
tools for social development activities. It has provided justifications to include data mining algorithm
for establishing specifically derived monitoring and evaluation tools that may be used for various social
development applications. Specifically, this chapter gave in-depth analytical observations for establishing
knowledge centers with a range of approaches and put forward a few research issues and challenges to
transform the contemporary human society into a knowledge society.
Section III
Applications of Data Mining
Chapter IX
Prospects and Scopes of Data Mining Applications in Society Development Activities .................. 162
Hakikur Rahman, Sustainable Development Networking Foundation, Bangladesh
Chapter IX focuses on a few areas of social development processes and put forwards hints on application
of data mining tools, through which decision-making would be easier. Subsequently, it has put forward
potential areas of society development initiatives, where data mining applications can be incorporated.
The focus area may vary from basic social services, like education, health care, general commodities,
tourism, and ecosystem management to advanced uses, like database tomography.
Chapter X
Business Data Warehouse: The Case of Wal-Mart ............................................................................ 189
Indranil Bose, The University of Hong Kong, Hong Kong
Lam Albert Kar Chun, The University of Hong Kong, Hong Kong
Leung Vivien Wai Yue, The University of Hong Kong, Hong Kong
Li Hoi Wan Ines, The University of Hong Kong, Hong Kong
Wong Oi Ling Helen, The University of Hong Kong, Hong Kong
This chapter highlights on business data warehouse and discusses about the retailing giantWal-Mart. Here,
the planning and implementation of the Wal-Mart data warehouse is being described and its integration
with the operational systems is being discussed. This chapter has also highlighted some of the problems
that have been encountered during the development process of the data warehouse, and provided some
future recommendations about Wal-Mart data warehouse.
Chapter XI
Medical Applications of Nanotechnology in the Research Literature ............................................... 199
Ronald N. Kostoff, Office of Naval Research, USA
Raymond G. Koytcheff, Office of Naval Research, USA
Clifford G.Y. Lau, Institute for Defense Analyses, USA
Chapter XI examines medical applications literatures that are associated with nanoscience and nano-
technology research. For this research, authors have retrieved about 65000 nanotechnology records in
2005 from the Science Citation Index/ Social Science Citation Index (SCI/SSCI) using a comprehensive
300+ term query, and in this chapter they intend to facilitate the nanotechnology transition process by
identifying the significant application areas. Specifically, it has identified the main nanotechnology health
applications from today’s vantage point, as well as the related science and infrastructure. The medical
applications were ascertained through a fuzzy clustering process, and metrics were generated using text
mining to extract technical intelligence for specific medical applications/ applications groups.
Chapter XII
Early Warning System for SMEs as a Financial Risk Detector ......................................................... 221
Ali Serhan Koyuncugil, Capital Markets Board of Turkey, Turkey
Nermin Ozgulbas, Baskent University, Turkey
This chapter introduces an early warning system for SMEs (SEWS) as a financial risk detector that is
based on data mining. During the development of an early warning system, it compiled a system in
which qualitative and quantitative data about the requirements of enterprises are taken into consider-
ation. Moreover, an easy to understand, easy to interpret and easy to apply utilitarian model is targeted
by discovering the implicit relationships between the data and the identification of effect level of every
factor related to the system. This chapter eventually shows the way of empowering knowledge society
from SME’s point of view by designing an early warning system based on data mining.
Chapter XIII
What Role is “Business Intelligence” Playing in Developing Countries?
A Picture of Brazilian Companies ...................................................................................................... 241
Maira Petrini, Fundação Getulio Vargas, Brazil
Marlei Pozzebon, HEC Montreal, Canada
Chapter XIII focuses at various business intelligence (BI) projects in developing countries, and spe-
cifically highlights on Brazilian BI projects. Within a broad enquiry about the role of BI playing in
developing countries, two specific research questions were explored in this chapter. The first one tried
to determine whether the approaches, models or frameworks are tailored for particularities and the
contextually situated business strategy of each company, or if they are “standard” and imported from
“developed” contexts. The second one tried to analyze what type of information is being considered for
incorporation by BI systems; whether they are formal or informal in nature; whether they are gathered
from internal or external sources; whether there is a trend that favors some areas, like finance or mar-
keting, over others, or if there is a concern with maintaining multiple perspectives; who in the firms is
using BI systems, and so forth.
Chapter XIV
Building an Environmental GIS Knowledge Infrastructure .............................................................. 262
Inya Nlenanya, Center for Transportation Research and Education,
Iowa State University, USA
In Chapter XIV, the author proposes a simple and accessible conceptual geographical information system
(GIS) based knowledge discovery interface that can be used as a decision making tool. The chapter also
addresses some issues that might make this knowledge infrastructure stimulate sustainable development,
especially emphasizing sub-Saharan African region.
Chapter XV
The Application of Data Mining for Drought Monitoring and Prediction ......................................... 280
Tsegaye Tadesse, National Drought Mitigation Center, University of Nebraska, USA
Brian Wardlow, National Drought Mitigation Center, University of Nebraska, USA
Michael J. Hayes, National Drought Mitigation Center, University of Nebraska, USA
Chapter XV discusses about the application of data mining to develop drought monitoring utilities, which
enable monitoring and prediction of drought’s impact on vegetation conditions. The chapter also sum-
marizes current research using data mining approaches to build up various types of drought monitoring
tools and explains how they are being integrated with decision support systems, specifically focusing
drought monitoring and prediction in the United States.
Compilation of References .............................................................................................................. 292
About the Contributors ................................................................................................................... 325
Index ................................................................................................................................................ 330
xi
Foreword
Advances in information technology and data collection methods have led to the availability of larger
data sets in government and commercial enterprises, and in a wide variety of scientific and engineering
disciplines. Consequently, researchers and practitioners have an unprecedented opportunity to analyze
this data in much more analytic ways and extract intelligent and useful information from it.
The traditional approach to data analysis for decision making has been shifted to merge business
and scientific expertise with statistical modeling techniques in order to develop experimentally verified
solutions for explicit problems. In recent years, a number of trends have emerged that have started to
challenge this traditional approach. One trend is the increasing accessibility of large volumes of high-
dimensional data, occupying database tables with many millions of rows and many thousands of col-
umns. Another trend is the increasing dynamic demand for rapidly building and deploying data-driven
analytics. A third trend is the increasing necessity to present analysis results to end-users in a form that
can be readily understood and assimilated so that end-users can gain the insights they need to improve
the decisions they make.
Data mining tools sweep through databases and identify previously hidden patterns in one step. An
example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products
that are often purchased together. Other pattern discovery problems include detecting fraudulent credit
card transactions and identifying anomalous data that could represent data entry keying errors. Data
mining algorithms embody techniques that have existed for at least 10 years, but have only recently
been implemented as mature, reliable, understandable tools that consistently outperform older statisti-
cal methods.
This book has specifically focused on applying data mining techniques to design, develop, and
evaluate social advancement processes that have been applied in several developing economies. This
book provides a overview on the main issues of data mining (including its classification, regression,
clustering, association rules, trend detection, feature selection, intelligent search, data cleaning, privacy
and security issues, etc.) and knowledge enhancing processes as well as a wide spectrum of data mining
applications such as computational natural science, e-commerce, environmental study, financial market
study, network monitoring, social service analysis, and so forth.
This book will be highly acceptable to researchers, academics and practitioners, including GOs and
NGOs for further research and study, especially who would be working in the aspect of monitoring and
evaluation of projects; follow-up activities on development projects, and be an invaluable scholarly
content for development practitioners.
Dr. Abdul Matin Patwari
Vice Chancellor, The University of Asia Pacific
Dhaka, Bangladesh.
xii
Preface
Data mining may be characterized as the process of extracting intelligent information from large amounts
of raw data, and day-by-day becoming a pervasive technology in activities as diverse as using historical
data to predict the success of a awareness raising campaign by looking into pattern sequence formations,
orapromotionaloperationbylookingintopatternsequencetransformations,oramonitoringtoolbylook-
ing into pattern sequence repetitions, or a analysis tool by looking into pattern sequence formations.
Theories and concepts on data mining recently added to the arena of database and researches in this
aspect do not go beyond more than a decade. Very minor research and development activities have been
observed in the 1990’s, along the immense prospect of information and communication technologies
(ICTs). Organized and coordinated researches on data mining started in 2001, with the advent of various
workshops, seminars, promotional campaigns, and funded researches. International conferences on data
mining organized by Institute of Electrical and Electronics Engineers, Inc. (since 2001), Wessex Institute
of Technology (since 1999), Society for Industrial and Applied Mathematics (since 2001), Institute of
ComputerVisionandappliedComputerSciences(since1999),andWorldAcademyofScienceareamong
the leaders in creating awareness on advanced research activities on data mining and its effective appli-
cations. Furthermore, these events reveal that the theme of research has been shifting from fundamental
data mining to information engineering and/or information management along these years.
Data mining is a promising and relatively new area of research and development, which can provide
important advantages to the users. It can yield substantial knowledge from data primarily gathered
through a wide range of applications. Various institutions have derived considerable benefits from its
application and many other industries and disciplines are now applying the methodology in increasing
effect for their benefit.
Subsequently, collective efforts in machine learning, artificial intelligence, statistics, and database
communities have been reinforcing technologies of knowledge discovery in databases to extract valuable
information from massive amounts of data in support of intelligent decision making. Data mining aims
to develop algorithms for extracting new patterns from the facts recorded in a database, and up till now,
data mining tools adopted techniques from statistics, network modeling and visualization to classify data
and identify patterns. Ultimately, knowledge recovery aims to enable an information system to transform
information to knowledge through hypothesis, testing and theory formation. It sets new challenges for
database technology: new concepts and methods are needed for basic operations, query languages, and
query processing strategies (Witten & Frank, 2005; Yuan, Buttenfield, Gehagen & Miller, 2004).
However, data mining does not provide any straightforward analysis, nor does it necessarily equate
with machine learning, especially in a situation of relatively larger databases. Furthermore, an exhaustive
statistical analysis is not possible, though many data mining methods contain a degree of nondetermin-
ism to enable them to scale massive datasets.
At the same time, successful applications of data mining are not common, despite the vast literature
now accumulating on the subject. The reason is that, although it is relatively straightforward to find
xiii
pattern or structure in data, but establishing its relevance and explaining its cause are both very diffi-
cult tasks. In addition, much of what that has been discovered so far may well be known to the expert.
Therefore, addressing these problematic issues requires the synthesis of underlying theory from the
databases, statistics, algorithms, machine learning, and visualization (Giudici, 2003; Hastie, Tibshirani
& Friedman, 2001; Yuan, Buttenfield, Gehagen & Miller, 2004).
Alongtheseperspectives,toenablepractitionersinimprovingtheirresearchesandparticipateactively
in solving practical problems related to data explosion, optimum searching, qualitative content manage-
ment, improved decision making, and intelligent data mining a complete guide is the need of the hour.
A book featuring all these aspects can fill an extremely demanding knowledge gap in the contemporary
world.
Furthermore, data mining is not an independently existed research subject anymore. To understand
its essential insights, and effective implementations one must open the knowledge periphery in multi-
dimensional aspects. Therefore, in this era of information revolution data mining should be treated as a
cross-cutting and cross-sectoral feature. At the same time, data mining is becoming an interdisciplinary
field of research driven by a variety of multidimensional applications. On one hand it entails techniques
for machine learning, pattern recognition, statistics, algorithm, database, linguistic, and visualization.
On the other hand, one finds applications to understand human behavior, such as that of the end user of
an enterprise. It also helps entrepreneurs to perceive the type of transactions involved, including those
needed to evaluate risks or detect scams.
The reality of data explosion in multidimensional databases is a surprising and widely misunderstood
phenomenon. For those about to use an OLAP (online analytical processing) product, it is critically
important to understand what data explosion is, what causes it, and how it can be avoided, because the
consequences of ignoring data explosion can be very costly, and, in most cases, result in project failure
(Applix, 2003), while enterprise data requirements grow at 50-100% a year, creating a constant storage
infrastructure management challenge (Intransa, 2005).
Concurrently, the database community draws much of its motivation from the vast digital datasets
nowavailableonlineandthecomputationalproblemsinvolvedinanalyzingthem.Almostwithoutexcep-
tion, current databases and database management systems are designed without to knowledge or content,
so the access methods and query languages they provide are often inefficient or unsuitable for mining
tasks. The functionality of some existing methods can be approximated either by sampling the data or
reexpressing the data in a simpler form. However, algorithms attempt to encapsulate all the important
structure contained in the original data, so that information loss is minimal and mining algorithms can
function more efficiently. Therefore, sampling strategies must try to avoid bias, which is difficult if the
target and its explanation are unknown.
These are related to the core technology aspects of data mining. Apart from the intricate technology
context, the applications of data mining methods lag in the development context. Lack of data has been
found to inhibit the ability of organizations to fully assist clients, and lack of knowledge made the gov-
ernment vulnerable to the influence of outsiders who did have access to data from countries overseas.
Furthermore, disparity in data collection demands a coordinated data archiving and data sharing, as it
is extremely crucial for developing countries.
The technique of data mining enables governments, enterprises, and private organizations to carry
out mass surveillance and personalized profiling, in most cases without any controls or right of access
to examine this data. However, to raise the human capacity and establish effective knowledge systems
from the applications of data mining, the main focus should be on sustainable use of resources and the
associated systems under specific context (ecological, climatic, social and economic conditions) of
developing countries. Research activities should also focus on sustainable management of vulnerable
xiv
resources and apply integrated management techniques, with a view to support the implementation of
the provisions related to research and sustainable use of existing resources (EC, 2005).
Toobtainadvantagesofdataminingapplications,thescientificissuesandaspectsofarchivingscientific
and technology data can include the discipline specific needs and practices of scientific communities as
well as interdisciplinary assessments and methods. In this context, data archiving can be seen primarily
as a program of practices and procedures that support the collection, long-term preservation, and low
cost access to, and dissemination of scientific and technology data. The tasks of the data archiving in-
clude: digitizing data, gathering digitized data into archive collections, describing the collected data to
support long term preservation, decreasing the risks of losing data, and providing easy ways to make the
data accessible. Hence, data archiving and the associated data centers need to be part of the day-to-day
practice of science. This is particularly important now that much new data is collected and generated
digitally, and regularly (Codata, 2002; Mohammadian, 2004).
So far, data mining has existed in the form of discrete technologies. Recently, its integration into many
other formats of ICTs has become attractive as various organizations possessing huge databases began to
realize the potential of information hidden there (Hernandez, Göhring & Hopmann, 2004). Thereby, the
Internet can be a tremendous tool for the collection and exchange of information, best practices, success
cases and vast quantities of data. But it is also becoming increasingly congested and its popular use raises
issues about authentication and evaluation of information and data. Interoperability is another issue,
which provides significant challenges. The growing number and volume of data sources, together with
the high-speed connectivity of the Internet and the increasing number and complexity of data sources,
are making interoperability and data integration an important research and industry focus. Moreover,
incompatibilities between data formats, software systems, methodologies and analytical models are
creating barriers to easy flow and creation of data, information and knowledge (Carty, 2002). All these
demand, not only technology revolution, but also tremendous uplift of human capacity as a whole.
Therefore,thechallengeofhumandevelopmenttakingintoaccountthesocialandeconomicbackground
while protecting the environment confronts decision makers like national governments, local communi-
ties and development organizations. A question arises, as how can new technology for information and
communication be applied to fulfill this task (Hernandez, Göhring & Hopmann, 2004)? This book gives
a review of data mining and decision support techniques and their requirement to achieve sustainable
outcomes. It looks into authenticated global approaches on data mining and shows its capabilities as
an effective instrument on the base of its application as real projects in the developing countries. The
applications are on development of algorithms, computer security, open and distance learning, online
analytical processing, scientific modeling, simple warehousing, and social and economic development
process.
Applying data mining techniques in various aspects of social development processes could thereby
empower the society with proper knowledge, and would produce economic products by raising their
economic capabilities.
On the other hand, coupled to linguistic techniques data mining has produced a new field of text
mining. This has considerably increased the applications of data mining to extract ideas and sentiment
from a wide range of sources, and opened up new possibilities for data mining that can act as a bridge
between the technology and physical sciences and those related to social sciences. Furthermore, data
mining today is recognized as an important tool to analyze and understand the information collected
by governments, businesses and scientific centers. In the context of novel data, text, and Web-mining
application areas are emerging fast and these developments call for new perspectives and approaches
in the form of inclusive researches.
Similarly, info-miners in the distance learning community are using one or more info-mining tools.
They offer a high quality open and distance learning (ODL) information retrieval and search services.
xv
Thus, ICT based info-mining services will likely be producing huge digital libraries such as e-books,
journals, reports and databases on DVD and similar high-density information storage media. Most of
these off-line formats are PC-accessible, and can store considerably more information per unit than a
CD-ROM(COL,2003).Hence,knowledgeenhancementprocessescanbesignificantlyimprovedthrough
proper use of data mining techniques.
Thus, data mining techniques are gradually becoming essential components of corporate intelligence
systems and are progressively evolving into a pervasive technology within activities that range from
the utilization of historical data to predicting the success of an awareness campaign, or a promotional
operation in search of succession patterns used as monitoring tools, or in the analysis of genome chains
or formation of knowledge banks. In reality, data mining is becoming an interdisciplinary field driven
by various multidimensional applications. On one hand it involves schemes for machine learning, pat-
tern recognition, statistics, algorithm, database, linguistic, and visualization. On the other hand, one
finds its applications to understand human behavior, or to understand the type of transactions involved,
or to evaluate risks or detect frauds in an enterprise. Data mining can yield substantial knowledge from
raw data that are primarily gathered for a wide range of applications. Various institutions have derived
significant benefits from its application, and many other industries and disciplines are now applying the
modus operandi in increasing effect for their overall management development.
This book tries to examine the meaning and role of data mining in terms of social development ini-
tiatives and its outcomes in developing economies in terms of upholding knowledge dimensions. At the
same time, it gives an in-depth look into the critical management of information in developed countries
with a similar point of view. Furthermore, this book provides an overview on the main issues of data
mining (including its classification, regression, clustering, association rules, trend detection, feature
selection, intelligent search, data cleaning, privacy and security issues, etc.) and knowledge enhancing
processes as well as a wide spectrum of data mining applications such as computational natural science,
e-commerce, environmental study, business intelligence, network monitoring, social service analysis,
and so forth to empower the knowledge society.
Where the Book StandS
Intheglobalcontext,acombinationofcontinualtechnologicalinnovationandincreasingcompetitiveness
makes the management of information a huge challenge and requires decision-making processes built
on reliable and opportune information, gathered from available internal and external sources. Although
the volume of acquired information is immensely increasing, this does not mean that people are able
to derive appropriate value from it (Maira & Marlei, 2003). This deserves authenticated investigation
on information archival strategies and demands years of continuous investments in order to put in
place a technological platform that supports all development processes and strengthens the efficiency
of the operational structure. Most organizations are supposed to have reached at a certain level where
the implementation of IT solutions for strategic levels becomes achievable and essential. This context
explains the emergence of the domain generally known as “intelligent data mining”, seen as an answer
to the current demands in terms of data/information for decision-making with the intensive utilization
of information technology.
The objective of the book is to examine the meaning and role of data mining in a particular context
(i.e., in terms of development initiatives and its outcomes), especially in developing countries and tran-
sitional economies. If the management of information is a challenge even to enterprises in developed
xvi
countries, what can be said about organizations struggling in unstable contexts such as developing ones?
The book has tried to focus on data mining application in developed countries’ context, too.
With the unprecedented rate at which data is being collected today in almost all fields of human
endeavor, there is an emerging demand to extract useful information from it for economic and scien-
tific benefit of the society. Intelligent data mining enables the community to take advantages out of the
gathered data and information by taking intelligent decisions. This increases the knowledge content of
each member of the community, if it can be applied to practical usage areas. Eventually, a knowledge
base is being created and a knowledge-based society will be established.
However, data mining involves the process of automatic discovery of patterns, sequences, trans-
formations, associations, and anomalies in massive databases, and is a enormously interdisciplinary
field representing the confluence of several disciplines, including database systems, data warehousing,
machine learning, statistics, algorithms, data visualization, and high-performance computing (LCPS,
2001; UN, 2004). A book of this nature, encompassing such omnipotent subject area has been missing
in the contemporary global market, intends to fill in this knowledge gap.
In this context, this book provides an overview on the main issues of data mining (including its clas-
sification, regression, clustering, association rules, trend detection, feature selection, intelligent search,
data cleaning, privacy and security issues, and etc.) and knowledge enhancing processes as well as a
wide spectrum of data mining applications such as computational natural science, e-commerce, envi-
ronmental study, financial market study, machine learning, Web mining, nanotechnology, e-tourism,
and social service analysis.
Apart from providing insight into the advanced context of data mining, this book has emphasized
on:
• Development and availability of shared data, metadata, and products commonly required across
diverse societal benefit areas
• Promoting research efforts that are necessary for the development of tools required in all societal
benefit areas
• Encouraging and facilitating the transition from research to operations of appropriate systems and
techniques
• Facilitating partnerships between operational groups and research groups
• Developing recommended priorities for new or augmented efforts in human capacity building
• Contributing to, access, and retrieve data from global data systems and networks
• Encouraging the adoption of existing and new standards to support broader data and information
usability
• Data management approaches that encompass a broad perspective on the observation of data life
cycle,frominputthroughprocessing,archiving,anddissemination,includingreprocessing,analysis
and visualization of large volumes and diverse types of data
• Facilitating recording and storage of data in clearly defined formats, with metadata and quality
indications to enable search, retrieval, and archiving as easily accessible data sets
• Facilitating user involvement and conducting outreach at global, regional, national and local levels
• Complete and open exchange of data, metadata, and products within relevant agencies and national
policies and legislations
xvii
organization of ChapterS
Altogether this book has fifteen chapters and they are divided into three sections: Education and Re-
search; Tools, Techniques, Methods; and Applications of Data Mining. Section I has three chapters, and
they discuss policy and decision-making approaches of data mining for sociodevelopment aspects in
technical and semitechnical contexts. Section II is comprised of five chapters and they illustrate tools,
techniques, and methods of data mining applications for various human development processes and
scientific research. The third section has seven chapters and those chapters show various case studies,
practical applications and research activities on data mining applications that are being used in the social
development processes for empowering the knowledge societies.
Chapter I provides an overview of a series of multiple criteria optimization-based data mining meth-
ods that utilize multiple criteria programming (MCP) to solve various data mining problems. Authors
state that data mining is being established on the basis of many disciplines, such as machine learning,
databases, statistics, computer science, and operation research and each field comprehends data mining
from its own perspectives by making distinct contributions. They further state that due to the difficulty of
accessingtheaccuracyofhiddendataandincreasingthepredictingrateinacomplexlarge-scaledatabase,
researchers and practitioners have always desired to seek new or alternative data mining techniques.
Therefore, this chapter outlines a few research challenges and opportunities at the end.
Chapter II identifies some important barriers to the successful application of computational intel-
ligence (CI) techniques in a commercial environment and suggests various ways in which they may be
overcome. It states that CI offers new opportunities to a business that wishes to improve the efficiency of
their operations. In this context, this chapter further identifies a few key conceptual, cultural, and techni-
cal barriers and describes different ways in which they affect the business users and the CI practitioners.
This chapter aims to provide knowledgeable insight for its readers through outcome of a successful
computational intelligence project and expects that by enabling both parties to understand each other’s
perspectives, the true potential of CI may be realized.
Chapter III describes two data mining techniques that are used to explore frequent large itemsets
in the database. In the first technique called closed directed graph approach. The algorithm scans the
database once making a count on 2-itemsets possible from which only the 2-itemsets with a minimum
support are used to form the closed directed graph and explores frequent large itemsets in the database.
In the second technique, dynamic hashing algorithm large 3-itemsets are generated at an earlier stage
that reduces the size of the transaction database after trimming and thereby cost of later iterations will
be reduced. Furthermore, this chapter predicts that the techniques may help researchers not only to un-
derstand about generating frequent large itemsets, but also finding association rules among transactions
within relational databases, and make knowledgeable decisions.
It is observed that daily, different satellites capture data of distinct contexts, and among which images
are processed and stored by many institutions. In Chapter IV authors present relevant definitions on
remote sensing and image mining domain, by referring to related work in this field and indicating about
the importance of appropriate tools and techniques to analyze satellite images and extract knowledge
from this kind of data. As a case study, the Amazonia deforestation problem is being discussed; as well
INPE’s effort to develop and spread technology to deal with challenges involving Earth observation
resources. The purpose is to present relevant technologies, new approaches and research directions on
remote sensing image mining, and demonstrating how to increase the analysis potential of such huge
strategic data for the benefit of the researchers.
Chapter V reviews contemporary research on machine learning and Web mining methods that are
related to areas of social benefit. It demonstrates that machine learning and Web mining methods may
xviii
provide intelligent Web services of social interest. The chapter also reveals a growing interest for using
advanced computational methods, such as machine learning and Web mining, for better services to the
public, as most research identified in the literature has been conducted during recent years. The chapter
tries to assist researchers and academics from different disciplines to understand how Web mining and
machine learning methods are applied to Web data. Furthermore, it aims to provide the latest develop-
ments on research in this field that is related to societal benefit areas.
In recent times, customer relationship management (CRM) can be related to sales, marketing and
even services automation.Additionally, the concept of CRM is increasingly associated with cost savings
and streamline processes as well as with the engendering, nurturing and tracking of relationships with
customers. Chapter VI seeks to illustrate how, although the product and service elements as well as
organizational structure and strategies are central to CRM, data is the pivotal dimension around which the
concept revolves in contemporary terms, and subsequently tried to demonstrate how these processes are
associated with data management, namely: data collection, data collation, data storage and data mining,
which are becoming essential components of CRM in both theoretical and practical aspects.
In Chapter VII, authors have introduced the concept of “one-sum” weighted association rules
(WARs) and named such WARs as allocating patterns (ALPs). An algorithm is also being proposed to
extract hidden and interestingALPs from data. The chapter further point out thatALPs can be applied in
portfolio management. Modeling a collection of investment portfolios as a one-sum weighted transac-
tion-database that contains hidden ALPs can do this, and eventually those ALPs, mined from the given
portfolio-data, can be applied to guide future investment activities.
ChapterVIIIisfocusedtodataminingapplicationsandtheirutilizationsinformulatingperformance-
measuring tools for social development activities. In this context, this chapter provides justifications to
include data mining algorithm to establish specifically derived monitoring and evaluation tools for vari-
ous social development applications. In particular, this chapter gave in-depth analytical observations to
establish knowledge centers with a range of approaches and finally it put forward a few research issues
and challenges to transform the contemporary human society into a knowledge society.
ChapterIX highlightesa few areas of developmentaspects and hints applicationof data mining tools,
through which decision-making would be easier. Subsequently, this chapter has put forward potential
areas of society development initiatives, where data mining applications can be introduced. The focus
area may vary from basic education, health care, general commodities, tourism, and ecosystem manage-
ment to advanced uses, like database tomography. This chapter also provides some future challenges and
recommendations in terms of using data mining applications for empowering knowledge society.
Chapter X focuses on business data warehouse and discusses the retailing giant, Wal-Mart. In this
chapter, the planning and implementation of the Wal-Mart data warehouse is being described and its
integration with the operational systems is discussed. It also highlighted some of the problems that have
been encountered during the development process of the data warehouse, including providing some
future recommendations.
In Chapter XI medical applications literature associated with nanoscience and nanotechnology re-
search was examined.Authors retrieved about 65,000 nanotechnology records in 2005 from the Science
Citation Index/ Social Science Citation Index (SCI/SSCI) using a comprehensive 300+ term query. This
chapter intends to facilitate the nanotechnology transition process by identifying the significant applica-
tion areas. It also identified the main nanotechnology health applications from today’s vantage point, as
well as the related science and infrastructure. The medical applications were identified through a fuzzy
clustering process, and metrics were generated using text mining to extract technical intelligence for
specific medical applications/ applications groups.
xix
Chapter XII introduces an early warning system for SMEs (SEWS) as a financial risk detector
that is based on data mining. Through a study this chapter composes a system in which qualitative and
quantitative data about the requirements of enterprises are taken into consideration, during the develop-
ment of an early warning system. Moreover, during the formation of this system; an easy to understand,
easy to interpret and easy to apply utilitarian model is targeted by discovering the implicit relationships
between the data and the identification of effect level of every factor related to the system. This chapter
also shows the way of empowering knowledge society from SME’s point of view by designing an early
warning system based on data mining. Using this system, SME managers could easily reach financial
management, risk management knowledge without any prior knowledge and expertise.
Chapter XIII looks at various business intelligence (BI) projects in developing countries, and spe-
cifically focuses on Brazilian BI projects. Authors poised this question that, if the management of IT is
a challenge for companies in developed countries, what can be said about organizations struggling in
unstable contexts such as those often prevailing in developing countries. Within this broad enquiry about
the role of BI playing in developing countries, two specific research questions are explored in this chapter.
The purpose of the first question is to determine whether those approaches, models, or frameworks are
tailored for particularities and the contextually situated business strategy of each company, or if they are
“standard” and imported from “developed” contexts. The purpose of the second one is to analyze: what
type of information is being considered for incorporation by BI systems; whether they are formal or
informal in nature; whether they are gathered from internal or external sources; whether there is a trend
that favors some areas, like finance or marketing, over others, or if there is a concern with maintaining
multiple perspectives; who in the firms is using BI systems, and so forth.
Technologies such as geographic information systems (GIS) enable geo-spatial information to be
gathered, modified, integrated, and mapped easily and cost effectively. However, these technologies
generate both opportunities and challenges for achieving wider and more effective use of geo-spatial
information in stimulating and sustaining sustainable development through elegant policy making. In
Chapter XIV, the author proposes a simple and accessible conceptual knowledge discovery interface
that can be used as a tool. Moreover, the chapter addresses some issues that might make this knowledge
infrastructure stimulate sustainable development, especially emphasizing sub-Saharan African region.
Finally, Chapter XV discusses the application of data mining to develop drought monitoring tools
that enable monitoring and prediction of drought’s impact on vegetation conditions. The chapter also
summarizes current research using data mining approaches (e.g., association rules and decision-tree
methods) to develop various types of drought monitoring tools and briefly explains how they are being
integrated with decision support systems. This chapter also introduces how data mining can be used to
enhance drought monitoring and prediction in the United States, and at the same time, assist others to
understand how similar tools might be developed in other parts of the world.
ConCluSion
Data mining is becoming an essential tool in science, engineering, industrial processes, healthcare, and
medicine.Thedatasetsinthesefieldsarelarge,complex,andoftennoisy.However,extractingknowledge
from raw datasets requires the use of sophisticated, high-performance and principled analysis techniques
and algorithms, based on sound statistical foundations. In turn, these techniques require powerful visual-
ization technologies; implementations that must be carefully tuned for enhanced performance; software
systems that are usable by scientists, engineers, and physicians as well as researchers.
xx
Data mining, as stated earlier, is denoted as the extraction of hidden predictive information from large
databases, and it is a powerful new technology with great potential to help enterprises focus on the most
important information in their data warehouses. Data mining tools predict future trends and behaviors,
allowing entrepreneurs to make proactive, knowledge-driven decisions. The automated, prospective
analyses offered by data mining move beyond the analyses of past events provided by retrospective
constituents typical of decision support systems. Data mining tools can answer business questions that
traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding
predictive information that experts may miss because it lies outside their expectations.
In effect, data mining techniques are the result of a long process of research and product development.
This evolution began when business data was first stored on computers, continued with improvements
in data access, and more recently, generated technologies that allow users to navigate through their data
in real time. Thus, data mining takes this evolutionary progression beyond retrospective data access
and navigation to prospective and proactive information delivery. Furthermore, data mining algorithms
allow researchers to device unique decision-making tools from emancipated data varying in nature.
Foremost, applying data mining techniques extremely valuable utilities can be devised that could raise
the knowledge content at each tier of society segments.
However, in terms of accumulated literature and research contexts, not many publications are avail-
able in the field of data mining applications in social development phenomenon, especially in the form
of a book. By taking this as a baseline, compiled literature seems to be extremely valuable in the context
of utilizing data mining and other information techniques for the improvement of skills development,
knowledge management, and societal benefits. Similarly, Internet search engines do not fetch sufficient
bibliographies in the field of data mining for development perspective. Due to the high demand from
researchers’ in the aspect of ICTD, a book of this format stands to be unique. Moreover, utilization of
new ICTs in the form of data mining deserves appropriate intervention for their diffusion at local, na-
tional, regional, and global levels.
Itisassumedthatnumerousindividuals,academics,researchers,engineers,professionalsfromgovern-
ment and nongovernment security and development organizations will be interested in this increasingly
importanttopicforcarryingoutimplementationstrategiestowardstheirnationaldevelopment.Thisbook
will assist its readers to understand the key practical and research issues related to applying data min-
ing in development data analysis, cyber acclamations, digital deftness, contemporary CRM, investment
portfolios, early warning system in SMEs, business intelligence, and intrinsic nature in the context of
society uplift as a whole and the use of data and information for empowering knowledge societies.
Most books of data mining deal with mere technology aspects, despite the diversified nature of its
various applications along many tiers of human endeavor. However, there are a few activities in recent
years that are producing high quality proceedings, but it is felt that compilation of contents of this nature
from advanced research outcomes that have been carried out globally may produce a demanding book
among the researchers.
referenCeS
Applix (2003). OLAP data scalability: Ignore the OLAP data explosion at great cost. A White Paper.
Westborough, MA: Applix, Inc.
Carty,A.J.(2002,September29).Scientificandtechnicaldata:Extendingthefrontiersofresearch.InPro-
ceedings of the Opening Address at the 18th
International CODATA Conference, Montreal, Quebec.
xxi
Codata (2002, May 21-22). In Proceedings of the Workshop on Archiving Scientific and Technical Data,
Committee on Data for Science and Technology (CODATA), Pretoria, South Africa.
COL (2003). Find information faster: COL’s “Info-mining” tools. Vancouver, BC: Clippings, Com-
monwealth of Learning.
EC (2005). Integrating and strengthening the European Research Area, 2005 Work Programme (SP1-
10). European Commission.
Hernandez, V., Göhring, W., & Hopmann, C. (2004, Nov. 30-Dec. 3). Sustainable decision support for
environmental problems in developing countries: Applying multicriteria spatial analysis on the Nicara-
gua Development Gateway niDG. In Proceedings of the Workshop on Binding EU-Latin American IST
Research Initiatives for Enhancing Future Co-Operation. Santo Domingo, Costa Rica.
Giudici, P. (2003). Applied data mining: Statistical methods for business and industry. John Wiley.
Hastie, T., Tibshirani, R., & Friedman, J. (2001) (Eds.). The elements of statistical learning: Data min-
ing, inference, and prediction. Springer Verlag.
Intransa (2005). Managing storage growth with an affordable and flexible IPSAN:Ahighly cost-effective
storage solution that leverages existing IT resources. San Jose, CA: Intransa, Inc.
LCPS (2001, September 11-12). Draft workshop report. In Proceedings of the International Consulta-
tive Workshop, The Digital Initiative for Development Agency (DID), The Lebanese Center for Policy
Studies (LCPS), Beirut.
Maira, P. & Marlei, P. (2003, June 16-21). The value of “business intelligence” in the context of devel-
oping countries. In Proceedings of the 11th European Conference on Information Systems, ECIS 2003,
Naples, Italy. Retrieved April 6, 2008, http://is2.lse.ac.uk/asp/aspecis/20030119.pdf
Mohammadian, M. (2004). Intelligent agents for data mining and information retrieval. Hershey, PA:
Idea Group Publishing.
UN (2004, June 16). Draft Sao Paulo Consensus, UNCTAD XI Multi-Stakeholder Partnerships, United
Nations Conference on Trade and Development, TD/L.380/Add.1, Sao Paulo.
Witten, I. H. & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd
ed). Morgan Kaufmann.
Yuan, M., Buttenfield, B., Gehagen, M. & Miller, H. (2004). Geospatial data mining and knowledge
discovery. In R. B. McMaster & E. L. Usery (Eds.), A research agenda for geographic information sci-
ence (pp. 365-388). Boca Raton, FL: CRC Press.
xxii
Acknowledgment
The editor would like to acknowledge the assistance from all involved in the entire accretion of manu-
scripts, painstaking review process, and methodical revision of the book, without whose support the
project could not have been satisfactorily completed. I am indebted to all the authors who provided their
relentless and generous supports, but reviewers who were most helpful and provided comprehensive,
thorough and creative comments are: Ali Serhan Koyuncugil, Georgios Lappas, and Paul Henman.
Thanks go to my close friends at UNDP, and colleagues at SDNF and ICMS for their wholehearted
encouragements during the entire process.
Special thanks also go to the dedicated publishing team at IGI Global. Particularly to Kristin Roth,
Jessica Thompson, and Jennifer Neidig for their continuous suggestions, supports and feedbacks via e-
mail for keeping the project on schedule, and to Mehdi Khosrow-Pour and Jan Travers for their enduring
professional supports. Finally, I would like to thank all my family members for their love and support
throughout this period.
Hakikur Rahman, Editor
SDNF, Bangladesh
September 2007
Data Mining Applications For Empowering Knowledge Societies 1st Edition Hakikur Rahman
Section I
Education and Research
Chapter I
Introduction to Data Mining
Techniques via Multiple Criteria
Optimization Approaches and
Applications
Yong Shi
University of the Chinese Academy of Sciences, China
and University of Nebraska at Omaha, USA
Yi Peng
University of Nebraska at Omaha, USA
Gang Kou
University of Nebraska at Omaha, USA
Zhengxin Chen
University of Nebraska at Omaha, USA
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
aBStraCt
This chapter provides an overview of a series of multiple criteria optimization-based data mining meth-
ods, which utilize multiple criteria programming (MCP) to solve data mining problems, and outlines
some research challenges and opportunities for the data mining community. To achieve these goals, this
chapter first introduces the basic notions and mathematical formulations for multiple criteria optimiza-
tion-based classification models, including the multiple criteria linear programming model, multiple
criteria quadratic programming model, and multiple criteria fuzzy linear programming model. Then it
presents the real-life applications of these models in credit card scoring management, HIV-1 associated
dementia (HAD) neuronal damage and dropout, and network intrusion detection. Finally, the chapter
discusses research challenges and opportunities.
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
introduCtion
Data mining has become a powerful information
technology tool in today’s competitive business
world.Asthesizesandvarietiesofelectronicdata-
setsgrow,theinterestindataminingisincreasing
rapidly. Data mining is established on the basis of
manydisciplines,suchasmachinelearning,data-
bases,statistics,computerscience,andoperations
research. Each field comprehends data mining
from its own perspective and makes its distinct
contributions. It is this multidisciplinary nature
that brings vitality to data mining. One of the
application roots of data mining can be regarded
as statistical data analysis in the pharmaceutical
industry.Nowadaysthefinancialindustry,includ-
ingcommercialbanks,hasbenefitedfromtheuse
of data mining. In addition to statistics, decision
trees,neuralnetworks,roughsets,fuzzysets,and
vector support machines have gradually become
populardataminingmethodsoverthelast10years.
Due to the difficulty of accessing the accuracy of
hidden data and increasing the predicting rate in
a complex large-scale database, researchers and
practitioners have always desired to seek new
or alternative data mining techniques. This is a
key motivation for the proposed multiple criteria
optimization-based data mining methods.
The objective of this chapter is to provide
an overview of a series of multiple criteria
optimization-based methods, which utilize the
multiple criteria programming (MCP) to solve
classification problems. In addition to giving an
overview, this chapter lists some data mining
research challenges and opportunities for the
data mining community. To achieve these goals,
the next section introduces the basic notions and
mathematical formulations for three multiple
criteriaoptimization-basedclassificationmodels:
the multiple criteria linear programming model,
multiple criteria quadratic programming model,
and multiple criteria fuzzy linear programming
model. The third section presents some real-life
applicationsofthesemodels,includingcreditcard
scoring management, classifications on HIV-1
associated dementia (HAD) neuronal damage
and dropout, and network intrusion detection.
Thechapterthenoutlinesresearchchallengesand
opportunities, and the conclusion is presented.
Multiple Criteria
optiMization-BaSed
ClaSSifiCation ModelS
This section explores solving classification
problems, one of the major areas of data mining,
through the use of multiple criteria mathematical
programming-based methods (Shi, Wise, Luo, 
Lin, 2001; Shi, Peng, Kou,  Chen, 2005). Such
methods have shown its strong applicability in
solving a variety of classification problems (e.g.,
Kou et al., 2005; Zheng et al., 2004).
Classification
Although the definition of classification in data
mining varies, the basic idea of classification
can be generally described as to “predicate
the most likely state of a categorical variable
(the class) given the values of other variables”
(Bradley, Fayyad,  Mangasarian, 1999, p. 6).
Classification is a two-step process. The first step
constructs a predictive model based on training
dataset. The second step applies the predictive
model constructed from the first step to testing
dataset. If the classification accuracy of testing
dataset is acceptable, the model can be used to
predicate unknown data (Han  Kamber, 2000;
Olson  Shi, 2005).
Using the multiple criteria programming, the
classification task can be defined as follows: for a
givensetofvariablesinthedatabase,theboundar-
iesbetweentheclassesarerepresentedbyscalars
intheconstraintavailabilities.Then,thestandards
of classification are measured by minimizing
the total overlapping of data and maximizing
the distances of every data to its class boundary
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
simultaneously. Through the algorithms of MCP,
an“optimal”solutionofvariables(so-calledclas-
sifier) for the data observations is determined
for the separation of the given classes. Finally,
the resulting classifier can be used to predict the
unknowndatafordiscoveringthehiddenpatterns
of data as possible knowledge. Note that MCP
differs from the known support vector machine
(SVM) (e.g., Mangasarian, 2000; Vapnik, 2000).
While the former uses multiple measurements
to separate each data from different classes, the
latter searches the minority of the data (support
vectors)torepresentthemajorityinclassifyingthe
data. However, both can be generally regarded as
in the same category of optimization approaches
to data mining.
In the following, we first discuss a general-
ized multi-criteria programming model formula-
tion, and then explore several variations of the
model.
A Generalized Multiple Criteria
Programming Model Formulation
Thissectionintroducesageneralizedmulti-crite-
riaprogrammingmethodforclassification.Simply
speaking, this method is to classify observations
into distinct groups based on two criteria for data
separation. The following models represent this
concept mathematically:
Given an r-dimensional attribute vector
a=(a1
,...ar
), let Ai
=(Ai1
,...,Air
)∈Rr
be one of the
samplerecordsoftheseattributes,wherei=1,...,n;n
representsthetotalnumberofrecordsinthedata-
set.SupposetwogroupsG1
andG2
arepredefined.
A boundary scalar b can be selected to separate
these two groups. A vector X = (x1
,...,Xr
)T
∈Rr
can
be identified to establish the following linear
inequations (Fisher, 1936; Shi et al., 2001):
• Ai
X  b,∀Ai
∈G1
• Ai
X ≥ b,∀Ai
∈G2
To formulate the criteria and complete con-
straints for data separation, some variables need
to be introduced. In the classification problem, Ai
X is the score for the ith
data record. Let ai
be the
overlapping of two-group boundary for record
Ai
(external measurement) and βi
be the distance
of record Ai
from its adjusted boundary (internal
measurement). The overlapping ai
means the
distance of record Ai
to the boundary b if Ai
is
misclassified into another group. For instance, in
Figure 1 the “black dot” located to the right of the
boundary b belongs to G1
, but it was misclassi-
fied by the boundary b to G2
. Thus, the distance
between b and the “dot” equals ai
. Adjusted
boundary is defined as b-a*
or b+a*
, while a*
represents the maximum of overlapping (Freed
 Glover, 1981, 1986). Then, a mathematical
function f(a) can be used to describe the relation
of all overlapping ai
, while another mathematical
function g(β) represents the aggregation of all
distances βi
. The final classification accuracies
depend on simultaneously minimizing f(a) and
maximizing g(β). Thus, a generalized bi-criteria
programming method for classification can be
formulated as:
(GeneralizedModel)Minimizef(a)andMaximize
g(β)
Subject to:
Ai
X - ai
+βi
- b = 0, 
∀ Ai
∈ G1
,
Ai
X + ai
-βi
- b = 0, ∀ Ai
∈ G2
,
where Ai
, i = 1, …, n are given, X and b are un-
restricted, and a= (a1
,...an
)T
, β=(β1
,...βn
)T
;ai
, βi
≥
0, i = 1, …, n.
Allvariablesandtheirrelationshipsarerepre-
sentedinFigure1.TherearetwogroupsinFigure
1:“blackdots”indicateG1
dataobjects,and“stars”
indicateG2
dataobjects.Thereisonemisclassified
dataobjectfromeachgroupiftheboundaryscalar
b is used to classify these two groups, whereas
adjusted boundaries b-a*
and b+a*
separate two
groups without misclassification.
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
Based on the above generalized model, the
following subsection formulates a multiple cri-
teria linear programming (MCLP) model and a
multiplecriteriaquadraticprogramming(MCQP)
model.
Multiple Criteria Linear and Quadratic
Programming Model Formulation
Different forms of f(a) and g(β) in the general-
ized model will affect the classification criteria.
Commonly f(a) (or g(β)) can be component-wise
andnon-increasing(ornon-decreasing)functions.
Forexample,inordertoutilizethecomputational
power of some existing mathematical program-
ming software packages, a sub-model can be set
up by using the norm to represent f(a) and g(β).
This means that we can assume f(a) = ||a||p
and
g(β) = ||β||q
. To transform the bi-criteria problems
of the generalized model into a single-criterion
problem, we use weights wa
 0 and wβ
 0 for
||a||p
and ||β||q
, respectively. The values of wa
and
wβ
canbepre-definedintheprocessofidentifying
theoptimalsolution.Thus,thegeneralizedmodel
is converted into a single criterion mathematical
programming model as:
Model 1: Minimize wa
||a||p
- wβ
||β||q
Subject to:
Ai
X - ai
+βi
-b=0, ∀ Ai
∈ G1
,
Ai
X+ai
-βi
-b=0, ∀Ai
∈ G2
,
where Ai
, i = 1, …, n are given, X and b are un-
restricted, and a = (a1
,...,an
)T
, β = (β1
,...βn
)T
; ai
, βi
≥ 0, i = 1, …, n.
Based on Model 1, mathematical program-
ming models with any norm can be theoretically
defined. This study is interested in formulating
a linear and a quadratic programming model. Let
p = q = 1, then ||a||1
= ∑
=
n
i
i
1
and ||β||1
= ∑
=
n
i
i
1
. Let
p = q = 2, then ||a||2
= ∑
=
n
i
i
1
2
and ||β||2
= ∑
=
n
i
i
1
2
.
The objective function in Model 1 can now be
an MCLP model or MCQP model.
Model 2: MCLP
Minimize wa ∑
=
n
i
i
1
- wβ∑
=
n
i
i
1
Subject to:
Ai
X-ai
+βi
+b=0, ∀Ai
∈ G1
,
Ai
X+ai
-βi
-b=0, ∀Ai
∈ G2
,
Figure 1. Two-group classification model
G1
G2
Ai X = b - a*
Ai X = b + a*
Ai X = b
i
i
i
i
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
where Ai
, i = 1, …, n are given, X and b are un-
restricted, and a=(a1
,...an
)T
, β = (β1
,...βn
)T
; ai
, βi
≥ 0, i = 1, …, n.
Model 3: MCQP
Minimize wa ∑
=
n
i
i
1
2
- wβ ∑
=
n
i
i
1
2
Subject to:
Ai
X - ai
+ βi
- b = 0, ∀Ai
∈ G1
,
Ai
X + ai
- βi
- b = 0, ∀Ai
∈ G2
,
where Ai
, i = 1, …, n are given, X and b are un-
restricted, and a = (a1
,...,an
)T
, β = (β1
,...βn
)T
; ai
, βi
≥ 0, i = 1, …, n.
Remark 
TherearesomeissuesrelatedtoMCLPandMCQP
that can be briefly addressed here:
1. In the process of finding an optimal solu-
tion for MCLP problem, if some βi
is too
large with given wa
 0 and wβ
 0 and all
ai
relatively small, the problem may have
an unbounded solution. In the real applica-
tions, the data with large βi
can be detected
as “outlier” or “noisy” in the data prepro-
cessing, which should be removed before
classification.
2. Note that although variables X and b are
unrestricted in the above models, X = 0 is an
“insignificant case” in terms of data separa-
tion,andthereforeitshouldbeignoredinthe
process of solving the problem. For b = 0,
however, may result a solution for the data
separation depending on the data structure.
From experimental studies, a pre-defined
value of b can quickly lead to an optimal
solution if the user fully understands the
data structure.
3. Some variations of the generalized model,
such as MCQP, are NP-hard problems.
Developing algorithms directly to solve
these models can be a challenge. Although
in application we can utilize some existing
commercialsoftware,thetheoretical-related
problem will be addressed in later in this
chapter.
Multiple Criteria Fuzzy Linear
Programming Model Formulation
It has been recognized that in many decision-
making problems, instead of finding the existing
“optimalsolution”(agoalvalue),decisionmakers
often approach a “satisfying solution” between
upper and lower aspiration levels that can be
represented by the upper and lower bounds of
acceptability for objective payoffs, respectively
(Charnes  Cooper, 1961; Lee, 1972; Shi  Yu,
1989;Yu,1985).Thisidea,whichhasanimportant
and pervasive impact on human decision making
(Lindsay  Norman 1972), is called the decision
makers’ goal-seeking concept. Zimmermann
(1978) employed it as the basis of his pioneering
workonFLP.WhenFLPisadoptedtoclassifythe
‘good’and‘bad’data,afuzzy(satisfying)solution
is used to meet a threshold for the accuracy rate
of classifications, although the fuzzy solution is
a near optimal solution.
According to Zimmermann (1978), in formu-
lating an FLP problem, the objectives (Minimize
Σi
ai
and Maximize Σi
βi
) and constraints (Ai
X = b
+ ai
- βi
, Ai
∈ G; Ai
X = b - ai
+ βi
, Ai
∈B) of the
generalized model are redefined as fuzzy sets
F and X with corresponding membership func-
tions µF
(x) and µX
(x) respectively. In this case
the fuzzy decision set D is defined as D = F ∪ X,
and the membership function is defined as µD
(x)
={µF
(x), µX
(x)}. In a maximal problem, x1
is a
“better” decision than x2
if µD
(x1
) ≥ µD
(x2
) . Thus,
it can be considered appropriately to select x*
such that { }
)
(
),
(
min
max
)
(
max x
x
x X
F
x
D
x
=
{ }
)
(
),
(
min *
*
x
x X
F
= is the maximized solu-
tion.
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
Let y1L
be Minimize Σi
ai
and y2U
be Maximize
Σi
βi
, then one can assume that the value of Maxi-
mize Σi
ai
to be y1U
and that of Minimize Σi
βi
to be
y2L
.Ifthe“upperbound”y1U
andthe“lowerbound”
y2L
do not exist for the formulations, they can be
estimated. Let F1
{x: y1L
≤ Σi
ai
≤ y1U
} and F2
{x:
y2L
≤ Σi
βi
≤ y2U
}and their membership functions
can be expressed respectively by:







≤
Σ

Σ

−
−
Σ
≥
Σ
=
L
i
i
U
i
i
L
L
U
L
i
i
U
i
i
F
y
if
y
y
if
y
y
y
y
if
x
1
1
1
1
1
1
1
,
0
,
,
1
)
(
1
and







≤
Σ

Σ

−
−
Σ
≥
Σ
=
L
i
i
U
i
i
L
L
U
L
i
i
U
i
i
F
y
if
y
y
if
y
y
y
y
if
x
2
2
2
2
2
2
2
,
0
,
,
1
)
(
2
Then the fuzzy set of the objective functions
is F = F1
∩ F2
, and its membership function is
{ }
)
(
),
(
min
)
( 2
1
x
x
x F
F
F = . Using the crisp con-
straint set X = {x: Ai
X = b + ai
- βi
, Ai
∈ G; Ai
X
= b - ai
+ βi
, Ai
∈ B}, the fuzzy set of the decision
problem is 1 2
D F F X
= ∩ ∩ , and its membership
function is 1 2
( ) ( )
D F F X
x x
∩ ∩
= .
Zi m mer ma n n (1978) has show n
t h a t t h e “o p t i m a l s ol u t i o n” of
{ }
)
(
),
(
),
(
min
max
)
(
max 2
1
x
x
x
x X
F
F
x
D
x
= is an
efficient solution of a variation of the generalized
model when f(a) = Σi
ai
and g(β) = Σi
βi
. Then,
this problem is equivalent to the following linear
program (He, Liu, Shi, Xu,  Yan, 2004):
Model 4: FLP
Maximize ξ
Subject to:
L
U
L
i
i
y
y
y
1
1
1
−
−
Σ
≤
L
U
L
i
i
y
y
y
2
2
2
−
−
Σ
≤
Ai
X = b + ai
- βi
, Ai
∈ G,
Ai
X = b - ai
+ βi
, Ai
∈ B,
where Ai
, y1L
, y1U
, y2L
and y2U
are known, X and b
are unrestricted, and ai
, βi
, ξ ≥ 0.
Note that Model 4 will produce a value of ξ
with 1  ξ ≥ 0. To avoid the trivial solution, one
can set up ξ  ε ≥ 0, for a given ε. Therefore,
seekingMaximumξintheFLPapproachbecomes
the standard of determining the classifications
between‘good’and‘bad’recordsinthedatabase.
A graphical illustration of this approach can be
seen from Figure 2; any point of hyper plane
0  ξ  1 over the shadow area represents the pos-
sible determination of classifications by the FLP
method. Whenever Model 4 has been trained to
meet the given thresholdt, it is said that the better
classifier has been identified.
A procedure of using the FLP method for data
classificationscanbecapturedbytheflowchartof
Figure 2. Note that although the boundary of two
classesbistheunrestrictedvariableinModel4,it
can be presumed by the analyst according to the
structure of a particular database. First, choosing
a proper value of b can speed up solving Model
4. Second, given a thresholdt, the best data sepa-
ration can be selected from a number of results
determined by different b values. Therefore, the
parameter b plays a key role in this chapter to
achieveandguaranteethedesiredaccuracyratet.
Forthisreason,theFLPclassificationmethoduses
b as an important control parameter as shown in
Figure 2.
real-life appliCationS uSing
Multiple Criteria optiMization
approaCheS
The models of multiple criteria optimization data
mining in this chapter have been applied in credit
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
card portfolio management (He et al., 2004; Kou,
Liu, Peng, Shi, Wise,  Xu, 2003; Peng, Kou,
Chen,Shi,2004;Shietal.,2001;Shi,Peng,Xu,
 Tang, 2002; Shi et al., 2005), HIV-1-mediated
neural dendritic and synaptic damage treatment
(Zheng et al., 2004), network intrusion detection
(Kouetal.,2004a;Kou,Peng,Chen,Shi,Chen.
2004b), and firms bankruptcy analyses (Kwak,
Shi,Eldridge,Kou,2006).Theseapproachesare
also being applied in other ongoing real-life data
mining projects, such as anti-gene and antibody
analyses, petroleum drilling and exploration,
fraud management, and financial risk evaluation.
In order to let the reader understand the useful-
ness of the models, the key experiences in some
applications are reported as below.
Credit Card Portfolio Management
The goal of credit card accounts classification is
to produce a “blacklist” of the credit cardhold-
ers; this list can help creditors to take proactive
steps to minimize charge-off loss. In this study,
creditcardaccountsareclassifiedintotwogroups:
‘good’ or ‘bad’. From the technical point of view,
weneedfirstconstructanumberofclassifiersand
then choose one that can find more bad records.
Theresearchprocedureconsistsoffivesteps.The
first step is data cleaning. Within this step, miss-
ing data cells and outliers are removed from the
dataset. The second step is data transformation.
The dataset is transformed in accord with the
format requirements of MCLP software (Kou 
Shi, 2002) and LINGO 8.0, which is a software
toolforsolvingnonlinearprogrammingproblems
(LINDO Systems Inc.). The third step is datasets
selection. The training dataset and the testing
dataset are selected according to a heuristic
process. The fourth step is model formulation
and classification. The two-group MCLP and
MCQP models are applied to the training dataset
to obtain optimal solutions. The solutions are
then applied to the testing dataset within which
class labels are removed for validation. Based on
these scores, each record is predicted as either
bad(bankruptaccount)orgood(currentaccount).
By comparing the predicted labels with original
labels of records, the classification accuracies of
multiple-criteria models can be determined. If
the classification accuracy is acceptable by data
analysts, this solution will be applied to future
unknown credit card records or applications to
make predictions. Otherwise, data analysts can
modify the boundary and attributes values to get
another set of optimal solutions. The fifth step is
results’ presentation. The acceptable classifica-
tion results are summarized in tables or figures
and presented to end users.
Figure 2.Aflowchart of the fuzzy linear program-
ming classification method
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
Credit Card Dataset
The credit card dataset used in this chapter is
provided by a major U.S. bank. It contains 5,000
records and 102 variables (38 original variables
and 64 derived variables). The data were col-
lected from June 1995 to December 1995, and
the cardholders were from 28 states of the United
States. Each record has a class label to indicate
its credit status: either ‘good’ or ‘bad’. ‘Bad’ indi-
catesabankruptcycreditcardaccountand‘good’
indicates a good status account. Among these
5,000 records, 815 are bankruptcy accounts and
4,185 are good status accounts. The 38 original
variables can be divided into four categories: bal-
ance, purchase, payment, and cash advance. The
64 derived variables are created from the original
38 variables to reinforce the comprehension of
cardholders’ behaviors, such as times over-limit
in last two years, calculated interest rate, cash as
percentage of balance, purchase as percentage to
balance, payment as percentage to balance, and
purchase as percentage to payment. For the pur-
pose of credit card classification, the 64 derived
variableswerechosentocomputethemodelsince
theyprovidemorepreciseinformationaboutcredit
cardholders’ behaviors.
Experimental Results of MCLP
Inspired by the k-fold cross-validation method
in classification, this study proposed a heuristic
process for training and testing dataset selec-
tions. Standard k-fold cross-validation is not
used because the majority-vote ensemble method
used later on in this chapter may need hundreds
of voters. If standard k-fold cross-validation
was employed, k should be equal to hundreds.
The following paragraph describes the heuristic
process.
First, the bankruptcy dataset (815 records) is
divided into 100 intervals (each interval has eight
records). Within each interval, seven records
are randomly selected. The number of seven
is determined according to empirical results of
k-fold cross-validation. Thus 700 ‘bad’ records
are obtained. Second, the good-status dataset
(4,185 records) is divided into 100 intervals (each
interval has 41 records). Within each interval,
seven records are randomly selected. Thus the
total of 700 ‘good’ records is obtained. Third,
the 700 bankruptcy and 700 current records are
combined to form a training dataset. Finally, the
remaining 115 bankruptcy and 3,485 current ac-
counts become the testing dataset. According to
this procedure, the total possible combinations
of this selection equals (C 7
8 ×C7
41 )100
. Thus, the
possibility of getting identical training or testing
datasets is approximately zero. The across-the-
board thresholds of 65% and 70% are set for the
‘bad’and‘good’class,respectively.Thevaluesof
thresholds are determined from previous experi-
ence. The classification results whose predictive
accuracies are below these thresholds will be
filtered out.
The whole research procedure can be sum-
marized using the following algorithm:
Algorithm 1
Input: The data set A = {A1
, A2
, A3
,…, An
},
boundary b
Output: The optimal solution, X*
= (x1
*
,
x2
*
, x3
*
, . . . , x64
*
), the classification score
MCLPi
Step 1: Generate the Training set and the
Testing set from the credit card data set.
Step2:Applythetwo-groupMCLPmodelto
compute the optimal solution X*
= (x1
*
, x2
*
, . .
. , x64
*
) as the best weights of all 64 variables
with given values of control parameters (b,
a*, β*
) in Training set.
Step3:TheclassificationscoreMCLPi
=Ai
X*
against of each observation in the Training
set is calculated against the boundary b
to check the performance measures of the
classification.
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
Step 4: If the classification result of Step 3 is
acceptable(i.e.,thefoundperformancemea-
sureislargerorequaltothegiventhreshold),
go to the next step. Otherwise, arbitrarily
choosedifferentvaluesofcontrolparameters
(b, a*, β*
) and go to Step 1.
Step5:UseX*
=(x1
*
,x2
*
,...,x64
*
)tocalculate
the MCLP scores for all Ai
in the Testing set
and conduct the performance analysis. If it
produces a satisfying classification result,
go to the next step. Otherwise, go back to
Step 1 to reformulate the Training Set and
Testing Set.
Step 6: Repeat the whole process until a
preset number (e.g., 999) of different X*
are
generated for the future ensemble method.
End.
Using Algorithm 1 to the credit card dataset,
classification results were obtained and summa-
rized. Due to the space limitation, only a part (10
out of the total 500 cross-validation results) of
the results is summarized in Table 1 (Peng et al.,
2004).Thecolumns“Bad”and“Good”refertothe
numberofrecordsthatwerecorrectlyclassifiedas
“bad” and “good,” respectively. The column “Ac-
curacy” was calculated using correctly classified
records divided by the total records in that class.
For instance, 80.43% accuracy of Dataset 1 for
bad record in the training dataset was calculated
using 563 divided by 700 and means that 80.43%
ofbadrecordswerecorrectlyclassified.Theaver-
agepredictiveaccuraciesforbadandgoodgroups
in the training dataset are 79.79% and 78.97%,
and the average predictive accuracies for bad and
good groups in the testing dataset are 68% and
74.39%. The results demonstrated that a good
separation of bankruptcy and good status credit
card accounts is observed with this method.
Improvement of MCLP Experimental
Results with Ensemble Method
Increditcardbankruptcypredictions,evenasmall
percentage of increase in the classification accu-
racy can save creditors millions of dollars. Thus
it is necessary to investigate possible techniques
thatcanimproveMCLPclassificationresults.The
technique studied in this experiment is major-
ity-vote ensemble. An ensemble consists of two
fundamental elements: a set of trained classifiers
and an aggregation mechanism that organizes
these classifiers into the output ensemble. The
aggregation mechanism can be an average or a
Cross
Validation
Training Set (700 Bad +700 Good) Testing Set (115 Bad +3485 Good)
Bad Accuracy Good Accuracy Bad Accuracy Good Accuracy
DataSet 1 563 80.43% 557 79.57% 78 67.83% 2575 73.89%
DataSet 2 546 78.00% 546 78.00% 75 65.22% 2653 76.13%
DataSet 3 564 80.57% 560 80.00% 75 65.22% 2550 73.17%
DataSet 4 553 79.00% 553 79.00% 78 67.83% 2651 76.07%
DataSet 5 548 78.29% 540 77.14% 78 67.83% 2630 75.47%
DataSet 6 567 81.00% 561 80.14% 79 68.70% 2576 73.92%
DataSet 7 556 79.43% 548 78.29% 77 66.96% 2557 73.37%
DataSet 8 562 80.29% 552 78.86% 79 68.70% 2557 73.37%
DataSet 9 566 80.86% 557 79.57% 83 72.17% 2588 74.26%
DataSet 10 560 80.00% 554 79.14% 80 69.57% 2589 74.29%
Table 1. MCLP credit card accounts classification
0
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
majority vote (Zenobi  Cunningham, 2002).
Weingessel,Dimitriadou,andHornik(2003)have
reviewedaseriesofensemble-relatedpublications
(Dietterich, 2000; Lam, 2000; Parhami, 1994;
Bauer  Kohavi, 1999; Kuncheva, 2000). Previ-
ousresearchhasshownthatanensemblecanhelp
to increase classification accuracy and stability
(OpitzMaclin,1999).ApartofMCLP’soptimal
solutions was selected to form ensembles. Each
solution will have one vote for each credit card
record,andfinalclassificationresultisdetermined
by the majority votes. Algorithm 2 describes the
ensemble process:
Algorithm 2
Input: The data set A = {A1
, A2
, A3
, …, An
},
boundary b , a certain number of solutions,
X*
= (x1
*
, x2
*
, x3
*
, . . . , x64
*
)
Output:TheclassificationscoreMCLPi
and
the prediction Pi
Step 1: A committee of certain odd number
of classifiers X*
is formed.
Step 2: The classification score MCLPi
=
Ai
X*
against each observation is calculated
against the boundary b by every member of
the committee. The performance measures
of the classification will be decided by
majorities of the committee. If more than
half of the committee members agreed in
the classification, then the prediction Pi
for
thisobservationissuccessful,otherwisethe
prediction is failed.
Step 3: The accuracy for each group will be
computed by the percentage of successful
classification in all observations.
End.
The results of applying Algorithm 2 are sum-
marizedinTable2(Pengetal.,2004).Theaverage
predictive accuracies for bad and good groups in
the training dataset are 80.8% and 80.6%, and
the average predictive accuracies for bad and
good groups in the testing dataset are 72.17% and
76.4%.Comparedwithpreviousresults,ensemble
technique improves the classification accuracies.
Especially for bad records classification in the
testingset,theaverageaccuracyincreased4.17%.
Since bankruptcy accounts are the major cause
of creditors’ loss, predictive accuracy for bad
records is considered to be more important than
for good records.
Experimental Results of MCQP
Based on the MCQP model and the research
proceduredescribedinprevioussections,similar
experimentswereconductedtogetMCQPresults.
LINGO8.0wasusedtocomputetheoptimalsolu-
tions. The whole research procedure for MCQP
is summarized in Algorithm 3:
Ensemble
Results
Training Set
(700 Bad data+700 Good data)
Testing Set
(115 Bad data+3485 Good data)
No. of Voters Bad Accuracy Good Accuracy Bad Accuracy Good Accuracy
9 563 80.43% 561 80.14% 81 70.43% 2605 74.75%
99 565 80.71% 563 80.43% 83 72.17% 2665 76.47%
199 565 80.71% 566 80.86% 83 72.17% 2656 76.21%
299 568 81.14% 564 80.57% 84 73.04% 2697 77.39%
399 567 81.00% 567 81.00% 84 73.04% 2689 77.16%
Table 2. MCLP credit card accounts classification with ensemble
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
Algorithm 3
Input: The data set A = {A1
, A2
, A3
,…, An
},
boundary b
Output: The optimal solution, X*
= (x1
*
x2
*
, x3
*
, . . . , x64
*
), the classification score
MCQPi
Step 1: Generate the Training set and Test-
ing set from the credit card data set.
Step 2: Apply the two-group MCQP model
to compute the compromise solution X*
=
(x1
*
, x2
*
, . . . , x64
*
) as the best weights of all
64 variables with given values of control
parameters (b, a*
, β*
) using LINGO 8.0
software.
Step 3: The classification score MCQPi
=
Ai
X*
against each observation is calculated
against the boundary b to check the perfor-
mance measures of the classification.
Step 4: If the classification result of Step 3
is acceptable (i.e., the found performance
measure is larger or equal to the given
threshold), go to the next step. Otherwise,
choosedifferentvaluesofcontrolparameters
(b, a*
, β*
) and go to Step 1.
Step 5: Use X*
= (x1
*
, x2
*
,..., x64
*
) to calculate
the MCQP scores for all Ai
in the test set
and conduct the performance analysis. If it
produces a satisfying classification result,
go to the next step. Otherwise, go back to
Step 1 to reformulate the Training Set and
Testing Set.
Step 6: Repeat the whole process until a
preset number of different X*
are gener-
ated.
End.
A part (10 out of the total 38 results) of the
results is summarized in Table 3.
The average predictive accuracies for bad and
good groups in the training dataset are 86.61%
and73.29%,andtheaveragepredictiveaccuracies
for bad and good groups in the testing dataset
are 81.22% and 68.25%. Compared with MCLP,
MCQP has lower predictive accuracies for good
records.Nevertheless,badgroupclassificationac-
curacies of the testing set using MCQP increased
from 68% to 81.22%, which is a remarkable
improvement.
Improvement of MCQP with Ensemble
Method
Similar to the MCLP experiment, the majority-
vote ensemble discussed previously was applied
Cross Validation Training Set (700 Bad data+700 Good data) Testing Set (115 Bad data+3485 Good data)
Bad Accuracy Good Accuracy Bad Accuracy Good Accuracy
DataSet 1 602 86.00% 541 77.29% 96 83.48% 2383 68.38%
DataSet 2 614 87.71% 496 70.86% 93 80.87% 2473 70.96%
DataSet 3 604 86.29% 530 75.71% 95 82.61% 2388 68.52%
DataSet 4 616 88.00% 528 75.43% 95 82.61% 2408 69.10%
DataSet 5 604 86.29% 547 78.14% 90 78.26% 2427 69.64%
DataSet 6 614 87.71% 502 71.71% 94 81.74% 2328 66.80%
DataSet 7 610 87.14% 514 73.43% 95 82.61% 2380 68.29%
DataSet 8 582 83.14% 482 68.86% 93 80.87% 2354 67.55%
DataSet 9 614 87.71% 479 68.43% 90 78.26% 2295 65.85%
DataSet 10 603 86.14% 511 73.00% 93 80.87% 2348 67.37%
Table 3. MCQP credit card accounts classification
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
to MCQP to examine whether it can make an
improvement.TheresultsarerepresentedinTable
4. The average predictive accuracies for bad and
good groups in the training dataset are 89.18%
and74.68%,andtheaveragepredictiveaccuracies
for bad and good groups in the testing dataset are
85.61% and 68.67%. Compared with previous
MCQP results, majority-vote ensemble improves
the total classification accuracies. Especially for
bad records in testing set, the average accuracy
increased 4.39%.
Experimental Results of Fuzzy Linear
Programming
Applying the fuzzy linear programming model
discussedearlierinthischaptertothesamecredit
card dataset, we obtained some FLP classifica-
tion results. These results are compared with the
decision tree, MCLP, and neural networks (see
Tables 5 and 6). The software of decision tree is
the commercial version called C5.0 (C5.0 2004),
while software for both neural network and
MCLP were developed at the Data Mining Lab,
University of Nebraska at Omaha, USA (Kou 
Shi, 2002).
Note that in both Table 5 and Table 6, the
columns Tg
and Tb
respectively represent the
number of good and bad accounts identified by a
method,whiletherowsofgoodandbadrepresent
the actual numbers of the accounts.
Classifications on HIV-1 Mediated
Neural Dendritic and Synaptic
Damage Using MCLP
The ability to identify neuronal damage in the
dendriticarborduringHIV-1-associateddementia
(HAD) is crucial for designing specific therapies
for the treatment of HAD. A two-class model of
multiplecriterialinearprogramming(MCLP)was
proposed to classify such HIV-1 mediated neuro-
naldendriticandsynapticdamages.Givencertain
classes, including treatments with brain-derived
neurotrophic factor (BDNF), glutamate, gp120,
or non-treatment controls from our in vitro ex-
perimentalsystems,weusedthetwo-classMCLP
model to determine the data patterns between
classes in order to gain insight about neuronal
dendritic and synaptic damages under different
treatments (Zheng et al., 2004). This knowledge
can be applied to the design and study of specific
therapiesforthepreventionorreversalofneuronal
damage associated with HAD.
Ensemble Results Training Set (700 Bad data+700 Good data) Testing Set (115 Bad data+3485 Good data)
No. of Voters Bad Accuracy Good Accuracy Bad Accuracy Good Accuracy
3 612 87.43% 533 76.14% 98 85.22% 2406 69.04%
5 619 88.43% 525 75.00% 95 82.61% 2422 69.50%
7 620 88.57% 525 75.00% 97 84.35% 2412 69.21%
9 624 89.14% 524 74.86% 100 86.96% 2398 68.81%
11 625 89.29% 525 75.00% 99 86.09% 2389 68.55%
13 629 89.86% 517 73.86% 100 86.96% 2374 68.12%
15 629 89.86% 516 73.71% 98 85.22% 2372 68.06%
17 632 90.29% 520 74.29% 99 86.09% 2379 68.26%
19 628 89.71% 520 74.29% 100 86.96% 2387 68.49%
Table 4. MCQP credit card accounts classification with ensemble
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
Database
Thedataproducedbylaboratoryexperimentation
andimageanalysiswasorganizedintoadatabase
composed of four classes (G1-G4), each of which
has nine attributes. The four classes are defined
as the following:
• G1:TreatmentwiththeneurotrophinBDNF
(brain-derived neurotrophic factor, 0.5
ng/ml, 5 ng/ml, 10 ng/mL, and 50 ng/ml),
this factor promotes neuronal cell survival
and has been shown to enrich neuronal cell
cultures (Lopez et al., 2001; Shibata et al.,
2003).
• G2: Non-treatment, neuronal cells are kept
in their normal media used for culturing
(NeurobasalmediawithB27,whichisaneu-
ronal cell culture maintenance supplement
from Gibco, with glutamine and penicillin-
streptomycin).
• G3: Treatment with glutamate (10, 100, and
1,000 M). At low concentrations, gluta-
mate acts as a neurotransmitter in the brain.
However,athighconcentrations,ithasbeen
shown to be a neurotoxin by over-stimulat-
ing NMDA receptors. This factor has been
shown to be upregulated in HIV-1-infected
macrophages(Jiangetal.,2001)andthereby
linkedtoneuronaldamagebyHIV-1infected
macrophages.
• G4: Treatment with gp120 (1 nanoM), an
HIV-1 envelope protein. This protein could
interactwithreceptorsonneuronsandinter-
fere with cell signaling leading to neuronal
damage, or it could also indirectly induce
neuronal injury through the production of
otherneurotoxins(Hesselgesseretal.,1998;
Kaul, Garden,  Lipton, 2001; Zheng et al.,
1999).
The nine attributes are defined as:
• x1 = The number of neurites
Decision Tree Tg
Tb
Total
Good 138 2 140
Bad 13 127 140
Total 151 129 280
Neural Network Tg
Tb
Total
Good 116 24 140
Bad 14 126 140
Total 130 150 280
MCLP Tg
Tb
Total
Good 134 6 140
Bad 7 133 140
Total 141 139 280
FLP Tg
Tb
Total
Good 127 13 140
Bad 13 127 140
Total 140 140 280
Decision Tree Tg
Tb
Total
Good 2180 2005 4185
Bad 141 674 815
Total 2321 2679 5000
Neural Network Tg
Tb
Total
Good 2814 1371 4185
Bad 176 639 815
Total 2990 2010 5000
MCLP Tg
Tb
Total
Good 3160 1025 4185
Bad 484 331 815
Total 3644 1356 5000
FLP Tg
Tb
Total
Good 2498 1687 4185
Bad 113 702 815
Total 2611 2389 5000
Table 5. Learning comparisons on balanced 280
records
Table 6. Comparisons on prediction of 5,000
records
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
• x2 = The number of arbors
• x3 = The number of branch nodes
• x4 = The average length of arbors
• x5 = The ratio of neurite to arbor
• x6 = The area of cell bodies
• x7 = The maximum length of the arbors
• x8 = The culture time (during this time,
the neuron grows normally and BDNF,
glutamate, or gp120 have not been added
to affect growth)
• x9 = The treatment time (during this time,
the neuron was growing under the effects
of BDNF, glutamate, or gp120)
The database used in this chapter contained
2,112 observations. Among them, 101 are on G1,
1,001 are on G2, 229 are on G3, and 781 are on
G4.
Comparing with the traditional mathematical
tools in classification, such as neural networks,
decision tree, and statistics, the two-class MCLP
approach is simple and direct, free of the statisti-
cal assumptions, and flexible by allowing deci-
sion makers to play an active part in the analysis
(Shi, 2001).
Results of Empirical Study Using
MClp
Byusingthetwo-classmodelfortheclassifications
on {G1, G2, G3, and G4}, there are six possible
pairings: G1 vs. G2; G1 vs. G3; G1 vs. G4; G2
vs. G3; G2 vs. G4; and G3 vs. G4. In the cases of
G1 vs. G3 and G1 vs. G4, we see these combina-
tions would be treated as redundancies, therefore
they are not considered in the pairing groups. G1
through G3 or G4 is a continuum. G1 represents
anenrichmentofneuronalcultures,G2isbasalor
maintenance of neuronal culture, and G3/G4 are
both damage of neuronal cultures. There would
never be a jump between G1 to G3/G4 without
traveling through G2. So, we used the following
four two-class pairs: G1 vs. G2; G2 vs. G3; G2
vs. G4; and G3 vs. G4. The meanings of these
two-class pairs are:
• G1vs.G2showsthatBDNFshouldenrichthe
neuronalcellculturesandincreaseneuronal
networkcomplexity—thatis,moredendrites
and arbors, more length to dendrites, and so
forth.
• G2 vs. G3 indicates that glutamate should
damage neurons and lead to a decrease in
dendrite and arbor number including den-
drite length.
• G2 vs. G4 should show that gp120 causes
neuronal damage leading to a decrease in
dendrite and arbor number and dendrite
length.
• G3 vs. G4 provides information on the pos-
sible difference between glutamate toxicity
and gp120-induced neurotoxicity.
Given a threshold of training process that can
beanyperformancemeasure,wehavecarriedout
the following steps:
Algorithm 4
Step1:Foreachclasspair,weusedtheLinux
code of the two-class model to compute the
compromise solution X*
= (x1
*
,..., x9
*
) as the
best weights of all nine neuronal variables
with given values of control parameters (b,
a*
, β*
).
Step 2: The classification score MCLPi
=
Ai
X*
against of each observation has been
calculated against the boundary b to check
the performance measures of the classifica-
tion.
Step 3: If the classification result of Step 2
is acceptable (i.e., the given performance
measure is larger or equal to the given
threshold), go to Step 4. Otherwise, choose
different values of control parameters (b,
a*
, β*
) and go to Step 1.
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
Step 4: For each class pair, use X*
= (x1
*
,...,
x9
*
) to calculate the MCLP scores for all Ai
in the test set and conduct the performance
analysis.
According to the nature of this research, we
define the following terms, which have been
widely used in the performance analysis as:
TP (True Positive) = the number of records
in the first class that has been classified cor-
rectly
FP(FalsePositive)=thenumberofrecords
in the second class that has been classified
into the first class
TN(TrueNegative)=thenumberofrecords
in the second class that has been classified
correctly
FN(FalseNegative)=thenumberofrecords
in the first class that has been classified into
the second class
Then we have four different performance
measures:
Sensitivity =
FN
TP
TP
+
Positive Predictivity =
FP
TP
TP
+
False-Positive Rate =
FP
TN
FP
+
Negative Predictivity =
TN
FN
TN
+
The “positive” represents the first-class label
while the “negative” represents the second-class
label in the same class pair. For example, in the
class pair {G1 vs. G2}, the record of G1 is “posi-
tive” while that of G2 is “negative.” Among the
above four measures, more attention is paid to
sensitivity or false-positive rates because both
measurethecorrectnessofclassificationonclass-
pair data analyses. Note that in a given a class
pair, the sensitivity represents the corrected rate
of the first class, and one minus the false positive
rate is the corrected rate of the second class by
the above measure definitions.
Consideringthelimiteddataavailabilityinthis
pilot study, we set the across-the-board threshold
of 55% for sensitivity [or 55% of (1- false posi-
tive rate)] to select the experimental results from
training and test processes. All 20 of the training
and test sets, over the four class pairs, have been
computed using the above procedure. The results
against the threshold are summarized in Tables
7 to 10. As seen in these tables, the sensitivities
for the comparison of all four pairs are higher
than 55%, indicating that good separation among
individual pairs is observed with this method.
The results are then analyzed in terms of both
positive predictivity and negative predictivity
for the prediction power of the MCLP method
on neuron injuries. In Table 7, G1 is the number
of observations predefined as BDNF treatment,
G2 is the number of observations predefined as
non-treatment, N1 means the number of obser-
Training N1 N2 Sensitivity
Positive
Predictivity
False Positive Rate
Negative
Predictivity
G1 55 (TP) 34 (FN)
61.80% 61.80% 38.20% 61.80%
G2 34 (FP) 55 (TN)
Test N1 N2 Sensitivity
Positive
Predictivity
False Positive Rate
Negative
Predictivity
G1 11 (TP) 9 (FN)
55.00% 3.78% 30.70% 98.60%
G2 280 (FP) 632 (TN)
Table 7. Classification results with G1 vs. G2
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
Training N2 N3 Sensitivity
Positive
Predictivity
False Positive
Rate
Negative
Predictivity
G2 126 (TP) 57 (FN)
68.85% 68.48% 31.69% 68.68%
G3 58 (FP) 125 (TN)
Test N2 N3 Sensitivity
Positive
Predictivity
False Positive
Rate
Negative
Predictivity
G2 594 (TP) 224 (FN)
72.62% 99.32% 8.70% 15.79%
G3 4 (FP) 42 (TN)
Training N2 N4 Sensitivity
Positive
Predictivity
False Positive Rate
Negative
Predictivity
G2 419(TP) 206 (FN)
67.04% 65.88% 34.72% 66.45%
G4 217 (FP) 408 (TN)
Test N2 N4 Sensitivity
Positive
Predictivity
False Positive Rate
Negative
Predictivity
G2 216 (TP) 160 (FN)
57.45% 80.90% 32.90% 39.39%
G4 51 (FP) 104 (TN)
Training N3 N4 Sensitivity
Positive
Predictivity
False Positive
Rate
Negative
Predictivity
G3 120(TP) 40 (FN)
57.45% 80.90% 24.38% 75.16%
G4 39 (FP) 121 (TN)
Test N3 N4 Sensitivity
Positive
Predictivity
False Positive
Rate
Negative
Predictivity
G3 50 (TP) 19 (FN)
72.46% 16.78% 40.00% 95.14%
G4 248 (FP) 372 (TN)
Table 8. Classification results with G2 vs. G3
Table 9. Classification results with G2 vs. G4
Table 10. Classification results with G3 vs. G4
vations classified as BDNF treatment, and N2
is the number of observations classified as non-
treatment. The meanings of other pairs in Tables
8 to 10 can be similarly explained. In Table 7
for {G1 vs. G2}, both positive predictivity and
negative predictivity are the same (61.80%) in the
training set. However, the negative predictivity
of the test set (98.60%) is much higher than that
of the positive predictivity (3.78%). The predic-
tion of G1 in the training set is better than that
of the test set, while the prediction of G2 in test
outperforms that of training. This is due to the
small size of G1. In Table 3 for {G2 vs. G3}, the
positive predictivity (68.48%) is almost equal to
the negative predictivity (68.68%) of the training
set. The positive predictivity (99.32%) is much
higher than the negative predictivity (15.79%) of
the test set. As a result, the prediction of G2 in
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
the test set is better than in the training set, but
the prediction of G3 in the training set is better
than in the test set.
The case of Table 9 for {G2 vs. G4} is similar
to that of Table 8 for {G2 vs. G3}. We see that the
separation of G2 in test (80.90%) is better than in
training (65.88%), while the separation of G4 in
training(66.45%)isbetterthanintest(39.39%).In
the case of Table 10 for {G3 vs. G4}, the positive
predictivity (80.90%) is higher than the negative
predictivity (75.16%) of the training set. Then,
the positive predictivity (16.78%) is much lower
than the negative predictivity (95.14%) of the test
set. The prediction of G3 in training (80.90%) is
better than that of test (16.78%), and the predic-
tion of G4 in test (95.14%) is better than that of
training (75.16%).
In summary, we observed that the predictions
of G2 in test for {G1 vs. G2}, {G2 vs. G3}, and
{G2 vs. G4} is always better than those in train-
ing. The prediction of G3 in training for {G2 vs.
G3} and {G3 vs. G4} is better than those of test.
Finally, the prediction of G4 for {G2 vs. G4} in
training reverses that of {G3 vs. G4} in test. If
we emphasize the test results, these results are
favorable to G2. This may be due to the size of
G2 (non-treatment), which is larger than all other
classes.Theclassificationresultscanchangeifthe
sizes of G1, G3, and G4 increase significantly.
Network Intrusion Detection
Network intrusions are malicious activities that
aim to misuse network resources. Although
various approaches have been applied to network
intrusion detection, such as statistical analysis,
sequence analysis, neural networks, machine
learning,andartificialimmunesystems,thisfield
isfarfrommaturity,andnewsolutionsareworthy
of investigation. Since intrusion detection can be
treated as a classification problem, it is feasible to
apply a multiple-criterion classification model to
this type of application. The objective of this ex-
perimentistoexaminetheapplicabilityofMCLP
and MCQP models in intrusion detection.
KDD Dataset
The KDD-99 dataset provided by DARPA was
used in our intrusion detection test. The KDD-99
datasetincludesawidevarietyofintrusionssimu-
lated in a military network environment. It was
used in the 1999 KDD-CUP intrusion detection
contest. After the contest, KDD-99 has become
a de facto standard dataset for intrusion detection
experiments. Within the KDD-99 dataset, each
connection has 38 numerical variables and is
labeled as normal or attack. There are four main
categories of attacks: denial-of-service (DOS),
unauthorized access from a remote machine
(R2L),unauthorizedaccesstolocalrootprivileges
(U2R), surveillance and other probing. The train-
ing dataset contains a total of 24 attack types,
while the testing dataset contains an additional
14 types (Stolfo, Fan, Lee, Prodromidis,  Chan,
2000). Because the number of attacks for R2L,
U2R, and probing is relatively small, this experi-
ment focused on DOS.
Experimental Results of MCLP
Following the heuristic process described in
this chapter, training and testing datasets were
selected: first, the ‘normal’ dataset (812,813
records) was divided into 100 intervals (each
interval has 8,128 records). Within each interval,
20 records were randomly selected. Second, the
‘DOS’ dataset (247,267 records) was divided into
100 intervals (each interval has 2,472 records).
Within each interval, 20 records were randomly
selected. Third, the 2,000 normal and 2,000 DOS
recordswerecombinedtoformatrainingdataset.
Because KDD-99 has over 1 million records, and
4,000 training records represent less than 0.4%
of it, the whole KDD-99 dataset is used for test-
ing. Various training and testing datasets can be
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
obtained by repeating this process. Considering
the previous high detection rates of KDD-99 by
other methods, the across-the-board threshold
of 95% was set for both normal and DOS. Since
training dataset classification accuracies are all
100%, only testing dataset (10 out of the total 300
results) results are summarized in Table 11 (Kou
et al., 2004a). The average predictive accuracies
for normal and DOS groups in the testing dataset
are 98.94% and 99.56%.
Improvement of MCLP with Ensemble
Method
Themajority-voteensemblemethoddemonstrated
its superior performance in credit card accounts
classification.Canitimprovetheclassificationac-
curacyofnetworkintrusiondetection?Toanswer
this question, the majority-vote ensemble was
applied to the KDD-99 dataset. Ensemble results
are summarized in Table 12 (Kou et al., 2004a).
Theaveragepredictiveaccuraciesfornormaland
DOS groups in the testing dataset are 99.61% and
99.78%.BothnormalandDOSpredictiveaccura-
cies have been slightly improved.
Cross Validation Testing Set (812813 Normal + 247267 Dos)
Normal Accuracy DOS Accuracy
DataSet 1 804513 98.98% 246254 99.59%
DataSet 2 808016 99.41% 246339 99.62%
DataSet 3 802140 98.69% 245511 99.29%
DataSet 4 805151 99.06% 246058 99.51%
DataSet 5 805308 99.08% 246174 99.56%
DataSet 6 799135 98.32% 246769 99.80%
DataSet 7 805639 99.12% 246070 99.52%
DataSet 8 802938 98.79% 246566 99.72%
DataSet 9 805983 99.16% 245498 99.28%
DataSet 10 802765 98.76% 246641 99.75%
Number of Voters Normal Accuracy DOS Accuracy
3 809567 99.60% 246433 99.66%
5 809197 99.56% 246640 99.75%
7 809284 99.57% 246690 99.77%
9 809287 99.57% 246737 99.79%
11 809412 99.58% 246744 99.79%
13 809863 99.64% 246794 99.81%
15 809994 99.65% 246760 99.79%
17 810089 99.66% 246821 99.82%
19 810263 99.69% 246846 99.83%
Table 11. MCLP KDD-99 classification results
Table 12. MCLP KDD-99 classification results with ensemble
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
Experimental Results of MCQP
A similar MCQP procedure used in credit card
accounts classification was used to classify the
KDD-99 dataset. A part of the results is sum-
marized in Table 13 (Kou et al., 2004b). These
results are slightly better than MCLP.
Improvement of MCQP with Ensemble
Method
The majority-vote ensemble was used on MCQP
results,andapartoftheoutputsissummarizedin
Table 14 (Kou et al., 2004b). The average predic-
tive accuracies for normal and DOS groups in the
testing dataset are 99.86% and 99.82%. Although
the increase in classification accuracy is small,
Cross Validation Testing Set(812813 Normal + 247267 Dos)
Normal Accuracy DOS Accuracy
DataSet 1 808142 99.43% 245998 99.49%
DataSet 2 810689 99.74% 246902 99.85%
DataSet 3 807597 99.36% 246491 99.69%
DataSet 4 808410 99.46% 246256 99.59%
DataSet 5 810283 99.69% 246090 99.52%
DataSet 6 809272 99.56% 246580 99.72%
DataSet 7 806116 99.18% 246229 99.58%
DataSet 8 808143 99.43% 245998 99.49%
DataSet 9 811806 99.88% 246433 99.66%
DataSet 10 810307 99.69% 246702 99.77%
NO of Voters Normal Accuracy DOS Accuracy
3 810126 99.67% 246792 99.81%
5 811419 99.83% 246930 99.86%
7 811395 99.83% 246830 99.82%
9 811486 99.84% 246795 99.81%
11 812030 99.90% 246845 99.83%
13 812006 99.90% 246788 99.81%
15 812089 99.91% 246812 99.82%
17 812045 99.91% 246821 99.82%
19 812069 99.91% 246817 99.82%
21 812010 99.90% 246831 99.82%
23 812149 99.92% 246821 99.82%
25 812018 99.90% 246822 99.82%
Table 13. MCQP KDD-99 classification results
Table 14. MCQP KDD-99 classification results with ensemble
0
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
both normal and DOS predictive accuracies have
been improved compared with previous 99.54%
and 99.64%.
reSearCh ChallengeS and
opportunitieS
Althoughtheabovemultiplecriteriaoptimization
dataminingmethodshavebeenappliedinthereal-
life applications, there are number of challenging
problemsinmathematicalmodeling.Whilesome
oftheproblemsarecurrentlyunderinvestigation,
some others remain to be explored.
Variations and Algorithms of
Generalized Models
GivenModel1,ifp=2,q=1,itwillbecomeaconvex
quadratic program which can be solved by using
some known convex quadratic programming al-
gorithm. However, when p=1, q=2, Model 1 is a
concave quadratic program; and when p=2, q=2,
we have Model 3 (MCQP), which is an indefinite
quadratic problem. Since both concave quadratic
programming and MCQP are NP-hard problems,
itisverydifficulttofindaglobaloptimalsolution.
Weareworkingonbothcasesfordevelopingdirect
algorithms that can converge to local optima in
classification (Zhang, Shi,  Zhang, 2005).
Kernel Functions for Data
Observations
Thegeneralizedmodelinthechapterhasanatural
connection with known support vector machines
(SVM)(Mangasarian,2000;Vapnik,2000)since
they both belong to the category of optimiza-
tion-based data mining methods. However, they
differ from ways to identify the classifiers. As
we mentioned before, while the multiple criteria
optimization approaches in this chapter use the
overlappingandinteriordistanceastwostandards
to measure the separation of each observation in
thedataset,SVMselectstheminorityofobserva-
tions (support vectors) to represent the majority
of the rest of the observations. Therefore, in the
experimental studies and real applications, SVM
mayhaveahighaccuracyinthetrainingset,buta
loweraccuracyinthetestingresult.Nevertheless,
the use of kernel functions in SVM has shown its
efficiency in handling nonlinear datasets. How to
adopt kernel functions into the multiple criteria
optimization approaches can be an interesting
research problem. Kou, Peng, Shi, and Chen
(2006) explored some possibility of this research
direction. The basic idea is outlined.
First, we can rewrite the generalized model
(Model 1) similar to the approach of SVM.
Suppose the two-classes G1
and G2
are under
consideration. Then, a n×n diagonal matrix Y,
which only contains +1 or -1, indicates the class
membership. A -1 in row i of matrix Y indicates
the corresponding record Ai
∈ G1
, and a +1 in row
i of matrix Y indicates the corresponding record
Ai
∈ G2
. The constraints in Model 1, Ai
X = b + ai
- βi
, ∀ Ai
∈ G1
and Ai
X = b - ai
+ βi
, ∀Ai
∈ G2
, are
converted as: Y (A⋅X - eb) = a - β, where e =
(1,1,…,1)T
, a = (a1
,...,an
) , and β = (β1
,..., βn
)T
. In
order to maximize the distance
2
2
X
between the
twoadjustedboundinghyperplanes,thefunction
2
2
1
X should also be minimized. Let s = 2, q =1,
and p =1, then a simple quadratic programming
(SQP) variation of Model 1 can be built as:
Model 5: SQP
Minimize ∑
∑
=
=
−
+
−
n
i
i
n
i
i w
w
X
1
1
2
2
1
Subject to Y ( A⋅X  - eb ) = a - β, where e =
(1,1,…,1)T
, a= (a1
,...,an
)T
and β= (β1
,...,βn
)T
≥0.
Using Lagrange function to represent Model
5, one can get an equivalent of the Wolfe dual
problem of Model 5 expressed as:
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
Model 6: Dual of SQP
Maximize ∑
∑
∑
=
=
=
+
⋅
−
n
i
i
j
j
i y
y
1
n
1
j
i
n
1
i
)
(
2
1
j
i A
A
Subject to 0
1
=
∑
=
i
n
i
i y , i
w w
≤ ≤ ,
where wβ
wa
are given, 1≤ i ≤ n.
The global optimal solution of the primal
problem if Model 5 can be obtained from the
solution of the Wolfe dual problem:
yi
1
*
*
i
A
X ∑
=
=
n
i
i , )
(
-
1
*
A
A j
i ⋅
= ∑
=
i
n
i
*
j y
y
b .
As a result, the classification decision func-
tion becomes:
,
0
0
{
)
-
)
(( 1
2
G
B
,
G
, B
*
b
B
sgn
∈

∈
≤
⋅
*
X
We observe that because the form (Ai
⋅Aj
) of
Model 6 is inner product in the vector space, it
can be substituted by a positive semi-definite ker-
nel K(Ai
, Aj
) without affecting the mathematical
modeling process. In general, a kernel function
refers to a real-valued function on χ×χ and for all
Ai
, Aj
∈χ. Thus, Model 6 can be easily transformed
toanonlinearmodelbyreplacing(Ai
⋅Aj
)withsome
positive semi-definite kernel function K(Ai
, Aj
).
Use of kernel functions in multiple criteria opti-
mization approaches can extend its applicability
to linear inseparable datasets. However, there are
some theoretical difficulties to directly introduce
kernelfunctiontoModel5.Howtoovercomethem
deserves a careful study. Future studies may be
done on establishing a theoretical guideline for
selection of a kernel that is optimal in achieving
asatisfactorycreditanalysisresult.Anotheropen
problemistostudythesubjectofreducingcompu-
tational cost and improving algorithm efficiency
for high dimensional or massive datasets.
Choquet Integrals and Non-Additive
Set Function
Considering the r-dimensional attribute vector
a = (a1
,...,ar
) in the classification problem, let P(a)
denote the power set of a. We use f (a1
),..., f (ar
) to
denote the values of each attribute in an obser-
vation. The procedure of calculating a Choquet
integral can be given as (Wang  Wang, 1997):
})
,...,
,
({
)]
(
)
(
[ '
'
2
'
1
'
1
1
'
r
j
r
j
j a
a
a
a
f
a
f
d
f ×
−
= −
=
∫ ∑ ,
where }
,...,
,
{ '
'
2
'
1 r
a
a
a is a permutation of a =
(a1
,...,ar
). Such that 0
)
( '
0 =
a
f and )
(
),...,
( '
'
1 r
a
f
a
f
is non-decreasingly ordered such that: f (a1
) ≤...≤
f (ar
). The non-additive set function is defined as:
µ:P(a)→(-∞,+∞), where µ(∅) = 0. We use µi
to
denote set function µ, where i = 1,...,2r
.
Introducing the Choquet measure into the
generalized model of an section refers to the uti-
lization of Choquet integral as a representative
of the left-hand side of the constraints in Model
1. This variation for non-additive data mining
problem is (Yan, Wang, Shi,  Chen, 2005):
Model 7: Choquet Form
Minimize f (a) and Maximize g (β)
Subject to:
d
f
∫ - i + i - b = 0, ∀ A i ∈ G1 ,
d
f
∫ + i - i - b = 0, ∀ Ai ∈ G2 ,
where d
f
∫ denotes the Choquet integral with
respect to a signed fuzzy measure to aggregate
the attributes of a observation f, b is unrestricted,
and a = (a1
,...,an
)T
, β = (β1
,...,βn
)T
; ai
, βi
≥ 0, i =
1,…, n.
Model 7 results in the replacement of a linear
combination of all the attributes Ai
X in the left-
handsideofconstraintswiththeChoquetintegral
representation d
f
∫ .Thenumberofparameters,
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
denoted by µi
, increases from r to 2r
(r is the num-
ber attributes). How to determine the parameters
through linear programming framework is not
easy. We are still working on this problem and
shall report the significant results.
ConCluSion
As Usama Fayyad pointed out at the KDD-03
Panel, data mining must attract the participation
of the relevant communities to avoid re-inventing
wheels and bring the field an auspicious future
(Fayyad, Piatetsky-Shapiro,  Uthurusamy,
2003). One relevant field to which data mining
hasnotattractedenoughparticipationisoptimiza-
tion.Thischaptersummarizesaseriesofresearch
activities that utilize multiple criteria decision-
making methods to classification problems in
data mining. Specifically, this chapter describes
avariationofmultiplecriteriaoptimization-based
models and applies these models to credit card
scoringmanagement,HIV-1associateddementia
(HAD) neuronal damage and dropout, and net-
work intrusion detection as well as the potential
in various real-life problems.
aCknoWledgMent
Since 1998, this research has been partially sup-
ported by a number of grants, including First
Data Corporation, USA; DUE-9796243, the
National Science Foundation of USA; U.S. Air
Force Research Laboratory (PR No. E-3-1162);
National Excellent Youth Fund #70028101,
Key Project #70531040, #70472074, National
NaturalScienceFoundationofChina;973Project
#2004CB720103, Ministry of Science and Tech-
nology,China;K.C.WongEducationFoundation
(2001, 2003), Chinese Academy of Sciences; and
BHP Billiton Co., Australia.
referenCeS
Bradley, P.S., Fayyad, U.M.,  Mangasarian,
O.L. (1999). Mathematical programming for data
mining:Formulationsandchallenges.INFORMS
Journal on Computing, 11, 217-238.
Bauer, E.,  Kohavi, R. (1999). an empirical
comparison of voting classification algorithms:
Bagging,boosting,andvariants.MachineLearn-
ing, 36, 105-139.
C 5.0. (2004). Retrieved from http://www.rule-
quest.com/see5-info.html
Charnes, A.,  Cooper, W.W. (1961). Manage-
ment models and industrial applications of lin-
ear programming (vols. 1  2). New York: John
Wiley  Sons.
Dietterich, T. (2000). Ensemble methods in ma-
chine learning. In Kittler  Roli (Eds.), Multiple
classifier systems (pp. 1-15). Berlin: Springer-
Verlag (Lecture Notes in Pattern Recognition
1857).
Fayyad,U.M.,Piatetsky-Shapiro,G.,Uthurusa-
my,R.(2003).SummaryfromtheKDD-03Panel:
Data mining: The next 10 years. ACM SIGKDD
Explorations Newsletter, 5(2), 191-196.
Fisher, R.A. (1936). The use of multiple measure-
mentsintaxonomicproblems.AnnalsofEugenics,
7, 179-188.
Freed, N.,  Glover, F. (1981). Simple but power-
ful goal programming models for discriminant
problems. European Journal of Operational
Research, 7, 44-60.
Freed, N.,  Glover, F. (1986). Evaluating alter-
native linear programming models to solve the
two-group discriminant problem. Decision Sci-
ence, 17, 151-162.
Han, J.W.,  Kamber, M. (2000). Data mining:
Concepts and techniques. San Diego: Academic
Press.
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
He, J., Liu, X., Shi, Y., Xu, W.,  Yan, N. (2004).
Classifications of credit cardholder behavior by
using fuzzy linear programming. International
JournalofInformationTechnologyandDecision
Making, 3, 633-650.
Hesselgesser, J., Taub, D., Baskar, P., Greenberg,
M., Hoxie, J., Kolson, D.L.,  Horuk, R. (1998).
Neuronal apoptosis induced by HIV-1 gp120
and the Chemokine SDF-1alpha mediated by
the Chemokine receptor CXCR4. Curr Biol, 8,
595-598.
Kaul, M., Garden, G.A.,  Lipton, S.A. (2001).
PathwaystoneuronalinjuryandapoptosisinHIV-
associated dementia. Nature, 410, 988-994.
Kou, G.,  Shi, Y. (2002). Linux-based Multiple
Linear Programming Classification Program:
(Version 1.0.) College of Information Science
andTechnology,UniversityofNebraska-Omaha,
USA.
Kou, G., Liu, X., Peng, Y., Shi, Y., Wise, M., 
Xu, W. (2003). Multiple criteria linear program-
mingapproachtodatamining:Models,algorithm
designs and software development. Optimization
Methods and Software, 18, 453-473.
Kou, G., Peng, Y., Yan, N., Shi, Y., Chen, Z., Zhu,
Q., Huff, J.,  McCartney, S. (2004a, July 19-21).
Network intrusion detection by using multiple-
criteria linear programming. In Proceedings of
the International Conference on Service Systems
and Service Management, Beijing, China.
Kou, G., Peng, Y., Chen, Z., Shi, Y.,  Chen, X.
(2004b,July12-14).Amultiple-criteriaquadratic
programming approach to network intrusion de-
tection. In Proceedings of the Chinese Academy
of Sciences Symposium on Data Mining and
Knowledge Management, Beijing, China.
Kou, G., Peng, Y., Shi, Y.,  Chen, Z. (2006). A
new multi-criteria convex quadratic program-
ming model for credit data analysis. Working
Paper, University of Nebraska at Omaha, USA.
Kuncheva, L.I. (2000). Clustering-and-selection
model for classifier combination. In Proceedings
ofthe4th
InternationalConferenceonKnowledge-
BasedIntelligentEngineeringSystemsandAllied
Technologies (KES’2000).
Kwak,W.,Shi,Y.,Eldridge,S.,Kou,G.(2006).
Bankruptcy prediction for Japanese firms: Us-
ing multiple criteria linear programming data
mining approach. In Proceedings of the Inter-
national Journal of Data Mining and Business
Intelligence.
Jiang, Z., Piggee, C., Heyes, M.P., Murphy, C.,
Quearry, B., Bauer, M., Zheng, J., Gendelman,
H.E.,  Markey, S.P. (2001). Glutamate is a me-
diator of neurotoxicity in secretions of activated
HIV-1-infected macrophages. Journal of Neuro-
immunology, 117, 97-107.
Lam, L. (2000). Classifier combinations: Imple-
mentations and theoretical issues. In Kittler 
Roli(Eds.),Multipleclassifiersystems(pp.78-86).
Berlin:Springer-Verlag(LectureNotesinPattern
Recognition 1857).
Lee,S.M.(1972).Goalprogrammingfordecision
analysis. Auerbach.
Lindsay, P.H.,  Norman, D.A. (1972). Human
information processing: An introduction to psy-
chology. New York: Academic Press.
LINDO Systems Inc. (2003). An overview of
LINGO 8.0. Retrieved from http://www.lindo.
com/cgi/frameset.cgi?leftlingo.html;lingof.html
Lopez,A.,Bauer,M.A.,Erichsen,D.A.,Peng,H.,
Gendelman, L., Shibata, A., Gendelman, H.E., 
Zheng, J. (2001). The regulation of neurotrophic
factor activities following HIV-1 infection and
immune activation of mononuclear phagocytes.
In Proceedings of Soc. Neurosci. Abs., San Di-
ego, CA.
Mangasarian, O.L. (2000). Generalized support
vector machines. In A. Smola, P. Bartlett, B.
Scholkopf,D.Schuurmans(Eds.),Advancesin
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
largemarginclassifiers(pp.135-146).Cambridge,
MA: MIT Press.
Olson, D.,  Shi, Y. (2005). Introduction to
business data mining. New York: McGraw-Hill/
Irwin.
Opitz, D.,  Maclin, R. (1999). Popular ensemble
methods:Anempiricalstudy.JournalofArtificial
Intelligence Research, 11, 169-198.
Parhami, B. (1994). Voting algorithms. IEEE
Transactions on Reliability, 43, 617-629.
Peng, Y., Kou, G., Chen, Z.,  Shi, Y. (2004).
Cross-validation and ensemble analyses on mul-
tiple-criteria linear programming classification
for credit cardholder behavior. In Proceedings of
ICCS2004(pp.931-939).Berlin:Springer-Verlage
(LNCS 2416).
Shi, Y.,  Yu, P.L. (1989). Goal setting and
compromise solutions. In B. Karpak  S. Zionts
(Eds.), Multiple criteria decision making and risk
analysis using microcomputers (pp. 165-204).
Berlin: Springer-Verlag.
Shi, Y. (2001). Multiple criteria and multiple
constraint levels linear programming: Con-
cepts, techniques and applications. NJ: World
Scientific.
Shi, Y., Wise, W., Luo, M.,  Lin, Y. (2001).
Multiple criteria decision making in credit card
portfolio management. In M. Koksalan  S.
Zionts (Eds.), Multiple criteria decision mak-
ing in new millennium (pp. 427-436). Berlin:
Springer-Verlag.
Shi, Y, Peng, Y., Xu, W.,  Tang, X. (2002). Data
mining via multiple criteria linear programming:
Applicationsincreditcardportfoliomanagement.
InternationalJournalofInformationTechnology
and Decision Making, 1, 131-151.
Shi, Y, Peng, Y., Kou, G.,  Chen, Z. (2005).
Classifyingcreditcardaccountsforbusinessintel-
ligence and decision making: A multiple-criteria
quadratic programming approach. International
JournalofInformationTechnologyandDecision
Making, 4, 581-600.
Shibata,A.,Zelivyanskaya,M.,Limoges,J.,Carl-
son, K.A., Gorantla, S., Branecki, C., Bishu, S.,
Xiong,H.,Gendelman,H.E.(2003).Peripheral
nerveinducesmacrophageneurotrophicactivities:
Regulation of neuronal process outgrowth, intra-
cellular signaling and synaptic function. Journal
of Neuroimmunology, 142, 112-129.
Stolfo, S.J., Fan, W., Lee, W., Prodromidis, A.,
 Chan, P.K. (2000). Cost-based modeling and
evaluation for data mining with application to
fraud and intrusion detection: Results from the
JAM project. In Proceedings of the DARPA In-
formation Survivability Conference.
Vapnik, V.N. (2000). The nature of statistical
learning theory (2nd
ed.). New York: Springer.
Wang, J.,  Wang, Z. (1997). Using neural net-
worktodetermineSugenomeasuresbystatistics.
Neural Networks, 10, 183-195.
Weingessel, A., Dimitriadou, E.,  Hornik, K.
(2003, March 20-22). An ensemble method for
clustering.InProceedingsofthe3rd
International
Workshop on Distributed Statistical Computing,
Vienna, Austria.
Yan, N., Wang, Z., Shi, Y.,  Chen, Z. (2005).
Classificationbylinearprogrammingwithsigned
fuzzy measures. Working Paper, University of
Nebraska at Omaha, USA.
Yu, P.L. (1985). Multiple criteria decision mak-
ing: Concepts, techniques and extensions. New
York: Plenum Press.
Zenobi, G.,  Cunningham, P. (2002). An ap-
proach to aggregating ensembles of lazy learn-
ers that supports explanation. Lecture Notes in
Computer Science, 2416, 436-447.
Zhang, J., Shi, Y.,  Zhang, P. (2005). Several
multi-criteria programming methods for clas-
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
sification. Working Paper, Chinese Academy of
Sciences Research Center on Data Technology 
KnowledgeEconomyandGraduateUniversityof
Chinese Academy of Sciences, China.
Zheng, J., Thylin, M., Ghorpade, A., Xiong, H.,
Persidsky, Y., Cotter, R., Niemann, D., Che, M.,
Zeng, Y., Gelbard, H. et al. (1999). Intracellular
CXCR4 signaling, neuronal apoptosis and neu-
ropathogenic mechanisms of HIV-1-associated
dementia. Journal of Neuroimmunology, 98,
185-200.
This work was previously published in Research and Trends in Data Mining Technologies and Applications, edited by D. Taniar,
pp. 242-275, copyright 2007 by IGI Publishing, formerly known as Idea Group Publishing (an imprint of IGI Global).
Zheng,J.,Zhuang,W.,Yan,N.,Kou,G.,Erichsen,
D., McNally, C., Peng, H., Cheloha, A., Shi, C., 
Shi, Y. (2004). Classification of HIV-1-mediated
neuronal dendritic and synaptic damage using
multiple criteria linear programming. Neuroin-
formatics, 2, 303-326.
Zimmermann, H.-J. (1978). Fuzzy programming
and linear programming with several objective
functions. Fuzzy Sets and Systems, 1, 45-55.
Random documents with unrelated
content Scribd suggests to you:
the credulous, affrighted Inuit how they can escape from the
dreaded ghosts.
The hardest task, that of driving away Sedna, is reserved for the
most powerful angakoq. A rope is coiled on the floor of a large hut in
such a manner as to leave a small opening at the top, which
represents the breathing hole of a seal. Two angakut stand by the
side of it, one of them holding the seal spear in his left hand, as if he
were watching at the seal hole in the winter, the other holding the
harpoon line. Another angakoq, whose office it is to lure Sedna up
with a magic song, sits at the back of the hut. At last she comes up
through the hard rocks and the wizard hears her heavy breathing;
now she emerges from the ground and meets the angakoq waiting
at the hole. She is harpooned and sinks away in angry haste,
drawing after her the harpoon, to which the two men hold with all
their strength. Only by a desperate effort does she tear herself away
from it and return to her dwelling in Adlivun. Nothing is left with the
two men but the blood sprinkled harpoon, which they proudly show
to the Inuit.
Sedna and the other evil spirits are at last driven away, and on the
following day a great festival for young and old is celebrated in
honor of the event. But they must still be careful, for the wounded
Sedna is greatly enraged and will seize any one whom she can find
out of his hut; so on this day they all wear protecting amulets
(koukparmiutang) on the tops of their hoods. Parts of the first
garment which they wore after birth are used for this purpose.
The men assemble early in the morning in the middle of the
settlement. As soon as they have all got together they run
screaming and jumping around the houses, following the course of
the sun (nunajisartung or kaivitijung). A few, dressed in women’s
jackets, run in the opposite direction. These are those who were
born in abnormal presentations. The circuit made, they visit every
hut, and the woman of the house must always be in waiting for
them. When she hears the noise of the band she comes out and
throws a dish containing little gifts of meat, ivory trinkets, and
articles of sealskin into the yelling crowd, of which each one helps
himself to what he can get. No hut is omitted in this round
(irqatatung).
The crowd next divides itself into two parties, the ptarmigans
(aχigirn), those who were born in the winter, and the ducks (aggirn),
or the children of summer. A large rope of sealskin is stretched out.
One party takes one end of it and tries with all its might to drag the
opposite party over to its side. The others hold fast to the rope and
try as hard to make ground for themselves. If the ptarmigans give
way the summer has won the game and fine weather may be
expected to prevail through the winter (nussueraqtung).
The contest of the seasons having been decided, the women bring
out of a hut a large kettle of water and each person takes his
drinking cup. They all stand as near the kettle as possible, while the
oldest man among them steps out first. He dips a cup of water from
the vessel, sprinkles a few drops on the ground, turns his face
toward the home of his youth, and tells his name and the place of
his birth (oχsoaχsavepunga——me, I was born in ——). He is
followed by an aged woman, who announces her name and home,
and then all the others do the same, down to the young children,
who are represented by their mothers. Only the parents of children
born during the last year are forbidden to partake in this ceremony.
As the words of the old are listened to respectfully, so those of the
distinguished hunters are received with demonstrative applause and
those of the others with varying degrees of attention, in some cases
even with joking and raillery (imitijung).
Now arises a cry of surprise and all eyes are turned toward a hut out
of which stalk two gigantic figures. They wear heavy boots; their
legs are swelled out to a wonderful thickness with several pairs of
breeches; the shoulders of each are covered by a woman’s over-
jacket and the faces by tattooed masks of sealskins. In the right
hand each carries the seal spear, on the back of each is an inflated
buoy of sealskin, and in the left hand the scraper. Silently, with long
strides, the qailertetang (Fig. 535) approach the assembly, who,
screaming, press back from them. The pair solemnly lead the men to
a suitable spot and set them in a row, and the women in another
opposite them. They match the men and women in pairs and these
pairs run, pursued by the qailertetang, to the hut of the woman,
where they are for the following day and night man and wife
(nulianititijung). Having performed this duty, the qailertetang stride
down to the shore and invoke the good north wind, which brings fair
weather, while they warn off the unfavorable south wind.
As soon as the incantation is over, all the men attack the
qailertetang with great noise. They act as if they had weapons in
their hands and would kill both spirits. One pretends to probe them
with a spear, another to stab them with a knife, one to cut off their
arms and legs, another to beat them unmercifully on the head. The
buoys which they carry on their backs are ripped open and collapse
and soon they both lie as if dead beside their broken weapons
(pilektung). The Eskimo leave them to get their drinking cups and
the qailertetang awake to new life. Each man fills his sealskin with
water, passes a cup to them, and inquires about the future, about
the fortunes of the hunt and the events of life. The qailertetang
answer in murmurs which the questioner must interpret for himself.
Fig. 535. Qailertetang, a masked figure. (From a sketch by the author.)
The evening is spent in playing ball, which is whipped all around the
settlement (ajuktaqtung). (See Appendix, Note 6.)
This feast is celebrated as here described in Cumberland Sound and
Nugumiut. Hall and Kumlien make a few observations in regard to it,
but the latter has evidently misunderstood its meaning. His
description is as follows (p. 43):
An angakoq dresses himself up in the most hideous manner,
having several pairs of pants on among the rest, and a horrid
looking mask of skins. The men and women now range
themselves in separate and opposite ranks, and the angakoq
takes his place between them. He then picks out a man and
conducts him to a woman in the opposite ranks. This couple
then go to the woman’s hut and have a grand spree for a day or
two. This manner of proceeding is kept up till all the women but
one are disposed of. This one is always the angakoq’s choice,
and her he reserves for himself.
Another description by Kumlien (p. 19) evidently refers to the same
feast:
They have an interesting custom or superstition, namely, the
killing of the evil spirit of the deer; sometime during the winter
or early in spring, at any rate before they can go deer hunting,
they congregate together and dispose of this imaginary evil. The
chief ancut [angakoq], or medicine man, is the main performer.
He goes through a number of gyrations and contortions,
constantly hallooing and calling, till suddenly the imaginary deer
is among them. Now begins a lively time. Every one is
screaming, running, jumping, spearing, and stabbing at the
imaginary deer, till one would think a whole madhouse was let
loose. Often this deer proves very agile, and must be hard to
kill, for I have known them to keep this performance up for
days; in fact, till they were completely exhausted.
During one of these performances an old man speared the deer,
another knocked out an eye, a third stabbed him, and so on till
he was dead. Those who are able or fortunate enough to inflict
some injury on this bad deer, especially he who inflicts the
death blow, is considered extremely lucky, as he will have no
difficulty in procuring as many deer as he wants, for there is no
longer an evil spirit to turn his bullets or arrows from their
course.
I could not learn anything about this ceremony, though I asked all
the persons with whom Kumlien had had intercourse. Probably there
was some misunderstanding as to the meaning of their feast during
the autumn which induced him to give this report.
Hall describes the feast as celebrated by the Nugumiut (I, p. 528),
as follows:
At a time of the year apparently answering to our Christmas,
they have a general meeting in a large igdlu [snow house] on a
certain evening. There the angakoq prays on behalf of the
people for the public prosperity through the subsequent year.
Then follows something like a feast. The next day all go out into
the open air and form in a circle; in the centre is placed a vessel
of water, and each member of the company brings a piece of
meat, the kind being immaterial. The circle being formed, each
person eats his or her meat in silence, thinking of Sedna, and
wishing for good things. Then one in the circle takes a cup, dips
up some of the water, all the time thinking of Sedna, and drinks
it; and then, before passing the cup to another, states audibly
the time and the place of his or her birth. This ceremony is
performed by all in succession. Finally, presents of various
articles are thrown from one to another, with the idea that each
will receive of Sedna good things in proportion to the liberality
here shown.
Soon after this occasion, at a time which answers to our New
Year’s day, two men start out, one of them being dressed to
represent a woman, and go to every house in the village,
blowing out the light in each. The lights are afterwards
rekindled from a fresh fire. When Taqulitu [Hall’s well known
companion in his journeys] was asked the meaning of this, she
replied, “New sun—new light,” implying a belief that the sun
was at that time renewed for the year.
Inasmuch as Hall did not see the feast himself, but had only a
description by an Eskimo, into which he introduced points of
similarity with Christian feasts, it may be looked upon as fairly
agreeing with the feast of the Oqomiut. The latter part corresponds
to the celebration of the feast as it is celebrated in Akudnirn. 8
According to a statement in the journal of Hall’s second expedition
(II, p. 219) masks are also used on the western shore of Hudson
Bay, where it seems that all the natives disguise themselves on this
occasion.
The Akudnirmiut celebrate the feast in the following way: The
qailertetang do not act a part there, but other masks take their
place. They are called mirqussang and represent a man and his wife.
They wear masks of the skin of the ground seal, only that of the
woman being tattooed. The hair of the man is arranged in a bunch
protruding from the forehead (sulubaut), that of the woman in a
pigtail on each side and a large knot at the back of the head. Their
left legs are tied up by a thong running around the neck and the
knee, compelling them to hobble. They have neither seal float and
spear nor inflated legs, but carry the skin scraper. They must try to
enter the huts while the Inuit hold a long sealskin thong before them
to keep them off. If they fall down in the attempt to cross it they are
thoroughly beaten with a short whip or with sticks. After having
succeeded in entering the huts they blow out all the fires.
The parts of the feast already described as celebrated in Cumberland
Sound seem not to be customary in Akudnirn, the conjuration of
Sedna and the exchanges of wives excepted, which are also
practiced here. Sometimes the latter ceremony takes place the night
before the feast. It is called suluiting or quvietung.
When it is quite dark a number of Inuit come out of their huts and
run crying all round their settlements. Wherever anybody is asleep
they climb upon the roof of his hut and rouse him by screaming and
shouting until all have assembled outside. Then a woman and a man
(the mirqussang) sit down in the snow. The man holds a knife
(sulung) in his hand, from which the feast takes its name, and sings:
Oangaja jaja jajaja aja.
Pissiungmipadlo panginejernago
Qodlungutaokpan panginejerlugping
Pissiungmipadlo panginejernago.
To this song the woman keeps time by moving her body and her
arms, at the same time flinging snow on the bystanders. Then the
whole company goes into the singing house and joins in dancing and
singing. This done, the men must leave the house and stand outside
while the mirqussang watch the entrance. The women continue
singing and leave the house one by one. They are awaited by the
mirqussang, who lead every one to one of the men standing about.
The pair must re-enter the singing house and walk around the lamp,
all the men and women crying, “Hrr! hrr!” from both corners of the
mouth. Then they go to the woman’s hut, where they stay during
the ensuing night. The feast is frequently celebrated by all the tribes
of Davis and Hudson Strait, and even independently of the great
feast described above.
The day after, the men frequently join in a shooting match. A target
is set up, at which they shoot their arrows. As soon as a man hits,
the women, who stand looking on, rush forward and rub noses with
him.
If a stranger unknown to the inhabitants of a settlement arrives on a
visit he is welcomed by the celebration of a great feast. Among the
southeastern tribes the natives arrange themselves in a row, one
man standing in front of it. The stranger approaches slowly, his arms
folded and his head inclined toward the right side. Then the native
strikes him with all his strength on the right cheek and in his turn
inclines his head awaiting the stranger’s blow (tigluiqdjung). While
this is going on the other men are playing at ball and singing
(igdlukitaqtung). Thus they continue until one of the combatants is
vanquished.
The ceremonies of greeting among the western tribes are similar to
those of the eastern, but in addition “boxing, wrestling, and knife
testing” are mentioned by travelers who have visited them. In Davis
Strait and probably in all the other countries the game of “hook and
crook” is always played on the arrival of a stranger
(pakijumijartung). Two men sit down on a large skin, after having
stripped the upper part of their bodies, and each tries to stretch out
the bent arm of the other. These games are sometimes dangerous,
as the victor has the right to kill his adversary; but generally the
feast ends peaceably. The ceremonies of the western tribes in
greeting a stranger are much feared by their eastern neighbors and
therefore intercourse is somewhat restricted. The meaning of the
duel, according to the natives themselves, is “that the two men in
meeting wish to know which of them is the better man.” The
similarity of these ceremonies with those of Greenland, where the
game of hook and crook and wrestling matches have been
customary, is quite striking, as is that of the explanation of these
ceremonies.
The word for greeting on Davis Strait and Hudson Strait, is
Assojutidlin? (Are you quite well?) and the answer, Tabaujuradlu
(Very well). The word Taima! which is used in Hudson Strait, and
Mane taima! of the Netchillirmiut seem to be similar to our Halloo!
The Ukusiksalirmiut say Ilaga! (My friend!)
CUSTOMS AND REGULATIONS CONCERNING BIRTH,
SICKNESS, AND DEATH.
I have mentioned that it is extremely difficult to find out the
innumerable regulations connected with the religious ideas and
customs of the Eskimo. The difficulty is even greater in regard to the
customs which refer to birth, sickness, and death, and it is no
wonder that, while some of the accounts of different writers coincide
tolerably well, there are great discrepancies in others, particularly as
the customs vary to a great extent among the different tribes.
Before the child is born a small hut or snow house is built for the
mother, in which she awaits her delivery. Sick persons are isolated in
the same way, the reason being that in case of death everything that
had been in contact with the deceased must be destroyed. According
to Kumlien (p. 28) the woman is left with only one attendant,
a young girl appointed by the head ancut (angakoq) of the
encampment; but this, no doubt, is an error. She may be visited by
her friends, who, however, must leave her when parturition takes
place. She must cut the navel string herself, and in Davis Strait this
is done by tying it through with deer sinews; in Iglulik (Lyon,
p. 370), by cutting it with a stone spear head. The child is cleaned
with a birdskin and clothed in a small gown of the same material.
According to Lyon the Iglulirmiut swathe it with the dried intestines
of some animal.
Kumlien describes a remarkable custom of which I could find no
trace, not even upon direct inquiry (p. 281):
As soon as the mother with her new born babe is able to get up
and go out, usually but a few hours, they are taken in charge by
an aged female angakoq, who seems to have some particular
mission to perform in such cases. She conducts them to some
level spot on the ice, if near the sea, and begins a sort of march
in circles on the ice, the mother following with the child on her
back; this manœuvre is kept up for some time, the old woman
going through a number of performances the nature of which
we could not learn and continually muttering something equally
unintelligible to us. The next act is to wade through snowdrifts,
the aged angakoq leading the way. We have been informed that
it is customary for the mother to wade thus bare-legged.
Lyon says (p. 370):
After a few days, or according to the fancy of the parents, an
angakoq, who by relationship or long acquaintance is a friend of
the family, makes use of some vessel, and with the urine the
mother washes the infant, while all the gossips around pour
forth their good wishes for the little one to prove an active man,
if a boy, or, if a girl, the mother of plenty of children. This
ceremony, I believe, is never omitted, and is called qoqsiuariva.
Though I heard about the washing with urine, I did not learn
anything about the rest of the ceremony in Cumberland Sound and
Davis Strait.
A few days after birth the first dress of the child is exchanged for
another. A small hood made from the skin of a hare’s head is fitted
snugly upon the head, a jacket for the upper part of the body is
made of the skin of a fawn, and two small boots, made of the same
kind of a skin, the left one being wreathed with seaweed (Fucus),
cover the legs. While the child wears this clothing that which was
first worn is fastened to a pole which is secured to the roof of the
hut. In two months the child gets a third suit of clothes the same as
formerly described (p. 557). Then the second gown is exposed for
some time on the top of the hut, the first one being taken down,
and both are carefully preserved for a year. After this time has
expired both are once more exposed on the top of a pole and then
sunk into the sea, a portion of the birdskin dress alone being kept,
for this is considered a powerful amulet and is held in high esteem
and worn every fall at the Sedna feast on the point of the hood (see
p. 604). I have stated that those who were born in abnormal
presentations wear women’s dresses at this feast and must make
their round in a direction opposite to the movement of the sun.
Captain Spicer, of Groton, Conn., affirms that the bird used for the
first clothing is chosen according to a strict law, every month having
its own bird. So far as I know, waterfowl are used in summer and
the ptarmigan in winter, and accordingly the men are called at the
great autumn feast the ducks and ptarmigans, the former including
those who were born in summer, the latter those born in winter.
As long as any portion of the navel string remains a strip of sealskin
is worn around the belly.
After the birth of her child the mother must observe a great number
of regulations, referring particularly to food and work. She is not
allowed for a whole year to eat raw meat or a part of any animal
killed by being shot through the heart. In Cumberland Sound she
must not eat for five days anything except meat of an animal killed
by her husband or by a boy on his first hunting expedition. This
custom seems to be observed more strictly, however, and for a
longer period if the new born child dies. Two months after delivery
she must make a call at every hut, while before that time she is not
allowed to enter any but her own. At the end of this period she must
also throw away her old clothing. The same custom was observed by
Hall among the Nugumiut (I, p. 426). On the western shore of
Hudson Bay she is permitted to re-enter the hut a few days after
delivery, but must pass in by a separate entrance. An opening is cut
for the purpose through the snow wall. She must keep a little skin
bag hung up near her, into which she must put a little of her food
after each meal, having first put it up to her mouth. This is called
laying up food for the infant, although none is given to it (Hall II,
p. 173). I have already mentioned that the parents are not allowed
in the first year after the birth of a child to take part in the Sedna
feast.
The customs which are associated with the death of an infant are
very complicated. For a whole year, when outside the hut, the
mother must have her head covered with a cap, or at least with a
piece of skin. If a ground seal is caught she must throw away the
old cap and have a new one made. The boots of the deceased are
always carried about by the parents when traveling, and whenever
they stop these are buried in the snow or under stones. Neither
parent is allowed to eat raw flesh during the following year. The
woman must cook her food in a small pot which is exclusively used
by her. If she is about to enter a hut the men who may be sitting
inside must come out first, and not until they have come out is she
allowed to enter. If she wants to go out of the hut she must walk
around all the men who may happen to be there.
The child is sometimes named before it is born. Lyon says upon this
subject (p. 369):
Some relative or friend lays her hand on the mother’s stomach,
and decides what the infant is to be called, and, as the names
serve for either sex, it is of no consequence whether it proves a
girl or a boy.
On Davis Strait it is always named after the persons who have died
since the last birth took place, and therefore the number of names
of an Eskimo is sometimes rather large. If a relative dies while the
child is younger than four years or so, his name is added to the old
ones and becomes the proper name by which it is called. It is
possible that children receive the names of all the persons in the
settlement who die while the children are quite young, but of this I
am not absolutely certain. When a person falls sick the angakut
change his name in order to ward off the disease or they consecrate
him as a dog to Sedna. In the latter event he gets a dog’s name and
must wear throughout life a harness over the inner jacket. Thus it
may happen that Eskimo are known in different tribes by different
names. It may also be mentioned here that friends sometimes
exchange names and dogs are called by the name of a friend as a
token of regard.
The treatment of the sick is the task of the angakoq, whose
manipulations have been described.
If it is feared that a disease will prove fatal, a small snow house or a
hut is built, according to the season, into which the patient is carried
through an opening at the back. This opening is then closed, and
subsequently a door is cut out. A small quantity of food is placed in
the hut, but the patient is left without attendants. As long as there is
no fear of sudden death the relatives and friends may come to visit
him, but when death is impending the house is shut up and he is left
alone to die. If it should happen that a person dies in a hut among
its inmates, everything belonging to the hut must be destroyed or
thrown away, even the tools c. lying inside becoming useless to the
survivors, but the tent poles may be used again after a year has
elapsed. No doubt this custom explains the isolation of the sick. If a
child dies in a hut and the mother immediately rushes out with it,
the contents of the hut may be saved.
Though the Eskimo feel the greatest awe in touching a dead body,
the sick await their death with admirable coolness and without the
least sign of fear or unwillingness to die. I remember a young girl
who sent for me a few hours before her death and asked me to give
her some tobacco and bread, which she wanted to take to her
mother, who had died a few weeks before.
Only the relatives are allowed to touch the body of the deceased.
They clothe it or wrap it in deerskins and bury it at once. In former
times they always built a tomb, at least when death occurred in the
summer. From its usual dimensions one would suppose that the
body was buried with the legs doubled up, for all of them are too
short for grown persons. If the person to be buried is young, his feet
are placed in the direction of the rising sun, those of the aged in the
opposite direction. According to Lyon the Iglulirmiut bury half grown
children with the feet towards the southeast, young men and women
with the feet towards the south, and middle aged persons with the
feet towards the southwest. This agrees with the fact that the graves
in Cumberland Sound do not all lie east and west. The tomb is
always vaulted, as any stone or piece of snow resting upon the body
is believed to be a burden to the soul of the deceased. The man’s
hunting implements and other utensils are placed by the side of his
grave; the pots, the lamps, knives, c., by the side of that of the
woman; toys, by that of a child. Hall (I, p. 103) observed in a grave
a small kettle hung up over a lamp. These objects are held in great
respect and are never removed, at least as long as it is known to
whose grave they belong. Sometimes models of implements are
used for this purpose instead of the objects themselves. Figure 536
represents a model of a lamp found in a grave of Cumberland
Sound. Nowadays the Eskimo place the body in a box, if they can
procure one, or cover it very slightly with stones or snow. It is
strange that, though the ceremonies of burying are very strictly
attended to and though they take care to give the dead their
belongings, they do not heed the opening of the graves by dogs or
wolves and the devouring of the bodies and do not attempt to
recover them when the graves are invaded by animals.
Fig. 536. Model of lamp from a grave in Cumberland Sound. (Museum für
Völkerkunde, Berlin.)
The body must be carried to the place of burial by the nearest
relatives, a few others only accompanying it. For this purpose they
rarely avail themselves of a sledge, as it cannot be used afterward,
but must be left with the deceased. Dogs are never allowed to drag
the sledge on such an occasion. After returning from the burial the
relatives must lock themselves up in the old hut for three days,
during which they mourn the loss of the deceased. During this time
they do not dress their hair and they have their nostrils closed with a
piece of deerskin. After this they leave the hut forever. The dogs are
thrown into it through the window and allowed to devour whatever
they can get at. For some time afterward the mourners must cook
their meals in a separate pot. A strange custom was observed by
Hall in Hudson Bay (II, p. 186). The mourners did not smoke. They
kept their hoods on from morning till night. To the hood the skin and
feathers of the head of Uria grylle were fastened and a feather of
the same waterfowl to each arm just above the elbow. All male
relatives of the deceased wore a belt around the waist, besides
which they constantly wore mittens. It is probable that at the
present time all Eskimo when in mourning avoid using implements of
European manufacture and suspend the use of tobacco. It has
already been stated that women who have lost a child must keep
their heads covered.
Parry, Lyon (p. 369), and Klutschak (p. 201) state that when the
Eskimo first hear of the death of a relative they throw themselves
upon the ground and cry, not for grief, but as a mourning ceremony.
For three or sometimes even four days after a death the inhabitants
of a village must not use their dogs, but must walk to the hunting
ground, and for one day at least they are not allowed to go hunting
at all. The women must stop all kinds of work.
On the third day after death the relatives visit the tomb and travel
around it three times in the same direction as the sun is moving, at
the same time talking to the deceased and promising that they will
bring him something to eat. According to Lyon the Iglulirmiut chant
forth inquiries as to the welfare of the departed soul, whether it has
reached the land Adli, if it has plenty of food, c., at each question
stopping at the head of the grave and repeating some ceremonial
words (p. 371).
These visits to the grave are repeated a year after death and
whenever they pass it in traveling. Sometimes they carry food to the
deceased, which he is expected to return greatly increased. Hall
describes this custom as practiced by the Nugumiut (I, p. 426). He
says:
They took down small pieces of [deer] skin with the fur on, and
of [fat]. When there they stood around [the] grave [of the
woman] upon which they placed the articles they had brought.
Then one of them stepped up, took a piece of the [deer meat],
cut a slice and ate it, at the same time cutting off another slice
and placing it under a stone by the grave. Then the knife was
passed from one hand to the other, both hands being thrown
behind the person. This form of shifting the implement was
continued for perhaps a minute, the motions being accompanied
by constant talk with the dead. Then a piece of [deer] fur and
some [fat] were placed under the stone with an exclamation
signifying, “Here is something to eat and something to keep you
warm.” Each of the [natives] also went through the same forms.
They never visit the grave of a departed friend until some
months after death, and even then only when all the surviving
members of the family have removed to another place.
Whenever they return to the vicinity of their kindred’s grave,
a visit is made to it with the best of food as a present for the
departed one. Neither seal, polar bear, nor walrus, however, is
taken.
According to Klutschak (p. 154), the natives of Hudson Bay avoid
staying a long time on the salt water ice near the grave of a relative.
On the fourth day after death the relatives may go for the first time
upon the ice, but the men are not allowed to hunt; on the next day
they must go sealing, but without dogs and sledge, walking to the
hunting ground and dragging the seal home. On the sixth day they
are at liberty to use their dogs again. For a whole year they must not
join in any festival and are not allowed to sing certain songs.
If a married woman dies the widower is not permitted to keep any
part of the first seal he catches after her death except the flesh.
Skin, blubber, bones, and entrails must be sunk in the sea.
All the relatives must have new suits of clothes made and before the
others are cast away they are not allowed to enter a hut without
having asked and obtained permission. (See Appendix, Note 7.)
Lyon (p. 368) makes the following statement on the mourning
ceremonies in Iglulik:
Widows are forbidden for six months to taste of unboiled flesh;
they wear no * * * pigtails, and cut off a portion of their
long hair in token of grief, while the remaining locks hang in
loose disorder about their shoulders. * * * After six
months, the disconsolate ladies are at liberty to eat raw meat,
to dress their pigtails and to marry as fast as they please; while
in the meantime they either cohabit with their future husbands,
if they have one, or distribute their favors more generally.
A widower and his children remain during three days within the
hut where his wife died, after which it is customary to remove
to another. He is not allowed to fish or hunt for a whole season,
or in that period to marry again. During the three days of
lamentation all the relatives of the deceased are quite careless
of their dress; their hair hangs wildly about, and, if possible,
they are more than usually dirty in their persons. All visitors to a
mourning family consider it as indispensably necessary to howl
at their first entry.
I may add here that suicide is not of rare occurrence, as according
to the religious ideas of the Eskimo the souls of those who die by
violence go to Qudlivun, the happy land. For the same reason it is
considered lawful for a man to kill his aged parents. In suicide death
is generally brought about by hanging.
TALES AND TRADITIONS.
ITITAUJANG.
A long, long time ago, a young man, whose name was Ititaujang,
lived in a village with many of his friends. When he became grown
he wished to take a wife and went to a hut in which he knew an
orphan girl was living. However, as he was bashful and was afraid to
speak to the young girl himself, he called her little brother, who was
playing before the hut, and said, “Go to your sister and ask her if
she will marry me.” The boy ran to his sister and delivered the
message. The young girl sent him back and bade him ask the name
of her suitor. When she heard that his name was Ititaujang she told
him to go away and look for another wife, as she was not willing to
marry a man with such an ugly name. 9 But Ititaujang did not submit
and sent the boy once more to his sister. “Tell her that
Nettirsuaqdjung is my other name,” said he. The boy, however, said
upon entering, “Ititaujang is standing before the doorway and wants
to marry you.” Again the sister said “I will not have a man with that
ugly name.” When the boy returned to Ititaujang and repeated his
sister’s speech, he sent him back once more and said, “Tell her that
Nettirsuaqdjung is my other name.” Again the boy entered and said,
“Ititaujang is standing before the doorway and wants to marry you.”
The sister answered, “I will not have a man with that ugly name.”
When the boy returned to Ititaujang and told him to go away, he
was sent in the third time on the same commission, but to no better
effect. Again the young girl declined his offer, and upon that
Ititaujang went away in great anger. He did not care for any other
girl of his tribe, but left the country altogether and wandered over
hills and through valleys up the country many days and many nights.
At last he arrived in the land of the birds and saw a lakelet in which
many geese were swimming. On the shore he saw a great number
of boots; cautiously he crept nearer and stole as many as he could
get hold of. A short time after the birds left the water and finding the
boots gone became greatly alarmed and flew away. Only one of the
flock remained behind, crying, “I want to have my boots; I want to
have my boots.” Ititaujang came forth now and answered, “I will
give you your boots if you will become my wife.” She objected, but
when Ititaujang turned round to go away with the boots she agreed,
though rather reluctantly.
Having put on the boots she was transformed into a woman and
they wandered down to the seaside, where they settled in a large
village. Here they lived together for some years and had a son. In
time Ititaujang became a highly respected man, as he was by far the
best whaler among the Inuit.
Once upon a time the Inuit had killed a whale and were busy cutting
it up and carrying the meat and the blubber to their huts. Though
Ititaujang was hard at work his wife stood lazily by. When he called
her and asked her to help as the other women did she objected,
crying, “My food is not from the sea; my food is from the land; I will
not eat the meat of a whale; I will not help.”
Ititaujang answered, “You must eat of the whale; that will fill your
stomach.” Then she began crying and exclaimed, “I will not eat it;
I will not soil my nice white clothing.”
She descended to the beach, eagerly looking for birds’ feathers.
Having found a few she put them between her fingers and between
those of her child; both were transformed into geese and flew away.
When the Inuit saw this they called out, “Ititaujang, your wife is
flying away.” Ititaujang became very sad; he cried for his wife and
did not care for the abundance of meat and blubber, nor for the
whales spouting near the shore. He followed his wife and ascended
the land in search of her.
After having traveled for many weary months he came to a river.
There he saw a man who was busy chopping chips from a piece of
wood with a large hatchet. As soon as the chips fell off he polished
them neatly and they were transformed into salmon, becoming so
slippery that they glided from his hands and fell into the river, which
they descended to a large lake near by. The name of the man was
Eχaluqdjung (the little salmon).
On approaching, Ititaujang was frightened almost to death, for he
saw that the back of this man was altogether hollow and that he
could look from behind right through his mouth. Cautiously he crept
back and by a circuitous way approached him from the opposite
direction.
When Eχaluqdjung saw him coming he stopped chopping and asked,
“Which way did you approach me?” Ititaujang, pointing in the
direction he had come last and from which he could not see the
hollow back of Eχaluqdjung, answered, “It is there I have come
from.” Eχaluqdjung, on hearing this, said, “That is lucky for you. If
you had come from the other side and had seen my back I should
have immediately killed you with my hatchet.” Ititaujang was very
glad that he had turned back and thus deceived the salmon maker.
He asked him, “Have you not seen my wife, who has left me, coming
this way?” Eχaluqdjung had seen her and said, “Do you see yon little
island in the large lake? There she lives now and has taken another
husband.”
When Ititaujang heard this report he almost despaired, as he did not
know how to reach the island; but Eχaluqdjung kindly promised to
help him. They descended to the beach; Eχaluqdjung gave him the
backbone of a salmon and said, “Now shut your eyes. The backbone
will turn into a kayak and carry you safely to the island. But mind
you do not open your eyes, else the boat will upset.”
Ititaujang promised to obey. He shut his eyes, the backbone became
a kayak, and away he went over the lake. As he did not hear any
splashing of water, he was anxious to see whether the boat moved
on, and opened his eyes just a little. But he had scarcely taken a
short glimpse when the kayak began to swing violently and he felt
that it became a backbone again. He quickly shut his eyes, the boat
went steadily on, and a short time after he was landed on the island.
There he saw the hut and his son playing on the beach near it. The
boy on looking up saw Ititaujang and ran to his mother crying,
“Mother, father is here and is coming to our hut.” The mother
answered, “Go, play on; your father is far away and cannot find us.”
The child obeyed; but as he saw Ititaujang approaching he re-
entered the hut and said, “Mother, father is here and is coming to
our hut.” Again the mother sent him away, but he returned very
soon, saying that Ititaujang was quite near.
Scarcely had the boy said so when Ititaujang opened the door. When
the new husband saw him he told his wife to open a box which was
in a corner of the hut. She did so, and many feathers flew out of it
and stuck to them. The woman, her new husband, and the child
were thus again transformed into geese. The hut disappeared; but
when Ititaujang saw them about to fly away he got furious and cut
open the belly of his wife before she could escape. Then many eggs
fell down.
THE EMIGRATION OF THE SAGDLIRMIUT.
In the beginning all the Inuit lived near Ussualung, in Tiniqdjuarbing
(Cumberland Sound). The Igdlumiut, the Nugumiut, and the
Talirpingmiut in the south, the Aggomiut in the far north, and the
Inuit, who tattoo rings round their eyes, in the far west, all once
lived together. There is a tradition concerning the emigration of the
Sagdlirmiut (see p. 451) who live east of Iglulik. The Akudnirmiut
say that the following events did not happen in Tiniqdjuarbing, but
in Aggo, a country where nobody lives nowadays. Ikeraping, an
Akudnirmio, heard the story related by a Tununirmio, who had seen
the place himself, but all the Oqomiut assert that Ussualung is the
place where the events in the story happened.
An old woman, the sister of Mitiq, the angakoq, told the story as
follows:
Near Ussualung there are two places, Qerniqdjuaq and Eχaluqdjuaq.
In each of these was a large house, in which many families lived
together. They used to keep company during the summer when they
went deer hunting, but returned to their separate houses in the fall.
Once upon a time it happened that the men of Qerniqdjuaq had
been very successful, while those of Eχaluqdjuaq had caught
scarcely any deer. Therefore the latter got very angry and resolved
to kill the other party, but they preferred to wait until the winter.
Later in the season many deer were caught and put up in depots.
They were to be carried down to the winter settlements by means of
sledges.
One day both parties agreed upon a journey to these depots and the
men of Eχaluqdjuaq resolved to kill their enemies on this occasion.
They set out with their dogs and sledges, and when they were fairly
inland they suddenly attacked their unsuspecting companions and
killed them. For fear that the wives and children of the murdered
men might be suspicious if the dogs returned without their masters,
they killed them too. After a short time they returned and said they
had lost the other party and did not know what had happened to
them.
A young man of Eχaluqdjuaq was the suitor of a girl of Qerniqdjuaq
and used to visit her every night. He did not stop his visits now. He
was kindly received by the woman and lay down to sleep with his
young wife.
Under the snow bench there was a little boy who had seen the
young man of Eχaluqdjuaq coming. When everybody was sleeping
he heard somebody calling and soon recognized the spirits of the
murdered men, who told him what had happened and asked him to
kill the young man in revenge. The boy crept from his place under
the bed, took a knife, and put it into the young man’s breast. As he
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Data Mining – A Perspective Approach
PPTX
Data Mining & Applications
PDF
Knowledge Discovery Practices And Emerging Applications Of Data Mining Trends...
DOCX
Data Mining @ Information Age
PPT
PPT
PDF
Lect 1 introduction
PPT
datamining.ppt
Data Mining – A Perspective Approach
Data Mining & Applications
Knowledge Discovery Practices And Emerging Applications Of Data Mining Trends...
Data Mining @ Information Age
Lect 1 introduction
datamining.ppt

Similar to Data Mining Applications For Empowering Knowledge Societies 1st Edition Hakikur Rahman (20)

PPT
datamining.ppt
PPTX
datamining management slyabbus and ppt.pptx
PPT
Data Mining Xuequn Shang NorthWestern Polytechnical University
PDF
Data Mining and its detail processes with steps
PPT
Unit 1 (Chapter-1) on data mining concepts.ppt
PPT
Chapter 01Intro.ppt full explanation used
PDF
Data Mining and Data Warehousing 1st Edition S.K. Mourya
PPTX
Topic(1)-Intro data mining master ALEX.pptx
DOCX
Seminar Report Vaibhav
PDF
Collaborative Filtering Using Data Mining and Analysis Vishal Bhatnagar
PPTX
BAS 250 Lecture 1
PPT
Introduction to data warehouse
PDF
Data Mining Concepts And Applictions Ciza Thomas Andries Engelbrecht
PPTX
Data mining and knowledge discovery
PPTX
Data mining and knowledge discovery
PPTX
Data mining and knowledge discovery
PPTX
Data mining and knowledge discovery
PPTX
Data mining and knowledge discovery
PPTX
Data mining and knowledge discovery
PPTX
Data mining and knowledge discovery
datamining.ppt
datamining management slyabbus and ppt.pptx
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining and its detail processes with steps
Unit 1 (Chapter-1) on data mining concepts.ppt
Chapter 01Intro.ppt full explanation used
Data Mining and Data Warehousing 1st Edition S.K. Mourya
Topic(1)-Intro data mining master ALEX.pptx
Seminar Report Vaibhav
Collaborative Filtering Using Data Mining and Analysis Vishal Bhatnagar
BAS 250 Lecture 1
Introduction to data warehouse
Data Mining Concepts And Applictions Ciza Thomas Andries Engelbrecht
Data mining and knowledge discovery
Data mining and knowledge discovery
Data mining and knowledge discovery
Data mining and knowledge discovery
Data mining and knowledge discovery
Data mining and knowledge discovery
Data mining and knowledge discovery
Ad

Recently uploaded (20)

PDF
International_Financial_Reporting_Standa.pdf
PPTX
Virtual and Augmented Reality in Current Scenario
PDF
Empowerment Technology for Senior High School Guide
PDF
My India Quiz Book_20210205121199924.pdf
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
semiconductor packaging in vlsi design fab
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PPTX
What’s under the hood: Parsing standardized learning content for AI
PDF
Mucosal Drug Delivery system_NDDS_BPHARMACY__SEM VII_PCI.pdf
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
Hazard Identification & Risk Assessment .pdf
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
MICROENCAPSULATION_NDDS_BPHARMACY__SEM VII_PCI .pdf
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PPTX
Computer Architecture Input Output Memory.pptx
PDF
What if we spent less time fighting change, and more time building what’s rig...
PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
PDF
Journal of Dental Science - UDMY (2021).pdf
International_Financial_Reporting_Standa.pdf
Virtual and Augmented Reality in Current Scenario
Empowerment Technology for Senior High School Guide
My India Quiz Book_20210205121199924.pdf
Cambridge-Practice-Tests-for-IELTS-12.docx
semiconductor packaging in vlsi design fab
Unit 4 Computer Architecture Multicore Processor.pptx
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
What’s under the hood: Parsing standardized learning content for AI
Mucosal Drug Delivery system_NDDS_BPHARMACY__SEM VII_PCI.pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Hazard Identification & Risk Assessment .pdf
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
MICROENCAPSULATION_NDDS_BPHARMACY__SEM VII_PCI .pdf
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
Computer Architecture Input Output Memory.pptx
What if we spent less time fighting change, and more time building what’s rig...
Core Concepts of Personalized Learning and Virtual Learning Environments
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
Journal of Dental Science - UDMY (2021).pdf
Ad

Data Mining Applications For Empowering Knowledge Societies 1st Edition Hakikur Rahman

  • 1. Data Mining Applications For Empowering Knowledge Societies 1st Edition Hakikur Rahman download https://ebookbell.com/product/data-mining-applications-for- empowering-knowledge-societies-1st-edition-hakikur-rahman-1479086 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Surveillance Technologies And Early Warning Systems Data Mining Applications For Risk Detection 1st Edition Ali Serhan Koyuncugil https://ebookbell.com/product/surveillance-technologies-and-early- warning-systems-data-mining-applications-for-risk-detection-1st- edition-ali-serhan-koyuncugil-7121234 Pharmaceutical Data Mining Approaches And Applications For Drug Discovery Konstantin V Balakin https://ebookbell.com/product/pharmaceutical-data-mining-approaches- and-applications-for-drug-discovery-konstantin-v-balakin-57672036 Pharmaceutical Data Mining Approaches And Applications For Drug Discovery Konstantin V Balakin Wiley Interscience Online Service https://ebookbell.com/product/pharmaceutical-data-mining-approaches- and-applications-for-drug-discovery-konstantin-v-balakin-wiley- interscience-online-service-4105478 Soft Computing For Data Mining Applications 1st Edition K R Venugopal https://ebookbell.com/product/soft-computing-for-data-mining- applications-1st-edition-k-r-venugopal-4193372
  • 3. Database Support For Data Mining Applications Discovering Knowledge With Inductive Queries 1st Edition Jeanfranois Boulicaut Auth https://ebookbell.com/product/database-support-for-data-mining- applications-discovering-knowledge-with-inductive-queries-1st-edition- jeanfranois-boulicaut-auth-4238540 Intelligent Data Mining And Analysis In Power And Energy Systems Models And Applications For Smarter Efficient Power Systems 1st Edition Zita A Vale https://ebookbell.com/product/intelligent-data-mining-and-analysis-in- power-and-energy-systems-models-and-applications-for-smarter- efficient-power-systems-1st-edition-zita-a-vale-50489556 Data Mining For Bioinformatics Applications 1st Edition He Zengyou https://ebookbell.com/product/data-mining-for-bioinformatics- applications-1st-edition-he-zengyou-5138054 Data Mining For Business Applications 1st Edition Cao Longbing Auth https://ebookbell.com/product/data-mining-for-business- applications-1st-edition-cao-longbing-auth-1201928 Data Mining For Biomedical Applications Pakdd 2006 Workshop Biodm 2006 Singapore April 9 2006 Proceedings 1st Edition Hon Nian Chua https://ebookbell.com/product/data-mining-for-biomedical-applications- pakdd-2006-workshop-biodm-2006-singapore-april-9-2006-proceedings-1st- edition-hon-nian-chua-1547960
  • 6. Data Mining Applications for Empowering Knowledge Societies Hakikur Rahman Sustainable Development Networking Foundation (SDNF), Bangladesh Hershey • New York InformatIon scIence reference
  • 7. Director of Editorial Content: Kristin Klinger Managing Development Editor: Kristin M. Roth Assistant Managing Development Editor: Jessica Thompson Assistant Development Editor: Deborah Yahnke Senior Managing Editor: Jennifer Neidig Managing Editor: Jamie Snavely Assistant Managing Editor: Carole Coulson Copy Editor: Erin Meyer Typesetter: Sean Woznicki Cover Design: Lisa Tosheff Printed at: Yurchak Printing Inc. Published in the United States of America by Information Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue, Suite 200 Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: cust@igi-global.com Web site: http://www.igi-global.com and in the United Kingdom by Information Science Reference (an imprint of IGI Global) 3 Henrietta Street Covent Garden London WC2E 8LU Tel: 44 20 7240 0856 Fax: 44 20 7379 0609 Web site: http://www.eurospanbookstore.com Copyright © 2009 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark. Library of Congress Cataloging-in-Publication Data Data mining applications for empowering knowledge societies / Hakikur Rahman, editor. p. cm. Summary: “This book presents an overview on the main issues of data mining, including its classification, regression, clustering, and ethical issues”--Provided by publisher. Includes bibliographical references and index. ISBN 978-1-59904-657-0 (hardcover) -- ISBN 978-1-59904-659-4 (ebook) 1. Data mining. 2. Knowledge management. I. Rahman, Hakikur, 1957- QA76.9.D343D38226 2009 005.74--dc22 2008008466 British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library. All work contributed to this book set is original material. The views expressed in this book are those of the authors, but not necessarily of the publisher. If a library purchased a print copy of this publication, please go to http://www.igi-global.com/agreement for information on activating the library's complimentary electronic access to this publication.
  • 8. Foreword ..............................................................................................................................................xi Preface .................................................................................................................................................xii Acknowledgment ..............................................................................................................................xxii Section I Education and Research Chapter I Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications ................................................................................................................ 1 Yong Shi, University of the Chinese Academy of Sciences, China and University of Nebraska at Omaha, USA Yi Peng, University of Nebraska at Omaha, USA Gang Kou, University of Nebraska at Omaha, USA Zhengxin Chen, University of Nebraska at Omaha, USA Chapter II Making Decisions with Data: Using Computational Intelligence Within a Business Environment ......................................................................................................................... 26 Kevin Swingler, University of Stirling, Scotland David Cairns, University of Stirling, Scotland Chapter III Data Mining Association Rules for Making Knowledgeable Decisions ............................................. 43 A.V. Senthil Kumar, CMS College of Science and Commerce, India R. S. D. Wahidabanu, Govt. College of Engineering, India Table of Contents
  • 9. Section II Tools, Techniques, Methods Chapter IV Image Mining: Detecting Deforestation Patterns Through Satellites .................................................. 55 Marcelino Pereira dos Santos Silva, Rio Grande do Norte State University, Brazil Gilberto Câmara, National Institute for Space Research, Brazil Maria Isabel Sobral Escada, National Institute for Space Research, Brazil Chapter V Machine Learning and Web Mining: Methods and Applications in Societal Benefit Areas ................ 76 Georgios Lappas, Technological Educational Institution of Western Macedonia, Kastoria Campus, Greece Chapter VI The Importance of Data Within Contemporary CRM ......................................................................... 96 Diana Luck, London Metropolitan University, UK Chapter VII Mining Allocating Patterns in Investment Portfolios ......................................................................... 110 Yanbo J. Wang, University of Liverpool, UK Xinwei Zheng, University of Durham, UK Frans Coenen, University of Liverpool, UK Chapter VIII Application of Data Mining Algorithms for Measuring Performance Impact of Social Development Activities ...................................................................................................... 136 Hakikur Rahman, Sustainable Development Networking Foundation (SDNF), Bangladesh Section III Applications of Data Mining Chapter IX Prospects and Scopes of Data Mining Applications in Society Development Activities .................. 162 Hakikur Rahman, Sustainable Development Networking Foundation, Bangladesh Chapter X Business Data Warehouse: The Case of Wal-Mart ............................................................................ 189 Indranil Bose, The University of Hong Kong, Hong Kong Lam Albert Kar Chun, The University of Hong Kong, Hong Kong Leung Vivien Wai Yue, The University of Hong Kong, Hong Kong Li Hoi Wan Ines, The University of Hong Kong, Hong Kong Wong Oi Ling Helen, The University of Hong Kong, Hong Kong
  • 10. Chapter XI Medical Applications of Nanotechnology in the Research Literature ............................................... 199 Ronald N. Kostoff, Office of Naval Research, USA Raymond G. Koytcheff, Office of Naval Research, USA Clifford G.Y. Lau, Institute for Defense Analyses, USA Chapter XII Early Warning System for SMEs as a Financial Risk Detector ......................................................... 221 Ali Serhan Koyuncugil, Capital Markets Board of Turkey, Turkey Nermin Ozgulbas, Baskent University, Turkey Chapter XIII What Role is “Business Intelligence” Playing in Developing Countries? A Picture of Brazilian Companies ...................................................................................................... 241 Maira Petrini, Fundação Getulio Vargas, Brazil Marlei Pozzebon, HEC Montreal, Canada Chapter XIV Building an Environmental GIS Knowledge Infrastructure .............................................................. 262 Inya Nlenanya, Center for Transportation Research and Education, Iowa State University, USA Chapter XV The Application of Data Mining for Drought Monitoring and Prediction ......................................... 280 Tsegaye Tadesse, National Drought Mitigation Center, University of Nebraska, USA Brian Wardlow, National Drought Mitigation Center, University of Nebraska, USA Michael J. Hayes, National Drought Mitigation Center, University of Nebraska, USA Compilation of References .............................................................................................................. 292 About the Contributors ................................................................................................................... 325 Index ................................................................................................................................................ 330
  • 11. Foreword ..............................................................................................................................................xi Preface .................................................................................................................................................xii Acknowledgment ..............................................................................................................................xxii Section I Education and Research Chapter I Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications ................................................................................................................ 1 Yong Shi, University of the Chinese Academy of Sciences, China and University of Nebraska at Omaha, USA Yi Peng, University of Nebraska at Omaha, USA Gang Kou, University of Nebraska at Omaha, USA Zhengxin Chen, University of Nebraska at Omaha, USA This chapter presents an overview of a series of multiple criteria optimization-based data mining meth- ods that utilize multiple criteria programming to solve various data mining problems and outlines some research challenges. At the same time, this chapter points out to several research opportunities for the data mining community. Chapter II Making Decisions with Data: Using Computational Intelligence Within a Business Environment ......................................................................................................................... 26 Kevin Swingler, University of Stirling, Scotland David Cairns, University of Stirling, Scotland This chapter identifies important barriers to the successful application of computational intelligence techniques in a commercial environment and suggests a number of ways in which they may be over- come. It further identifies a few key conceptual, cultural, and technical barriers and describes different ways in which they affect business users and computational intelligence practitioners. This chapter aims to provide knowledgeable insight for its readers through outcome of a successful computational intelligence project. Detailed Table of Contents
  • 12. Chapter III Data Mining Association Rules for Making Knowledgeable Decisions ............................................. 43 A.V. Senthil Kumar, CMS College of Science and Commerce, India R. S. D. Wahidabanu, Govt. College of Engineering, India This chapter describes two popular data mining techniques that are being used to explore frequent large itemsets in the database. The first one is called closed directed graph approach where the algorithm scans the database once making a count on possible 2-itemsets from which only the 2-itemsets with a mini- mum support are used to form the closed directed graph and explores possible frequent large itemsets in the database. In the second one, dynamic hashing algorithm where large 3-itemsets are generated at an earlier stage that reduces the size of the transaction database after trimming and thereby cost of later iterations will be reduced. However, this chapter envisages that these techniques may help researchers not only to understand about generating frequent large itemsets, but also finding association rules among transactions within relational databases, and make knowledgeable decisions. Section II Tools, Techniques, Methods Chapter IV Image Mining: Detecting Deforestation Patterns Through Satellites .................................................. 55 Marcelino Pereira dos Santos Silva, Rio Grande do Norte State University, Brazil Gilberto Câmara, National Institute for Space Research, Brazil Maria Isabel Sobral Escada, National Institute for Space Research, Brazil This chapter presents with relevant definitions on remote sensing and image mining domain, by refer- ring to related work in this field and demonstrates the importance of appropriate tools and techniques to analyze satellite images and extract knowledge from this kind of data. A case study, the Amazonia with deforestation problem is being discussed, and effort has been made to develop strategy to deal with challenges involving Earth observation resources. The purpose is to present new approaches and research directions on remote sensing image mining, and demonstrates how to increase the analysis potential of such huge strategic data for the benefit of the researchers. Chapter V Machine Learning and Web Mining: Methods and Applications in Societal Benefit Areas ................ 76 Georgios Lappas, Technological Educational Institution of Western Macedonia, Kastoria Campus, Greece This chapter reviews contemporary researches on machine learning and Web mining methods that are related to areas of social benefit. It further demonstrates that machine learning and web mining methods may provide intelligent Web services of social interest. The chapter also discusses about the growing interest of researchers in recent days for using advanced computational methods, such as machine learn- ing and Web mining, for better services to the public.
  • 13. Chapter VI The Importance of Data Within Contemporary CRM ......................................................................... 96 Diana Luck, London Metropolitan University, UK This chapter search for the importance of customer relationship management (CRM) in the product development and service elements as well as organizational structure and strategies, where data takes as the pivotal dimension around which the concept of CRM revolves in contemporary terms. Subsequently it has tried to demonstrate how these processes are associated with data management, namely: data col- lection, data collation, data storage and data mining, and are becoming essential components of CRM in both theoretical and practical aspects. Chapter VII Mining Allocating Patterns in Investment Portfolios ......................................................................... 110 Yanbo J. Wang, University of Liverpool, UK Xinwei Zheng, University of Durham, UK Frans Coenen, University of Liverpool, UK This chapter has introduced the concept of “one-sum” weighted association rules (WARs) and named such WARs as allocating patterns (ALPs). Here, an algorithm is being proposed to extract hidden and interestingALPs from data. The chapter further points out thatALPs can be applied in portfolio manage- ment, and modeling a collection of investment portfolios as a one-sum weighted transaction-database, ALPs can be applied to guide future investment activities. Chapter VIII Application of Data Mining Algorithms for Measuring Performance Impact of Social Development Activities ...................................................................................................... 136 Hakikur Rahman, Sustainable Development Networking Foundation (SDNF), Bangladesh This chapter focuses to data mining applications and their utilizations in devising performance-measuring tools for social development activities. It has provided justifications to include data mining algorithm for establishing specifically derived monitoring and evaluation tools that may be used for various social development applications. Specifically, this chapter gave in-depth analytical observations for establishing knowledge centers with a range of approaches and put forward a few research issues and challenges to transform the contemporary human society into a knowledge society. Section III Applications of Data Mining Chapter IX Prospects and Scopes of Data Mining Applications in Society Development Activities .................. 162 Hakikur Rahman, Sustainable Development Networking Foundation, Bangladesh Chapter IX focuses on a few areas of social development processes and put forwards hints on application of data mining tools, through which decision-making would be easier. Subsequently, it has put forward
  • 14. potential areas of society development initiatives, where data mining applications can be incorporated. The focus area may vary from basic social services, like education, health care, general commodities, tourism, and ecosystem management to advanced uses, like database tomography. Chapter X Business Data Warehouse: The Case of Wal-Mart ............................................................................ 189 Indranil Bose, The University of Hong Kong, Hong Kong Lam Albert Kar Chun, The University of Hong Kong, Hong Kong Leung Vivien Wai Yue, The University of Hong Kong, Hong Kong Li Hoi Wan Ines, The University of Hong Kong, Hong Kong Wong Oi Ling Helen, The University of Hong Kong, Hong Kong This chapter highlights on business data warehouse and discusses about the retailing giantWal-Mart. Here, the planning and implementation of the Wal-Mart data warehouse is being described and its integration with the operational systems is being discussed. This chapter has also highlighted some of the problems that have been encountered during the development process of the data warehouse, and provided some future recommendations about Wal-Mart data warehouse. Chapter XI Medical Applications of Nanotechnology in the Research Literature ............................................... 199 Ronald N. Kostoff, Office of Naval Research, USA Raymond G. Koytcheff, Office of Naval Research, USA Clifford G.Y. Lau, Institute for Defense Analyses, USA Chapter XI examines medical applications literatures that are associated with nanoscience and nano- technology research. For this research, authors have retrieved about 65000 nanotechnology records in 2005 from the Science Citation Index/ Social Science Citation Index (SCI/SSCI) using a comprehensive 300+ term query, and in this chapter they intend to facilitate the nanotechnology transition process by identifying the significant application areas. Specifically, it has identified the main nanotechnology health applications from today’s vantage point, as well as the related science and infrastructure. The medical applications were ascertained through a fuzzy clustering process, and metrics were generated using text mining to extract technical intelligence for specific medical applications/ applications groups. Chapter XII Early Warning System for SMEs as a Financial Risk Detector ......................................................... 221 Ali Serhan Koyuncugil, Capital Markets Board of Turkey, Turkey Nermin Ozgulbas, Baskent University, Turkey This chapter introduces an early warning system for SMEs (SEWS) as a financial risk detector that is based on data mining. During the development of an early warning system, it compiled a system in which qualitative and quantitative data about the requirements of enterprises are taken into consider- ation. Moreover, an easy to understand, easy to interpret and easy to apply utilitarian model is targeted by discovering the implicit relationships between the data and the identification of effect level of every factor related to the system. This chapter eventually shows the way of empowering knowledge society from SME’s point of view by designing an early warning system based on data mining.
  • 15. Chapter XIII What Role is “Business Intelligence” Playing in Developing Countries? A Picture of Brazilian Companies ...................................................................................................... 241 Maira Petrini, Fundação Getulio Vargas, Brazil Marlei Pozzebon, HEC Montreal, Canada Chapter XIII focuses at various business intelligence (BI) projects in developing countries, and spe- cifically highlights on Brazilian BI projects. Within a broad enquiry about the role of BI playing in developing countries, two specific research questions were explored in this chapter. The first one tried to determine whether the approaches, models or frameworks are tailored for particularities and the contextually situated business strategy of each company, or if they are “standard” and imported from “developed” contexts. The second one tried to analyze what type of information is being considered for incorporation by BI systems; whether they are formal or informal in nature; whether they are gathered from internal or external sources; whether there is a trend that favors some areas, like finance or mar- keting, over others, or if there is a concern with maintaining multiple perspectives; who in the firms is using BI systems, and so forth. Chapter XIV Building an Environmental GIS Knowledge Infrastructure .............................................................. 262 Inya Nlenanya, Center for Transportation Research and Education, Iowa State University, USA In Chapter XIV, the author proposes a simple and accessible conceptual geographical information system (GIS) based knowledge discovery interface that can be used as a decision making tool. The chapter also addresses some issues that might make this knowledge infrastructure stimulate sustainable development, especially emphasizing sub-Saharan African region. Chapter XV The Application of Data Mining for Drought Monitoring and Prediction ......................................... 280 Tsegaye Tadesse, National Drought Mitigation Center, University of Nebraska, USA Brian Wardlow, National Drought Mitigation Center, University of Nebraska, USA Michael J. Hayes, National Drought Mitigation Center, University of Nebraska, USA Chapter XV discusses about the application of data mining to develop drought monitoring utilities, which enable monitoring and prediction of drought’s impact on vegetation conditions. The chapter also sum- marizes current research using data mining approaches to build up various types of drought monitoring tools and explains how they are being integrated with decision support systems, specifically focusing drought monitoring and prediction in the United States. Compilation of References .............................................................................................................. 292 About the Contributors ................................................................................................................... 325 Index ................................................................................................................................................ 330
  • 16. xi Foreword Advances in information technology and data collection methods have led to the availability of larger data sets in government and commercial enterprises, and in a wide variety of scientific and engineering disciplines. Consequently, researchers and practitioners have an unprecedented opportunity to analyze this data in much more analytic ways and extract intelligent and useful information from it. The traditional approach to data analysis for decision making has been shifted to merge business and scientific expertise with statistical modeling techniques in order to develop experimentally verified solutions for explicit problems. In recent years, a number of trends have emerged that have started to challenge this traditional approach. One trend is the increasing accessibility of large volumes of high- dimensional data, occupying database tables with many millions of rows and many thousands of col- umns. Another trend is the increasing dynamic demand for rapidly building and deploying data-driven analytics. A third trend is the increasing necessity to present analysis results to end-users in a form that can be readily understood and assimilated so that end-users can gain the insights they need to improve the decisions they make. Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors. Data mining algorithms embody techniques that have existed for at least 10 years, but have only recently been implemented as mature, reliable, understandable tools that consistently outperform older statisti- cal methods. This book has specifically focused on applying data mining techniques to design, develop, and evaluate social advancement processes that have been applied in several developing economies. This book provides a overview on the main issues of data mining (including its classification, regression, clustering, association rules, trend detection, feature selection, intelligent search, data cleaning, privacy and security issues, etc.) and knowledge enhancing processes as well as a wide spectrum of data mining applications such as computational natural science, e-commerce, environmental study, financial market study, network monitoring, social service analysis, and so forth. This book will be highly acceptable to researchers, academics and practitioners, including GOs and NGOs for further research and study, especially who would be working in the aspect of monitoring and evaluation of projects; follow-up activities on development projects, and be an invaluable scholarly content for development practitioners. Dr. Abdul Matin Patwari Vice Chancellor, The University of Asia Pacific Dhaka, Bangladesh.
  • 17. xii Preface Data mining may be characterized as the process of extracting intelligent information from large amounts of raw data, and day-by-day becoming a pervasive technology in activities as diverse as using historical data to predict the success of a awareness raising campaign by looking into pattern sequence formations, orapromotionaloperationbylookingintopatternsequencetransformations,oramonitoringtoolbylook- ing into pattern sequence repetitions, or a analysis tool by looking into pattern sequence formations. Theories and concepts on data mining recently added to the arena of database and researches in this aspect do not go beyond more than a decade. Very minor research and development activities have been observed in the 1990’s, along the immense prospect of information and communication technologies (ICTs). Organized and coordinated researches on data mining started in 2001, with the advent of various workshops, seminars, promotional campaigns, and funded researches. International conferences on data mining organized by Institute of Electrical and Electronics Engineers, Inc. (since 2001), Wessex Institute of Technology (since 1999), Society for Industrial and Applied Mathematics (since 2001), Institute of ComputerVisionandappliedComputerSciences(since1999),andWorldAcademyofScienceareamong the leaders in creating awareness on advanced research activities on data mining and its effective appli- cations. Furthermore, these events reveal that the theme of research has been shifting from fundamental data mining to information engineering and/or information management along these years. Data mining is a promising and relatively new area of research and development, which can provide important advantages to the users. It can yield substantial knowledge from data primarily gathered through a wide range of applications. Various institutions have derived considerable benefits from its application and many other industries and disciplines are now applying the methodology in increasing effect for their benefit. Subsequently, collective efforts in machine learning, artificial intelligence, statistics, and database communities have been reinforcing technologies of knowledge discovery in databases to extract valuable information from massive amounts of data in support of intelligent decision making. Data mining aims to develop algorithms for extracting new patterns from the facts recorded in a database, and up till now, data mining tools adopted techniques from statistics, network modeling and visualization to classify data and identify patterns. Ultimately, knowledge recovery aims to enable an information system to transform information to knowledge through hypothesis, testing and theory formation. It sets new challenges for database technology: new concepts and methods are needed for basic operations, query languages, and query processing strategies (Witten & Frank, 2005; Yuan, Buttenfield, Gehagen & Miller, 2004). However, data mining does not provide any straightforward analysis, nor does it necessarily equate with machine learning, especially in a situation of relatively larger databases. Furthermore, an exhaustive statistical analysis is not possible, though many data mining methods contain a degree of nondetermin- ism to enable them to scale massive datasets. At the same time, successful applications of data mining are not common, despite the vast literature now accumulating on the subject. The reason is that, although it is relatively straightforward to find
  • 18. xiii pattern or structure in data, but establishing its relevance and explaining its cause are both very diffi- cult tasks. In addition, much of what that has been discovered so far may well be known to the expert. Therefore, addressing these problematic issues requires the synthesis of underlying theory from the databases, statistics, algorithms, machine learning, and visualization (Giudici, 2003; Hastie, Tibshirani & Friedman, 2001; Yuan, Buttenfield, Gehagen & Miller, 2004). Alongtheseperspectives,toenablepractitionersinimprovingtheirresearchesandparticipateactively in solving practical problems related to data explosion, optimum searching, qualitative content manage- ment, improved decision making, and intelligent data mining a complete guide is the need of the hour. A book featuring all these aspects can fill an extremely demanding knowledge gap in the contemporary world. Furthermore, data mining is not an independently existed research subject anymore. To understand its essential insights, and effective implementations one must open the knowledge periphery in multi- dimensional aspects. Therefore, in this era of information revolution data mining should be treated as a cross-cutting and cross-sectoral feature. At the same time, data mining is becoming an interdisciplinary field of research driven by a variety of multidimensional applications. On one hand it entails techniques for machine learning, pattern recognition, statistics, algorithm, database, linguistic, and visualization. On the other hand, one finds applications to understand human behavior, such as that of the end user of an enterprise. It also helps entrepreneurs to perceive the type of transactions involved, including those needed to evaluate risks or detect scams. The reality of data explosion in multidimensional databases is a surprising and widely misunderstood phenomenon. For those about to use an OLAP (online analytical processing) product, it is critically important to understand what data explosion is, what causes it, and how it can be avoided, because the consequences of ignoring data explosion can be very costly, and, in most cases, result in project failure (Applix, 2003), while enterprise data requirements grow at 50-100% a year, creating a constant storage infrastructure management challenge (Intransa, 2005). Concurrently, the database community draws much of its motivation from the vast digital datasets nowavailableonlineandthecomputationalproblemsinvolvedinanalyzingthem.Almostwithoutexcep- tion, current databases and database management systems are designed without to knowledge or content, so the access methods and query languages they provide are often inefficient or unsuitable for mining tasks. The functionality of some existing methods can be approximated either by sampling the data or reexpressing the data in a simpler form. However, algorithms attempt to encapsulate all the important structure contained in the original data, so that information loss is minimal and mining algorithms can function more efficiently. Therefore, sampling strategies must try to avoid bias, which is difficult if the target and its explanation are unknown. These are related to the core technology aspects of data mining. Apart from the intricate technology context, the applications of data mining methods lag in the development context. Lack of data has been found to inhibit the ability of organizations to fully assist clients, and lack of knowledge made the gov- ernment vulnerable to the influence of outsiders who did have access to data from countries overseas. Furthermore, disparity in data collection demands a coordinated data archiving and data sharing, as it is extremely crucial for developing countries. The technique of data mining enables governments, enterprises, and private organizations to carry out mass surveillance and personalized profiling, in most cases without any controls or right of access to examine this data. However, to raise the human capacity and establish effective knowledge systems from the applications of data mining, the main focus should be on sustainable use of resources and the associated systems under specific context (ecological, climatic, social and economic conditions) of developing countries. Research activities should also focus on sustainable management of vulnerable
  • 19. xiv resources and apply integrated management techniques, with a view to support the implementation of the provisions related to research and sustainable use of existing resources (EC, 2005). Toobtainadvantagesofdataminingapplications,thescientificissuesandaspectsofarchivingscientific and technology data can include the discipline specific needs and practices of scientific communities as well as interdisciplinary assessments and methods. In this context, data archiving can be seen primarily as a program of practices and procedures that support the collection, long-term preservation, and low cost access to, and dissemination of scientific and technology data. The tasks of the data archiving in- clude: digitizing data, gathering digitized data into archive collections, describing the collected data to support long term preservation, decreasing the risks of losing data, and providing easy ways to make the data accessible. Hence, data archiving and the associated data centers need to be part of the day-to-day practice of science. This is particularly important now that much new data is collected and generated digitally, and regularly (Codata, 2002; Mohammadian, 2004). So far, data mining has existed in the form of discrete technologies. Recently, its integration into many other formats of ICTs has become attractive as various organizations possessing huge databases began to realize the potential of information hidden there (Hernandez, Göhring & Hopmann, 2004). Thereby, the Internet can be a tremendous tool for the collection and exchange of information, best practices, success cases and vast quantities of data. But it is also becoming increasingly congested and its popular use raises issues about authentication and evaluation of information and data. Interoperability is another issue, which provides significant challenges. The growing number and volume of data sources, together with the high-speed connectivity of the Internet and the increasing number and complexity of data sources, are making interoperability and data integration an important research and industry focus. Moreover, incompatibilities between data formats, software systems, methodologies and analytical models are creating barriers to easy flow and creation of data, information and knowledge (Carty, 2002). All these demand, not only technology revolution, but also tremendous uplift of human capacity as a whole. Therefore,thechallengeofhumandevelopmenttakingintoaccountthesocialandeconomicbackground while protecting the environment confronts decision makers like national governments, local communi- ties and development organizations. A question arises, as how can new technology for information and communication be applied to fulfill this task (Hernandez, Göhring & Hopmann, 2004)? This book gives a review of data mining and decision support techniques and their requirement to achieve sustainable outcomes. It looks into authenticated global approaches on data mining and shows its capabilities as an effective instrument on the base of its application as real projects in the developing countries. The applications are on development of algorithms, computer security, open and distance learning, online analytical processing, scientific modeling, simple warehousing, and social and economic development process. Applying data mining techniques in various aspects of social development processes could thereby empower the society with proper knowledge, and would produce economic products by raising their economic capabilities. On the other hand, coupled to linguistic techniques data mining has produced a new field of text mining. This has considerably increased the applications of data mining to extract ideas and sentiment from a wide range of sources, and opened up new possibilities for data mining that can act as a bridge between the technology and physical sciences and those related to social sciences. Furthermore, data mining today is recognized as an important tool to analyze and understand the information collected by governments, businesses and scientific centers. In the context of novel data, text, and Web-mining application areas are emerging fast and these developments call for new perspectives and approaches in the form of inclusive researches. Similarly, info-miners in the distance learning community are using one or more info-mining tools. They offer a high quality open and distance learning (ODL) information retrieval and search services.
  • 20. xv Thus, ICT based info-mining services will likely be producing huge digital libraries such as e-books, journals, reports and databases on DVD and similar high-density information storage media. Most of these off-line formats are PC-accessible, and can store considerably more information per unit than a CD-ROM(COL,2003).Hence,knowledgeenhancementprocessescanbesignificantlyimprovedthrough proper use of data mining techniques. Thus, data mining techniques are gradually becoming essential components of corporate intelligence systems and are progressively evolving into a pervasive technology within activities that range from the utilization of historical data to predicting the success of an awareness campaign, or a promotional operation in search of succession patterns used as monitoring tools, or in the analysis of genome chains or formation of knowledge banks. In reality, data mining is becoming an interdisciplinary field driven by various multidimensional applications. On one hand it involves schemes for machine learning, pat- tern recognition, statistics, algorithm, database, linguistic, and visualization. On the other hand, one finds its applications to understand human behavior, or to understand the type of transactions involved, or to evaluate risks or detect frauds in an enterprise. Data mining can yield substantial knowledge from raw data that are primarily gathered for a wide range of applications. Various institutions have derived significant benefits from its application, and many other industries and disciplines are now applying the modus operandi in increasing effect for their overall management development. This book tries to examine the meaning and role of data mining in terms of social development ini- tiatives and its outcomes in developing economies in terms of upholding knowledge dimensions. At the same time, it gives an in-depth look into the critical management of information in developed countries with a similar point of view. Furthermore, this book provides an overview on the main issues of data mining (including its classification, regression, clustering, association rules, trend detection, feature selection, intelligent search, data cleaning, privacy and security issues, etc.) and knowledge enhancing processes as well as a wide spectrum of data mining applications such as computational natural science, e-commerce, environmental study, business intelligence, network monitoring, social service analysis, and so forth to empower the knowledge society. Where the Book StandS Intheglobalcontext,acombinationofcontinualtechnologicalinnovationandincreasingcompetitiveness makes the management of information a huge challenge and requires decision-making processes built on reliable and opportune information, gathered from available internal and external sources. Although the volume of acquired information is immensely increasing, this does not mean that people are able to derive appropriate value from it (Maira & Marlei, 2003). This deserves authenticated investigation on information archival strategies and demands years of continuous investments in order to put in place a technological platform that supports all development processes and strengthens the efficiency of the operational structure. Most organizations are supposed to have reached at a certain level where the implementation of IT solutions for strategic levels becomes achievable and essential. This context explains the emergence of the domain generally known as “intelligent data mining”, seen as an answer to the current demands in terms of data/information for decision-making with the intensive utilization of information technology. The objective of the book is to examine the meaning and role of data mining in a particular context (i.e., in terms of development initiatives and its outcomes), especially in developing countries and tran- sitional economies. If the management of information is a challenge even to enterprises in developed
  • 21. xvi countries, what can be said about organizations struggling in unstable contexts such as developing ones? The book has tried to focus on data mining application in developed countries’ context, too. With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging demand to extract useful information from it for economic and scien- tific benefit of the society. Intelligent data mining enables the community to take advantages out of the gathered data and information by taking intelligent decisions. This increases the knowledge content of each member of the community, if it can be applied to practical usage areas. Eventually, a knowledge base is being created and a knowledge-based society will be established. However, data mining involves the process of automatic discovery of patterns, sequences, trans- formations, associations, and anomalies in massive databases, and is a enormously interdisciplinary field representing the confluence of several disciplines, including database systems, data warehousing, machine learning, statistics, algorithms, data visualization, and high-performance computing (LCPS, 2001; UN, 2004). A book of this nature, encompassing such omnipotent subject area has been missing in the contemporary global market, intends to fill in this knowledge gap. In this context, this book provides an overview on the main issues of data mining (including its clas- sification, regression, clustering, association rules, trend detection, feature selection, intelligent search, data cleaning, privacy and security issues, and etc.) and knowledge enhancing processes as well as a wide spectrum of data mining applications such as computational natural science, e-commerce, envi- ronmental study, financial market study, machine learning, Web mining, nanotechnology, e-tourism, and social service analysis. Apart from providing insight into the advanced context of data mining, this book has emphasized on: • Development and availability of shared data, metadata, and products commonly required across diverse societal benefit areas • Promoting research efforts that are necessary for the development of tools required in all societal benefit areas • Encouraging and facilitating the transition from research to operations of appropriate systems and techniques • Facilitating partnerships between operational groups and research groups • Developing recommended priorities for new or augmented efforts in human capacity building • Contributing to, access, and retrieve data from global data systems and networks • Encouraging the adoption of existing and new standards to support broader data and information usability • Data management approaches that encompass a broad perspective on the observation of data life cycle,frominputthroughprocessing,archiving,anddissemination,includingreprocessing,analysis and visualization of large volumes and diverse types of data • Facilitating recording and storage of data in clearly defined formats, with metadata and quality indications to enable search, retrieval, and archiving as easily accessible data sets • Facilitating user involvement and conducting outreach at global, regional, national and local levels • Complete and open exchange of data, metadata, and products within relevant agencies and national policies and legislations
  • 22. xvii organization of ChapterS Altogether this book has fifteen chapters and they are divided into three sections: Education and Re- search; Tools, Techniques, Methods; and Applications of Data Mining. Section I has three chapters, and they discuss policy and decision-making approaches of data mining for sociodevelopment aspects in technical and semitechnical contexts. Section II is comprised of five chapters and they illustrate tools, techniques, and methods of data mining applications for various human development processes and scientific research. The third section has seven chapters and those chapters show various case studies, practical applications and research activities on data mining applications that are being used in the social development processes for empowering the knowledge societies. Chapter I provides an overview of a series of multiple criteria optimization-based data mining meth- ods that utilize multiple criteria programming (MCP) to solve various data mining problems. Authors state that data mining is being established on the basis of many disciplines, such as machine learning, databases, statistics, computer science, and operation research and each field comprehends data mining from its own perspectives by making distinct contributions. They further state that due to the difficulty of accessingtheaccuracyofhiddendataandincreasingthepredictingrateinacomplexlarge-scaledatabase, researchers and practitioners have always desired to seek new or alternative data mining techniques. Therefore, this chapter outlines a few research challenges and opportunities at the end. Chapter II identifies some important barriers to the successful application of computational intel- ligence (CI) techniques in a commercial environment and suggests various ways in which they may be overcome. It states that CI offers new opportunities to a business that wishes to improve the efficiency of their operations. In this context, this chapter further identifies a few key conceptual, cultural, and techni- cal barriers and describes different ways in which they affect the business users and the CI practitioners. This chapter aims to provide knowledgeable insight for its readers through outcome of a successful computational intelligence project and expects that by enabling both parties to understand each other’s perspectives, the true potential of CI may be realized. Chapter III describes two data mining techniques that are used to explore frequent large itemsets in the database. In the first technique called closed directed graph approach. The algorithm scans the database once making a count on 2-itemsets possible from which only the 2-itemsets with a minimum support are used to form the closed directed graph and explores frequent large itemsets in the database. In the second technique, dynamic hashing algorithm large 3-itemsets are generated at an earlier stage that reduces the size of the transaction database after trimming and thereby cost of later iterations will be reduced. Furthermore, this chapter predicts that the techniques may help researchers not only to un- derstand about generating frequent large itemsets, but also finding association rules among transactions within relational databases, and make knowledgeable decisions. It is observed that daily, different satellites capture data of distinct contexts, and among which images are processed and stored by many institutions. In Chapter IV authors present relevant definitions on remote sensing and image mining domain, by referring to related work in this field and indicating about the importance of appropriate tools and techniques to analyze satellite images and extract knowledge from this kind of data. As a case study, the Amazonia deforestation problem is being discussed; as well INPE’s effort to develop and spread technology to deal with challenges involving Earth observation resources. The purpose is to present relevant technologies, new approaches and research directions on remote sensing image mining, and demonstrating how to increase the analysis potential of such huge strategic data for the benefit of the researchers. Chapter V reviews contemporary research on machine learning and Web mining methods that are related to areas of social benefit. It demonstrates that machine learning and Web mining methods may
  • 23. xviii provide intelligent Web services of social interest. The chapter also reveals a growing interest for using advanced computational methods, such as machine learning and Web mining, for better services to the public, as most research identified in the literature has been conducted during recent years. The chapter tries to assist researchers and academics from different disciplines to understand how Web mining and machine learning methods are applied to Web data. Furthermore, it aims to provide the latest develop- ments on research in this field that is related to societal benefit areas. In recent times, customer relationship management (CRM) can be related to sales, marketing and even services automation.Additionally, the concept of CRM is increasingly associated with cost savings and streamline processes as well as with the engendering, nurturing and tracking of relationships with customers. Chapter VI seeks to illustrate how, although the product and service elements as well as organizational structure and strategies are central to CRM, data is the pivotal dimension around which the concept revolves in contemporary terms, and subsequently tried to demonstrate how these processes are associated with data management, namely: data collection, data collation, data storage and data mining, which are becoming essential components of CRM in both theoretical and practical aspects. In Chapter VII, authors have introduced the concept of “one-sum” weighted association rules (WARs) and named such WARs as allocating patterns (ALPs). An algorithm is also being proposed to extract hidden and interestingALPs from data. The chapter further point out thatALPs can be applied in portfolio management. Modeling a collection of investment portfolios as a one-sum weighted transac- tion-database that contains hidden ALPs can do this, and eventually those ALPs, mined from the given portfolio-data, can be applied to guide future investment activities. ChapterVIIIisfocusedtodataminingapplicationsandtheirutilizationsinformulatingperformance- measuring tools for social development activities. In this context, this chapter provides justifications to include data mining algorithm to establish specifically derived monitoring and evaluation tools for vari- ous social development applications. In particular, this chapter gave in-depth analytical observations to establish knowledge centers with a range of approaches and finally it put forward a few research issues and challenges to transform the contemporary human society into a knowledge society. ChapterIX highlightesa few areas of developmentaspects and hints applicationof data mining tools, through which decision-making would be easier. Subsequently, this chapter has put forward potential areas of society development initiatives, where data mining applications can be introduced. The focus area may vary from basic education, health care, general commodities, tourism, and ecosystem manage- ment to advanced uses, like database tomography. This chapter also provides some future challenges and recommendations in terms of using data mining applications for empowering knowledge society. Chapter X focuses on business data warehouse and discusses the retailing giant, Wal-Mart. In this chapter, the planning and implementation of the Wal-Mart data warehouse is being described and its integration with the operational systems is discussed. It also highlighted some of the problems that have been encountered during the development process of the data warehouse, including providing some future recommendations. In Chapter XI medical applications literature associated with nanoscience and nanotechnology re- search was examined.Authors retrieved about 65,000 nanotechnology records in 2005 from the Science Citation Index/ Social Science Citation Index (SCI/SSCI) using a comprehensive 300+ term query. This chapter intends to facilitate the nanotechnology transition process by identifying the significant applica- tion areas. It also identified the main nanotechnology health applications from today’s vantage point, as well as the related science and infrastructure. The medical applications were identified through a fuzzy clustering process, and metrics were generated using text mining to extract technical intelligence for specific medical applications/ applications groups.
  • 24. xix Chapter XII introduces an early warning system for SMEs (SEWS) as a financial risk detector that is based on data mining. Through a study this chapter composes a system in which qualitative and quantitative data about the requirements of enterprises are taken into consideration, during the develop- ment of an early warning system. Moreover, during the formation of this system; an easy to understand, easy to interpret and easy to apply utilitarian model is targeted by discovering the implicit relationships between the data and the identification of effect level of every factor related to the system. This chapter also shows the way of empowering knowledge society from SME’s point of view by designing an early warning system based on data mining. Using this system, SME managers could easily reach financial management, risk management knowledge without any prior knowledge and expertise. Chapter XIII looks at various business intelligence (BI) projects in developing countries, and spe- cifically focuses on Brazilian BI projects. Authors poised this question that, if the management of IT is a challenge for companies in developed countries, what can be said about organizations struggling in unstable contexts such as those often prevailing in developing countries. Within this broad enquiry about the role of BI playing in developing countries, two specific research questions are explored in this chapter. The purpose of the first question is to determine whether those approaches, models, or frameworks are tailored for particularities and the contextually situated business strategy of each company, or if they are “standard” and imported from “developed” contexts. The purpose of the second one is to analyze: what type of information is being considered for incorporation by BI systems; whether they are formal or informal in nature; whether they are gathered from internal or external sources; whether there is a trend that favors some areas, like finance or marketing, over others, or if there is a concern with maintaining multiple perspectives; who in the firms is using BI systems, and so forth. Technologies such as geographic information systems (GIS) enable geo-spatial information to be gathered, modified, integrated, and mapped easily and cost effectively. However, these technologies generate both opportunities and challenges for achieving wider and more effective use of geo-spatial information in stimulating and sustaining sustainable development through elegant policy making. In Chapter XIV, the author proposes a simple and accessible conceptual knowledge discovery interface that can be used as a tool. Moreover, the chapter addresses some issues that might make this knowledge infrastructure stimulate sustainable development, especially emphasizing sub-Saharan African region. Finally, Chapter XV discusses the application of data mining to develop drought monitoring tools that enable monitoring and prediction of drought’s impact on vegetation conditions. The chapter also summarizes current research using data mining approaches (e.g., association rules and decision-tree methods) to develop various types of drought monitoring tools and briefly explains how they are being integrated with decision support systems. This chapter also introduces how data mining can be used to enhance drought monitoring and prediction in the United States, and at the same time, assist others to understand how similar tools might be developed in other parts of the world. ConCluSion Data mining is becoming an essential tool in science, engineering, industrial processes, healthcare, and medicine.Thedatasetsinthesefieldsarelarge,complex,andoftennoisy.However,extractingknowledge from raw datasets requires the use of sophisticated, high-performance and principled analysis techniques and algorithms, based on sound statistical foundations. In turn, these techniques require powerful visual- ization technologies; implementations that must be carefully tuned for enhanced performance; software systems that are usable by scientists, engineers, and physicians as well as researchers.
  • 25. xx Data mining, as stated earlier, is denoted as the extraction of hidden predictive information from large databases, and it is a powerful new technology with great potential to help enterprises focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing entrepreneurs to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective constituents typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. In effect, data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Thus, data mining takes this evolutionary progression beyond retrospective data access and navigation to prospective and proactive information delivery. Furthermore, data mining algorithms allow researchers to device unique decision-making tools from emancipated data varying in nature. Foremost, applying data mining techniques extremely valuable utilities can be devised that could raise the knowledge content at each tier of society segments. However, in terms of accumulated literature and research contexts, not many publications are avail- able in the field of data mining applications in social development phenomenon, especially in the form of a book. By taking this as a baseline, compiled literature seems to be extremely valuable in the context of utilizing data mining and other information techniques for the improvement of skills development, knowledge management, and societal benefits. Similarly, Internet search engines do not fetch sufficient bibliographies in the field of data mining for development perspective. Due to the high demand from researchers’ in the aspect of ICTD, a book of this format stands to be unique. Moreover, utilization of new ICTs in the form of data mining deserves appropriate intervention for their diffusion at local, na- tional, regional, and global levels. Itisassumedthatnumerousindividuals,academics,researchers,engineers,professionalsfromgovern- ment and nongovernment security and development organizations will be interested in this increasingly importanttopicforcarryingoutimplementationstrategiestowardstheirnationaldevelopment.Thisbook will assist its readers to understand the key practical and research issues related to applying data min- ing in development data analysis, cyber acclamations, digital deftness, contemporary CRM, investment portfolios, early warning system in SMEs, business intelligence, and intrinsic nature in the context of society uplift as a whole and the use of data and information for empowering knowledge societies. Most books of data mining deal with mere technology aspects, despite the diversified nature of its various applications along many tiers of human endeavor. However, there are a few activities in recent years that are producing high quality proceedings, but it is felt that compilation of contents of this nature from advanced research outcomes that have been carried out globally may produce a demanding book among the researchers. referenCeS Applix (2003). OLAP data scalability: Ignore the OLAP data explosion at great cost. A White Paper. Westborough, MA: Applix, Inc. Carty,A.J.(2002,September29).Scientificandtechnicaldata:Extendingthefrontiersofresearch.InPro- ceedings of the Opening Address at the 18th International CODATA Conference, Montreal, Quebec.
  • 26. xxi Codata (2002, May 21-22). In Proceedings of the Workshop on Archiving Scientific and Technical Data, Committee on Data for Science and Technology (CODATA), Pretoria, South Africa. COL (2003). Find information faster: COL’s “Info-mining” tools. Vancouver, BC: Clippings, Com- monwealth of Learning. EC (2005). Integrating and strengthening the European Research Area, 2005 Work Programme (SP1- 10). European Commission. Hernandez, V., Göhring, W., & Hopmann, C. (2004, Nov. 30-Dec. 3). Sustainable decision support for environmental problems in developing countries: Applying multicriteria spatial analysis on the Nicara- gua Development Gateway niDG. In Proceedings of the Workshop on Binding EU-Latin American IST Research Initiatives for Enhancing Future Co-Operation. Santo Domingo, Costa Rica. Giudici, P. (2003). Applied data mining: Statistical methods for business and industry. John Wiley. Hastie, T., Tibshirani, R., & Friedman, J. (2001) (Eds.). The elements of statistical learning: Data min- ing, inference, and prediction. Springer Verlag. Intransa (2005). Managing storage growth with an affordable and flexible IPSAN:Ahighly cost-effective storage solution that leverages existing IT resources. San Jose, CA: Intransa, Inc. LCPS (2001, September 11-12). Draft workshop report. In Proceedings of the International Consulta- tive Workshop, The Digital Initiative for Development Agency (DID), The Lebanese Center for Policy Studies (LCPS), Beirut. Maira, P. & Marlei, P. (2003, June 16-21). The value of “business intelligence” in the context of devel- oping countries. In Proceedings of the 11th European Conference on Information Systems, ECIS 2003, Naples, Italy. Retrieved April 6, 2008, http://is2.lse.ac.uk/asp/aspecis/20030119.pdf Mohammadian, M. (2004). Intelligent agents for data mining and information retrieval. Hershey, PA: Idea Group Publishing. UN (2004, June 16). Draft Sao Paulo Consensus, UNCTAD XI Multi-Stakeholder Partnerships, United Nations Conference on Trade and Development, TD/L.380/Add.1, Sao Paulo. Witten, I. H. & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed). Morgan Kaufmann. Yuan, M., Buttenfield, B., Gehagen, M. & Miller, H. (2004). Geospatial data mining and knowledge discovery. In R. B. McMaster & E. L. Usery (Eds.), A research agenda for geographic information sci- ence (pp. 365-388). Boca Raton, FL: CRC Press.
  • 27. xxii Acknowledgment The editor would like to acknowledge the assistance from all involved in the entire accretion of manu- scripts, painstaking review process, and methodical revision of the book, without whose support the project could not have been satisfactorily completed. I am indebted to all the authors who provided their relentless and generous supports, but reviewers who were most helpful and provided comprehensive, thorough and creative comments are: Ali Serhan Koyuncugil, Georgios Lappas, and Paul Henman. Thanks go to my close friends at UNDP, and colleagues at SDNF and ICMS for their wholehearted encouragements during the entire process. Special thanks also go to the dedicated publishing team at IGI Global. Particularly to Kristin Roth, Jessica Thompson, and Jennifer Neidig for their continuous suggestions, supports and feedbacks via e- mail for keeping the project on schedule, and to Mehdi Khosrow-Pour and Jan Travers for their enduring professional supports. Finally, I would like to thank all my family members for their love and support throughout this period. Hakikur Rahman, Editor SDNF, Bangladesh September 2007
  • 30. Chapter I Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications Yong Shi University of the Chinese Academy of Sciences, China and University of Nebraska at Omaha, USA Yi Peng University of Nebraska at Omaha, USA Gang Kou University of Nebraska at Omaha, USA Zhengxin Chen University of Nebraska at Omaha, USA Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited. aBStraCt This chapter provides an overview of a series of multiple criteria optimization-based data mining meth- ods, which utilize multiple criteria programming (MCP) to solve data mining problems, and outlines some research challenges and opportunities for the data mining community. To achieve these goals, this chapter first introduces the basic notions and mathematical formulations for multiple criteria optimiza- tion-based classification models, including the multiple criteria linear programming model, multiple criteria quadratic programming model, and multiple criteria fuzzy linear programming model. Then it presents the real-life applications of these models in credit card scoring management, HIV-1 associated dementia (HAD) neuronal damage and dropout, and network intrusion detection. Finally, the chapter discusses research challenges and opportunities.
  • 31. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications introduCtion Data mining has become a powerful information technology tool in today’s competitive business world.Asthesizesandvarietiesofelectronicdata- setsgrow,theinterestindataminingisincreasing rapidly. Data mining is established on the basis of manydisciplines,suchasmachinelearning,data- bases,statistics,computerscience,andoperations research. Each field comprehends data mining from its own perspective and makes its distinct contributions. It is this multidisciplinary nature that brings vitality to data mining. One of the application roots of data mining can be regarded as statistical data analysis in the pharmaceutical industry.Nowadaysthefinancialindustry,includ- ingcommercialbanks,hasbenefitedfromtheuse of data mining. In addition to statistics, decision trees,neuralnetworks,roughsets,fuzzysets,and vector support machines have gradually become populardataminingmethodsoverthelast10years. Due to the difficulty of accessing the accuracy of hidden data and increasing the predicting rate in a complex large-scale database, researchers and practitioners have always desired to seek new or alternative data mining techniques. This is a key motivation for the proposed multiple criteria optimization-based data mining methods. The objective of this chapter is to provide an overview of a series of multiple criteria optimization-based methods, which utilize the multiple criteria programming (MCP) to solve classification problems. In addition to giving an overview, this chapter lists some data mining research challenges and opportunities for the data mining community. To achieve these goals, the next section introduces the basic notions and mathematical formulations for three multiple criteriaoptimization-basedclassificationmodels: the multiple criteria linear programming model, multiple criteria quadratic programming model, and multiple criteria fuzzy linear programming model. The third section presents some real-life applicationsofthesemodels,includingcreditcard scoring management, classifications on HIV-1 associated dementia (HAD) neuronal damage and dropout, and network intrusion detection. Thechapterthenoutlinesresearchchallengesand opportunities, and the conclusion is presented. Multiple Criteria optiMization-BaSed ClaSSifiCation ModelS This section explores solving classification problems, one of the major areas of data mining, through the use of multiple criteria mathematical programming-based methods (Shi, Wise, Luo, Lin, 2001; Shi, Peng, Kou, Chen, 2005). Such methods have shown its strong applicability in solving a variety of classification problems (e.g., Kou et al., 2005; Zheng et al., 2004). Classification Although the definition of classification in data mining varies, the basic idea of classification can be generally described as to “predicate the most likely state of a categorical variable (the class) given the values of other variables” (Bradley, Fayyad, Mangasarian, 1999, p. 6). Classification is a two-step process. The first step constructs a predictive model based on training dataset. The second step applies the predictive model constructed from the first step to testing dataset. If the classification accuracy of testing dataset is acceptable, the model can be used to predicate unknown data (Han Kamber, 2000; Olson Shi, 2005). Using the multiple criteria programming, the classification task can be defined as follows: for a givensetofvariablesinthedatabase,theboundar- iesbetweentheclassesarerepresentedbyscalars intheconstraintavailabilities.Then,thestandards of classification are measured by minimizing the total overlapping of data and maximizing the distances of every data to its class boundary
  • 32. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications simultaneously. Through the algorithms of MCP, an“optimal”solutionofvariables(so-calledclas- sifier) for the data observations is determined for the separation of the given classes. Finally, the resulting classifier can be used to predict the unknowndatafordiscoveringthehiddenpatterns of data as possible knowledge. Note that MCP differs from the known support vector machine (SVM) (e.g., Mangasarian, 2000; Vapnik, 2000). While the former uses multiple measurements to separate each data from different classes, the latter searches the minority of the data (support vectors)torepresentthemajorityinclassifyingthe data. However, both can be generally regarded as in the same category of optimization approaches to data mining. In the following, we first discuss a general- ized multi-criteria programming model formula- tion, and then explore several variations of the model. A Generalized Multiple Criteria Programming Model Formulation Thissectionintroducesageneralizedmulti-crite- riaprogrammingmethodforclassification.Simply speaking, this method is to classify observations into distinct groups based on two criteria for data separation. The following models represent this concept mathematically: Given an r-dimensional attribute vector a=(a1 ,...ar ), let Ai =(Ai1 ,...,Air )∈Rr be one of the samplerecordsoftheseattributes,wherei=1,...,n;n representsthetotalnumberofrecordsinthedata- set.SupposetwogroupsG1 andG2 arepredefined. A boundary scalar b can be selected to separate these two groups. A vector X = (x1 ,...,Xr )T ∈Rr can be identified to establish the following linear inequations (Fisher, 1936; Shi et al., 2001): • Ai X b,∀Ai ∈G1 • Ai X ≥ b,∀Ai ∈G2 To formulate the criteria and complete con- straints for data separation, some variables need to be introduced. In the classification problem, Ai X is the score for the ith data record. Let ai be the overlapping of two-group boundary for record Ai (external measurement) and βi be the distance of record Ai from its adjusted boundary (internal measurement). The overlapping ai means the distance of record Ai to the boundary b if Ai is misclassified into another group. For instance, in Figure 1 the “black dot” located to the right of the boundary b belongs to G1 , but it was misclassi- fied by the boundary b to G2 . Thus, the distance between b and the “dot” equals ai . Adjusted boundary is defined as b-a* or b+a* , while a* represents the maximum of overlapping (Freed Glover, 1981, 1986). Then, a mathematical function f(a) can be used to describe the relation of all overlapping ai , while another mathematical function g(β) represents the aggregation of all distances βi . The final classification accuracies depend on simultaneously minimizing f(a) and maximizing g(β). Thus, a generalized bi-criteria programming method for classification can be formulated as: (GeneralizedModel)Minimizef(a)andMaximize g(β) Subject to: Ai X - ai +βi - b = 0,  ∀ Ai ∈ G1 , Ai X + ai -βi - b = 0, ∀ Ai ∈ G2 , where Ai , i = 1, …, n are given, X and b are un- restricted, and a= (a1 ,...an )T , β=(β1 ,...βn )T ;ai , βi ≥ 0, i = 1, …, n. Allvariablesandtheirrelationshipsarerepre- sentedinFigure1.TherearetwogroupsinFigure 1:“blackdots”indicateG1 dataobjects,and“stars” indicateG2 dataobjects.Thereisonemisclassified dataobjectfromeachgroupiftheboundaryscalar b is used to classify these two groups, whereas adjusted boundaries b-a* and b+a* separate two groups without misclassification.
  • 33. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications Based on the above generalized model, the following subsection formulates a multiple cri- teria linear programming (MCLP) model and a multiplecriteriaquadraticprogramming(MCQP) model. Multiple Criteria Linear and Quadratic Programming Model Formulation Different forms of f(a) and g(β) in the general- ized model will affect the classification criteria. Commonly f(a) (or g(β)) can be component-wise andnon-increasing(ornon-decreasing)functions. Forexample,inordertoutilizethecomputational power of some existing mathematical program- ming software packages, a sub-model can be set up by using the norm to represent f(a) and g(β). This means that we can assume f(a) = ||a||p and g(β) = ||β||q . To transform the bi-criteria problems of the generalized model into a single-criterion problem, we use weights wa 0 and wβ 0 for ||a||p and ||β||q , respectively. The values of wa and wβ canbepre-definedintheprocessofidentifying theoptimalsolution.Thus,thegeneralizedmodel is converted into a single criterion mathematical programming model as: Model 1: Minimize wa ||a||p - wβ ||β||q Subject to: Ai X - ai +βi -b=0, ∀ Ai ∈ G1 , Ai X+ai -βi -b=0, ∀Ai ∈ G2 , where Ai , i = 1, …, n are given, X and b are un- restricted, and a = (a1 ,...,an )T , β = (β1 ,...βn )T ; ai , βi ≥ 0, i = 1, …, n. Based on Model 1, mathematical program- ming models with any norm can be theoretically defined. This study is interested in formulating a linear and a quadratic programming model. Let p = q = 1, then ||a||1 = ∑ = n i i 1 and ||β||1 = ∑ = n i i 1 . Let p = q = 2, then ||a||2 = ∑ = n i i 1 2 and ||β||2 = ∑ = n i i 1 2 . The objective function in Model 1 can now be an MCLP model or MCQP model. Model 2: MCLP Minimize wa ∑ = n i i 1 - wβ∑ = n i i 1 Subject to: Ai X-ai +βi +b=0, ∀Ai ∈ G1 , Ai X+ai -βi -b=0, ∀Ai ∈ G2 , Figure 1. Two-group classification model G1 G2 Ai X = b - a* Ai X = b + a* Ai X = b i i i i
  • 34. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications where Ai , i = 1, …, n are given, X and b are un- restricted, and a=(a1 ,...an )T , β = (β1 ,...βn )T ; ai , βi ≥ 0, i = 1, …, n. Model 3: MCQP Minimize wa ∑ = n i i 1 2 - wβ ∑ = n i i 1 2 Subject to: Ai X - ai + βi - b = 0, ∀Ai ∈ G1 , Ai X + ai - βi - b = 0, ∀Ai ∈ G2 , where Ai , i = 1, …, n are given, X and b are un- restricted, and a = (a1 ,...,an )T , β = (β1 ,...βn )T ; ai , βi ≥ 0, i = 1, …, n. Remark TherearesomeissuesrelatedtoMCLPandMCQP that can be briefly addressed here: 1. In the process of finding an optimal solu- tion for MCLP problem, if some βi is too large with given wa 0 and wβ 0 and all ai relatively small, the problem may have an unbounded solution. In the real applica- tions, the data with large βi can be detected as “outlier” or “noisy” in the data prepro- cessing, which should be removed before classification. 2. Note that although variables X and b are unrestricted in the above models, X = 0 is an “insignificant case” in terms of data separa- tion,andthereforeitshouldbeignoredinthe process of solving the problem. For b = 0, however, may result a solution for the data separation depending on the data structure. From experimental studies, a pre-defined value of b can quickly lead to an optimal solution if the user fully understands the data structure. 3. Some variations of the generalized model, such as MCQP, are NP-hard problems. Developing algorithms directly to solve these models can be a challenge. Although in application we can utilize some existing commercialsoftware,thetheoretical-related problem will be addressed in later in this chapter. Multiple Criteria Fuzzy Linear Programming Model Formulation It has been recognized that in many decision- making problems, instead of finding the existing “optimalsolution”(agoalvalue),decisionmakers often approach a “satisfying solution” between upper and lower aspiration levels that can be represented by the upper and lower bounds of acceptability for objective payoffs, respectively (Charnes Cooper, 1961; Lee, 1972; Shi Yu, 1989;Yu,1985).Thisidea,whichhasanimportant and pervasive impact on human decision making (Lindsay Norman 1972), is called the decision makers’ goal-seeking concept. Zimmermann (1978) employed it as the basis of his pioneering workonFLP.WhenFLPisadoptedtoclassifythe ‘good’and‘bad’data,afuzzy(satisfying)solution is used to meet a threshold for the accuracy rate of classifications, although the fuzzy solution is a near optimal solution. According to Zimmermann (1978), in formu- lating an FLP problem, the objectives (Minimize Σi ai and Maximize Σi βi ) and constraints (Ai X = b + ai - βi , Ai ∈ G; Ai X = b - ai + βi , Ai ∈B) of the generalized model are redefined as fuzzy sets F and X with corresponding membership func- tions µF (x) and µX (x) respectively. In this case the fuzzy decision set D is defined as D = F ∪ X, and the membership function is defined as µD (x) ={µF (x), µX (x)}. In a maximal problem, x1 is a “better” decision than x2 if µD (x1 ) ≥ µD (x2 ) . Thus, it can be considered appropriately to select x* such that { } ) ( ), ( min max ) ( max x x x X F x D x = { } ) ( ), ( min * * x x X F = is the maximized solu- tion.
  • 35. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications Let y1L be Minimize Σi ai and y2U be Maximize Σi βi , then one can assume that the value of Maxi- mize Σi ai to be y1U and that of Minimize Σi βi to be y2L .Ifthe“upperbound”y1U andthe“lowerbound” y2L do not exist for the formulations, they can be estimated. Let F1 {x: y1L ≤ Σi ai ≤ y1U } and F2 {x: y2L ≤ Σi βi ≤ y2U }and their membership functions can be expressed respectively by:        ≤ Σ Σ − − Σ ≥ Σ = L i i U i i L L U L i i U i i F y if y y if y y y y if x 1 1 1 1 1 1 1 , 0 , , 1 ) ( 1 and        ≤ Σ Σ − − Σ ≥ Σ = L i i U i i L L U L i i U i i F y if y y if y y y y if x 2 2 2 2 2 2 2 , 0 , , 1 ) ( 2 Then the fuzzy set of the objective functions is F = F1 ∩ F2 , and its membership function is { } ) ( ), ( min ) ( 2 1 x x x F F F = . Using the crisp con- straint set X = {x: Ai X = b + ai - βi , Ai ∈ G; Ai X = b - ai + βi , Ai ∈ B}, the fuzzy set of the decision problem is 1 2 D F F X = ∩ ∩ , and its membership function is 1 2 ( ) ( ) D F F X x x ∩ ∩ = . Zi m mer ma n n (1978) has show n t h a t t h e “o p t i m a l s ol u t i o n” of { } ) ( ), ( ), ( min max ) ( max 2 1 x x x x X F F x D x = is an efficient solution of a variation of the generalized model when f(a) = Σi ai and g(β) = Σi βi . Then, this problem is equivalent to the following linear program (He, Liu, Shi, Xu, Yan, 2004): Model 4: FLP Maximize ξ Subject to: L U L i i y y y 1 1 1 − − Σ ≤ L U L i i y y y 2 2 2 − − Σ ≤ Ai X = b + ai - βi , Ai ∈ G, Ai X = b - ai + βi , Ai ∈ B, where Ai , y1L , y1U , y2L and y2U are known, X and b are unrestricted, and ai , βi , ξ ≥ 0. Note that Model 4 will produce a value of ξ with 1 ξ ≥ 0. To avoid the trivial solution, one can set up ξ ε ≥ 0, for a given ε. Therefore, seekingMaximumξintheFLPapproachbecomes the standard of determining the classifications between‘good’and‘bad’recordsinthedatabase. A graphical illustration of this approach can be seen from Figure 2; any point of hyper plane 0 ξ 1 over the shadow area represents the pos- sible determination of classifications by the FLP method. Whenever Model 4 has been trained to meet the given thresholdt, it is said that the better classifier has been identified. A procedure of using the FLP method for data classificationscanbecapturedbytheflowchartof Figure 2. Note that although the boundary of two classesbistheunrestrictedvariableinModel4,it can be presumed by the analyst according to the structure of a particular database. First, choosing a proper value of b can speed up solving Model 4. Second, given a thresholdt, the best data sepa- ration can be selected from a number of results determined by different b values. Therefore, the parameter b plays a key role in this chapter to achieveandguaranteethedesiredaccuracyratet. Forthisreason,theFLPclassificationmethoduses b as an important control parameter as shown in Figure 2. real-life appliCationS uSing Multiple Criteria optiMization approaCheS The models of multiple criteria optimization data mining in this chapter have been applied in credit
  • 36. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications card portfolio management (He et al., 2004; Kou, Liu, Peng, Shi, Wise, Xu, 2003; Peng, Kou, Chen,Shi,2004;Shietal.,2001;Shi,Peng,Xu, Tang, 2002; Shi et al., 2005), HIV-1-mediated neural dendritic and synaptic damage treatment (Zheng et al., 2004), network intrusion detection (Kouetal.,2004a;Kou,Peng,Chen,Shi,Chen. 2004b), and firms bankruptcy analyses (Kwak, Shi,Eldridge,Kou,2006).Theseapproachesare also being applied in other ongoing real-life data mining projects, such as anti-gene and antibody analyses, petroleum drilling and exploration, fraud management, and financial risk evaluation. In order to let the reader understand the useful- ness of the models, the key experiences in some applications are reported as below. Credit Card Portfolio Management The goal of credit card accounts classification is to produce a “blacklist” of the credit cardhold- ers; this list can help creditors to take proactive steps to minimize charge-off loss. In this study, creditcardaccountsareclassifiedintotwogroups: ‘good’ or ‘bad’. From the technical point of view, weneedfirstconstructanumberofclassifiersand then choose one that can find more bad records. Theresearchprocedureconsistsoffivesteps.The first step is data cleaning. Within this step, miss- ing data cells and outliers are removed from the dataset. The second step is data transformation. The dataset is transformed in accord with the format requirements of MCLP software (Kou Shi, 2002) and LINGO 8.0, which is a software toolforsolvingnonlinearprogrammingproblems (LINDO Systems Inc.). The third step is datasets selection. The training dataset and the testing dataset are selected according to a heuristic process. The fourth step is model formulation and classification. The two-group MCLP and MCQP models are applied to the training dataset to obtain optimal solutions. The solutions are then applied to the testing dataset within which class labels are removed for validation. Based on these scores, each record is predicted as either bad(bankruptaccount)orgood(currentaccount). By comparing the predicted labels with original labels of records, the classification accuracies of multiple-criteria models can be determined. If the classification accuracy is acceptable by data analysts, this solution will be applied to future unknown credit card records or applications to make predictions. Otherwise, data analysts can modify the boundary and attributes values to get another set of optimal solutions. The fifth step is results’ presentation. The acceptable classifica- tion results are summarized in tables or figures and presented to end users. Figure 2.Aflowchart of the fuzzy linear program- ming classification method
  • 37. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications Credit Card Dataset The credit card dataset used in this chapter is provided by a major U.S. bank. It contains 5,000 records and 102 variables (38 original variables and 64 derived variables). The data were col- lected from June 1995 to December 1995, and the cardholders were from 28 states of the United States. Each record has a class label to indicate its credit status: either ‘good’ or ‘bad’. ‘Bad’ indi- catesabankruptcycreditcardaccountand‘good’ indicates a good status account. Among these 5,000 records, 815 are bankruptcy accounts and 4,185 are good status accounts. The 38 original variables can be divided into four categories: bal- ance, purchase, payment, and cash advance. The 64 derived variables are created from the original 38 variables to reinforce the comprehension of cardholders’ behaviors, such as times over-limit in last two years, calculated interest rate, cash as percentage of balance, purchase as percentage to balance, payment as percentage to balance, and purchase as percentage to payment. For the pur- pose of credit card classification, the 64 derived variableswerechosentocomputethemodelsince theyprovidemorepreciseinformationaboutcredit cardholders’ behaviors. Experimental Results of MCLP Inspired by the k-fold cross-validation method in classification, this study proposed a heuristic process for training and testing dataset selec- tions. Standard k-fold cross-validation is not used because the majority-vote ensemble method used later on in this chapter may need hundreds of voters. If standard k-fold cross-validation was employed, k should be equal to hundreds. The following paragraph describes the heuristic process. First, the bankruptcy dataset (815 records) is divided into 100 intervals (each interval has eight records). Within each interval, seven records are randomly selected. The number of seven is determined according to empirical results of k-fold cross-validation. Thus 700 ‘bad’ records are obtained. Second, the good-status dataset (4,185 records) is divided into 100 intervals (each interval has 41 records). Within each interval, seven records are randomly selected. Thus the total of 700 ‘good’ records is obtained. Third, the 700 bankruptcy and 700 current records are combined to form a training dataset. Finally, the remaining 115 bankruptcy and 3,485 current ac- counts become the testing dataset. According to this procedure, the total possible combinations of this selection equals (C 7 8 ×C7 41 )100 . Thus, the possibility of getting identical training or testing datasets is approximately zero. The across-the- board thresholds of 65% and 70% are set for the ‘bad’and‘good’class,respectively.Thevaluesof thresholds are determined from previous experi- ence. The classification results whose predictive accuracies are below these thresholds will be filtered out. The whole research procedure can be sum- marized using the following algorithm: Algorithm 1 Input: The data set A = {A1 , A2 , A3 ,…, An }, boundary b Output: The optimal solution, X* = (x1 * , x2 * , x3 * , . . . , x64 * ), the classification score MCLPi Step 1: Generate the Training set and the Testing set from the credit card data set. Step2:Applythetwo-groupMCLPmodelto compute the optimal solution X* = (x1 * , x2 * , . . . , x64 * ) as the best weights of all 64 variables with given values of control parameters (b, a*, β* ) in Training set. Step3:TheclassificationscoreMCLPi =Ai X* against of each observation in the Training set is calculated against the boundary b to check the performance measures of the classification.
  • 38. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications Step 4: If the classification result of Step 3 is acceptable(i.e.,thefoundperformancemea- sureislargerorequaltothegiventhreshold), go to the next step. Otherwise, arbitrarily choosedifferentvaluesofcontrolparameters (b, a*, β* ) and go to Step 1. Step5:UseX* =(x1 * ,x2 * ,...,x64 * )tocalculate the MCLP scores for all Ai in the Testing set and conduct the performance analysis. If it produces a satisfying classification result, go to the next step. Otherwise, go back to Step 1 to reformulate the Training Set and Testing Set. Step 6: Repeat the whole process until a preset number (e.g., 999) of different X* are generated for the future ensemble method. End. Using Algorithm 1 to the credit card dataset, classification results were obtained and summa- rized. Due to the space limitation, only a part (10 out of the total 500 cross-validation results) of the results is summarized in Table 1 (Peng et al., 2004).Thecolumns“Bad”and“Good”refertothe numberofrecordsthatwerecorrectlyclassifiedas “bad” and “good,” respectively. The column “Ac- curacy” was calculated using correctly classified records divided by the total records in that class. For instance, 80.43% accuracy of Dataset 1 for bad record in the training dataset was calculated using 563 divided by 700 and means that 80.43% ofbadrecordswerecorrectlyclassified.Theaver- agepredictiveaccuraciesforbadandgoodgroups in the training dataset are 79.79% and 78.97%, and the average predictive accuracies for bad and good groups in the testing dataset are 68% and 74.39%. The results demonstrated that a good separation of bankruptcy and good status credit card accounts is observed with this method. Improvement of MCLP Experimental Results with Ensemble Method Increditcardbankruptcypredictions,evenasmall percentage of increase in the classification accu- racy can save creditors millions of dollars. Thus it is necessary to investigate possible techniques thatcanimproveMCLPclassificationresults.The technique studied in this experiment is major- ity-vote ensemble. An ensemble consists of two fundamental elements: a set of trained classifiers and an aggregation mechanism that organizes these classifiers into the output ensemble. The aggregation mechanism can be an average or a Cross Validation Training Set (700 Bad +700 Good) Testing Set (115 Bad +3485 Good) Bad Accuracy Good Accuracy Bad Accuracy Good Accuracy DataSet 1 563 80.43% 557 79.57% 78 67.83% 2575 73.89% DataSet 2 546 78.00% 546 78.00% 75 65.22% 2653 76.13% DataSet 3 564 80.57% 560 80.00% 75 65.22% 2550 73.17% DataSet 4 553 79.00% 553 79.00% 78 67.83% 2651 76.07% DataSet 5 548 78.29% 540 77.14% 78 67.83% 2630 75.47% DataSet 6 567 81.00% 561 80.14% 79 68.70% 2576 73.92% DataSet 7 556 79.43% 548 78.29% 77 66.96% 2557 73.37% DataSet 8 562 80.29% 552 78.86% 79 68.70% 2557 73.37% DataSet 9 566 80.86% 557 79.57% 83 72.17% 2588 74.26% DataSet 10 560 80.00% 554 79.14% 80 69.57% 2589 74.29% Table 1. MCLP credit card accounts classification
  • 39. 0 Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications majority vote (Zenobi Cunningham, 2002). Weingessel,Dimitriadou,andHornik(2003)have reviewedaseriesofensemble-relatedpublications (Dietterich, 2000; Lam, 2000; Parhami, 1994; Bauer Kohavi, 1999; Kuncheva, 2000). Previ- ousresearchhasshownthatanensemblecanhelp to increase classification accuracy and stability (OpitzMaclin,1999).ApartofMCLP’soptimal solutions was selected to form ensembles. Each solution will have one vote for each credit card record,andfinalclassificationresultisdetermined by the majority votes. Algorithm 2 describes the ensemble process: Algorithm 2 Input: The data set A = {A1 , A2 , A3 , …, An }, boundary b , a certain number of solutions, X* = (x1 * , x2 * , x3 * , . . . , x64 * ) Output:TheclassificationscoreMCLPi and the prediction Pi Step 1: A committee of certain odd number of classifiers X* is formed. Step 2: The classification score MCLPi = Ai X* against each observation is calculated against the boundary b by every member of the committee. The performance measures of the classification will be decided by majorities of the committee. If more than half of the committee members agreed in the classification, then the prediction Pi for thisobservationissuccessful,otherwisethe prediction is failed. Step 3: The accuracy for each group will be computed by the percentage of successful classification in all observations. End. The results of applying Algorithm 2 are sum- marizedinTable2(Pengetal.,2004).Theaverage predictive accuracies for bad and good groups in the training dataset are 80.8% and 80.6%, and the average predictive accuracies for bad and good groups in the testing dataset are 72.17% and 76.4%.Comparedwithpreviousresults,ensemble technique improves the classification accuracies. Especially for bad records classification in the testingset,theaverageaccuracyincreased4.17%. Since bankruptcy accounts are the major cause of creditors’ loss, predictive accuracy for bad records is considered to be more important than for good records. Experimental Results of MCQP Based on the MCQP model and the research proceduredescribedinprevioussections,similar experimentswereconductedtogetMCQPresults. LINGO8.0wasusedtocomputetheoptimalsolu- tions. The whole research procedure for MCQP is summarized in Algorithm 3: Ensemble Results Training Set (700 Bad data+700 Good data) Testing Set (115 Bad data+3485 Good data) No. of Voters Bad Accuracy Good Accuracy Bad Accuracy Good Accuracy 9 563 80.43% 561 80.14% 81 70.43% 2605 74.75% 99 565 80.71% 563 80.43% 83 72.17% 2665 76.47% 199 565 80.71% 566 80.86% 83 72.17% 2656 76.21% 299 568 81.14% 564 80.57% 84 73.04% 2697 77.39% 399 567 81.00% 567 81.00% 84 73.04% 2689 77.16% Table 2. MCLP credit card accounts classification with ensemble
  • 40. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications Algorithm 3 Input: The data set A = {A1 , A2 , A3 ,…, An }, boundary b Output: The optimal solution, X* = (x1 * x2 * , x3 * , . . . , x64 * ), the classification score MCQPi Step 1: Generate the Training set and Test- ing set from the credit card data set. Step 2: Apply the two-group MCQP model to compute the compromise solution X* = (x1 * , x2 * , . . . , x64 * ) as the best weights of all 64 variables with given values of control parameters (b, a* , β* ) using LINGO 8.0 software. Step 3: The classification score MCQPi = Ai X* against each observation is calculated against the boundary b to check the perfor- mance measures of the classification. Step 4: If the classification result of Step 3 is acceptable (i.e., the found performance measure is larger or equal to the given threshold), go to the next step. Otherwise, choosedifferentvaluesofcontrolparameters (b, a* , β* ) and go to Step 1. Step 5: Use X* = (x1 * , x2 * ,..., x64 * ) to calculate the MCQP scores for all Ai in the test set and conduct the performance analysis. If it produces a satisfying classification result, go to the next step. Otherwise, go back to Step 1 to reformulate the Training Set and Testing Set. Step 6: Repeat the whole process until a preset number of different X* are gener- ated. End. A part (10 out of the total 38 results) of the results is summarized in Table 3. The average predictive accuracies for bad and good groups in the training dataset are 86.61% and73.29%,andtheaveragepredictiveaccuracies for bad and good groups in the testing dataset are 81.22% and 68.25%. Compared with MCLP, MCQP has lower predictive accuracies for good records.Nevertheless,badgroupclassificationac- curacies of the testing set using MCQP increased from 68% to 81.22%, which is a remarkable improvement. Improvement of MCQP with Ensemble Method Similar to the MCLP experiment, the majority- vote ensemble discussed previously was applied Cross Validation Training Set (700 Bad data+700 Good data) Testing Set (115 Bad data+3485 Good data) Bad Accuracy Good Accuracy Bad Accuracy Good Accuracy DataSet 1 602 86.00% 541 77.29% 96 83.48% 2383 68.38% DataSet 2 614 87.71% 496 70.86% 93 80.87% 2473 70.96% DataSet 3 604 86.29% 530 75.71% 95 82.61% 2388 68.52% DataSet 4 616 88.00% 528 75.43% 95 82.61% 2408 69.10% DataSet 5 604 86.29% 547 78.14% 90 78.26% 2427 69.64% DataSet 6 614 87.71% 502 71.71% 94 81.74% 2328 66.80% DataSet 7 610 87.14% 514 73.43% 95 82.61% 2380 68.29% DataSet 8 582 83.14% 482 68.86% 93 80.87% 2354 67.55% DataSet 9 614 87.71% 479 68.43% 90 78.26% 2295 65.85% DataSet 10 603 86.14% 511 73.00% 93 80.87% 2348 67.37% Table 3. MCQP credit card accounts classification
  • 41. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications to MCQP to examine whether it can make an improvement.TheresultsarerepresentedinTable 4. The average predictive accuracies for bad and good groups in the training dataset are 89.18% and74.68%,andtheaveragepredictiveaccuracies for bad and good groups in the testing dataset are 85.61% and 68.67%. Compared with previous MCQP results, majority-vote ensemble improves the total classification accuracies. Especially for bad records in testing set, the average accuracy increased 4.39%. Experimental Results of Fuzzy Linear Programming Applying the fuzzy linear programming model discussedearlierinthischaptertothesamecredit card dataset, we obtained some FLP classifica- tion results. These results are compared with the decision tree, MCLP, and neural networks (see Tables 5 and 6). The software of decision tree is the commercial version called C5.0 (C5.0 2004), while software for both neural network and MCLP were developed at the Data Mining Lab, University of Nebraska at Omaha, USA (Kou Shi, 2002). Note that in both Table 5 and Table 6, the columns Tg and Tb respectively represent the number of good and bad accounts identified by a method,whiletherowsofgoodandbadrepresent the actual numbers of the accounts. Classifications on HIV-1 Mediated Neural Dendritic and Synaptic Damage Using MCLP The ability to identify neuronal damage in the dendriticarborduringHIV-1-associateddementia (HAD) is crucial for designing specific therapies for the treatment of HAD. A two-class model of multiplecriterialinearprogramming(MCLP)was proposed to classify such HIV-1 mediated neuro- naldendriticandsynapticdamages.Givencertain classes, including treatments with brain-derived neurotrophic factor (BDNF), glutamate, gp120, or non-treatment controls from our in vitro ex- perimentalsystems,weusedthetwo-classMCLP model to determine the data patterns between classes in order to gain insight about neuronal dendritic and synaptic damages under different treatments (Zheng et al., 2004). This knowledge can be applied to the design and study of specific therapiesforthepreventionorreversalofneuronal damage associated with HAD. Ensemble Results Training Set (700 Bad data+700 Good data) Testing Set (115 Bad data+3485 Good data) No. of Voters Bad Accuracy Good Accuracy Bad Accuracy Good Accuracy 3 612 87.43% 533 76.14% 98 85.22% 2406 69.04% 5 619 88.43% 525 75.00% 95 82.61% 2422 69.50% 7 620 88.57% 525 75.00% 97 84.35% 2412 69.21% 9 624 89.14% 524 74.86% 100 86.96% 2398 68.81% 11 625 89.29% 525 75.00% 99 86.09% 2389 68.55% 13 629 89.86% 517 73.86% 100 86.96% 2374 68.12% 15 629 89.86% 516 73.71% 98 85.22% 2372 68.06% 17 632 90.29% 520 74.29% 99 86.09% 2379 68.26% 19 628 89.71% 520 74.29% 100 86.96% 2387 68.49% Table 4. MCQP credit card accounts classification with ensemble
  • 42. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications Database Thedataproducedbylaboratoryexperimentation andimageanalysiswasorganizedintoadatabase composed of four classes (G1-G4), each of which has nine attributes. The four classes are defined as the following: • G1:TreatmentwiththeneurotrophinBDNF (brain-derived neurotrophic factor, 0.5 ng/ml, 5 ng/ml, 10 ng/mL, and 50 ng/ml), this factor promotes neuronal cell survival and has been shown to enrich neuronal cell cultures (Lopez et al., 2001; Shibata et al., 2003). • G2: Non-treatment, neuronal cells are kept in their normal media used for culturing (NeurobasalmediawithB27,whichisaneu- ronal cell culture maintenance supplement from Gibco, with glutamine and penicillin- streptomycin). • G3: Treatment with glutamate (10, 100, and 1,000 M). At low concentrations, gluta- mate acts as a neurotransmitter in the brain. However,athighconcentrations,ithasbeen shown to be a neurotoxin by over-stimulat- ing NMDA receptors. This factor has been shown to be upregulated in HIV-1-infected macrophages(Jiangetal.,2001)andthereby linkedtoneuronaldamagebyHIV-1infected macrophages. • G4: Treatment with gp120 (1 nanoM), an HIV-1 envelope protein. This protein could interactwithreceptorsonneuronsandinter- fere with cell signaling leading to neuronal damage, or it could also indirectly induce neuronal injury through the production of otherneurotoxins(Hesselgesseretal.,1998; Kaul, Garden, Lipton, 2001; Zheng et al., 1999). The nine attributes are defined as: • x1 = The number of neurites Decision Tree Tg Tb Total Good 138 2 140 Bad 13 127 140 Total 151 129 280 Neural Network Tg Tb Total Good 116 24 140 Bad 14 126 140 Total 130 150 280 MCLP Tg Tb Total Good 134 6 140 Bad 7 133 140 Total 141 139 280 FLP Tg Tb Total Good 127 13 140 Bad 13 127 140 Total 140 140 280 Decision Tree Tg Tb Total Good 2180 2005 4185 Bad 141 674 815 Total 2321 2679 5000 Neural Network Tg Tb Total Good 2814 1371 4185 Bad 176 639 815 Total 2990 2010 5000 MCLP Tg Tb Total Good 3160 1025 4185 Bad 484 331 815 Total 3644 1356 5000 FLP Tg Tb Total Good 2498 1687 4185 Bad 113 702 815 Total 2611 2389 5000 Table 5. Learning comparisons on balanced 280 records Table 6. Comparisons on prediction of 5,000 records
  • 43. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications • x2 = The number of arbors • x3 = The number of branch nodes • x4 = The average length of arbors • x5 = The ratio of neurite to arbor • x6 = The area of cell bodies • x7 = The maximum length of the arbors • x8 = The culture time (during this time, the neuron grows normally and BDNF, glutamate, or gp120 have not been added to affect growth) • x9 = The treatment time (during this time, the neuron was growing under the effects of BDNF, glutamate, or gp120) The database used in this chapter contained 2,112 observations. Among them, 101 are on G1, 1,001 are on G2, 229 are on G3, and 781 are on G4. Comparing with the traditional mathematical tools in classification, such as neural networks, decision tree, and statistics, the two-class MCLP approach is simple and direct, free of the statisti- cal assumptions, and flexible by allowing deci- sion makers to play an active part in the analysis (Shi, 2001). Results of Empirical Study Using MClp Byusingthetwo-classmodelfortheclassifications on {G1, G2, G3, and G4}, there are six possible pairings: G1 vs. G2; G1 vs. G3; G1 vs. G4; G2 vs. G3; G2 vs. G4; and G3 vs. G4. In the cases of G1 vs. G3 and G1 vs. G4, we see these combina- tions would be treated as redundancies, therefore they are not considered in the pairing groups. G1 through G3 or G4 is a continuum. G1 represents anenrichmentofneuronalcultures,G2isbasalor maintenance of neuronal culture, and G3/G4 are both damage of neuronal cultures. There would never be a jump between G1 to G3/G4 without traveling through G2. So, we used the following four two-class pairs: G1 vs. G2; G2 vs. G3; G2 vs. G4; and G3 vs. G4. The meanings of these two-class pairs are: • G1vs.G2showsthatBDNFshouldenrichthe neuronalcellculturesandincreaseneuronal networkcomplexity—thatis,moredendrites and arbors, more length to dendrites, and so forth. • G2 vs. G3 indicates that glutamate should damage neurons and lead to a decrease in dendrite and arbor number including den- drite length. • G2 vs. G4 should show that gp120 causes neuronal damage leading to a decrease in dendrite and arbor number and dendrite length. • G3 vs. G4 provides information on the pos- sible difference between glutamate toxicity and gp120-induced neurotoxicity. Given a threshold of training process that can beanyperformancemeasure,wehavecarriedout the following steps: Algorithm 4 Step1:Foreachclasspair,weusedtheLinux code of the two-class model to compute the compromise solution X* = (x1 * ,..., x9 * ) as the best weights of all nine neuronal variables with given values of control parameters (b, a* , β* ). Step 2: The classification score MCLPi = Ai X* against of each observation has been calculated against the boundary b to check the performance measures of the classifica- tion. Step 3: If the classification result of Step 2 is acceptable (i.e., the given performance measure is larger or equal to the given threshold), go to Step 4. Otherwise, choose different values of control parameters (b, a* , β* ) and go to Step 1.
  • 44. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications Step 4: For each class pair, use X* = (x1 * ,..., x9 * ) to calculate the MCLP scores for all Ai in the test set and conduct the performance analysis. According to the nature of this research, we define the following terms, which have been widely used in the performance analysis as: TP (True Positive) = the number of records in the first class that has been classified cor- rectly FP(FalsePositive)=thenumberofrecords in the second class that has been classified into the first class TN(TrueNegative)=thenumberofrecords in the second class that has been classified correctly FN(FalseNegative)=thenumberofrecords in the first class that has been classified into the second class Then we have four different performance measures: Sensitivity = FN TP TP + Positive Predictivity = FP TP TP + False-Positive Rate = FP TN FP + Negative Predictivity = TN FN TN + The “positive” represents the first-class label while the “negative” represents the second-class label in the same class pair. For example, in the class pair {G1 vs. G2}, the record of G1 is “posi- tive” while that of G2 is “negative.” Among the above four measures, more attention is paid to sensitivity or false-positive rates because both measurethecorrectnessofclassificationonclass- pair data analyses. Note that in a given a class pair, the sensitivity represents the corrected rate of the first class, and one minus the false positive rate is the corrected rate of the second class by the above measure definitions. Consideringthelimiteddataavailabilityinthis pilot study, we set the across-the-board threshold of 55% for sensitivity [or 55% of (1- false posi- tive rate)] to select the experimental results from training and test processes. All 20 of the training and test sets, over the four class pairs, have been computed using the above procedure. The results against the threshold are summarized in Tables 7 to 10. As seen in these tables, the sensitivities for the comparison of all four pairs are higher than 55%, indicating that good separation among individual pairs is observed with this method. The results are then analyzed in terms of both positive predictivity and negative predictivity for the prediction power of the MCLP method on neuron injuries. In Table 7, G1 is the number of observations predefined as BDNF treatment, G2 is the number of observations predefined as non-treatment, N1 means the number of obser- Training N1 N2 Sensitivity Positive Predictivity False Positive Rate Negative Predictivity G1 55 (TP) 34 (FN) 61.80% 61.80% 38.20% 61.80% G2 34 (FP) 55 (TN) Test N1 N2 Sensitivity Positive Predictivity False Positive Rate Negative Predictivity G1 11 (TP) 9 (FN) 55.00% 3.78% 30.70% 98.60% G2 280 (FP) 632 (TN) Table 7. Classification results with G1 vs. G2
  • 45. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications Training N2 N3 Sensitivity Positive Predictivity False Positive Rate Negative Predictivity G2 126 (TP) 57 (FN) 68.85% 68.48% 31.69% 68.68% G3 58 (FP) 125 (TN) Test N2 N3 Sensitivity Positive Predictivity False Positive Rate Negative Predictivity G2 594 (TP) 224 (FN) 72.62% 99.32% 8.70% 15.79% G3 4 (FP) 42 (TN) Training N2 N4 Sensitivity Positive Predictivity False Positive Rate Negative Predictivity G2 419(TP) 206 (FN) 67.04% 65.88% 34.72% 66.45% G4 217 (FP) 408 (TN) Test N2 N4 Sensitivity Positive Predictivity False Positive Rate Negative Predictivity G2 216 (TP) 160 (FN) 57.45% 80.90% 32.90% 39.39% G4 51 (FP) 104 (TN) Training N3 N4 Sensitivity Positive Predictivity False Positive Rate Negative Predictivity G3 120(TP) 40 (FN) 57.45% 80.90% 24.38% 75.16% G4 39 (FP) 121 (TN) Test N3 N4 Sensitivity Positive Predictivity False Positive Rate Negative Predictivity G3 50 (TP) 19 (FN) 72.46% 16.78% 40.00% 95.14% G4 248 (FP) 372 (TN) Table 8. Classification results with G2 vs. G3 Table 9. Classification results with G2 vs. G4 Table 10. Classification results with G3 vs. G4 vations classified as BDNF treatment, and N2 is the number of observations classified as non- treatment. The meanings of other pairs in Tables 8 to 10 can be similarly explained. In Table 7 for {G1 vs. G2}, both positive predictivity and negative predictivity are the same (61.80%) in the training set. However, the negative predictivity of the test set (98.60%) is much higher than that of the positive predictivity (3.78%). The predic- tion of G1 in the training set is better than that of the test set, while the prediction of G2 in test outperforms that of training. This is due to the small size of G1. In Table 3 for {G2 vs. G3}, the positive predictivity (68.48%) is almost equal to the negative predictivity (68.68%) of the training set. The positive predictivity (99.32%) is much higher than the negative predictivity (15.79%) of the test set. As a result, the prediction of G2 in
  • 46. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications the test set is better than in the training set, but the prediction of G3 in the training set is better than in the test set. The case of Table 9 for {G2 vs. G4} is similar to that of Table 8 for {G2 vs. G3}. We see that the separation of G2 in test (80.90%) is better than in training (65.88%), while the separation of G4 in training(66.45%)isbetterthanintest(39.39%).In the case of Table 10 for {G3 vs. G4}, the positive predictivity (80.90%) is higher than the negative predictivity (75.16%) of the training set. Then, the positive predictivity (16.78%) is much lower than the negative predictivity (95.14%) of the test set. The prediction of G3 in training (80.90%) is better than that of test (16.78%), and the predic- tion of G4 in test (95.14%) is better than that of training (75.16%). In summary, we observed that the predictions of G2 in test for {G1 vs. G2}, {G2 vs. G3}, and {G2 vs. G4} is always better than those in train- ing. The prediction of G3 in training for {G2 vs. G3} and {G3 vs. G4} is better than those of test. Finally, the prediction of G4 for {G2 vs. G4} in training reverses that of {G3 vs. G4} in test. If we emphasize the test results, these results are favorable to G2. This may be due to the size of G2 (non-treatment), which is larger than all other classes.Theclassificationresultscanchangeifthe sizes of G1, G3, and G4 increase significantly. Network Intrusion Detection Network intrusions are malicious activities that aim to misuse network resources. Although various approaches have been applied to network intrusion detection, such as statistical analysis, sequence analysis, neural networks, machine learning,andartificialimmunesystems,thisfield isfarfrommaturity,andnewsolutionsareworthy of investigation. Since intrusion detection can be treated as a classification problem, it is feasible to apply a multiple-criterion classification model to this type of application. The objective of this ex- perimentistoexaminetheapplicabilityofMCLP and MCQP models in intrusion detection. KDD Dataset The KDD-99 dataset provided by DARPA was used in our intrusion detection test. The KDD-99 datasetincludesawidevarietyofintrusionssimu- lated in a military network environment. It was used in the 1999 KDD-CUP intrusion detection contest. After the contest, KDD-99 has become a de facto standard dataset for intrusion detection experiments. Within the KDD-99 dataset, each connection has 38 numerical variables and is labeled as normal or attack. There are four main categories of attacks: denial-of-service (DOS), unauthorized access from a remote machine (R2L),unauthorizedaccesstolocalrootprivileges (U2R), surveillance and other probing. The train- ing dataset contains a total of 24 attack types, while the testing dataset contains an additional 14 types (Stolfo, Fan, Lee, Prodromidis, Chan, 2000). Because the number of attacks for R2L, U2R, and probing is relatively small, this experi- ment focused on DOS. Experimental Results of MCLP Following the heuristic process described in this chapter, training and testing datasets were selected: first, the ‘normal’ dataset (812,813 records) was divided into 100 intervals (each interval has 8,128 records). Within each interval, 20 records were randomly selected. Second, the ‘DOS’ dataset (247,267 records) was divided into 100 intervals (each interval has 2,472 records). Within each interval, 20 records were randomly selected. Third, the 2,000 normal and 2,000 DOS recordswerecombinedtoformatrainingdataset. Because KDD-99 has over 1 million records, and 4,000 training records represent less than 0.4% of it, the whole KDD-99 dataset is used for test- ing. Various training and testing datasets can be
  • 47. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications obtained by repeating this process. Considering the previous high detection rates of KDD-99 by other methods, the across-the-board threshold of 95% was set for both normal and DOS. Since training dataset classification accuracies are all 100%, only testing dataset (10 out of the total 300 results) results are summarized in Table 11 (Kou et al., 2004a). The average predictive accuracies for normal and DOS groups in the testing dataset are 98.94% and 99.56%. Improvement of MCLP with Ensemble Method Themajority-voteensemblemethoddemonstrated its superior performance in credit card accounts classification.Canitimprovetheclassificationac- curacyofnetworkintrusiondetection?Toanswer this question, the majority-vote ensemble was applied to the KDD-99 dataset. Ensemble results are summarized in Table 12 (Kou et al., 2004a). Theaveragepredictiveaccuraciesfornormaland DOS groups in the testing dataset are 99.61% and 99.78%.BothnormalandDOSpredictiveaccura- cies have been slightly improved. Cross Validation Testing Set (812813 Normal + 247267 Dos) Normal Accuracy DOS Accuracy DataSet 1 804513 98.98% 246254 99.59% DataSet 2 808016 99.41% 246339 99.62% DataSet 3 802140 98.69% 245511 99.29% DataSet 4 805151 99.06% 246058 99.51% DataSet 5 805308 99.08% 246174 99.56% DataSet 6 799135 98.32% 246769 99.80% DataSet 7 805639 99.12% 246070 99.52% DataSet 8 802938 98.79% 246566 99.72% DataSet 9 805983 99.16% 245498 99.28% DataSet 10 802765 98.76% 246641 99.75% Number of Voters Normal Accuracy DOS Accuracy 3 809567 99.60% 246433 99.66% 5 809197 99.56% 246640 99.75% 7 809284 99.57% 246690 99.77% 9 809287 99.57% 246737 99.79% 11 809412 99.58% 246744 99.79% 13 809863 99.64% 246794 99.81% 15 809994 99.65% 246760 99.79% 17 810089 99.66% 246821 99.82% 19 810263 99.69% 246846 99.83% Table 11. MCLP KDD-99 classification results Table 12. MCLP KDD-99 classification results with ensemble
  • 48. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications Experimental Results of MCQP A similar MCQP procedure used in credit card accounts classification was used to classify the KDD-99 dataset. A part of the results is sum- marized in Table 13 (Kou et al., 2004b). These results are slightly better than MCLP. Improvement of MCQP with Ensemble Method The majority-vote ensemble was used on MCQP results,andapartoftheoutputsissummarizedin Table 14 (Kou et al., 2004b). The average predic- tive accuracies for normal and DOS groups in the testing dataset are 99.86% and 99.82%. Although the increase in classification accuracy is small, Cross Validation Testing Set(812813 Normal + 247267 Dos) Normal Accuracy DOS Accuracy DataSet 1 808142 99.43% 245998 99.49% DataSet 2 810689 99.74% 246902 99.85% DataSet 3 807597 99.36% 246491 99.69% DataSet 4 808410 99.46% 246256 99.59% DataSet 5 810283 99.69% 246090 99.52% DataSet 6 809272 99.56% 246580 99.72% DataSet 7 806116 99.18% 246229 99.58% DataSet 8 808143 99.43% 245998 99.49% DataSet 9 811806 99.88% 246433 99.66% DataSet 10 810307 99.69% 246702 99.77% NO of Voters Normal Accuracy DOS Accuracy 3 810126 99.67% 246792 99.81% 5 811419 99.83% 246930 99.86% 7 811395 99.83% 246830 99.82% 9 811486 99.84% 246795 99.81% 11 812030 99.90% 246845 99.83% 13 812006 99.90% 246788 99.81% 15 812089 99.91% 246812 99.82% 17 812045 99.91% 246821 99.82% 19 812069 99.91% 246817 99.82% 21 812010 99.90% 246831 99.82% 23 812149 99.92% 246821 99.82% 25 812018 99.90% 246822 99.82% Table 13. MCQP KDD-99 classification results Table 14. MCQP KDD-99 classification results with ensemble
  • 49. 0 Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications both normal and DOS predictive accuracies have been improved compared with previous 99.54% and 99.64%. reSearCh ChallengeS and opportunitieS Althoughtheabovemultiplecriteriaoptimization dataminingmethodshavebeenappliedinthereal- life applications, there are number of challenging problemsinmathematicalmodeling.Whilesome oftheproblemsarecurrentlyunderinvestigation, some others remain to be explored. Variations and Algorithms of Generalized Models GivenModel1,ifp=2,q=1,itwillbecomeaconvex quadratic program which can be solved by using some known convex quadratic programming al- gorithm. However, when p=1, q=2, Model 1 is a concave quadratic program; and when p=2, q=2, we have Model 3 (MCQP), which is an indefinite quadratic problem. Since both concave quadratic programming and MCQP are NP-hard problems, itisverydifficulttofindaglobaloptimalsolution. Weareworkingonbothcasesfordevelopingdirect algorithms that can converge to local optima in classification (Zhang, Shi, Zhang, 2005). Kernel Functions for Data Observations Thegeneralizedmodelinthechapterhasanatural connection with known support vector machines (SVM)(Mangasarian,2000;Vapnik,2000)since they both belong to the category of optimiza- tion-based data mining methods. However, they differ from ways to identify the classifiers. As we mentioned before, while the multiple criteria optimization approaches in this chapter use the overlappingandinteriordistanceastwostandards to measure the separation of each observation in thedataset,SVMselectstheminorityofobserva- tions (support vectors) to represent the majority of the rest of the observations. Therefore, in the experimental studies and real applications, SVM mayhaveahighaccuracyinthetrainingset,buta loweraccuracyinthetestingresult.Nevertheless, the use of kernel functions in SVM has shown its efficiency in handling nonlinear datasets. How to adopt kernel functions into the multiple criteria optimization approaches can be an interesting research problem. Kou, Peng, Shi, and Chen (2006) explored some possibility of this research direction. The basic idea is outlined. First, we can rewrite the generalized model (Model 1) similar to the approach of SVM. Suppose the two-classes G1 and G2 are under consideration. Then, a n×n diagonal matrix Y, which only contains +1 or -1, indicates the class membership. A -1 in row i of matrix Y indicates the corresponding record Ai ∈ G1 , and a +1 in row i of matrix Y indicates the corresponding record Ai ∈ G2 . The constraints in Model 1, Ai X = b + ai - βi , ∀ Ai ∈ G1 and Ai X = b - ai + βi , ∀Ai ∈ G2 , are converted as: Y (A⋅X - eb) = a - β, where e = (1,1,…,1)T , a = (a1 ,...,an ) , and β = (β1 ,..., βn )T . In order to maximize the distance 2 2 X between the twoadjustedboundinghyperplanes,thefunction 2 2 1 X should also be minimized. Let s = 2, q =1, and p =1, then a simple quadratic programming (SQP) variation of Model 1 can be built as: Model 5: SQP Minimize ∑ ∑ = = − + − n i i n i i w w X 1 1 2 2 1 Subject to Y ( A⋅X - eb ) = a - β, where e = (1,1,…,1)T , a= (a1 ,...,an )T and β= (β1 ,...,βn )T ≥0. Using Lagrange function to represent Model 5, one can get an equivalent of the Wolfe dual problem of Model 5 expressed as:
  • 50. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications Model 6: Dual of SQP Maximize ∑ ∑ ∑ = = = + ⋅ − n i i j j i y y 1 n 1 j i n 1 i ) ( 2 1 j i A A Subject to 0 1 = ∑ = i n i i y , i w w ≤ ≤ , where wβ wa are given, 1≤ i ≤ n. The global optimal solution of the primal problem if Model 5 can be obtained from the solution of the Wolfe dual problem: yi 1 * * i A X ∑ = = n i i , ) ( - 1 * A A j i ⋅ = ∑ = i n i * j y y b . As a result, the classification decision func- tion becomes: , 0 0 { ) - ) (( 1 2 G B , G , B * b B sgn ∈ ∈ ≤ ⋅ * X We observe that because the form (Ai ⋅Aj ) of Model 6 is inner product in the vector space, it can be substituted by a positive semi-definite ker- nel K(Ai , Aj ) without affecting the mathematical modeling process. In general, a kernel function refers to a real-valued function on χ×χ and for all Ai , Aj ∈χ. Thus, Model 6 can be easily transformed toanonlinearmodelbyreplacing(Ai ⋅Aj )withsome positive semi-definite kernel function K(Ai , Aj ). Use of kernel functions in multiple criteria opti- mization approaches can extend its applicability to linear inseparable datasets. However, there are some theoretical difficulties to directly introduce kernelfunctiontoModel5.Howtoovercomethem deserves a careful study. Future studies may be done on establishing a theoretical guideline for selection of a kernel that is optimal in achieving asatisfactorycreditanalysisresult.Anotheropen problemistostudythesubjectofreducingcompu- tational cost and improving algorithm efficiency for high dimensional or massive datasets. Choquet Integrals and Non-Additive Set Function Considering the r-dimensional attribute vector a = (a1 ,...,ar ) in the classification problem, let P(a) denote the power set of a. We use f (a1 ),..., f (ar ) to denote the values of each attribute in an obser- vation. The procedure of calculating a Choquet integral can be given as (Wang Wang, 1997): }) ,..., , ({ )] ( ) ( [ ' ' 2 ' 1 ' 1 1 ' r j r j j a a a a f a f d f × − = − = ∫ ∑ , where } ,..., , { ' ' 2 ' 1 r a a a is a permutation of a = (a1 ,...,ar ). Such that 0 ) ( ' 0 = a f and ) ( ),..., ( ' ' 1 r a f a f is non-decreasingly ordered such that: f (a1 ) ≤...≤ f (ar ). The non-additive set function is defined as: µ:P(a)→(-∞,+∞), where µ(∅) = 0. We use µi to denote set function µ, where i = 1,...,2r . Introducing the Choquet measure into the generalized model of an section refers to the uti- lization of Choquet integral as a representative of the left-hand side of the constraints in Model 1. This variation for non-additive data mining problem is (Yan, Wang, Shi, Chen, 2005): Model 7: Choquet Form Minimize f (a) and Maximize g (β) Subject to: d f ∫ - i + i - b = 0, ∀ A i ∈ G1 , d f ∫ + i - i - b = 0, ∀ Ai ∈ G2 , where d f ∫ denotes the Choquet integral with respect to a signed fuzzy measure to aggregate the attributes of a observation f, b is unrestricted, and a = (a1 ,...,an )T , β = (β1 ,...,βn )T ; ai , βi ≥ 0, i = 1,…, n. Model 7 results in the replacement of a linear combination of all the attributes Ai X in the left- handsideofconstraintswiththeChoquetintegral representation d f ∫ .Thenumberofparameters,
  • 51. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications denoted by µi , increases from r to 2r (r is the num- ber attributes). How to determine the parameters through linear programming framework is not easy. We are still working on this problem and shall report the significant results. ConCluSion As Usama Fayyad pointed out at the KDD-03 Panel, data mining must attract the participation of the relevant communities to avoid re-inventing wheels and bring the field an auspicious future (Fayyad, Piatetsky-Shapiro, Uthurusamy, 2003). One relevant field to which data mining hasnotattractedenoughparticipationisoptimiza- tion.Thischaptersummarizesaseriesofresearch activities that utilize multiple criteria decision- making methods to classification problems in data mining. Specifically, this chapter describes avariationofmultiplecriteriaoptimization-based models and applies these models to credit card scoringmanagement,HIV-1associateddementia (HAD) neuronal damage and dropout, and net- work intrusion detection as well as the potential in various real-life problems. aCknoWledgMent Since 1998, this research has been partially sup- ported by a number of grants, including First Data Corporation, USA; DUE-9796243, the National Science Foundation of USA; U.S. Air Force Research Laboratory (PR No. E-3-1162); National Excellent Youth Fund #70028101, Key Project #70531040, #70472074, National NaturalScienceFoundationofChina;973Project #2004CB720103, Ministry of Science and Tech- nology,China;K.C.WongEducationFoundation (2001, 2003), Chinese Academy of Sciences; and BHP Billiton Co., Australia. referenCeS Bradley, P.S., Fayyad, U.M., Mangasarian, O.L. (1999). Mathematical programming for data mining:Formulationsandchallenges.INFORMS Journal on Computing, 11, 217-238. Bauer, E., Kohavi, R. (1999). an empirical comparison of voting classification algorithms: Bagging,boosting,andvariants.MachineLearn- ing, 36, 105-139. C 5.0. (2004). Retrieved from http://www.rule- quest.com/see5-info.html Charnes, A., Cooper, W.W. (1961). Manage- ment models and industrial applications of lin- ear programming (vols. 1 2). New York: John Wiley Sons. Dietterich, T. (2000). Ensemble methods in ma- chine learning. In Kittler Roli (Eds.), Multiple classifier systems (pp. 1-15). Berlin: Springer- Verlag (Lecture Notes in Pattern Recognition 1857). Fayyad,U.M.,Piatetsky-Shapiro,G.,Uthurusa- my,R.(2003).SummaryfromtheKDD-03Panel: Data mining: The next 10 years. ACM SIGKDD Explorations Newsletter, 5(2), 191-196. Fisher, R.A. (1936). The use of multiple measure- mentsintaxonomicproblems.AnnalsofEugenics, 7, 179-188. Freed, N., Glover, F. (1981). Simple but power- ful goal programming models for discriminant problems. European Journal of Operational Research, 7, 44-60. Freed, N., Glover, F. (1986). Evaluating alter- native linear programming models to solve the two-group discriminant problem. Decision Sci- ence, 17, 151-162. Han, J.W., Kamber, M. (2000). Data mining: Concepts and techniques. San Diego: Academic Press.
  • 52. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications He, J., Liu, X., Shi, Y., Xu, W., Yan, N. (2004). Classifications of credit cardholder behavior by using fuzzy linear programming. International JournalofInformationTechnologyandDecision Making, 3, 633-650. Hesselgesser, J., Taub, D., Baskar, P., Greenberg, M., Hoxie, J., Kolson, D.L., Horuk, R. (1998). Neuronal apoptosis induced by HIV-1 gp120 and the Chemokine SDF-1alpha mediated by the Chemokine receptor CXCR4. Curr Biol, 8, 595-598. Kaul, M., Garden, G.A., Lipton, S.A. (2001). PathwaystoneuronalinjuryandapoptosisinHIV- associated dementia. Nature, 410, 988-994. Kou, G., Shi, Y. (2002). Linux-based Multiple Linear Programming Classification Program: (Version 1.0.) College of Information Science andTechnology,UniversityofNebraska-Omaha, USA. Kou, G., Liu, X., Peng, Y., Shi, Y., Wise, M., Xu, W. (2003). Multiple criteria linear program- mingapproachtodatamining:Models,algorithm designs and software development. Optimization Methods and Software, 18, 453-473. Kou, G., Peng, Y., Yan, N., Shi, Y., Chen, Z., Zhu, Q., Huff, J., McCartney, S. (2004a, July 19-21). Network intrusion detection by using multiple- criteria linear programming. In Proceedings of the International Conference on Service Systems and Service Management, Beijing, China. Kou, G., Peng, Y., Chen, Z., Shi, Y., Chen, X. (2004b,July12-14).Amultiple-criteriaquadratic programming approach to network intrusion de- tection. In Proceedings of the Chinese Academy of Sciences Symposium on Data Mining and Knowledge Management, Beijing, China. Kou, G., Peng, Y., Shi, Y., Chen, Z. (2006). A new multi-criteria convex quadratic program- ming model for credit data analysis. Working Paper, University of Nebraska at Omaha, USA. Kuncheva, L.I. (2000). Clustering-and-selection model for classifier combination. In Proceedings ofthe4th InternationalConferenceonKnowledge- BasedIntelligentEngineeringSystemsandAllied Technologies (KES’2000). Kwak,W.,Shi,Y.,Eldridge,S.,Kou,G.(2006). Bankruptcy prediction for Japanese firms: Us- ing multiple criteria linear programming data mining approach. In Proceedings of the Inter- national Journal of Data Mining and Business Intelligence. Jiang, Z., Piggee, C., Heyes, M.P., Murphy, C., Quearry, B., Bauer, M., Zheng, J., Gendelman, H.E., Markey, S.P. (2001). Glutamate is a me- diator of neurotoxicity in secretions of activated HIV-1-infected macrophages. Journal of Neuro- immunology, 117, 97-107. Lam, L. (2000). Classifier combinations: Imple- mentations and theoretical issues. In Kittler Roli(Eds.),Multipleclassifiersystems(pp.78-86). Berlin:Springer-Verlag(LectureNotesinPattern Recognition 1857). Lee,S.M.(1972).Goalprogrammingfordecision analysis. Auerbach. Lindsay, P.H., Norman, D.A. (1972). Human information processing: An introduction to psy- chology. New York: Academic Press. LINDO Systems Inc. (2003). An overview of LINGO 8.0. Retrieved from http://www.lindo. com/cgi/frameset.cgi?leftlingo.html;lingof.html Lopez,A.,Bauer,M.A.,Erichsen,D.A.,Peng,H., Gendelman, L., Shibata, A., Gendelman, H.E., Zheng, J. (2001). The regulation of neurotrophic factor activities following HIV-1 infection and immune activation of mononuclear phagocytes. In Proceedings of Soc. Neurosci. Abs., San Di- ego, CA. Mangasarian, O.L. (2000). Generalized support vector machines. In A. Smola, P. Bartlett, B. Scholkopf,D.Schuurmans(Eds.),Advancesin
  • 53. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications largemarginclassifiers(pp.135-146).Cambridge, MA: MIT Press. Olson, D., Shi, Y. (2005). Introduction to business data mining. New York: McGraw-Hill/ Irwin. Opitz, D., Maclin, R. (1999). Popular ensemble methods:Anempiricalstudy.JournalofArtificial Intelligence Research, 11, 169-198. Parhami, B. (1994). Voting algorithms. IEEE Transactions on Reliability, 43, 617-629. Peng, Y., Kou, G., Chen, Z., Shi, Y. (2004). Cross-validation and ensemble analyses on mul- tiple-criteria linear programming classification for credit cardholder behavior. In Proceedings of ICCS2004(pp.931-939).Berlin:Springer-Verlage (LNCS 2416). Shi, Y., Yu, P.L. (1989). Goal setting and compromise solutions. In B. Karpak S. Zionts (Eds.), Multiple criteria decision making and risk analysis using microcomputers (pp. 165-204). Berlin: Springer-Verlag. Shi, Y. (2001). Multiple criteria and multiple constraint levels linear programming: Con- cepts, techniques and applications. NJ: World Scientific. Shi, Y., Wise, W., Luo, M., Lin, Y. (2001). Multiple criteria decision making in credit card portfolio management. In M. Koksalan S. Zionts (Eds.), Multiple criteria decision mak- ing in new millennium (pp. 427-436). Berlin: Springer-Verlag. Shi, Y, Peng, Y., Xu, W., Tang, X. (2002). Data mining via multiple criteria linear programming: Applicationsincreditcardportfoliomanagement. InternationalJournalofInformationTechnology and Decision Making, 1, 131-151. Shi, Y, Peng, Y., Kou, G., Chen, Z. (2005). Classifyingcreditcardaccountsforbusinessintel- ligence and decision making: A multiple-criteria quadratic programming approach. International JournalofInformationTechnologyandDecision Making, 4, 581-600. Shibata,A.,Zelivyanskaya,M.,Limoges,J.,Carl- son, K.A., Gorantla, S., Branecki, C., Bishu, S., Xiong,H.,Gendelman,H.E.(2003).Peripheral nerveinducesmacrophageneurotrophicactivities: Regulation of neuronal process outgrowth, intra- cellular signaling and synaptic function. Journal of Neuroimmunology, 142, 112-129. Stolfo, S.J., Fan, W., Lee, W., Prodromidis, A., Chan, P.K. (2000). Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection: Results from the JAM project. In Proceedings of the DARPA In- formation Survivability Conference. Vapnik, V.N. (2000). The nature of statistical learning theory (2nd ed.). New York: Springer. Wang, J., Wang, Z. (1997). Using neural net- worktodetermineSugenomeasuresbystatistics. Neural Networks, 10, 183-195. Weingessel, A., Dimitriadou, E., Hornik, K. (2003, March 20-22). An ensemble method for clustering.InProceedingsofthe3rd International Workshop on Distributed Statistical Computing, Vienna, Austria. Yan, N., Wang, Z., Shi, Y., Chen, Z. (2005). Classificationbylinearprogrammingwithsigned fuzzy measures. Working Paper, University of Nebraska at Omaha, USA. Yu, P.L. (1985). Multiple criteria decision mak- ing: Concepts, techniques and extensions. New York: Plenum Press. Zenobi, G., Cunningham, P. (2002). An ap- proach to aggregating ensembles of lazy learn- ers that supports explanation. Lecture Notes in Computer Science, 2416, 436-447. Zhang, J., Shi, Y., Zhang, P. (2005). Several multi-criteria programming methods for clas-
  • 54. Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications sification. Working Paper, Chinese Academy of Sciences Research Center on Data Technology KnowledgeEconomyandGraduateUniversityof Chinese Academy of Sciences, China. Zheng, J., Thylin, M., Ghorpade, A., Xiong, H., Persidsky, Y., Cotter, R., Niemann, D., Che, M., Zeng, Y., Gelbard, H. et al. (1999). Intracellular CXCR4 signaling, neuronal apoptosis and neu- ropathogenic mechanisms of HIV-1-associated dementia. Journal of Neuroimmunology, 98, 185-200. This work was previously published in Research and Trends in Data Mining Technologies and Applications, edited by D. Taniar, pp. 242-275, copyright 2007 by IGI Publishing, formerly known as Idea Group Publishing (an imprint of IGI Global). Zheng,J.,Zhuang,W.,Yan,N.,Kou,G.,Erichsen, D., McNally, C., Peng, H., Cheloha, A., Shi, C., Shi, Y. (2004). Classification of HIV-1-mediated neuronal dendritic and synaptic damage using multiple criteria linear programming. Neuroin- formatics, 2, 303-326. Zimmermann, H.-J. (1978). Fuzzy programming and linear programming with several objective functions. Fuzzy Sets and Systems, 1, 45-55.
  • 55. Random documents with unrelated content Scribd suggests to you:
  • 56. the credulous, affrighted Inuit how they can escape from the dreaded ghosts. The hardest task, that of driving away Sedna, is reserved for the most powerful angakoq. A rope is coiled on the floor of a large hut in such a manner as to leave a small opening at the top, which represents the breathing hole of a seal. Two angakut stand by the side of it, one of them holding the seal spear in his left hand, as if he were watching at the seal hole in the winter, the other holding the harpoon line. Another angakoq, whose office it is to lure Sedna up with a magic song, sits at the back of the hut. At last she comes up through the hard rocks and the wizard hears her heavy breathing; now she emerges from the ground and meets the angakoq waiting at the hole. She is harpooned and sinks away in angry haste, drawing after her the harpoon, to which the two men hold with all their strength. Only by a desperate effort does she tear herself away from it and return to her dwelling in Adlivun. Nothing is left with the two men but the blood sprinkled harpoon, which they proudly show to the Inuit. Sedna and the other evil spirits are at last driven away, and on the following day a great festival for young and old is celebrated in honor of the event. But they must still be careful, for the wounded Sedna is greatly enraged and will seize any one whom she can find out of his hut; so on this day they all wear protecting amulets (koukparmiutang) on the tops of their hoods. Parts of the first garment which they wore after birth are used for this purpose. The men assemble early in the morning in the middle of the settlement. As soon as they have all got together they run screaming and jumping around the houses, following the course of the sun (nunajisartung or kaivitijung). A few, dressed in women’s jackets, run in the opposite direction. These are those who were born in abnormal presentations. The circuit made, they visit every hut, and the woman of the house must always be in waiting for them. When she hears the noise of the band she comes out and throws a dish containing little gifts of meat, ivory trinkets, and
  • 57. articles of sealskin into the yelling crowd, of which each one helps himself to what he can get. No hut is omitted in this round (irqatatung). The crowd next divides itself into two parties, the ptarmigans (aχigirn), those who were born in the winter, and the ducks (aggirn), or the children of summer. A large rope of sealskin is stretched out. One party takes one end of it and tries with all its might to drag the opposite party over to its side. The others hold fast to the rope and try as hard to make ground for themselves. If the ptarmigans give way the summer has won the game and fine weather may be expected to prevail through the winter (nussueraqtung). The contest of the seasons having been decided, the women bring out of a hut a large kettle of water and each person takes his drinking cup. They all stand as near the kettle as possible, while the oldest man among them steps out first. He dips a cup of water from the vessel, sprinkles a few drops on the ground, turns his face toward the home of his youth, and tells his name and the place of his birth (oχsoaχsavepunga——me, I was born in ——). He is followed by an aged woman, who announces her name and home, and then all the others do the same, down to the young children, who are represented by their mothers. Only the parents of children born during the last year are forbidden to partake in this ceremony. As the words of the old are listened to respectfully, so those of the distinguished hunters are received with demonstrative applause and those of the others with varying degrees of attention, in some cases even with joking and raillery (imitijung). Now arises a cry of surprise and all eyes are turned toward a hut out of which stalk two gigantic figures. They wear heavy boots; their legs are swelled out to a wonderful thickness with several pairs of breeches; the shoulders of each are covered by a woman’s over- jacket and the faces by tattooed masks of sealskins. In the right hand each carries the seal spear, on the back of each is an inflated buoy of sealskin, and in the left hand the scraper. Silently, with long strides, the qailertetang (Fig. 535) approach the assembly, who,
  • 58. screaming, press back from them. The pair solemnly lead the men to a suitable spot and set them in a row, and the women in another opposite them. They match the men and women in pairs and these pairs run, pursued by the qailertetang, to the hut of the woman, where they are for the following day and night man and wife (nulianititijung). Having performed this duty, the qailertetang stride down to the shore and invoke the good north wind, which brings fair weather, while they warn off the unfavorable south wind. As soon as the incantation is over, all the men attack the qailertetang with great noise. They act as if they had weapons in their hands and would kill both spirits. One pretends to probe them with a spear, another to stab them with a knife, one to cut off their arms and legs, another to beat them unmercifully on the head. The buoys which they carry on their backs are ripped open and collapse and soon they both lie as if dead beside their broken weapons (pilektung). The Eskimo leave them to get their drinking cups and the qailertetang awake to new life. Each man fills his sealskin with water, passes a cup to them, and inquires about the future, about the fortunes of the hunt and the events of life. The qailertetang answer in murmurs which the questioner must interpret for himself.
  • 59. Fig. 535. Qailertetang, a masked figure. (From a sketch by the author.) The evening is spent in playing ball, which is whipped all around the settlement (ajuktaqtung). (See Appendix, Note 6.) This feast is celebrated as here described in Cumberland Sound and Nugumiut. Hall and Kumlien make a few observations in regard to it, but the latter has evidently misunderstood its meaning. His description is as follows (p. 43): An angakoq dresses himself up in the most hideous manner, having several pairs of pants on among the rest, and a horrid looking mask of skins. The men and women now range themselves in separate and opposite ranks, and the angakoq
  • 60. takes his place between them. He then picks out a man and conducts him to a woman in the opposite ranks. This couple then go to the woman’s hut and have a grand spree for a day or two. This manner of proceeding is kept up till all the women but one are disposed of. This one is always the angakoq’s choice, and her he reserves for himself. Another description by Kumlien (p. 19) evidently refers to the same feast: They have an interesting custom or superstition, namely, the killing of the evil spirit of the deer; sometime during the winter or early in spring, at any rate before they can go deer hunting, they congregate together and dispose of this imaginary evil. The chief ancut [angakoq], or medicine man, is the main performer. He goes through a number of gyrations and contortions, constantly hallooing and calling, till suddenly the imaginary deer is among them. Now begins a lively time. Every one is screaming, running, jumping, spearing, and stabbing at the imaginary deer, till one would think a whole madhouse was let loose. Often this deer proves very agile, and must be hard to kill, for I have known them to keep this performance up for days; in fact, till they were completely exhausted. During one of these performances an old man speared the deer, another knocked out an eye, a third stabbed him, and so on till he was dead. Those who are able or fortunate enough to inflict some injury on this bad deer, especially he who inflicts the death blow, is considered extremely lucky, as he will have no difficulty in procuring as many deer as he wants, for there is no longer an evil spirit to turn his bullets or arrows from their course. I could not learn anything about this ceremony, though I asked all the persons with whom Kumlien had had intercourse. Probably there was some misunderstanding as to the meaning of their feast during the autumn which induced him to give this report.
  • 61. Hall describes the feast as celebrated by the Nugumiut (I, p. 528), as follows: At a time of the year apparently answering to our Christmas, they have a general meeting in a large igdlu [snow house] on a certain evening. There the angakoq prays on behalf of the people for the public prosperity through the subsequent year. Then follows something like a feast. The next day all go out into the open air and form in a circle; in the centre is placed a vessel of water, and each member of the company brings a piece of meat, the kind being immaterial. The circle being formed, each person eats his or her meat in silence, thinking of Sedna, and wishing for good things. Then one in the circle takes a cup, dips up some of the water, all the time thinking of Sedna, and drinks it; and then, before passing the cup to another, states audibly the time and the place of his or her birth. This ceremony is performed by all in succession. Finally, presents of various articles are thrown from one to another, with the idea that each will receive of Sedna good things in proportion to the liberality here shown. Soon after this occasion, at a time which answers to our New Year’s day, two men start out, one of them being dressed to represent a woman, and go to every house in the village, blowing out the light in each. The lights are afterwards rekindled from a fresh fire. When Taqulitu [Hall’s well known companion in his journeys] was asked the meaning of this, she replied, “New sun—new light,” implying a belief that the sun was at that time renewed for the year. Inasmuch as Hall did not see the feast himself, but had only a description by an Eskimo, into which he introduced points of similarity with Christian feasts, it may be looked upon as fairly agreeing with the feast of the Oqomiut. The latter part corresponds to the celebration of the feast as it is celebrated in Akudnirn. 8 According to a statement in the journal of Hall’s second expedition (II, p. 219) masks are also used on the western shore of Hudson
  • 62. Bay, where it seems that all the natives disguise themselves on this occasion. The Akudnirmiut celebrate the feast in the following way: The qailertetang do not act a part there, but other masks take their place. They are called mirqussang and represent a man and his wife. They wear masks of the skin of the ground seal, only that of the woman being tattooed. The hair of the man is arranged in a bunch protruding from the forehead (sulubaut), that of the woman in a pigtail on each side and a large knot at the back of the head. Their left legs are tied up by a thong running around the neck and the knee, compelling them to hobble. They have neither seal float and spear nor inflated legs, but carry the skin scraper. They must try to enter the huts while the Inuit hold a long sealskin thong before them to keep them off. If they fall down in the attempt to cross it they are thoroughly beaten with a short whip or with sticks. After having succeeded in entering the huts they blow out all the fires. The parts of the feast already described as celebrated in Cumberland Sound seem not to be customary in Akudnirn, the conjuration of Sedna and the exchanges of wives excepted, which are also practiced here. Sometimes the latter ceremony takes place the night before the feast. It is called suluiting or quvietung. When it is quite dark a number of Inuit come out of their huts and run crying all round their settlements. Wherever anybody is asleep they climb upon the roof of his hut and rouse him by screaming and shouting until all have assembled outside. Then a woman and a man (the mirqussang) sit down in the snow. The man holds a knife (sulung) in his hand, from which the feast takes its name, and sings: Oangaja jaja jajaja aja. Pissiungmipadlo panginejernago Qodlungutaokpan panginejerlugping Pissiungmipadlo panginejernago. To this song the woman keeps time by moving her body and her arms, at the same time flinging snow on the bystanders. Then the
  • 63. whole company goes into the singing house and joins in dancing and singing. This done, the men must leave the house and stand outside while the mirqussang watch the entrance. The women continue singing and leave the house one by one. They are awaited by the mirqussang, who lead every one to one of the men standing about. The pair must re-enter the singing house and walk around the lamp, all the men and women crying, “Hrr! hrr!” from both corners of the mouth. Then they go to the woman’s hut, where they stay during the ensuing night. The feast is frequently celebrated by all the tribes of Davis and Hudson Strait, and even independently of the great feast described above. The day after, the men frequently join in a shooting match. A target is set up, at which they shoot their arrows. As soon as a man hits, the women, who stand looking on, rush forward and rub noses with him. If a stranger unknown to the inhabitants of a settlement arrives on a visit he is welcomed by the celebration of a great feast. Among the southeastern tribes the natives arrange themselves in a row, one man standing in front of it. The stranger approaches slowly, his arms folded and his head inclined toward the right side. Then the native strikes him with all his strength on the right cheek and in his turn inclines his head awaiting the stranger’s blow (tigluiqdjung). While this is going on the other men are playing at ball and singing (igdlukitaqtung). Thus they continue until one of the combatants is vanquished. The ceremonies of greeting among the western tribes are similar to those of the eastern, but in addition “boxing, wrestling, and knife testing” are mentioned by travelers who have visited them. In Davis Strait and probably in all the other countries the game of “hook and crook” is always played on the arrival of a stranger (pakijumijartung). Two men sit down on a large skin, after having stripped the upper part of their bodies, and each tries to stretch out the bent arm of the other. These games are sometimes dangerous, as the victor has the right to kill his adversary; but generally the
  • 64. feast ends peaceably. The ceremonies of the western tribes in greeting a stranger are much feared by their eastern neighbors and therefore intercourse is somewhat restricted. The meaning of the duel, according to the natives themselves, is “that the two men in meeting wish to know which of them is the better man.” The similarity of these ceremonies with those of Greenland, where the game of hook and crook and wrestling matches have been customary, is quite striking, as is that of the explanation of these ceremonies. The word for greeting on Davis Strait and Hudson Strait, is Assojutidlin? (Are you quite well?) and the answer, Tabaujuradlu (Very well). The word Taima! which is used in Hudson Strait, and Mane taima! of the Netchillirmiut seem to be similar to our Halloo! The Ukusiksalirmiut say Ilaga! (My friend!) CUSTOMS AND REGULATIONS CONCERNING BIRTH, SICKNESS, AND DEATH. I have mentioned that it is extremely difficult to find out the innumerable regulations connected with the religious ideas and customs of the Eskimo. The difficulty is even greater in regard to the customs which refer to birth, sickness, and death, and it is no wonder that, while some of the accounts of different writers coincide tolerably well, there are great discrepancies in others, particularly as the customs vary to a great extent among the different tribes. Before the child is born a small hut or snow house is built for the mother, in which she awaits her delivery. Sick persons are isolated in the same way, the reason being that in case of death everything that had been in contact with the deceased must be destroyed. According to Kumlien (p. 28) the woman is left with only one attendant, a young girl appointed by the head ancut (angakoq) of the encampment; but this, no doubt, is an error. She may be visited by her friends, who, however, must leave her when parturition takes
  • 65. place. She must cut the navel string herself, and in Davis Strait this is done by tying it through with deer sinews; in Iglulik (Lyon, p. 370), by cutting it with a stone spear head. The child is cleaned with a birdskin and clothed in a small gown of the same material. According to Lyon the Iglulirmiut swathe it with the dried intestines of some animal. Kumlien describes a remarkable custom of which I could find no trace, not even upon direct inquiry (p. 281): As soon as the mother with her new born babe is able to get up and go out, usually but a few hours, they are taken in charge by an aged female angakoq, who seems to have some particular mission to perform in such cases. She conducts them to some level spot on the ice, if near the sea, and begins a sort of march in circles on the ice, the mother following with the child on her back; this manœuvre is kept up for some time, the old woman going through a number of performances the nature of which we could not learn and continually muttering something equally unintelligible to us. The next act is to wade through snowdrifts, the aged angakoq leading the way. We have been informed that it is customary for the mother to wade thus bare-legged. Lyon says (p. 370): After a few days, or according to the fancy of the parents, an angakoq, who by relationship or long acquaintance is a friend of the family, makes use of some vessel, and with the urine the mother washes the infant, while all the gossips around pour forth their good wishes for the little one to prove an active man, if a boy, or, if a girl, the mother of plenty of children. This ceremony, I believe, is never omitted, and is called qoqsiuariva. Though I heard about the washing with urine, I did not learn anything about the rest of the ceremony in Cumberland Sound and Davis Strait.
  • 66. A few days after birth the first dress of the child is exchanged for another. A small hood made from the skin of a hare’s head is fitted snugly upon the head, a jacket for the upper part of the body is made of the skin of a fawn, and two small boots, made of the same kind of a skin, the left one being wreathed with seaweed (Fucus), cover the legs. While the child wears this clothing that which was first worn is fastened to a pole which is secured to the roof of the hut. In two months the child gets a third suit of clothes the same as formerly described (p. 557). Then the second gown is exposed for some time on the top of the hut, the first one being taken down, and both are carefully preserved for a year. After this time has expired both are once more exposed on the top of a pole and then sunk into the sea, a portion of the birdskin dress alone being kept, for this is considered a powerful amulet and is held in high esteem and worn every fall at the Sedna feast on the point of the hood (see p. 604). I have stated that those who were born in abnormal presentations wear women’s dresses at this feast and must make their round in a direction opposite to the movement of the sun. Captain Spicer, of Groton, Conn., affirms that the bird used for the first clothing is chosen according to a strict law, every month having its own bird. So far as I know, waterfowl are used in summer and the ptarmigan in winter, and accordingly the men are called at the great autumn feast the ducks and ptarmigans, the former including those who were born in summer, the latter those born in winter. As long as any portion of the navel string remains a strip of sealskin is worn around the belly. After the birth of her child the mother must observe a great number of regulations, referring particularly to food and work. She is not allowed for a whole year to eat raw meat or a part of any animal killed by being shot through the heart. In Cumberland Sound she must not eat for five days anything except meat of an animal killed by her husband or by a boy on his first hunting expedition. This custom seems to be observed more strictly, however, and for a longer period if the new born child dies. Two months after delivery she must make a call at every hut, while before that time she is not
  • 67. allowed to enter any but her own. At the end of this period she must also throw away her old clothing. The same custom was observed by Hall among the Nugumiut (I, p. 426). On the western shore of Hudson Bay she is permitted to re-enter the hut a few days after delivery, but must pass in by a separate entrance. An opening is cut for the purpose through the snow wall. She must keep a little skin bag hung up near her, into which she must put a little of her food after each meal, having first put it up to her mouth. This is called laying up food for the infant, although none is given to it (Hall II, p. 173). I have already mentioned that the parents are not allowed in the first year after the birth of a child to take part in the Sedna feast. The customs which are associated with the death of an infant are very complicated. For a whole year, when outside the hut, the mother must have her head covered with a cap, or at least with a piece of skin. If a ground seal is caught she must throw away the old cap and have a new one made. The boots of the deceased are always carried about by the parents when traveling, and whenever they stop these are buried in the snow or under stones. Neither parent is allowed to eat raw flesh during the following year. The woman must cook her food in a small pot which is exclusively used by her. If she is about to enter a hut the men who may be sitting inside must come out first, and not until they have come out is she allowed to enter. If she wants to go out of the hut she must walk around all the men who may happen to be there. The child is sometimes named before it is born. Lyon says upon this subject (p. 369): Some relative or friend lays her hand on the mother’s stomach, and decides what the infant is to be called, and, as the names serve for either sex, it is of no consequence whether it proves a girl or a boy. On Davis Strait it is always named after the persons who have died since the last birth took place, and therefore the number of names of an Eskimo is sometimes rather large. If a relative dies while the
  • 68. child is younger than four years or so, his name is added to the old ones and becomes the proper name by which it is called. It is possible that children receive the names of all the persons in the settlement who die while the children are quite young, but of this I am not absolutely certain. When a person falls sick the angakut change his name in order to ward off the disease or they consecrate him as a dog to Sedna. In the latter event he gets a dog’s name and must wear throughout life a harness over the inner jacket. Thus it may happen that Eskimo are known in different tribes by different names. It may also be mentioned here that friends sometimes exchange names and dogs are called by the name of a friend as a token of regard. The treatment of the sick is the task of the angakoq, whose manipulations have been described. If it is feared that a disease will prove fatal, a small snow house or a hut is built, according to the season, into which the patient is carried through an opening at the back. This opening is then closed, and subsequently a door is cut out. A small quantity of food is placed in the hut, but the patient is left without attendants. As long as there is no fear of sudden death the relatives and friends may come to visit him, but when death is impending the house is shut up and he is left alone to die. If it should happen that a person dies in a hut among its inmates, everything belonging to the hut must be destroyed or thrown away, even the tools c. lying inside becoming useless to the survivors, but the tent poles may be used again after a year has elapsed. No doubt this custom explains the isolation of the sick. If a child dies in a hut and the mother immediately rushes out with it, the contents of the hut may be saved. Though the Eskimo feel the greatest awe in touching a dead body, the sick await their death with admirable coolness and without the least sign of fear or unwillingness to die. I remember a young girl who sent for me a few hours before her death and asked me to give her some tobacco and bread, which she wanted to take to her mother, who had died a few weeks before.
  • 69. Only the relatives are allowed to touch the body of the deceased. They clothe it or wrap it in deerskins and bury it at once. In former times they always built a tomb, at least when death occurred in the summer. From its usual dimensions one would suppose that the body was buried with the legs doubled up, for all of them are too short for grown persons. If the person to be buried is young, his feet are placed in the direction of the rising sun, those of the aged in the opposite direction. According to Lyon the Iglulirmiut bury half grown children with the feet towards the southeast, young men and women with the feet towards the south, and middle aged persons with the feet towards the southwest. This agrees with the fact that the graves in Cumberland Sound do not all lie east and west. The tomb is always vaulted, as any stone or piece of snow resting upon the body is believed to be a burden to the soul of the deceased. The man’s hunting implements and other utensils are placed by the side of his grave; the pots, the lamps, knives, c., by the side of that of the woman; toys, by that of a child. Hall (I, p. 103) observed in a grave a small kettle hung up over a lamp. These objects are held in great respect and are never removed, at least as long as it is known to whose grave they belong. Sometimes models of implements are used for this purpose instead of the objects themselves. Figure 536 represents a model of a lamp found in a grave of Cumberland Sound. Nowadays the Eskimo place the body in a box, if they can procure one, or cover it very slightly with stones or snow. It is strange that, though the ceremonies of burying are very strictly attended to and though they take care to give the dead their belongings, they do not heed the opening of the graves by dogs or wolves and the devouring of the bodies and do not attempt to recover them when the graves are invaded by animals. Fig. 536. Model of lamp from a grave in Cumberland Sound. (Museum für Völkerkunde, Berlin.)
  • 70. The body must be carried to the place of burial by the nearest relatives, a few others only accompanying it. For this purpose they rarely avail themselves of a sledge, as it cannot be used afterward, but must be left with the deceased. Dogs are never allowed to drag the sledge on such an occasion. After returning from the burial the relatives must lock themselves up in the old hut for three days, during which they mourn the loss of the deceased. During this time they do not dress their hair and they have their nostrils closed with a piece of deerskin. After this they leave the hut forever. The dogs are thrown into it through the window and allowed to devour whatever they can get at. For some time afterward the mourners must cook their meals in a separate pot. A strange custom was observed by Hall in Hudson Bay (II, p. 186). The mourners did not smoke. They kept their hoods on from morning till night. To the hood the skin and feathers of the head of Uria grylle were fastened and a feather of the same waterfowl to each arm just above the elbow. All male relatives of the deceased wore a belt around the waist, besides which they constantly wore mittens. It is probable that at the present time all Eskimo when in mourning avoid using implements of European manufacture and suspend the use of tobacco. It has already been stated that women who have lost a child must keep their heads covered. Parry, Lyon (p. 369), and Klutschak (p. 201) state that when the Eskimo first hear of the death of a relative they throw themselves upon the ground and cry, not for grief, but as a mourning ceremony. For three or sometimes even four days after a death the inhabitants of a village must not use their dogs, but must walk to the hunting ground, and for one day at least they are not allowed to go hunting at all. The women must stop all kinds of work. On the third day after death the relatives visit the tomb and travel around it three times in the same direction as the sun is moving, at the same time talking to the deceased and promising that they will bring him something to eat. According to Lyon the Iglulirmiut chant forth inquiries as to the welfare of the departed soul, whether it has
  • 71. reached the land Adli, if it has plenty of food, c., at each question stopping at the head of the grave and repeating some ceremonial words (p. 371). These visits to the grave are repeated a year after death and whenever they pass it in traveling. Sometimes they carry food to the deceased, which he is expected to return greatly increased. Hall describes this custom as practiced by the Nugumiut (I, p. 426). He says: They took down small pieces of [deer] skin with the fur on, and of [fat]. When there they stood around [the] grave [of the woman] upon which they placed the articles they had brought. Then one of them stepped up, took a piece of the [deer meat], cut a slice and ate it, at the same time cutting off another slice and placing it under a stone by the grave. Then the knife was passed from one hand to the other, both hands being thrown behind the person. This form of shifting the implement was continued for perhaps a minute, the motions being accompanied by constant talk with the dead. Then a piece of [deer] fur and some [fat] were placed under the stone with an exclamation signifying, “Here is something to eat and something to keep you warm.” Each of the [natives] also went through the same forms. They never visit the grave of a departed friend until some months after death, and even then only when all the surviving members of the family have removed to another place. Whenever they return to the vicinity of their kindred’s grave, a visit is made to it with the best of food as a present for the departed one. Neither seal, polar bear, nor walrus, however, is taken. According to Klutschak (p. 154), the natives of Hudson Bay avoid staying a long time on the salt water ice near the grave of a relative. On the fourth day after death the relatives may go for the first time upon the ice, but the men are not allowed to hunt; on the next day
  • 72. they must go sealing, but without dogs and sledge, walking to the hunting ground and dragging the seal home. On the sixth day they are at liberty to use their dogs again. For a whole year they must not join in any festival and are not allowed to sing certain songs. If a married woman dies the widower is not permitted to keep any part of the first seal he catches after her death except the flesh. Skin, blubber, bones, and entrails must be sunk in the sea. All the relatives must have new suits of clothes made and before the others are cast away they are not allowed to enter a hut without having asked and obtained permission. (See Appendix, Note 7.) Lyon (p. 368) makes the following statement on the mourning ceremonies in Iglulik: Widows are forbidden for six months to taste of unboiled flesh; they wear no * * * pigtails, and cut off a portion of their long hair in token of grief, while the remaining locks hang in loose disorder about their shoulders. * * * After six months, the disconsolate ladies are at liberty to eat raw meat, to dress their pigtails and to marry as fast as they please; while in the meantime they either cohabit with their future husbands, if they have one, or distribute their favors more generally. A widower and his children remain during three days within the hut where his wife died, after which it is customary to remove to another. He is not allowed to fish or hunt for a whole season, or in that period to marry again. During the three days of lamentation all the relatives of the deceased are quite careless of their dress; their hair hangs wildly about, and, if possible, they are more than usually dirty in their persons. All visitors to a mourning family consider it as indispensably necessary to howl at their first entry. I may add here that suicide is not of rare occurrence, as according to the religious ideas of the Eskimo the souls of those who die by violence go to Qudlivun, the happy land. For the same reason it is
  • 73. considered lawful for a man to kill his aged parents. In suicide death is generally brought about by hanging. TALES AND TRADITIONS. ITITAUJANG. A long, long time ago, a young man, whose name was Ititaujang, lived in a village with many of his friends. When he became grown he wished to take a wife and went to a hut in which he knew an orphan girl was living. However, as he was bashful and was afraid to speak to the young girl himself, he called her little brother, who was playing before the hut, and said, “Go to your sister and ask her if she will marry me.” The boy ran to his sister and delivered the message. The young girl sent him back and bade him ask the name of her suitor. When she heard that his name was Ititaujang she told him to go away and look for another wife, as she was not willing to marry a man with such an ugly name. 9 But Ititaujang did not submit and sent the boy once more to his sister. “Tell her that Nettirsuaqdjung is my other name,” said he. The boy, however, said upon entering, “Ititaujang is standing before the doorway and wants to marry you.” Again the sister said “I will not have a man with that ugly name.” When the boy returned to Ititaujang and repeated his sister’s speech, he sent him back once more and said, “Tell her that Nettirsuaqdjung is my other name.” Again the boy entered and said, “Ititaujang is standing before the doorway and wants to marry you.” The sister answered, “I will not have a man with that ugly name.” When the boy returned to Ititaujang and told him to go away, he was sent in the third time on the same commission, but to no better effect. Again the young girl declined his offer, and upon that Ititaujang went away in great anger. He did not care for any other girl of his tribe, but left the country altogether and wandered over hills and through valleys up the country many days and many nights.
  • 74. At last he arrived in the land of the birds and saw a lakelet in which many geese were swimming. On the shore he saw a great number of boots; cautiously he crept nearer and stole as many as he could get hold of. A short time after the birds left the water and finding the boots gone became greatly alarmed and flew away. Only one of the flock remained behind, crying, “I want to have my boots; I want to have my boots.” Ititaujang came forth now and answered, “I will give you your boots if you will become my wife.” She objected, but when Ititaujang turned round to go away with the boots she agreed, though rather reluctantly. Having put on the boots she was transformed into a woman and they wandered down to the seaside, where they settled in a large village. Here they lived together for some years and had a son. In time Ititaujang became a highly respected man, as he was by far the best whaler among the Inuit. Once upon a time the Inuit had killed a whale and were busy cutting it up and carrying the meat and the blubber to their huts. Though Ititaujang was hard at work his wife stood lazily by. When he called her and asked her to help as the other women did she objected, crying, “My food is not from the sea; my food is from the land; I will not eat the meat of a whale; I will not help.” Ititaujang answered, “You must eat of the whale; that will fill your stomach.” Then she began crying and exclaimed, “I will not eat it; I will not soil my nice white clothing.” She descended to the beach, eagerly looking for birds’ feathers. Having found a few she put them between her fingers and between those of her child; both were transformed into geese and flew away. When the Inuit saw this they called out, “Ititaujang, your wife is flying away.” Ititaujang became very sad; he cried for his wife and did not care for the abundance of meat and blubber, nor for the whales spouting near the shore. He followed his wife and ascended the land in search of her.
  • 75. After having traveled for many weary months he came to a river. There he saw a man who was busy chopping chips from a piece of wood with a large hatchet. As soon as the chips fell off he polished them neatly and they were transformed into salmon, becoming so slippery that they glided from his hands and fell into the river, which they descended to a large lake near by. The name of the man was Eχaluqdjung (the little salmon). On approaching, Ititaujang was frightened almost to death, for he saw that the back of this man was altogether hollow and that he could look from behind right through his mouth. Cautiously he crept back and by a circuitous way approached him from the opposite direction. When Eχaluqdjung saw him coming he stopped chopping and asked, “Which way did you approach me?” Ititaujang, pointing in the direction he had come last and from which he could not see the hollow back of Eχaluqdjung, answered, “It is there I have come from.” Eχaluqdjung, on hearing this, said, “That is lucky for you. If you had come from the other side and had seen my back I should have immediately killed you with my hatchet.” Ititaujang was very glad that he had turned back and thus deceived the salmon maker. He asked him, “Have you not seen my wife, who has left me, coming this way?” Eχaluqdjung had seen her and said, “Do you see yon little island in the large lake? There she lives now and has taken another husband.” When Ititaujang heard this report he almost despaired, as he did not know how to reach the island; but Eχaluqdjung kindly promised to help him. They descended to the beach; Eχaluqdjung gave him the backbone of a salmon and said, “Now shut your eyes. The backbone will turn into a kayak and carry you safely to the island. But mind you do not open your eyes, else the boat will upset.” Ititaujang promised to obey. He shut his eyes, the backbone became a kayak, and away he went over the lake. As he did not hear any splashing of water, he was anxious to see whether the boat moved on, and opened his eyes just a little. But he had scarcely taken a
  • 76. short glimpse when the kayak began to swing violently and he felt that it became a backbone again. He quickly shut his eyes, the boat went steadily on, and a short time after he was landed on the island. There he saw the hut and his son playing on the beach near it. The boy on looking up saw Ititaujang and ran to his mother crying, “Mother, father is here and is coming to our hut.” The mother answered, “Go, play on; your father is far away and cannot find us.” The child obeyed; but as he saw Ititaujang approaching he re- entered the hut and said, “Mother, father is here and is coming to our hut.” Again the mother sent him away, but he returned very soon, saying that Ititaujang was quite near. Scarcely had the boy said so when Ititaujang opened the door. When the new husband saw him he told his wife to open a box which was in a corner of the hut. She did so, and many feathers flew out of it and stuck to them. The woman, her new husband, and the child were thus again transformed into geese. The hut disappeared; but when Ititaujang saw them about to fly away he got furious and cut open the belly of his wife before she could escape. Then many eggs fell down. THE EMIGRATION OF THE SAGDLIRMIUT. In the beginning all the Inuit lived near Ussualung, in Tiniqdjuarbing (Cumberland Sound). The Igdlumiut, the Nugumiut, and the Talirpingmiut in the south, the Aggomiut in the far north, and the Inuit, who tattoo rings round their eyes, in the far west, all once lived together. There is a tradition concerning the emigration of the Sagdlirmiut (see p. 451) who live east of Iglulik. The Akudnirmiut say that the following events did not happen in Tiniqdjuarbing, but in Aggo, a country where nobody lives nowadays. Ikeraping, an Akudnirmio, heard the story related by a Tununirmio, who had seen the place himself, but all the Oqomiut assert that Ussualung is the place where the events in the story happened.
  • 77. An old woman, the sister of Mitiq, the angakoq, told the story as follows: Near Ussualung there are two places, Qerniqdjuaq and Eχaluqdjuaq. In each of these was a large house, in which many families lived together. They used to keep company during the summer when they went deer hunting, but returned to their separate houses in the fall. Once upon a time it happened that the men of Qerniqdjuaq had been very successful, while those of Eχaluqdjuaq had caught scarcely any deer. Therefore the latter got very angry and resolved to kill the other party, but they preferred to wait until the winter. Later in the season many deer were caught and put up in depots. They were to be carried down to the winter settlements by means of sledges. One day both parties agreed upon a journey to these depots and the men of Eχaluqdjuaq resolved to kill their enemies on this occasion. They set out with their dogs and sledges, and when they were fairly inland they suddenly attacked their unsuspecting companions and killed them. For fear that the wives and children of the murdered men might be suspicious if the dogs returned without their masters, they killed them too. After a short time they returned and said they had lost the other party and did not know what had happened to them. A young man of Eχaluqdjuaq was the suitor of a girl of Qerniqdjuaq and used to visit her every night. He did not stop his visits now. He was kindly received by the woman and lay down to sleep with his young wife. Under the snow bench there was a little boy who had seen the young man of Eχaluqdjuaq coming. When everybody was sleeping he heard somebody calling and soon recognized the spirits of the murdered men, who told him what had happened and asked him to kill the young man in revenge. The boy crept from his place under the bed, took a knife, and put it into the young man’s breast. As he
  • 78. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com