SlideShare a Scribd company logo
Visual Data Mining Techniques And Tools For Data
Visualization And Mining 1st Tom Soukup download
https://ebookbell.com/product/visual-data-mining-techniques-and-
tools-for-data-visualization-and-mining-1st-tom-soukup-920306
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Visual Data Mining Theory Techniques And Tools For Visual Analytics
1st Edition Simeon J Simoff
https://ebookbell.com/product/visual-data-mining-theory-techniques-
and-tools-for-visual-analytics-1st-edition-simeon-j-simoff-1223898
Visual Data Mining The Visminer Approach 2nd Edition Russell K
Anderson
https://ebookbell.com/product/visual-data-mining-the-visminer-
approach-2nd-edition-russell-k-anderson-4084584
Visual And Spatial Analysis Advances In Data Mining Reasoning And
Problem Solving 1st Edition Boris Kovalerchuk Auth
https://ebookbell.com/product/visual-and-spatial-analysis-advances-in-
data-mining-reasoning-and-problem-solving-1st-edition-boris-
kovalerchuk-auth-4327288
Visual Analytics And Interactive Technologies Data Text And Web Mining
Applications Premier Reference Source 1st Edition Qingyu Zhang
https://ebookbell.com/product/visual-analytics-and-interactive-
technologies-data-text-and-web-mining-applications-premier-reference-
source-1st-edition-qingyu-zhang-2367810
Collaborative Filtering Using Data Mining And Analysis Hardcover
Vishal Bhatnagar
https://ebookbell.com/product/collaborative-filtering-using-data-
mining-and-analysis-hardcover-vishal-bhatnagar-9998672
Visual Data Insights Using Sas Ods Graphics A Guide To
Communicationeffective Data Visualization 1st Edition Leroy Bessler
https://ebookbell.com/product/visual-data-insights-using-sas-ods-
graphics-a-guide-to-communicationeffective-data-visualization-1st-
edition-leroy-bessler-47552196
Visual Data And Their Use In Science Education 1st Edition Kevin D
Finson Jon Pedersen
https://ebookbell.com/product/visual-data-and-their-use-in-science-
education-1st-edition-kevin-d-finson-jon-pedersen-51388710
Visual Data Storytelling With Tableau 1st Edition Lindy Ryan
https://ebookbell.com/product/visual-data-storytelling-with-
tableau-1st-edition-lindy-ryan-10458402
Big Visual Data Analysis Scene Classification And Geometric Labeling
1st Edition Chen Chen
https://ebookbell.com/product/big-visual-data-analysis-scene-
classification-and-geometric-labeling-1st-edition-chen-chen-5359282
Visual Data Mining Techniques And Tools For Data Visualization And Mining 1st Tom Soukup
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-1- Present to you by: Team-Fly®
Visual Data Mining: Techniques and Tools for Data Visualization and Mining
by Tom Soukup and Ian Davidson ISBN: 0471149993
John Wiley & Sons ?2002 (382 pages)
Master the power of visual data mining tools and techniques.
Table of Contents Back Cover Comments
Table of Contents
Visual Data Mining—Techniques and Tools for Data Visualization and Mining
Trademarks
Introduction
Part I - Introduction and Project Planning Phase
Chapter 1 - Introduction to Data Visualization and Visual Data Mining
Chapter 2 - Step 1: Justifying and Planning the Data Visualization and Data Mining Project
Chapter 3 - Step 2: Identifying the Top Business Questions
Part II - Data Preparation Phase
Chapter 4 - Step 3: Choosing the Business Data Set
Chapter 5 - Step 4: Transforming the Business Data Set
Chapter 6 - Step 5: Verify the Business Data Set
Part III - Data Analysis Phase and Beyond
Chapter 7 - Step 6: Choosing the Visualization or Data Mining Tool
Chapter 8 - Step 7: Analyzing the Visualization or Mining Tool
Chapter 9 - Step 8: Verifying and Presenting the Visualizations or Mining Models
Chapter 10 - The Future of Visual Data Mining
Appendix A - Inserts
Glossary
References
Index
List of Figures
List of Tables
List of Codes
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-2- Present to you by: Team-Fly®
Visual Data Mining-Techniques and Tools for Data
Visualization and Mining
Tom Soukup
Ian Davidson
Wiley Publishing, Inc.
Publisher: Robert Ipsen
Executive Editor: Robert Elliott
Assistant Editor: Emilie Herman
Associate Managing Editor: John Atkins
New Media Editor: Brian Snapp
Text Design & Composition: John Wiley Production Services
Designations used by companies to distinguish their products are often claimed as trademarks. In
all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in
initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appropriate
companies for more complete information regarding trademarks and registration.
This book is printed on acid-free paper.
Copyright © 2002 by Tom Soukup and Ian Davidson.
All rights reserved.
Published by John Wiley & Sons, Inc.
Published simultaneously in Canada.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in
print may not be available in electronic books.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without
either the prior written permission of the Publisher, or authorization through payment of the
appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA
01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York,
NY 10158-0012, (212) 850-6011, fax (212) 850-6008, email: <PERMREQ@WILEY.COM>
This publication is designed to provide accurate and authoritative information in regard to the
subject matter covered. It is sold with the understanding that the publisher is not engaged in
professional services. If professional advice or other expert assistance is required, the services of a
competent professional person should be sought.
Library of Congress Cataloging-in-Publication Data:
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-3- Present to you by: Team-Fly®
Soukup, Tom, 1962-
Visual data mining: techniques and tools for data visualization and
mining / Tom Soukup, Ian Davidson.
p. cm.
"Wiley Computer Publishing."
Includes bibliographical references and index.
ISBN 0-471-14999-3
1. Data mining. 2. Database searching. I. Davidson, Ian, 1971- II. Title.
QA76.9.D343 S68 2002
006.3-dc21 2002004004
Printed in the United States of America.
10 9 8 7 6 5 4 3 2 1
To Ed and my family for their encouragement
-TOM
To my wife and parents for their support.
-IAN
ACKNOWLEDGMENTS
This book would not have been possible without the generous help of many people.
We thank the reviewers for their timely critique of our work, and our editor, Emilie Herman, who
skillfully guided us through the book-writing process.
We thank the Oracle Technology Network and SPSS Inc., for providing us evaluation copies of
Oracle and Clementine, respectively. The use of these products helped us to demonstrate key
concepts in the book.
Finally, we both learned a great deal from our involvement in Silicon Graphics' data mining
projects. This, along with our other data mining project experience, was instrumental in
formulating and trying the visual data mining methodology we present in this book.
Tom Soukup and Ian Davidson
My sincere thanks to the people with whom I have worked on data mining projects. You have all
demonstrated and taught me many aspects of working on successful data mining projects.
Ian Davidson
To all my data mining and business intelligence colleagues, I add my thanks. Your business
acumen and insights have aided in the formulation of a successful visual data mining
methodology.
Tom Soukup
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-4- Present to you by: Team-Fly®
ABOUT THE AUTHORS
Tom Soukup is a data mining and data warehousing specialist with more than 15 years
experience in database management and analysis. He currently works for Konami Gaming
Systems Division as Director of Business Intelligence and DBA.
Ian Davidson, Ph.D., has worked on a variety of commercial data-mining projects, such as cross
sell, retention, automobile claim, and credit card fraud detection. He recently joined the State
University of New York at Albany as an Assistant Professor of Computer Science.
Trademarks
Microsoft, Microsoft Excel, and PivotTable are either registered trademarks or trademarks of Microsoft
Corporation in the United States and/or other countries.
Oracle is a registered trademark of Oracle Corporation.
SPSS is a registered trademark, and Clementine and Clementine Solution Publisher are either registered
trademarks or trademarks of SPSS Inc.
MineSet is a registered trademark of Silicon Graphics, Inc.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-5- Present to you by: Team-Fly®
Introduction
Business intelligence solutions transform business data into conclusive, fact-based, and actionable information and
enable businesses to spot customer trends, create customer loyalty, enhance supplier relationships, reduce financial
risk, and uncover new sales opportunities. The goal of business intelligence is to make sense of change-to
understand and even anticipate it. It furnishes you with access to current, reliable, and easily digestible information.
It provides you the flexibility to look at and model that information from all sides, and in different dimensions. A
business intelligence solution answers the question "What if ..." instead of "What happened?" In short, a business
intelligence solution is the path to gaining-and maintaining-your competitive advantage.
Data visualization and data mining are two techniques often used to create and deploy successful business
intelligence solutions. By applying visualizations and data mining techniques, businesses can fully exploit
business data to discover previously unknown trends, behaviors, and anomalies:
ƒ Data visualization tools and techniques assist users in creating two- and three-dimensional pictures of
business data sets that can be easily interpreted to gain knowledge and insights.
ƒ Visual data mining tools and techniques assist users in creating visualizations of data mining models
that detect patterns in business data sets that help with decision making and predicting new business
opportunities.
In both cases, visualization is key in assisting business and data analysts to discover new patterns and trends from
their business data sets. Visualization is a proven method for communicating these discoveries to the decision
makers. The payoffs and return on investment (ROI) can be substantial for businesses that employ a combination
of data visualizations and visual data mining effectively. For instance, businesses can gain a greater understanding
of customer motivations to help reduce fraud, anticipate resource demand, increase acquisition, and curb customer
turnover (attrition).
Overview of the Book and Technology
This book was written to assist you to first prepare and transform your raw data into business data sets, then to
help you create and analyze the prepared business data set with data visualization and visual data mining tools and
techniques. Compared with other business intelligence techniques and tools, we have found that visualizations
help reduce your time-to-insight-the time it takes you to discover and understand previously unknown trends,
behaviors, and anomalies and communicate those findings to decision makers. It is often said that a picture paints
a thousand words. For instance, a few data visualizations can be used to quickly communicate the most important
discoveries instead of sorting through hundreds of pages of a traditional on-line analytical processing (OLAP)
report. Similarly, visual data mining tools and techniques enable you to visually inspect and interact with the
classification, association, cluster, and other data mining models for better understanding and faster
time-to-insight.
Throughout this book, we use the term visual data mining to indicate the use of visualization for inspecting,
understanding, and interacting with data mining algorithms. Finding patterns in a data visualization with your eyes
can also be considered visual data mining. In this case, the human mind acts as the pattern recognition data mining
engine. Unfortunately, not all models produced by data mining algorithms can be visualized (or a visualization of
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-6- Present to you by: Team-Fly®
them just wouldn't make sense). For instance, neural network models for classification, estimation, and clustering
do not lend themselves to useful visualization.
The most sophisticated pattern recognition machine in the world is the human mind. Visualization and visual data
mining tools and techniques aid in the process of pattern recognition by reducing large quantities of complicated
patterns into two- and three-dimensional pictures of data sets and data mining models. Often, these visualizations
lead to actionable business insights. Visualization helps business and data analysts to quickly and intuitively
discover interesting patterns and effectively communicate these insights to other business and data analysts, as
well as, decision makers.
IDC and The Data Warehousing Institute have sampled business intelligence solutions customers. They concluded
the following:
1. Visualization is essential (Source: IDC).
Eighty percent of business intelligence solution customers find visualization to be desirable.
2. Data mining algorithms are important to over 80 percent of data warehousing users (Source: The Data
Warehousing Institute).
Visualization and data mining business intelligence solutions reach across industries and business functions. For
example, telecommunications, stock exchanges, and credit card and insurance companies use visualization and
data mining to detect fraudulent use of their services; the medical industry uses data mining to predict the
effectiveness of surgical procedures, medical tests, medications, and fraud; and retailers use data mining to assess
the effectiveness of coupons and promotional events. The Gartner Group analyst firm estimates that by 2010, the
use of data mining in targeted marketing will increase from less than 5 percent to more than 80 percent (Source:
Gartner).
In practice, visualization and data mining has been around for quite a while. However, the term data mining has
only recently earned credibility within the business world for its abilities to control costs and contribute to revenue.
You may have heard data mining referred to as knowledge discovery in databases (KDD). The formal definition of
data mining, or KDD, is the extraction of interesting (non-trivial, implicit, previously unknown, and potentially
useful) information or patterns in large database.
The overall goal of this book is to first introduce you to data visualization and visual data mining tools and
techniques, demonstrate how to acquire and prepare your business data set, and provide you with a methodology
for using visualization and visual data mining to solve your business questions.
How This Book Is Organized
Although there are many books on data visualization and data mining theory, few present a practical methodology
for creating data visualizations and for performing visual data mining. Our book presents a proven eight-step data
visualization and visual data mining (VDM) methodology, as outlined in Figure I.1. Throughout the book, we
have stringently adhered to this eight-step VDM methodology. Each step of the methodology is explained with the
help of practical examples and then applied to a real-world business problem using a real-world data set. The data
set is available on the book's companion Web site. It is our hope that as you learn each methodology step, you will
be able to apply the methodology to your real-world data sets and begin receiving the benefits of data visualization
and visual data mining to solve your business issues.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-7- Present to you by: Team-Fly®
Figure I.1: Eight-step data visualization and visual data mining methodology.
Figure I.1 depicts the methodology as a sequential series of steps; however, the process of preparing the business
data set and creating and analyzing the data visualizations and data mining models is an iterative process.
Visualization and visual data mining steps are often repeated as the data and visualizations are refined and as you
gain more understanding about the data set and the significance of one data fact (a column) to other data facts
(other columns). It is rare that data or business analysts create a production-class data visualization or data mining
model the first time through the data mining discovery process.
This book is organized into three main sections that correspond to the phases of a data visualization and visual
data mining (VDM) project:
ƒProject planning
ƒData preparation
ƒData analysis
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-8- Present to you by: Team-Fly®
Part 1: Introduction and Project Planning Phase
Chapter 1: "Introduction to Data Visualization and Visual Data Mining," introduces you to data visualization
and visual data mining concepts used throughout the book. It illustrates how a few data visualizations can replace
(or augment) hundreds of pages of traditional "green-bar" OLAP reports. Multidimensional, spatial (landscape),
and hierarchical analysis data visualization tools and techniques are discussed through examples. Traditional
statistical tools, such as basic statistics and histograms, are given a visual twist through statistic and histogram
visualizations. Chapter 1 also introduces you to visual data mining concepts. This chapter describes how
visualizations of data mining models assist the data and business analysts, domain experts and decision makers in
understanding and visually interacting with data mining models such as decision trees. It also discusses using
visualization tools to plot the effectiveness of data mining models, as well as to analyze the potential deployment
of the models.
Chapter 2: "Step 1: Justifying and Planning the Data Visualization and Data Mining Project," introduces
you to the first of the eight steps in the data visualization and visual data mining (VDM) methodology and
discusses the business aspects of business intelligence solutions. In most cases, the project itself needs a business
justification before you can begin (or get funding for the project). This chapter presents examples of how various
businesses have justified (and benefited) from using data visualization and visual data mining tools and techniques.
Chapter 2 also discusses planning a VDM project and provides guidance on estimating the project time and
resource requirements. It helps you to define team roles and responsibilities for the project. The customer retention
business VDM project case study is introduced, and then Step 1 is applied to the case study.
Chapter 3: "Step 2: Identifying the Top Business Questions," introduces you to the second step of the VDM
methodology. This chapter discusses how to identify and refine business questions so that they can be investigated
through data visualization and visual data mining. It also guides you through mapping the top business questions
for your VDM project into data visualization and visual data mining problem definitions. Step 2 is then applied to
the continuing customer retention VDM project case study.
Part 2: The Data Preparation Phase
Chapter 4: "Step 3: Choosing the Data," introduces you to the third step of the VDM methodology and
discusses how to select the data relating to the data visualization and visual data mining questions identified in
Chapter 3 from your operational data source. It introduces the concept of using an exploratory data mart as a
repository for building and maintaining business data sets that address the business questions under investigation.
The exploratory data mart is then used to extract, cleanse, transform, load (ECTL), and merge the raw operational
data sources into one or more production business data sets. This chapter guides you through choosing the data set
for your VDM project by presenting and discussing practical examples, and applying Step 3 to the customer
retention VDM project case study.
Chapter 5: "Step 4: Transforming the Data Set," introduces you to the fourth step of the VDM methodology.
Chapter 5 discusses how to perform logical transformations on the business data set stored in the exploratory data
mart. These logical transformations often help in augmenting the business data set to enable you to gain more
insight into the business problems under investigation. This chapter guides you through transforming the data set
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-9- Present to you by: Team-Fly®
for your VDM project by presenting and discussing practical examples, and applying Step 4 to the customer
retention VDM project case study.
Chapter 6: "Step 5: Verifying the Data Set," introduces you to the fifth step of the VDM methodology. Chapter
6 discusses how to verify that the production business data set contains the expected data and that all of the ECTL
steps (from Chapter 4) and logical transformations (from Chapter 5) have been applied correctly, are error free,
and did not introduce bias into your business data set. This chapter guides you through verifying the data set for
your VDM project by presenting and discussing practical examples, and applying Step 5 to the customer retention
VDM project case study.
Chapter 7: "Step 6: Choosing the Visualization or Data Mining Tool," introduces you to the sixth step of the
VDM methodology. Chapter 7 discusses how to choose and fine-tune the data visualization or data mining model
tool appropriate in investigating the business questions identified in Chapter 3. This chapter guides you through
choosing the data visualization and data mining model tools by presenting and discussing practical examples, and
applying Step 6 to the customer retention VDM project case study.
Part 3: The Data Analysis Phase
Chapter 8: "Step 7: Analyzing the Visualization or Data Mining Model," introduces you to the seventh step of
the VDM methodology. Chapter 8 discusses how to use the data visualizations and data mining models to gain
business insights in answering the business questions identified in Chapter 3. For data mining, the predictive
strength of each model can be evaluated and compared to each other enabling you to decide on the best model that
addresses your business questions. Moreover, each data visualization or data mining model can be visually
investigated to discover patterns (business trends and anomalies). This chapter guides you through analyzing the
visualizations or data mining models by presenting and discussing practical examples, and applying Step 7 to the
continuing customer retention VDM project case study.
Chapter 9: "Step 8: Verifying and Presenting Analysis," introduces you to the final step of the VDM
methodology. Chapter 9 discussed the three parts to this step: verifying that the visualizations and data mining
model satisfies your business goals and objectives, presenting the visualization and data mining discoveries to the
decision-makers, and if appropriate, deploying the visualizations and mining models in a production environment.
Although this chapter discusses the implementation phase, a complete essay of this phase is outside the scope of
this book. Step 8 is then applied to the continuing customer retention VDM project case study.
Chapter 10, "The Future of Visual Data Mining," serves as a summary of the previous chapters and discusses
the future of data visualization and visual data mining.
The Glossary provides a quick reference to definitions of commonly used data visualizations and data mining
terms and algorithms.
Who Should Read This Book
A successful business intelligence solution using data visualization or visual data mining requires the participation
and cooperation from many parts of your business organization. Since this books endeavors to cover the VDM
project from the justification and planning phase up to implementation phase, it has a wide and diverse audience.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-10- Present to you by: Team-Fly®
The following definitions identify categories and roles of people in a typical business organization and lists which
chapters are most advantageous for them to read. Depending on your business organization, you may be
responsible for one or more roles. (In a small organization, you may be responsible for all roles).
Data Analysts normally interact directly with the visualization and visual data mining software to create and
evaluate the visualizations and data mining models. Data analysts collaborate with business analysts and domain
experts to identify and define the business questions and get help in understanding and selecting columns from the
raw data sources. We recommend data analysts focus on all chapters.
Business Analysts typically interact with previously created data visualizations and data mining models. Business
analysts help define the business questions and communicate the data mining discoveries to other analysts -
domain experts and decision makers. We recommend that business analysts focus on Chapters 1 through 4 and
Chapters 8 and 9.
Domain Experts typically do not create data visualizations and data mining models, but rather, interact with the
final visualizations and models. Domain experts know the business, as well as what data the business collects.
Data analysts and business analysts draw on the domain expert to understand and select the right data from the
raw operational data sources, as well as to clarify and verify their visualization and data mining discoveries. We
recommend domain experts focus on Chapters 1 through 4 and Chapters 6 and 9.
Decision Makers typically have the power to act on the data visualization and data mining discoveries. The
visualization and visual data mining discoveries are presented to decision makers to help them make decisions
based on these discoveries. We recommend decision makers focus on Chapters 1, 2, and 9. Chapter 10 focuses on
the near future of visualization in data mining. We recommend that all individuals read it.
Table I.1: How This Book Is Organized and Who Should Read It
CHAPTER
TOPIC AND VDM
STEP DISCUSSES
DATA
ANALYSTS
BUSINESS
ANALYSTS
DOMAIN
EXPERTS
DECISION
MAKERS
1 Introduction to Data
Visualization and Visual
Data Mining
√ √ √ √
2 Step 1: Justifying and
Planning the Data
Visualization/Data
Mining Project
√ √ √ √
3 Step 2: Identifying the
Top Business Questions
√ √ √
4 Step 3: Choosing the
Data Set
√ √ √
5 Step 4: Transforming the √
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-11- Present to you by: Team-Fly®
Table I.1: How This Book Is Organized and Who Should Read It
CHAPTER
TOPIC AND VDM
STEP DISCUSSES
DATA
ANALYSTS
BUSINESS
ANALYSTS
DOMAIN
EXPERTS
DECISION
MAKERS
Data Set
6 Step 5: Verifying the
Data Set
√ √
7 Step 6: Choosing the
Visualization or Data
Mining Model
√
8 Step 7: Analyzing the
Visualization or Data
Mining Model
√ √
9 Step 8: Verifying and
Presenting the Analysis
√ √ √ √
10 The Future of
Visualization and Visual
Data Mining
√ √ √ √
Software Tools Used
There are numerous visualization software tools, and more are being developed and enhanced each year that you
can use for data preparation, data visualization, and data mining. The graphical and data mining analysis
capabilities of software tools vary from package to package. We have decided to limit our selection to four core
packages for illustrating the data preparation and data analysis phases: Oracle, Microsoft Excel, SGI MineSet, and
SPSS Clementine. These software packages are not required for reading or understanding this book, as the data
visualization and data mining techniques described in the book are similar to those available in the majority of
data visualization and data mining software packages.
Oracle
The majority of query examples in the book are written using ANSI standard structured query language (SQL)
syntax. For the data preparation extraction, cleanse, transform, and load (ECTL) tasks, we chose to use Oracle
SQL*Loader syntax. For some of the logical transformation tasks, we chose to use Oracle procedural language
SQL (PL/SQL). The majority of queries, ECTL, and logical transformation tasks can be accomplished using
similar functions and tools in other popular RDBMS products, such as Microsoft SQL server, Sybase, Informix,
DB2, and RedBrick.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-12- Present to you by: Team-Fly®
Microsoft Excel
Excel is the most widely used spreadsheet and business graphics software tool. Excel provides comprehensive
tools to help you create, analyze, and share spreadsheets containing graphs. We chose to use Excel to illustrate
core data visualization types such as column, bar, pie, line, scatter, and radar graphs. These traditional graph types
are common to most visualization tool suites.
SGI MineSet
Although no longer commercially available, we chose to use MineSet to illustrate advanced data visualization
types, such as tree, statistics, and the 3D scatter graphs. These advanced graph types are common in most data
mining software suites, such as ANGOSS Knowledge Studio, Oracle Darwin, IBM Intelligent Miner, and SAS
Enterprise Miner.
SPSS Clementine
Clementine supports a variety of data mining techniques, such as prediction, classification, segmentation, and
association detection. We chose to use Clementine to illustrate these core data mining techniques. These core data
mining techniques are common in most of the data mining software suites previously listed.
What's on the Web Site
The companion Web site (www.wiley.com/compbooks/soukup) contains Web links to the data visualization and
visual data mining software tools discussed throughout this book. It also contains Web links to the extraction,
cleansing, transformation, and loading (ECTL) tools referenced in Chapter 4, as well as, other software tools
discussed in other chapters.
To demonstrate the eight-step data visualization and visual data mining methodology, we used a variety of
business data sets. One business data set we used frequently was from a home equity loan campaign. We have
included the entire home equity loan campaign prepared business data set on the Web site. For ease of transport
and download, we have saved it as an Excel spreadsheet containing 44,124 records and 20 columns.
At the end of Chapters 2 through 9, we applied each of the VDM steps to an ongoing customer retention case
study. However, the size of the operations data sources, as well as the final two business data sets, is fairly large.
For instance, the INVOICE.TXT file contains over 4.6 million rows. Therefore, we are providing the operational
data sources and business data sets as an Access database file, casestudy.mdb, which is 180 MB. In addition, we
are providing a 10 percent sample of each of the operational sources files, as well as the prepared business data
sets as Excel spreadsheets, namely:
ƒ10 percent sample of the CUSTOMER.TXT, CONTRACT.TXT, INVOICE.TXT, and
DEMOGRAPHIC.TXT operational source files
ƒ10 percent sample of the untransformed business data sets, customer_join and customer_demographics
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-13- Present to you by: Team-Fly®
ƒ10 percent sample of the prepared production business data sets, customer_join and
customer_demographics
Beware, if you use the sample Code Figure SQL on the 10 percent sample files instead of the complete data set
your results may not exactly match those demonstrated in the book. However, depending on the capacity of your
computer system and what database you are using, the 10 percent sample files may be easier for you to work with
than the complete files contained in the Access database file. The decision of which set of files to use is up to you;
nevertheless, we encourage you to work though the methodology steps with the customer retention operational
data source files and business data set files as you read the book.
Summary
The process of planning, preparing the business data set, and creating and analyzing data visualizations and data
mining models, is an iterative process. Visualization and visual data mining steps as described in the visualization
and visual data mining (VDM) methodology are frequently repeated. As you gain more understanding of the data
set and the significance of one data fact (a column) to other data facts (other columns), the data and visualizations
are refined. It is rare that data or business analysts create a production-class data visualization or data mining
model the first time through the data mining discovery process. Often the data must be further transformed or
more data is necessary to answer the business question. In some cases, discoveries about the data set lead to
refining the original business questions. The power of visualization provides you the ability to quickly see and
understand the data set and data mining model so you can improve your analysis interactively.
We hope that this book helps you develop production-class visualizations and data mining models that address
your business questions. Furthermore, we hope that this book gives you the essential guidance to make your VDM
project a success. The next chapter introduces you to data visualization and visual data mining concepts used
throughout the book.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-14- Present to you by: Team-Fly®
Part I: Introduction and Project Planning Phase
Chapter List
Chapter 1: Introduction to Data Visualization and Visual Data Mining
Chapter 2: Step 1: Justifying and Planning the Data Visualization and Data Mining Project
Chapter 3: Step 2: Identifying the Top Business Questions
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-15- Present to you by: Team-Fly®
Chapter 1: Introduction to Data Visualization and Visual Data
Mining
Overview
When you read a newspaper or magazine, or watch a news or weather program on TV, you see numerous data
visualizations. For example, bar and column graphs are often used to communicate categorical and demographic
discoveries such as household or population survey results or trends, line graphs are used to communicate
financial market time-based trends, and map graphs are used to communicate geographic weather patterns. Have
you ever asked yourself why? Could it be that two- and three-dimensional data visualizations are the most
effective way of communicating large quantities of complicated data? In this book, not only do we emphasize the
benefits of data visualization to analyze business data sets and communicate your discoveries, but we also outline
a proven data visualization and visual data mining methodology that explains how to conduct successful data
mining projects within your organization.
Chapter 1 introduces you to a variety of data visualization tools and techniques that you can use to visualize
business data sets and discover previously unknown trends, behavior, and anomalies. It also introduces you to a
variety of data visualization tools and techniques for visualizing, analyzing, and evaluating popular data mining
algorithms.
This book discusses two broad classes of visualizations-(1) data visualization techniques for visualizing business
data sets and (2) visual data mining tools and techniques for visualizing and analyzing data mining algorithms and
exploring the resultant data mining models. The distinction is as follows:
ƒData visualization tools and techniques help you create two- and three-dimensional pictures of
business data that can be easily interpreted to gain knowledge and insights into those data sets. With
data visualization, you act as the data mining or pattern recognition engine. By visually inspecting
and interacting with the two- or three-dimensional visualization, you can identify the interesting
(nontrivial, implicit, perhaps previously unknown and potentially useful) information or patterns in
the business data set.
ƒVisual data mining tools and techniques help you create visualizations of data mining models to gain
knowledge and insight into the patterns discovered by the data mining algorithms that help with
decision making and predicting new business opportunities. With visual data mining tools, you can
inspect and interact with the two- or three-dimensional visualization of the predictive or descriptive
data mining model to understand (and validate) the interesting information and patterns discovered
by the data mining algorithm. In addition, data visualization tools and techniques are used to
understand and evaluate the results of the data mining model. The output from a data mining tool is
a model of some sort. You can think of a model as a collection of generalizations or patterns found
in the business data set that is an abstraction of the task. Just as humans may use their previous
experience to develop a strategy to handle, say, difficult people, the data mining tool develops a
model to predict people who are likely to leave a service organization. Depending on the data
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-16- Present to you by: Team-Fly®
mining tool, an explanation of why a decision was made is possible. Some data mining tools
provide a clear set of reasons as to why a particular decision was made, while others are black
boxes, making decisions but not telling you why.
In both cases, visualization is key in helping you discover new patterns and trends and to communicate these
discoveries to the decision makers. The payoffs and ROI (return-on-investment) can be substantial for businesses
that use a combination of data visualization and visual data mining effectively. A base knowledge of various types
of data visualization and visual data mining tools is required before beginning the eight-step data visualization and
data mining (VDM) methodology discussed in Chapters 2 through 9. A good working knowledge of the
visualization types will aid you in the project planning, data preparation, and data analysis phases of your VDM
project.
Visualization Data Sets
The majority of business data sets are stored as a single table of information composed of a finite number of
columns and one or more rows of data. Chapter 4 discusses how to choose the data from your operational data
warehouse or other business data sources. However, before we begin introducing you to the visualization tools and
techniques, a brief explanation of the business data set is necessary. Table 1.1 shows an example of a simple
business data set with information (data) about weather.
Table 1.1: Business Data Set Weather
CITY DATE TEMPERATURE HUMIDITY CONDITION
Athens 01-MAY-2001 97.1 89.2 Sunny
Chicago 01-MAY-2001 66.5 100.0 Rainy
Paris 01-MAY-2001 71.3 62.3 Cloudy
The information (data facts) about the WEATHER subject data set is interpreted as follows:
ƒWEATHER is the file, table, or data set name. A city's weather on a particular day is the subject under
investigation.
ƒCITY, DATE, TEMPERATURE, HUMIDITY, and CONDITION are four columns of the data set.
These columns describe the kind of information kept in the data set-that is, attributes about the
weather for each city.
ƒATHENS, 01-MAY-2001, 97.1, 89.2, SUNNY is a particular record or row in the data set. Each
unique set of data (data fact) should have its own record (row). For this row, the data value
"Athens" identifies the CITY, "01-MAY-2001" identifies the DATE the measurement was taken,
"97.1" identifies TEMPERATURE in degrees Fahrenheit, "89.2" identifies the HUMIDITY in
percent, and "Sunny" identifies the CONDITION.
ƒThe level of detail or granularity of data facts (experimental unit) is at the city level.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-17- Present to you by: Team-Fly®
Data visualization tools and techniques are used to graphically display the data facts as a 2-D or 3-D picture
(representation) of the columns and rows contained in the business data sets.
Visualization Data Types
Columns in a business data set (table or file) contain either discrete or continuous data values. A discrete column,
also known as a categorical variable, is defined as a column of the table whose corresponding data values (record
or row values) have a finite number of distinct values. For instance, discrete data type columns are those that
contain a character string, an integer, or a finite number of grouped ranges of continuous data values. The possible
data values for a discrete column normally range from one to a few hundred unique values. If there is an inherent
order to the discrete column, it is also referred to as an ordinal variable. For instance, a discrete column whose
unique values are SMALL, MEDIUM, or LARGE is considered an ordinal variable.
A continuous column, also known as a numeric variable or date variable, is defined as a column of a table whose
corresponding data values (record or row values) can take on a full range (potentially an infinite number) of
numeric values. For instance, continuous data type columns are those that contain dates, double-precision numbers,
or floating-point numbers. The possible unique data values for a continuous column normally range from a few
thousand to an infinite number of unique values. Table 1.2 shows examples of the discrete and continuous
columns.
Table 1.2: Discrete and Contin uous Column Examples
COLUMN DATA TYPE COLUMN NAME EXAMPLE ROW
VALUES
DATA VALUE
RANGE
Discrete CITY Athens, Chicago, Paris Finite number of cities
in the world
Discrete CONDITION Sunny, Rainy Finite number of
weather conditions, such
as Sunny, Partly Cloudy,
Cloudy, Rainy
Ordinal EDUCATION Unknown, High School Finite number of
educational degree
categories, such as High
School, Bachelor,
Master, Doctorate
Discrete GENDER M, F, U Finite number of values,
such as M for male, F
for female, U for
unknown
Ordinal AGE_GROUPS 0-21, 22-35 Finite number of age
range groups
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-18- Present to you by: Team-Fly®
Table 1.2: Discrete and Contin uous Column Examples
COLUMN DATA TYPE COLUMN NAME EXAMPLE ROW
VALUES
DATA VALUE
RANGE
Discrete PURCHASE_MONTH January, February Finite number of months
Continuous DATE 01-MAY-2001,
02-MAY-2001
All possible dates
Continuous TEMPERATURE 97.1, 66.2, 71.3 All possible numeric
temperatures in degrees
Fahrenheit
Continuous HUMIDITY 89.1, 100.0, 62.3 All numbers between 0
and 100 percent
Continuous TOTAL_SALES 1.00, $1,000,000.00 All possible total sales
amounts
Visual versus Data Dimensions
Take care not to confuse the terms visual dimension and data dimension. Visual dimension relates to the spatial
coordinate system. Data dimension, on the other hand, relates to the number of columns in a business data set.
Visual dimensions are the graphical x-, y-, and z-axis of the spatial coordinate system or the color, opacity, height,
or size of the graphical object. Data dimensions are the discrete or continuous columns or variables contained
within the business data set.
If we use the business data set from Table 1.1, the data dimensions of the weather data set are the columns CITY,
DATE, TEMPERATURE, HUMIDITY, and CONDITION. To create a two- or three-dimensional visualization of
the weather data set, the columns under investigation are selected from the business data set to create a graphical
data table. The graphical data table is used to map the column values of the business data set to corresponding data
points in an x-, y-, or z-axis coordinate system.
Figure 1.1 illustrates a column graph visualization comparing the TEMPERATURE and HUMIDITY continuous
data dimensions by the CITY discrete data dimension for the weather data set. The corresponding graphical data
table values for the TEMPERATURE and HUMIDITY columns are represented by the height of the bars. A pair
of bars is drawn for each corresponding CITY value. Normally, the graphical data table is not part of the
visualization; however, in this example, the table is included to illustrate how the column graph was created.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-19- Present to you by: Team-Fly®
Figure 1.1: Column graph comparing temperature and humidity by city.
Since the WEATHER data set only contained summer temperatures ranging from 32 to 120 degrees Fahrenheit,
the same y-axis scale can be used for both HUMIDITY and TEMPERATURE. For a data set with different
HUMDITY and TEMPERATURE ranges, two y-axes would be required-one for the HUMIDITY scale (0 to 100
percent) and one for the TEMPERATURE scale (-65 to 150 degrees Fahrenheit).
Data Visualization Tools
Data visualization tools are used to create two- and three-dimensional pictures of business data sets. Some tools
even allow you to animate the picture through one or more data dimensions. Simple visualization tools such as line,
column, bar, and pie graphs have been used for centuries. However, most businesses still rely on the traditional
"green-bar" tabular report for the bulk of the information and communication needs. Recently, with the advance of
new visualization techniques, businesses are finding they can rapidly employ a few visualizations to replace
hundreds of pages of tabular reports. Other businesses use these visualizations to augment and summarize their
traditional reports. Using visualization tools and techniques can lead to quicker deployment, result in faster
business insights, and enable you to easily communicate those insights to others.
The data visualization tool used depends on the nature of the business data set and its underlying structure. Data
visualization tools can be classified into two main categories:
ƒMultidimensional visualizations
ƒSpecialized hierarchical and landscape visualizations
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-20- Present to you by: Team-Fly®
Choosing which visualization technique or tool to use to address your business questions is discussed in Chapter 7.
Using and analyzing the visualization to discover previously unknown trends, behaviors, and anomalies in your
business data set is covered in Chapter 8.
Multidimensional Data Visualization Tools
The most commonly used data visualization tools are those that graph multidimensional data sets.
Multidimensional data visualization tools enable users to visually compare data dimensions (column values) with
other data dimensions using a spatial coordinate system. Figure 1.2 shows examples of the most common
visualization graph types. Other common multidimensional graph types not shown in Figure 1.2 include contour,
histogram, error, Westinghouse, and box graphs. For more information on these and other graph types refer to
Information Graphics: A Comprehensive Illustrated Reference, by R. Harris (Oxford: Oxford University Press,
1999).
Figure 1.2: Multidimensional data visualization graph types.
Most multidimensional visualizations are used to compare and contrast the values of one column (data dimension)
to the values of other columns (data dimensions) in the prepared business data set. They are also used to
investigate the relationships between two or more continuous or discrete columns in the business data set. Table
1.3 lists some common multidimensional graph types and the types of column values they can compare or the
kinds of relationships they can investigate.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-21- Present to you by: Team-Fly®
Table 1.3: Graph Types and Column Types
GRAPH TYPE TYPE OF COLUMN VALUES TO COMPARE
Column and bar Used to compare discrete (categorical) column values to continuous column
values
Area, stacked column or
bar, line, high-low-close,
and radar
Used to compare discrete (categorical) column values over a continuous column
Pie, doughnut, histogram,
distribution, and box
Used to compare the distribution of distinct values for one or more discrete
columns
Scatter Used to investigate the relationship between two or more continuous columns
Column and Bar Graphs
Column and bar graphs, such as clustered column and clustered bar graphs, compare continuous data dimensions
across discrete data dimensions in an x- and y-coordinate system. Column graphs plot data dimensions much like a
line graph, except that a vertical column is drawn from the x-axis to the y-axis for the value of the data dimension.
Bar graphs are identical to column graphs, except the x-axis and y-axis are switched so that the bar graphical
entities are drawn horizontally instead of vertically. In either case, the data values associated with different sets of
data are grouped by their x-axis label to permit easy comparison between groups. Each set of data can be
represented by a different color or pattern. Stacked column and bar graphs work exactly like the non-stacked
version, except that the y-axis data dimension values from previous data sets are accumulated as each column is
plotted. Thus, bar graphical entities appear to be stacked upon each other rather than being placed side by side.
Figure 1.1 illustrates a multidimensional column graph visualization comparing the TEMPERATURE and
HUMIDITY data dimensions by the CITY data dimension for the weather data set from Table 1.1. The
interpretation of the bar graph in Figure 1.1 is left to the viewer-who posssesses perhaps the most sophisticated
pattern recognition machine ever created. What conclusions can be discovered from the column graph illustrated
in Figure 1.1? You may conclude the rule is that (in most cases) temperature tends to be higher than the humidity.
However, in the case of Chicago, the rule is broken. Despite this, if you must also take into consideration the
CONDITION column, you can refine the rule to be that temperature tends to be higher than humidity unless it is
raining. Now the rule would be true for all rows in the data set. Obtaining more records for the data set and
plotting them would help you visually test and refine your rule.
Distribution and Histogram Graphs
An extremely useful analytical technique is to use basic bar and column graphs to display the distribution of
values for a data dimension (column). Distribution and histogram graphs display the proportion of the values for
discrete (nonnumeric) and continuous (numeric) columns as specialized bar and column graphs. A distribution
graph shows the occurrence of discrete, non-numeric column values in a data set. A typical use of the distribution
graph is to show imbalances in the data. A histogram, also referred to as a frequency graph, plots the number of
occurrence of same or distinct values in the data set. They are also used to reveal imbalances in the data. Chapters
4, 5, and 6 use distribution and histogram graphs to initially explore the data set, detect imbalances, and verify the
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-22- Present to you by: Team-Fly®
correction of these imbalances. Chapters 7 and 8 use distribution and histogram graphs to discover and evaluate
key business indicators.
Figure 1.3 shows a distribution graph of the INVOICE DATE data dimension for 2,333 billing records for the first
four months of 2000. From the distribution graph, you can visually see that the month of February 2000 had the
most invoices. Since you can verify the number of records by month against the original operational data source,
the distribution graph provides you a method for verifying whether there are missing records in your business data
set.
Figure 1.3: Distribution graph of invoices for the first four months of 2000.
Figure 1.4a shows a histogram graph of the number of invoices by REGION and Figure 1.4b shows a histogram
graph of the number of invoices by BILLING RATE groupings for the first four months of 2000 from the same
accounting business data set. In both of these graphs, you can visually see the skewness (lack of symmetry in a
frequency distribution) in the column value distribution. For instance, the histogram graph of invoices by
REGION (Figure 1.4a) is skewed toward the Eastern region while the histogram graph of invoice by BILLING
RATE (Figure 1.4b) is skewed toward billing rates of $15.00 an hour or less.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-23- Present to you by: Team-Fly®
Figure 1.4: Histogram graphs of invoices by region and by billing rate regions.
Box Graphs
Understanding descriptive statistical information about the column's values has typically been accomplished by
analyzing measurements of central tendency (such as mean, median, and mode), measurements of variability (such
as standard deviation and variance), and measures of distribution (such as kurtosis and skewness). For more
information about central tendency, variability, and distribution measurements, refer to Statistics for the Utterly
Confused by L. Jaisingh (New York: McGraw-Hill, 2000). Table 1.4 shows some of the common descriptive
statistics derived from the values of the continuous column BILLING RATE.
Table 1.4: Descriptive Statistics for BILLING RATE
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-24- Present to you by: Team-Fly®
BILLING_RATE
Mean 19.59751
Standard error 0.271229
Median 15
Mode 12
Standard deviation 13.10066
Sample variance 171.6274
Kurtosis 16.48715
Skewness 3.196885
Range 159
Minimum 7
Maximum 166
Sum 45721
Count 2333
Confidence level (95.0%) 0.531874
A variation on the histogram graph is the box plot graph. It visually displays statistics about a continuous column
(numeric and date data types). Figure 1.5 shows two box plots for the BILLING RATE and INVOICE DATE.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-25- Present to you by: Team-Fly®
Figure 1.5: Box graph of BILLING RATE and INVOICE DATE.
The box graphs display the following for each continuous column in the data set:
ƒ The two quartiles (25th and 75th percentiles) of the column's values. The quartiles are shown as
lines across a vertical colored bar. The length of the bar represents the difference between the
25th and 75th percentiles. From the length of the bar you can determine the variability of the
continuous column. The larger the bar, the greater the spread in the data.
ƒ The minimum, maximum, median, and mean of the column's values. The horizontal line inside
the bar represents the median. If the median is not in the center of the bar, the distribution is
skewed.
ƒ The standard deviation of the column's values. The standard deviation is shown + and - one
standard deviation from the column's mean value.
The box plots visually reveal statistical information about the central tendency, variance, and distribution of the
continuous column values in the data set. The statistics graphs in Figure 1.5 show the position of the descriptive
statistics on a scale ranging from the minimum to the maximum value for numeric columns. They are often used to
explore the data in preparation for transformations and model building. Similar to the distribution and histogram
graph, statistics graphs are frequently used to reveal imbalances in the data. Chapters 4, 5, and 6 use statistics
graphs to initially explore the data set, detect imbalances, and verify the correction of these imbalances.
Line Graphs
In its simplest form, a line graph (chart) is nothing more than a set of data points plotted in an x- and y-coordinate
system, possibly connected by line segments. Line graphs normally show how the values of one column (data
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-26- Present to you by: Team-Fly®
dimension) compare to another column (data dimension) within an x- and y-coordinate system. Line and spline
segments will connect adjacent points from the values of the data column.
The data values for the x-axis can be either discrete or continuous. If the data values are discrete, the discrete
values become the labels for successive locations on the axis. The data values for the y-axis must be continuous.
Often line graphs are used to demonstrate time series trends. Figure 1.6 shows a line graph visualization
comparing the 1-, 3-, 6-, and 12-month bond yield indices from 1/17/1996 to 6/23/2000. The time series data
dimension (date) is plotted on the x-axis. The corresponding data values for the 1-, 3-, 6-, and 12-month yields are
plotted on the y-axis. The corresponding column data values are shown as points connected by a line within the
x-y coordinate system.
Figure 1.6: Line graph of bond yield indices.
Figure 1.6 is the compilation of four individual line graphs. It allows you to quickly see how the yield indices
compare to one another over the time dimension by the positions of the lines in the x- and y-coordinate system. In
this single data visualization, over 4,500 pieces of information are communicated (1,136 individual daily readings
of 4 values). Various trends may have been missed if you were only looking at column after column of numbers
from a green-bar report.
A high-low-close graph is a variation on the line graph. Instead of a single x-y data point, the high, low, and close
column values are displayed as hash markers on a floating column (the floating column being defined by the high
and low values) within the x- and y- coordinate system. A typical use of high-low-close graphs is to show stock
trends. Another variation on the line graph is the radar graph, which shows radars with markers at each data point
in a 360-degree coordinate system instead of the traditional 90-degree x-y coordinate system. Figure 1.7 shows a
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-27- Present to you by: Team-Fly®
radar graph of the bond yield indices comparing the 1- and 6-month bond yields. In Chapters 7 and 8, line and
radar graphs are used to discover and analyze time-based trends.
Figure 1.7: Radar graph of bond yield indices.
Scatter Graphs
Scatter graphs (sometimes referred to as scatter plots) are typically used to compare pairs of values. A scatter
graph enables you to visualize the business data set by mapping each row or record in the data set to a graphical
entity within a two- or three-dimensional graph. In contrast to the line graph, a scatter graphs displays unconnected
points on an x-, y-, or z-coordinate system (3-D). In its simplest mode, data dimensions from the data set are
mapped to the corresponding points in an x- and y-coordinate (2-D). The bubble graph is a variation of a simple
scatter graph that allows you to display another data dimension of the data set as the size of the graphical entity, as
well as its position within the x- and y-coordinate system. Figure 1.8 illustrates how you can use a scatter graph to
investigate the relationship between the number of store promotions and the weekly profit. In Chapters 7 and 8,
scatter graphs are used to discover and evaluate cause and effect relationships.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-28- Present to you by: Team-Fly®
Figure 1.8: Scatter graph of weekly profit by number of promotions.
Pie Graphs
Pie graphs display the contribution of each value to the sum of the values for a particular column. Discrete column
values become the labels for the slices of the pie, while the continuous column values are summarized into
contribution per the discrete column value. Figure 1.9a shows a pie graph comparing the percent contribution of
the total votes cast for each candidate in the state of Florida during the 2001 U.S. presidential race. Pie graphs are
also very useful in showing column value distributions. In Chapters 4, 5, and 6, they are used to compare column
value distributions before and after data preparation steps.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-29- Present to you by: Team-Fly®
Figure 1.9: Pie and doughnut graphs of the presidential vote in Florida.
The doughnut graph is a variation on the pie graph. It can be used to compare and contrast multiple continuous
columns at the same time. For instance, using a doughnut graph, you could show the voting percentages per U.S.
presidential candidate in Florida, Wisconsin, and other states within the same visualization. This allows you to not
only compare the vote percentages per candidate in Florida but also to compare those percentages against the other
states that were visualized. Figure 1.9b shows a doughnut graph of the presidential vote in Florida.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-30- Present to you by: Team-Fly®
Hierarchical and Landscape Data Visualization Tools
Hierarchical, landscape, and other specialized data visualization tools differ from normal multidimensional tools in
that they exploit or enhance the underlining structure of the business data set itself. You are most likely familiar
with an organizational chart or a family tree. Some business data sets possess an inherent hierarchical structure.
Tree visualizations can be useful for exploring the relationships between the hierarchy levels. Other business data
sets have an inherent geographical or spatial structure. For instance, data sets that contain addresses have a
geographical structure component. Map visualization can be useful for exploring the geographical relationships in
the data set. In other cases, the data set may have a spatial versus geographical structure component. For instance,
a data set that contains car part failures inherently has spatial information about the location of the failure within
the car. The failures can be "mapped" to a diagram of a car (a car landscape). Another data set may contain where
in the factory the failing part was manufactured. The failure can be "mapped" to a diagram of the factory (a factory
landscape) to explore whether the failed part has any significance to the location where it was manufactured.
Tree Visualizations
The tree graph presents a data set in the form of a tree. Each level of the tree branches (or splits) based upon the
values of a different attribute (hierarchy in the data set). Each node in the tree shows a graph representing all the
data in the sub-tree below it. The tree graph displays quantitative and relational characteristics of a data set by
showing them as hierarchically connected nodes. Each node contains information usually in the form of bars or
disks whose height and color correspond to aggregations of data values (usually sums, averages, or counts). The
lines (called edges) connect the nodes together and show the relationship of one set of data to its subsets.
Figure 1.10 illustrates the number of families on Medicaid from a 1995 Census data set using a tree graph. The
"root" node, or start of the tree, shows the total number of families on Medicaid (the small, darker colored column
on the right) and not on Medicaid (the taller, lighter colored column on the left) that occur in the entire data set.
You can see the number of families on Medicaid is very small, as the height of the lighter column is much greater
than the darker column. The second level of the tree represents the number of families on Medicaid by the various
family types. By visualizing the data in this way, you may be able to find some combination attributes and values
that are indicative of families having a higher than normal chance of being on Medicaid. As you can see from tree
visualization, some types of families have a significantly higher chance of being on Medicaid than others (related
subfamily and second individual family types versus non-family householders).
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-31- Present to you by: Team-Fly®
Figure 1.10: Tree visualization of proportion of families on Medicaid by family type and region.
Map Visualizations
To explore business data sets for strong spatial (typically geographical) relationships, you can use a map
visualization. The corresponding column values are displayed as graphical elements on a visual map based on a
spatial key. Although the data set contains a geographic data dimension, what is not contained in the data set is the
information that says there are 50 states in the United States, that California and New York are 3,000 miles apart,
that California is south of Oregon, or what the latitude or longitude coordinates are for the states. For instance, you
can plot your total sales by state, state and county, and zip code.
Figure 1.11 is a map visualization of a business data set that contains information about the number of new
account registrations by state. Using a corresponding color key, the states are colored based on the number of
registrations by state. You can quickly determine from the map which sales locations (states and regions) are
signing up more new customers than others. You can also see the geographic significance of the best-producing
state or regions compared with other states and regions.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-32- Present to you by: Team-Fly®
Figure 1.11: Map visualization of new account registrations by state.
Visual Data Mining Tools
Visual data mining tools can be used to create two- and three-dimensional pictures of the how the data mining
model is making its decision. The visualization tool used depends on the nature of data set and the underlying
structure of the resulting model. For example, in Figure 1.12 a decision tree model is visualized using a
hierarchical tree graph. From this visualization you can more easily see the structure of the model.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-33- Present to you by: Team-Fly®
Figure 1.12: Tree visualization of a decision tree to predict potential salary.
Unfortunately, not all data mining algorithms can be readily visualized with commercially available software. For
instance, neural network data mining models simulate a large number of interconnected simple processing units
segmented into input, hidden, and output layers. Visualizing the entire network with its inputs, connections,
weights, and outputs as a two- or three-dimensional picture is an active research question.
Visualization tools are also used to plot the effectiveness of the data mining model, as well as to analyze the
potential deployment of the model. A gains chart is a line graph that directly compares a model's performance at
predicting a target event in comparison to always guessing it occurs. The cumulative gain is the proportion of all
the target events that occur up to a specific percentile. Figure 1.13 illustrates a cumulative gains chart. The
population series refers to our random-guess model. From this line graph, you can compare and contrast the
performance of different data mining models. You can also use these visualizations to compare and contrast the
performance of the models at the time they are built and once they are deployed. You can quickly visually inspect
the performance of the model to see if it is performing as expected or becoming stale and out-of-date. Other
multidimensional data visualization tools are useful in analyzing the data mining model results, as well as
comparing and contrasting multiple data mining models.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-34- Present to you by: Team-Fly®
Figure 1.13: Evaluation line graph.
The tree visualization in Figure 1.12 and the line visualization in Figure 1.13 are just two examples of how you
can use data visualization to explore how data mining models make their decisions and evaluate multiple data
mining models. Choosing which visual data mining tool to use to address your business questions is discussed in
Chapter 7. Analyzing the visualization of the data mining model to discover previously unknown trends, behaviors,
and anomalies in business data set is discussed in Chapter 8.
Summary
Chapter 1 summarized data visualization and visual data mining tools and techniques that can be used to discover
previously unknown trends, behaviors, and anomalies in business data. In the next chapter, we help you justify and
plan a data visualization and data mining project so you can begin to exploit your business data with data
visualization and visual data mining to gain knowledge and insights into business data sets and communicate those
discoveries to the decision makers. Chapters 2 through 9 present and teach you a proven eight-step VDM
methodology that we have used to create successful business intelligence solutions with data visualization and
visual data mining tools and techniques.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-35- Present to you by: Team-Fly®
Chapter 2: Step 1: Justifying and Planning the Data
Visualization and Data Mining Project
Overview
Step 1 of the eight-step data visualization and data mining (VDM) methodology is composed of both the project
justification and the project plan. Chapter 1 provided you with an introduction to visualization and data mining
tools and techniques. This chapter shows you how to justify and plan the VDM project. Before the first row of
data is visualized or mined, a project justification and plan needs to be developed to ensure the success of the
project. The purpose of the project justification is to identify quantitative project objectives and develop a sound
business case for performing the project, and to gain executive support and funding from the decision makers for
the project. The project justification defines the overall business stimulus, return-on-investment (ROI) targets, and
visualization and data mining goals for the project. The purpose of a project plan is to define the scope, high-level
tasks, roles, and responsibilities for the project. The project plan establishes a roadmap and project time-line. It
defines the roles and responsibilities of all participants who will be involved in the project and serves as an
"agreement" of individual responsibilities among the operations and data warehousing, the data and business
analyst, the domain expert, and the decision maker teams.
A closed-loop business model is often helpful in modeling the business aspects of the project. The closed-loop
model ensures the resulting visualizations or data mining models feed back into the initial data set sources. This
feedback loop enables you to refine, improve, and correct your production visualizations or data mining models
through time. Other feedback loops within the business model ensure your project stays focused, makes business
sense, and remains within the scope of the project.
This chapter begins by discussing three types of projects:
ƒ Proof-of-concept
ƒ Pilot
ƒ Production
We then introduce using a closed-loop business model, provide guidance to estimating the project timeline and
resources, and define team roles and responsibilities for the project. At the end of this chapter, we introduce the
case study of a customer retention business problem. We then apply the concepts discussed in this chapter to the
case study to illustrate Step 1 of the VDM methodology.
Classes of Projects
The overall scope of your VDM project can be categorized into three classes of projects: proof-of-concept, pilot,
or production. Often a successful proof-of-concept or pilot project later leads to a production project. Therefore,
no matter which type of project is planned, it helps to keep the overall structure of the project justification and plan
consistent. This enables you to quickly turn a proof-of-concept project justification and plan into a pilot or
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-36- Present to you by: Team-Fly®
production project without starting over from scratch or wasting time and resources. Among other factors, the type
of project will determine the following:
ƒThe difficulty and number of the business questions investigated
ƒThe complexity and amount of data analyzed
ƒThe quality and completeness of the data
ƒThe project costs (personnel, software, and hardware cost)
ƒThe duration of the project
ƒThe complexity and number of resultant visualizations and models created
A proof-of-concept VDM project has a limited scope. The overall scope of a proof-of-concept project is to
determine whether visualization and data mining will be beneficial to your business, to prove to the decision
makers the value of visualization and data mining, and to give your organization experience with visualization and
data mining concepts. Typically, one or two relatively trivial business questions are investigated. The data set
analyzed is limited to a small sample of existing data. The average duration of a proof-of-concept project normally
is a few weeks.
A pilot VDM project also has a limited scope. The overall scope of the pilot project is to investigate, analyze, and
answer one or more business questions to determine if the ROI of the discoveries warrants a production project.
The data set analyzed is limited to representative samples from the real data sources. Often you will need to
purchase limited copies of the visualization and data mining tools. However, since the pilot project may not be
implemented, you may not have to purchase the production hardware or copies of the visualization and data
mining tools for everyone. The average duration of a pilot project is normally a few months
.
A production VDM project is similar to the pilot project in scope; however, the resulting visualizations and data
mining models are implemented into a production environment. The overall scope of the production project is to
fully investigate, analyze, and answer the business questions and then to implement an action plan and measure the
results of the production visualizations and data mining models created. You will need to purchase licenses for the
visualization and data mining tools for all production users and buy the production hardware. The average duration
of a production project ranges from a few months to a year. The actual project deployment may last many years.
Depending on the visualization and data mining experience level of your staff, you may need to augment it. For
production projects, you will need a dedicated and trained staff to maintain the production environment. Many
times after you see the benefits and ROI from the project, you will want to use visualization and data mining to
answer other business questions or use VDM in other departments in your organization.
Project Justifications
After you have decided which class of project to do, you next need to create a project justification. The project
justification defines the overall business stimulus, ROI targets, and visualization and data mining goals for the
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-37- Present to you by: Team-Fly®
project. Developing a project justification begins by identifying a high-level business issue your business needs to
address. Table 2.1 lists a few of the business issues that can be addressed by VDM projects.
Table 2.1: Business Issues Addressed by Visualizations or Visual Data Mining Projects
BUSINESS ISSUE VDM PROJECT OBJECTIVES
Target marketing To discover segments of "ideal" customers who share the same characteristics,
such as income level, and spending habits, with the best candidates for a specific
product or service
Cross-marketing To discover co-relations and associations between product sales and make
predictions based on these associations to facilitate cross-marketing
Customer profiling To create models to determine what types of customers buy which products
Identification of customer
requirements
To discover the best product matches for different segments of customers and
use predictions to find what factors will attract new customers
Financial planning and
asset evaluation
To create descriptive or predictive models to aid in cash flow analysis and
predictions, contingent claim analysis, and trend analysis to evaluate assets
Resource planning To create descriptive or predictive models to aid in analyzing and comparing
resources and spending
Competitive analysis To segment customers into classes for class-based pricing structures and set
pricing strategies for highly competitive markets
Fraud detection To create descriptive or predictive models to aid in analyzing historical data to
detect fraudulent behaviors in such industries as medical, retail, banking, credit
card, telephone, and insurance
Attrition modeling and
analysis
To create descriptive or predictive models to aid in the analysis of customer
attrition
Chemical and
pharmaceutical analysis
To create descriptive or predictive models to aid in molecular pattern modeling
and analysis, as well as drug discovery and clinical trial modeling and analysis
Attempt to state your overall project goal in a single statement, for instance, "To discover segments of ideal
customers who share the same characteristics and who are the best candidates for our new cable modem service
offering." You may need to interview various departments within your organization before deciding on your
project goal. For proof-of-concept projects, keep the overall project goal simple. For pilot or production projects,
the overall project goal may be more complex. Use the examples in Table 2.1 to establish your own project
objective.
Perhaps the most difficult part of the project's business justification is determining realistic ROI objects and
expected outcomes. You will often need the assistance of the business analysts or line-of-business manager to help
quantify the cost of continuing to do business "status quo." Your aim should be to create a document that contains
the project ROI objectives; describes the content, form, access, and owners of the data sources; summarizes the
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-38- Present to you by: Team-Fly®
previous research; explains the proposed methodology; and forecasts the anticipated outcome. When preparing the
justification document, keep in mind the class of project you are planning, as well as your target audience-the
decision makers and business experts.
As reference material for your business justification, include industry examples of visualization and data mining
success stories. Choose those success stories that relate to the business issues you are trying to address. Our
companion Web site (www.wiley.com/compbooks/soukup) has links to the majority of the commercially available
data visualization and visual data mining software providers. For example, you can find the following success
stories on the SPSS, SAS, and Oracle Web sites.
Dayton Hudson Corp. Success Story
Retail is a very competitive industry. The Dayton Hudson Corp. (DHC) success story highlights how they use data
mining to grow their business and improve customer satisfaction.
For instance, the DHC research and planning department also uses data mining to help select new store sites. By
analyzing trade and demographic data for 200 to 300 potential new sites with descriptive, correlation, and
regression data mining models, the research group can quantitatively determine which sites have the best potential
market success for each of its store lines: Target, Mervyn's, Dayton's, Hudson's, and Marshall Field's.
The DHC consumer research department also uses data mining to target customer satisfaction issues. Often
respondent surveys include data files with several hundred thousand cases from DHC stores, as well as,
competitive stores. These surveys are analyzed with data mining to gain knowledge about what is most important
to customers and to identify those stores with customer satisfaction problems. The data mining results are used to
help management better allocate store resources and technology, as well as improve training.
For more information on the DHC success story, refer to the SPSS Web site at
www.spss.com/spssatwork/template_view.cfm?Story_ID=4 (SPSS, 2002).
Marketing Dynamics Success Story
Customer direct marketing is another industry that benefits from data visualization and data mining. The
Marketing Dynamics success story highlights how they use visual data mining to develop more profitable direct
marketing programs for their clients.
Marketing Dynamics has access to large amounts of customer marketing data; however, the trick is to turn that
data into insights. Through the use of data mining analysis, Marketing Dynamics is able to develop more
profitable target marking programs for their clients, such as Cartier, Benjamin Moore & Company, SmithKline
Beecham, American Express Publishing, and several prominent catalog companies.
Marketing Dynamics uses analysis tools such as list analysis, data aggregation, cluster analysis, and other data
mining techniques to deliver predictive models to their clients who then use these models to better understand their
customers, discover new markets, and deploy successful direct marketing campaigns to reach those new markets.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-39- Present to you by: Team-Fly®
For more information on the Marketing Dynamics success story, refer to the SPSS Web site at
www.spss.com/spssatwork/template_view.cfm?Story_ID=25 (SPSS, 2002).
Sprint Success Story
Telecommunications is yet another fiercely competitive industry that is benefiting from data visualization and data
mining. The Sprint success story highlights how they use visual data mining for customer relationship
management (CRM).
Within the sphere of CRM, Sprint not only uses data mining to improve customer satisfaction, but also uses data
mining for cross-selling, customer retention, and new customer acquisition. Sprint uses SAS to provide their
marketing departments with a central analytic repository. Internal sales and marketing groups access this
repository to create better target marketing programs, improve customer relationships, and cross-sell to existing
customers. The central repository enables them to integrate multiple legacy systems and incorporate feedback
loops into their CRM system.
For more information on the Sprint success story, refer to the SAS Web site at
www.sas.com/news/success/sprint.html (SAS, 2002).
Lowestfare.com Success Story
Similar to the traditional retail industry, the Internet online travel industry may be even more brutally competitive.
The Lowestfare.com success story highlights how they used data mining to target those customers most likely to
purchase over the Internet.
Lowestfare.com built a data warehouse with the most important facts about customers. By analyzing these data
sets, they were able to better understand their customers in order to sell them the right products through the best
channels, thus increasing customer loyalty. Developing successful target-marketing models helped
Lowestfare.com increase profits for each ticket sold.
Lowestfare.com augmented their customer data warehouse with 650 pieces of demographic information purchased
from Acxiom. This enabled them to not only better understand who their customers were, but it also helped them
to build predictive cross-selling models. Through data mining, they were able to identify the top (87) pieces of
demographic information that profiled their customers. Then they were able to build data mining C&RT models
that produced customer profiles based on purchase behavior and deploy these models into their Internet site.
For more information on the Lowestfare.com success story, refer to the Oracle Web site at
http://otn.oracle.com/products/datamining/pdf/lowestfare.pdf, "Lowestfare.com Targeting Likely Internet
Purchasers."
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-40- Present to you by: Team-Fly®
Challenges to Visual Data Mining
Many challenges exist for justifying your VDM project. The various stakeholders in the organization may not
understand data mining and what it can do. Following are some common objections to visual data mining
approaches.
Data Visualization, Analysis, and Statistics are Meaningless
This objection is often due to a lack of familiarity with the process and benefits that visual data mining can
provide. The objection can be overcome by explaining that data analysis is part of most decision-making processes.
Whether consciously or subconsciously, individuals, teams, and organizations make decisions based on historical
experience every day. Data mining can be easily compared to this decision-making process. For instance, if you
view all your previous experiences as a large data set that can be investigated and analyzed, then the processes of
drawing actionable conclusions from this data set can be likened to the task of data mining. A critical aspect of the
VDM methodology is validation (discussed fully in Chapter 9). VDM tools and techniques only find the
interesting patterns and insights. It is the various stakeholders, such as the decision makers and domain experts,
that validate whether or not these discoveries are actionable, pragmatic, and worth implementing.
Why Are the Predictions Not 100 Percent Accurate?
One of the benefits of data mining is that it provides you with quantification of error. To some, the very fact that
an insight or model has error at all is cause to discount the benefits of visual data mining. After all, shouldn't the
model be 100 percent accurate before it is deployed? The accuracy of a model is only one measure that can be
used to value its worth. The ability to easily explain the model to regulators and domain experts and the ease of
implementation and maintenance are other important factors. Often, analyzing the errors or false prediction cases
leads to greater insight into the business problem as a whole. Similarly, visually comparing the model with line
graphs (discussed in Chapter 8) assists you in evaluating and selecting the "best" models based on your project
objectives.
Our Data Can't Be Visualized or Mined
Data integrity is very important for building useful visualizations and data mining models. How does an
organization determine that its data has the level of integrity needed to make a positive impact for the firm? At
what point is the data good enough?
The issue of data integrity unfortunately prevents many companies who would benefit from data mining
capabilities from getting started on building what is potentially a valuable future core competency. Very few
organizations possess data that is immediately suitable for mining unless it was collected for that purpose. A key
part of the VDM methodology is data preparation (fully discussed in Chapters 4, 5, and 6), which explicitly
involves making the data good enough to work with. Furthermore, it is quite feasible to measure the potential
financial success of a visual data mining project by working with historical data. Often the VDM data preparation
steps can help your organization pinpoint integrity problems with your existing historical data, as well as
implement new standards to ensure the integrity of new business data before and as it is gathered.
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-41- Present to you by: Team-Fly®
Closed-Loop Business Model
Whether you are planning a proof-of-concept, pilot, or production project, consider using a closed-loop business
model. A business model is considered closed-loop when the output of the final stage feeds back into the initial
step. The interactions among and between stages reveal the iterative nature of the business model.
Most VDM projects can be diagrammed as a closed-loop business model. Figure 2.1 shows the business stages
and interactions of a closed-loop business model for a VDM project. This model may be applied to a multitude of
visualization and data mining projects, such as projects that:
ƒPrevent customer attrition
ƒCross-sell to existing customers
ƒAcquire new customers
ƒDetect fraud
ƒIdentify most profitable customers
ƒProfile customers with greater accuracy
Figure 2.1: Closed-loop business model.
The business model can also be used for VDM projects that detect hidden patterns in scientific, government,
manufacturing, medical, and other applications, such as:
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-42- Present to you by: Team-Fly®
ƒPredicting the quality of a manufactured part
ƒFinding associations between patients, drugs, and outcomes
ƒIdentifying possible network intrusions
As illustrated in Figure 2.1, the closed-loop business model contains the data preparation and data analysis phases
of the eight-step VDM methodology described throughout this book. However, the implementation phase is
outside the book's scope. We have included the entire closed-loop business model to provide you with a business
framework for justifying and planning your VDM project. Our companion Web site has links to the majority of the
commercially available data visualization and "visual" data mining software providers where you can find
information on the implementation phase of a VDM project.
The following section discusses how to use the closed-loop business model for a customer attrition VDM project.
The overall business goal of a customer attrition project was to reduce customer attrition from 30 percent to 25
percent. You may be saying to yourself that a 5 percent improvement doesn't seem to be a very valuable goal.
However, in this particular case, 5 percent of approximately 4 million customers equates to 200,000 customers.
Given the average customer represents $240.00 a year in sales, a 5 percent improvement equates to approximately
$48 million in sales a year.
The overall business strategy of the customer attrition project was to create, analyze, and deploy visualization and
data mining models to discover profiles of customers who switched services to a competitor, to understand why
they switched services, and to find current customers who have similar profiles and then to take corrective action
to keep them from switching to the competition. The process of developing the business strategy and identifying
the business questions is the second step of the VDM methodology, which we discuss in Chapter 3.
Using the Closed-Loop Business Model
The first stage in the business model is to obtain and select the raw data from the data warehouse and business data
repositories pertaining to customers. In the customer retention project, it was discovered that a "customer" was
defined differently in the multiple databases. In addition, "customer attrition" was defined differently by different
organizations. These types of data issues need to be resolved to ensure the proper data is selected. Unless they are
resolved, the resulting analysis may be faulty. The process of obtaining and selecting the data are Steps 3, 4, and 5
of the VDM methodology and are discussed in Chapters 4 through 6.
Identifying the key business indicators is the next stage in the business model. Once all the project teams agreed
on the business rules (definitions) of who constitutes a "customer," and what constitutes "customer attrition,"
visualization and data mining tools were used to begin the process of identifying the key business indicators for
classifying satisfied versus lost customers. In the customer retention project, common indicators or profiles that
define a satisfied customer as compared to a lost customer were discovered from the historical data. After the key
indicators were discovered, the investigation and drill-down stage started. In this stage the data set is further
investigated to gain business insights and understanding of behavior (patterns and trends) of lost customers. As
shown in Figure 2.1, these business stages feed back into one another. If a key business indicator cannot be
substantiated or doesn't make good business sense, other indicators need to be identified. Sometimes a key
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-43- Present to you by: Team-Fly®
business indicator looks promising on the surface, but upon further investigation, it doesn't really help in revealing
insightful customer behaviors. During the customer retention project, it was discovered that customers who had
originally selected a particular service rate plan were extremely likely to begin shopping around for a better rate
after about 9 months of service. In addition, after a year of service, customers wanted new equipment. The process
of identifying and analyzing the key business indicators and drilling down into the data is Step 7 of the VDM
methodology, which we discuss in Chapter 8.
The development of visual and analytical models for different business scenarios is the next stage. For instance, a
model that identifies which customers are most likely to switch to the competition unless they are sent updated
equipment may be too cost-prohibitive, whereas a model for changing a customer from one service plan to another
may be more cost-effective. In this stage, the visualization and data mining models are used to help develop
different business scenarios. The process of developing the visualizations and analytical models is also part of Step
7 of the VDM methodology, discussed in Chapter 8.
Creating an action plan and gaining approval for the "best" strategic use of the models, visualizations, and insights
that produce the best ROI based on the business goals is the next stage in the business model. In this stage, the
visualizations and data mining models are used to communicate the findings to the decision makers and other
business analysts. These business stages feed back into one another, as shown in Figure 2.1. There may be
high-level business reasons for choosing one scenario over another. For instance, during the presentation of the
customer attrition project findings to the decision makers, the vice president of finance suggested that upgrading
the customers to newer equipment would not be as cost-prohibitive as originally thought (or modeled). The VP of
finance had just renegotiated a contract with the equipment manufacturer that greatly reduced the cost of the
equipment. The process of evaluating the "best" models is discussed in Chapter 8. Creating a presentation of the
analysis is Step 8 of the VDM methodology, which we discuss in Chapter 9.
Implementing the action plan once it has been approved is the next stage in the business model. In this stage, the
visualizations or data mining model are prepared for production. For example, during the customer retention
project, the rules from a data mining model were coded into the C language, and weekly batch procedures flagged
customers who had a high probability of switching to the competition. The customer support center was given this
list of customers at the beginning of each week. The customer support center then contacted each customer on the
list throughout the week and either offered to upgrade them to newer equipment or to switch them to a different
rate plan.
Measuring the results of the action plan against the model is the next stage in the business model. For example,
during the customer retention project, those customers offered and upgraded to the newer equipment were
monitored for a full year to determine the actual ROI of the project. The decision tree model had estimated that the
"upgrade" campaign would reduce customer attrition by 2.5 percent-$24 million. However, after 6 months, the
actual measured results were only around 2 percent-$20 million. The initial data set was augmented with the
results, and more refined data mining models were developed and put into production that resulted in a 3 percent
reduction in customer attrition-$29 million. In addition to the "upgrade" campaign, the customer attrition project
implemented a different data mining model to identify customers who should be offered a different plan before
they switched to a competitor that reduced customer attrition by another 2 percent.
Overall, the customer attrition project was deemed a success and added over $40 million to the bottom line. The
customer retention project costs (personnel, software, and hardware) were approximately $800,000, resulting in a
profit of $31 million for the first year. Using a closed-loop business model helped make the customer retention
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-44- Present to you by: Team-Fly®
project a success. The feedback loops enabled the data and business analysts to focus and improve their data
mining models to glean a higher rate of return.
Project Timeline
The project timeline will depend on the type of VDM project you are planning. Table 2.2 lists the average
workdays per task for typical proof-of-concept, pilot, and production projects. We have compiled this list based on
different real-world projects that we have completed. Of course, your "mileage" may vary depending on the
business issues investigated, the complexity of the data, the skill level of your teams, and the complexity of the
implementation, among other factors. Table 2.2 should give you a general guideline for estimating the project plan
timeline for proof-of-concept, pilot, and production projects.
Table 2.2: Estimating the Project Duration
PROJECT
PHASE
VDM
METHOD
OLOGY
STEP
TASK
NAME
PROOF-OF-C
ONCEPT
PILOT PRODUCTION
Planning
1 Justify and
Plan Project
5 5 5
2 Identify the
Top Business
Questions
3 5 10
Estimated
Project
Planning
Phase Days
8 10 15
Data Preparation
3 Choose the
Data Set
1 2 5
4 Transform the
Data Set
3 10 15
5 Verify the
Data Set
1 2 5
Estimated
Data
Preparation
Phase Days
5 14 25
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-45- Present to you by: Team-Fly®
Table 2.2: Estimating the Project Duration
PROJECT
PHASE
VDM
METHOD
OLOGY
STEP
TASK
NAME
PROOF-OF-C
ONCEPT
PILOT PRODUCTION
Data Analysis
6 Choose the
Visualization
or Mining
Tools
3 10 15
7 Analyze the
Visualization
or Mining
Models
5 10 15
8 Verify and
Present the
Visualization
or Mining
Models
2 10 15
Estimated
Data Analysis
Phase Days
11 30 45
Implementation
Create Action
Plan
10
Approve
Action Plan
5
Implement
Action Plan
20
Measure
Results
30
Estimated
Implementatio
n Phase Days
65
TOTAL
PROJECT
DURATION
24 days 54 days 150 days
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-46- Present to you by: Team-Fly®
Project Resources and Roles
As illustrated in Table 2.2, schedule a few weeks if you are planning a proof-of-concept project, a month or more
for a pilot project, and a few months to a year for a production project. When allocating your project resources, be
sure to reach agreement with all teams. The project consists of multiple teams: operations and data warehousing,
data and business analysts, domain experts, and decision makers. In the following sections, we will define each
team and their responsibilities.
The time and resource demands for each team will depend on the type of VDM project you are planning. A
successful business intelligence solution using data visualization or visual data mining requires the participation
and cooperation from many parts of a business organization. Depending on the size of your business organization,
you may be responsible for one or more roles. (In small organizations, you may be responsible for all roles.)
Tables 2.3 through 2.6 list the average workdays per resource for typical proof-of-concept, pilot, and production
projects. We have compiled this list based on different real-world projects that we have completed. As with the
project time-line, your "mileage" may vary depending on the business issues investigated, the complexity of the
data, the skill level of your teams, and the complexity of the implementation, among other factors.
Data and Business Analyst Team
The data and business analyst team is involved in all phases of the project; therefore, you can use Table 2.2 as a
guideline for estimating the average workdays for typical proof-of-concept, pilot, and production projects.
For proof-of-concept, pilot, and production projects, the data and business analyst team is often responsible for the
following:
ƒ Justifying and planning the project to the decision makers and creating the project justification
and planning document
ƒ Identifying the top business questions to be investigated
ƒ Mapping the top business questions into questions that can be investigated through visualization
and data mining
ƒ Creating extract procedures for historical and demographic data with the guidance of domain
experts and data warehousing team
ƒ Creating, analyzing, and evaluating the visualizations and data mining models with the guidance
of domain experts
ƒ Presenting the solution to the decision makers and assisting to create an action plan
During the implementation phase of a production project, the data and business analyst team is often responsible
for the following:
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-47- Present to you by: Team-Fly®
ƒ Implementing the solution's production environment and maintaining the solution's production
environment until the operations team is trained
ƒ Measuring the results of the solution and using the results to further refine, enhance, and correct
the production visualizations and data mining models
Domain Expert Team
The role of the domain expert team is to act as consultants to the data and business analysts to ensure the correct
data is obtained and valid business indicators are discovered. They also act as consultants to the decision maker
team to ensure the solution makes sound business sense. Table 2.3 lists the average workdays for typical
proof-of-concept, pilot, and production projects.
Table 2.3: Domain Expert Team Roles and Responsibilities
VDM
METHODOLOGY
STEP
TASKS PROOF-OF-CO
NCEPT
PILOT PRODUCTION
1 Justify and Plan
Project
5 5 5
2 Identify the Top
Business Questions
3 3 9
3 Choose the Data Set 1 2 5
4 Transform the Data
Set
- - -
5 Verify the Data Set 1 2 5
6 Choose the
Visualization or
Mining Tools
- - -
7 Analyze the
Visualization or
Mining Models
1 2 4
8 Verify and Present
the Visualization or
Mining Models
1 2 4
Create Action Plan 5
Implement Action
Plan
10
Measure Results 15
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-48- Present to you by: Team-Fly®
Table 2.3: Domain Expert Team Roles and Responsibilities
VDM
METHODOLOGY
STEP
TASKS PROOF-OF-CO
NCEPT
PILOT PRODUCTION
ESTIMATED
DAYS
12 days 16 days 62 days
For proof-of-concept, pilot, and production projects, the domain expert team is often responsible for the following:
ƒ Helping the data and business analysts justify and plan the project
ƒ Helping the data and business analysts identify the top business questions to be investigated
ƒ Validating the data obtained by operations and the data and business analysts
ƒ Validating the key business indicators discovered by the data and business analysts
In addition, for the implementation phase of a production project, the domain expert team often has the following
responsibilities:
ƒ Helping the data and business analysts and decision makers to create a valid action plan
ƒ Assisting in measuring the results of the project
Decision Maker Team
The role of the decision maker team is to evaluate the business scenarios and potentially approve the solution.
Table 2.4 lists the average workdays for typical proof-of-concept, pilot, and production projects.
Table 2.4: Decision Maker Team Roles and Responsibilities
VDM
METHODOLOGY
STEP
TASKS PROOF-OF-
CONCEPT
PILOT PRODUCTION
1 Justify and Plan
Project
2 5 5
2 Identify the Top
Business Questions
1 3 3
3 Choose the Data Set - - -
4 Transform the Data
Set
- - -
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-49- Present to you by: Team-Fly®
Table 2.4: Decision Maker Team Roles and Responsibilities
VDM
METHODOLOGY
STEP
TASKS PROOF-OF-
CONCEPT
PILOT PRODUCTION
5 Verify the Data Set - - -
6 Choose the
Visualization or
Mining Tools
- - -
7 Analyze the
Visualization or
Mining Models
- - -
8 Verify and Present
the Visualization or
Mining Models
3 3 3
Create Action Plan 3
Implement Action
Plan
2
Measure Results 3
ESTIMATED
DAYS
6 days 11 days 19 days
For proof-of-concept, pilot, and production projects, the decision maker team is often responsible for the
following:
ƒ Evaluating and approving the business justification and plan
ƒ Championing the project to the rest of the organization at a high level
ƒ Allocating the project funds
In addition, for the implementation phase of a production project, the decision maker team often has the following
responsibilities:
ƒ Providing feedback to the data and business analysts during the action plan creation
ƒ Providing feedback to the measured results of the project
John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining
-50- Present to you by: Team-Fly®
Operations Team
The role of the operations team is to provide network, database administration, and system administration
assistance to the data and business analyst team. They help in obtaining the project data, implementing the
production system, as well as measuring the results. Table 2.5 lists the average workdays for typical
proof-of-concept, pilot, and production projects.
Table 2.5: Operations Team Roles and Responsibilities
VDM
METHODOLOGY
STEP
TASKS PROOF-OF-CO
NCEPT
PILOT PRODUCTION
1 Justify and Plan
Project
2 3 3
2 Identify the Top
Business Questions
3 Choose the Data
Set
1 2 5
4 Transform the
Data Set
5 Verify the Data Set
6 Choose the
Visualization or
Mining Tools
7 Analyze the
Visualization or
Mining Models
8 Verify and Present
the Visualization
or Mining Models
Create Action Plan 5
Implement Action
Plan
20
Measure Results 15
ESTIMATED
DAYS
3 days 5 days 48 days
For proof-of-concept, pilot, and production project, the operations team is often responsible for the following:
Exploring the Variety of Random
Documents with Different Content
Visual Data Mining Techniques And Tools For Data Visualization And Mining 1st Tom Soukup
Visual Data Mining Techniques And Tools For Data Visualization And Mining 1st Tom Soukup
Visual Data Mining Techniques And Tools For Data Visualization And Mining 1st Tom Soukup
The Project Gutenberg eBook of The Little
Indian Weaver
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.
Title: The Little Indian Weaver
Author: Madeline Brandeis
Release date: July 19, 2012 [eBook #40277]
Most recently updated: October 23, 2024
Language: English
Credits: Produced by Juliet Sutherland, Diane Monico, and the Online
Distributed Proofreading Team at http://www.pgdp.net
*** START OF THE PROJECT GUTENBERG EBOOK THE LITTLE
INDIAN WEAVER ***
Visual Data Mining Techniques And Tools For Data Visualization And Mining 1st Tom Soukup
Visual Data Mining Techniques And Tools For Data Visualization And Mining 1st Tom Soukup
BAH, THE LITTLE INDIAN WEAVER
The LITTLE
INDIAN WEAVER
BY
MADELINE BRANDEIS
Producer of the Motion Pictures
"The Little Indian Weaver"
"The Wee Scotch Piper"
"The Little Dutch Tulip Girl"
"The Little Swiss Wood-Carver"
Distributed by Pathè Exchange, Inc., New York City
Photographic Illustrations by the Author
GROSSET & DUNLAP
PUBLISHERS NEW YORK
by arrangement with the A. Flanagan Company
COPYRIGHT, 1928, BY A. FLANAGAN COMPANY
PRINTED IN THE UNITED STATES OF AMERICA
To every child of every land,
Little sister, little brother,
As in this book your lives unfold,
May you learn to love each other.
CONTENTS
Chapter I Page
The Corn Ear Doll 9
Chapter II
Something Terrible Happens 32
Chapter III
At the Trading Post 43
Chapter IV
The Prayer Stick 62
Chapter V
At Bah's Hogan 75
Chapter VI
Billy Starts His Story 88
Chapter VII
All About the Indians 101
Chapter VIII
Who Wins the Radio? 119
BAH AND CORNELIA
The Little Indian Weaver
CHAPTER I
THE CORN EAR DOLL
How would you like to have a doll made from a corn ear? That is the
only kind of doll that Bah ever thought of having. Bah was only five
years old and she had never been away from her home, so of course
she couldn't know very much.
But she knew a bit about weaving blankets, and she was learning
more each day from her mother, who made beautiful ones and sold
them.
You see, Bah and her mother were American Indians, and they
belonged to the Navajo tribe. Their home was on the Navajo
Reservation in Arizona, and they called it an Indian village. But if you
went there you would not think it very much of a village in
comparison to the villages you know.
As a matter of fact, all you could see was a row of funny little round
houses, looking very much like large beehives, put together with
mud and sticks and called hogans. A street of hogans in each of
which lived a whole family of Indians, a few goats and sheep, a stray
dog or two, an Indian woman sitting outside her hogan weaving a
blanket, perhaps a child running with a dog—this, then, was a
Navajo village.
THE LITTLE INDIAN WEAVER
How different from your villages with their smooth stone buildings,
their stores and gasoline stations, and pretty shrub-covered
bungalows!
Most Indian women have many babies, and the whole family lives
together in one room which is the living room, bedroom, kitchen and
dining room all rolled into one. In the top of the hogan is a hole, so
that the smoke from the cooking fire in the middle of the room can
go out.
Bah did not spend much time in her hogan. No sooner was she up in
the morning than she was outside gathering sticks for the breakfast
fire. From the time she put her little brown face outside the hogan
door, bright and early in the morning, until nightfall when she
cuddled down in her warm Navajo blanket, she was out in the air—
and the air is so fresh out there in the desert; so much fresher than
it is in the big smoky cities.
Bah was a bright-eyed, healthy little girl, and the way she dressed
will sound queer to you, for her clothes were made just like her
mother's. On rainy days you have no doubt "dressed up" in mother's
clothes and thought it quite a lark. But when the game was over,
how glad you were to come back to your own little dresses and short
socks.
But Bah had always dressed in the same way—and that is, in a long
full cotton skirt, a calico waist with long sleeves, and many strings of
bright beads about her neck. Her hair was long, black and shiny, and
her mother tied it up in a knot at the back of her neck with a white
cloth.
Every morning Bah had a lesson in weaving, just as you have a
drawing lesson or a sewing lesson. Her father had made her a tiny
loom which stood outside the hogan door next to her mother's big
loom.
The morning when Bah planned the corn ear doll she was in the
midst of her weaving lesson. Mother's fingers were flying in and out,
and Bah's fingers were slow—oh, so slow, but her mind was not. Her
mind was at work on a doll. She had once seen the picture of a doll,
a real one. It was such a lovely doll! She wanted to cuddle it. How
she would love to hug a doll close to her and rock it to sleep!
The corn was ripe in the field which was not far away. After the
lesson she would pick an ear of corn, dry it nicely and dress it in a
wee Indian blanket. She would make some beads for its neck. She
would stick in two black beads for eyes. She would—
"Bah! you do not heed the lesson!"
It was Mother. And Mother was scolding. There were few times in
Bah's life when she could remember Mother having been cross. Bah
was at once attentive.
"I am sorry, Ma Shima (my mother)," she said, in the Navajo
language. "I was dreaming of something sweet."
"It is bad medicine to dream when one is awake, Bah," said Mother.
"You will never learn to weave—and a Navajo woman who cannot
weave blankets is indeed a useless one."
Bah hung her head in shame. But Mother laughed.
"Do not look that way, my little one, but try now to make the little
pattern which I teach you."
Bah did try. She had to rip out several rows of bad weaving caused
by her dreams of her corn ear doll. But not once, until the lesson
was over, did Bah think again of the doll.
The weaving lesson was at last over, and Bah ran quickly to the
cornfield, where she began to look eagerly for a proper ear of corn
with which to make a proper Indian doll.
As she was looking through the many waving stalks, she thought she
heard her name being called. But was it her name, and was it being
called? It sounded more like singing than like calling—and Mother
did not sing.
"Bah, Bah, Black Sheep
Have you any wool?"
This is what Bah heard.
She stopped in her search and looked around. There, a few yards
away, was some one coming towards her on a pony. Bah's first
thought was to run. She did not want to meet a stranger. So few
came here to her home, where the only people the little girl ever
saw were Mother, Father, and the few Indians who lived nearby.
White people were mysterious to Bah, and yet she often wondered
about the white children and how they played and worked and what
they did all day in school. Bah would go to school next year—to the
big new school just built on the Reservation for Indian children.
White people built it, and so it must be like the white children's
school. Sometimes she longed to go—and other times she was just a
little bit afraid.
"Yes, sir, yes, sir,
Three bags full."
The pony which Bah had seen from a distance was now standing
beside her, and she could see the rider, although he could not see
her, for she had hidden and was crouching between the cornstalks.
BAH'S HOME
The rider was a very small person—a boy—a white boy. Bah really
didn't feel as though he should be classified as white, for his skin
was a mixture of orange and brown—orange where the sun had
burned him, and over that a pattern of vivid brown freckles. Bah had
never before seen anything like him, and it is no wonder that the
timid little Indian hid herself.
The speckled boy took off his large cowboy hat and wiped his hot
brow with a cowboy's handkerchief.
"Gee, it's hot, Peanuts," he said aloud to the pony. "And I'd like to
know the way back—but looks as if we're lost."
Peanuts was presumably bored, for he let his head sink slowly,
closed his eyes and patiently waited for the next move. None came.
Bah, in her hiding place, was as dumb, if not as bored, as Peanuts.
She was tense with excitement, which obviously Peanuts was not,
and did not take her eyes from the boy's face. His every move very
much interested her. Here, then, was a white boy. He must be white,
for he was not an Indian and he spoke English.
Bah understood English, and of that she was very proud. Her mother
and father had always traded with the white man, so they had
learned to speak English, and had wisely taught their little girl. Now
how much easier it would be for Bah when she started to school.
But her knowledge did not help her at the moment when she looked
up from her cornstalk hiding place into the face of a live white boy.
Indeed she had even decided to run away, and was crawling
noiselessly through the corn.
"Baa, Baa, Black Sheep,"
again the boy began to sing as he started to turn away. Bah stopped
crawling. He did sing her name. He wanted her to come back. Maybe
she could help him find his way. And Oh! the pony was stepping all
over the corn. Didn't he know better than to do that?
The cornstalks rustled. The pony jumped to the side, and the boy
turned in his saddle and saw Bah standing.
"Oh, hello!" he said and turned back—the pony trampling upon a
beautiful stalk of corn. "I didn't see you before. Where were you?"
Bah couldn't speak. She tried ever so hard, but the English words
she knew so well would not come.
The boy jumped down from his pony and went up to her. There was
a smile on his face and as he came closer she saw that his eyes
were as blue as the sky. That part of him was pretty, thought Bah,
even if his skin was not—and the smile was friendly. So she gained
courage.
"You call my name?" she ventured.
The boy looked puzzled.
"No," he said, "I don't know your name, but I'm glad I've found
you."
Again he smiled, and this time Bah smiled too.
"My name Bah," she said, "and you say 'Bah, Bah, back skip'—I think
you call me come back to you."
When it suddenly dawned upon the boy what she meant he opened
his mouth very wide indeed and laughed so hard that Bah again
began to be afraid. But he stopped suddenly, realizing perhaps that
he had frightened her, and said:
"Oh, no. That is a song we sing about 'black sheep' that goes 'bah
bah'! I didn't know you heard me singing it."
Bah looked a bit ashamed, and did not offer a reply. The boy kept on
talking—
"But, gee, where do you come from, Bah? Is your house around
here?"
"Yes," said Bah. "Hogan over way, Bah come to find corn in
cornfield."
"Oh, I see," said the boy, "for dinner, I guess."
"No," replied the Indian girl, looking up into his face, "Bah make so
pretty doll from corn ear. Will dress in blanket and beads. You ever
see little girl's doll?"
She looked so intent and innocent that the boy could not scoff at
what would have been, among members of his own group at home,
a subject entirely forbidden in the presence of growing gentlemen.
Dolls! What interest had he in dolls! But as he looked into the
upturned face of the little brown maiden, he suddenly realized that
she had never heard of a boy's dislike for dolls; in fact, she had
probably never before met a white boy nor seen a white doll.
"Oh, yes, plenty of 'em," answered the white boy, "but never made
of an ear of corn—"
Then, seeing a shadow pass over her face he resumed gallantly, "But
it ought to make a peach of a doll. Maybe I could help you make it."
Now Bah was certain that she would like the white boy. She had
never before had a human playmate, and the feeling was a pleasant
one. But she remembered that her new friend was lost.
"You no can find way home?" she asked.
The boy laughed.
"I guess you want to get rid of me," he said. Then, sobering, he
resumed. "Yes, really, I'm lost. Peanuts and I have been wandering
all morning. You see, we started from Tuba early and we just didn't
watch the trails, so here we are."
"Oh, Tuba," said Bah, "not so very far. I show you how to go."
"But first I'll help you fix up a corn doll," said the boy. "We'll first
have to find a good fat corn ear. Nice fat dolls are the best, don't you
think so?"
As he talked he began looking through the cornstalks, and Bah
watched him. He finally found what he considered to be an ideal ear,
and together the two children made it into a doll, black bead eyes,
cornsilk hair, blanket, and all.
"I have just the name for her," said the boy. "We'll call her 'Cornelia!'
Shall we?"
Bah nodded happily. The name was a new one to her and she did
not catch its meaning in relation to her beautiful new doll, but it
pleased her nevertheless. In fact, everything about the boy pleased
her, and she was sorry when at last he said:
BAH AND CORNELIA
"It must be getting late. You'd better tell me how to get home.
Mother will wonder what happened."
Bah pointed out directions and the boy, thanking her, held out his
hand and said: "You never even asked my name. Don't you want to
know?"
Bah drooped her head shyly as she replied: "Indian never ask name.
Very bad manner."
The white boy's eyes opened wide.
"That's funny," he said. "Then how do you get to know people's
names?"
"When one people like other people, they tell name. No ask," said
Bah seriously.
"Oh, then I'll tell you quick 'cause I like you. My name's Billy."
Bah did not reply, but stood watching Billy as he swung himself onto
his pony. Then, when he was seated and smiled down at her, she
smiled up sweetly and said:
"We have cow named Billy."
BILLY
Visual Data Mining Techniques And Tools For Data Visualization And Mining 1st Tom Soukup
CHAPTER II
SOMETHING TERRIBLE HAPPENS
For days Bah's chief delight was her new corn ear doll. She kept it
with her constantly. It went to bed with her, sat at meals with her,
and watched the daily weaving lesson.
But one day a terrible thing happened. She was sitting by her
mother's side outside the hogan, her little fingers flying through the
strings of her loom, and one eye watching Mother's more
experienced fingers as they made a beautiful new pattern.
Cornelia had been carefully dressed in her blanket, her beads hung
about her neck and fondly kissed by her devoted parent, and was
now lying at Bah's feet while the little girl worked hard at her lesson.
THE WEAVING LESSON
"Pull your wool tighter, Bah," said Mother, in Navajo.
Bah's fingers and tongue worked together. Children's tongues have a
habit of moving with whatever else is in motion.
And as Bah worked, some sheep came wandering in from the field.
They were tame sheep and often nosed about the hogan for a bit of
human company or food, as the case might be, and this morning I
fear the reason was food.
Father sheep was very large and therefore hungrier than the rest.
His hunger made him bold. But Bah was a particular friend of his,
and I doubt whether even his appetite could have driven him to do
what he did that morning, had he been able to guess the great
sorrow he was to cause.
"You have left out a stitch, my child, and there will be a hole in the
work."
Bah's fingers stopped and so did her tongue.
"Oh dear, must I do that all over again, Mother?" she asked.
"If you wish to weave perfectly so that you may some day sell your
work, then you must learn to rip and go over many times."
Ripping is deadly work, as everyone who has ever ripped knows.
And Bah was not as interested in ripping as she had been in making
her pattern. So her thoughts naturally turned to her precious
Cornelia lying at her feet.
Her eyes turned at the same time, and horror upon horrors, what
did she see? The big black sheep was there chewing contentedly,
but Cornelia was gone. The little blanket was there—so were the
beads and some of the cornsilk hair. But Cornelia was gone. The
sheep went on chewing and couldn't understand why Bah did not
caress him as usual.
"Bah, do pay attention to your work!"
Mother was annoyed. Bah turned around and Mother saw a very sad
sight. She saw before her another mother—a stricken little mother
whose child had just provided a meal for a hungry animal. She
rocked an empty blanket back and forth, and the tears were
beginning to gather. Mother understood what had happened, and
now her voice sounded soft and kind.
"GO AWAY, MR. SHEEP!"
"Poor Bah! Your doll is gone!"
The little girl was crying as she continued to hug the empty blanket.
"Do not cry, my little one," said Mother. "Are there not many more
corn ears in the field?"
"Yes, my Mother," sobbed the child, "but no more Cornelias!"
And that was final. Never again could Bah go back to the cornfield.
Never again! How could Mother even have suggested such a thing!
Didn't she know that Cornelia, since the day of her birth, had been
different from all other ears of corn?
Why, Cornelia was a doll—she and Billy had decided that—and the
rest were vegetables! Oh, didn't Mother understand? Perhaps Mother
did, for her next remark showed it.
"One day, Bah, when I went to the Trading Post near Tuba I saw a
most beautiful doll. She was an Indian baby—a papoose—and she
was strapped upon the prettiest little laced baby cradle you ever
saw. She was dressed in a bright blanket and she had real hair and
such lovely beads around her neck."
A smile was trying to chase away the tears on the face of the little
mother as she listened to her own mother's recital of something too
wonderful to imagine. She said sorrowfully: "Some white child will
buy her, and how happy she will be. Ah, how I should like to have
her."
Mother said: "And so you shall, if you will work to have her."
Bah's eyes asked the question: "How?" and her mother went on:
"You know, Bah, that Mother sells or trades blankets, and that Father
sells or trades his beautiful silver and matrix jewelry to the Trading
Post. We do this so that we may have, in return, things which we
want and need. Now, you want and need a little doll. Why not sell
your work? Bah must weave a little blanket and take it to the store
where they will perhaps trade with you for the papoose doll."
"Do you really think they will, Ma Shima?" asked Bah as if she could
hardly believe it, and she wiped away her tears.
HOW BAH LONGED FOR THE PAPOOSE DOLL!
"Yes, I do," answered Mother. "But your blanket must be well made
and of a pretty pattern—else they will not take it, for they, in turn,
must sell it to the tourists."
"Then I shall make the most beautiful blanket which has ever been
made," laughed Bah, now thoroughly interested in her new task with
its wonderful object.
She worked all through the morning on her little blanket, with happy
thoughts of a real-haired Indian doll flying through her mind as her
fingers flew through her work. It was not until she heard Mother
grinding the corn for lunch that she looked up, and not until then
that she thought again of the morning's sorrow. But then she did
think of it, and her parents wondered why she could not eat her
corn bread.
Visual Data Mining Techniques And Tools For Data Visualization And Mining 1st Tom Soukup
CHAPTER III
AT THE TRADING POST
Billy's mother and father had come to Arizona for a special reason.
Billy's father was a writer, and he had come for information on the
Navajo Indians for a new book he was writing. Every day he would
go to the Indian villages, sit among the big chiefs and medicine men
(who are the wise ones among the Indians and are supposed to
work charms which cure the sick) and he would jot down in his
notebook many things which they told him.
Billy went with his father the first few days, but he didn't care much
for the way they sat around and did nothing but talk. Billy was a
very active boy and he soon grew tired of listening to the droning
voices of the Indian men, and the scratching of Father's pencil. At
last he told Father how it was, and Father laughed.
"I thought you were going to write, too, Billy," he said. "You'll never
find out about the Indians if you don't take the trouble to listen—and
then you'll never win that composition contest you've been dreaming
about."
It was true that Billy, since he had left New York, had dreamed of
nothing else but the composition contest. Many of his friends at
home were already struggling with their compositions, for the prize
was worth striving for—a wonderful radio set, the very latest model.
"I TRADE MY BLANKET FOR PAPOOSE DOLL!"
And how the others had envied him, for he was to go to Arizona and
live among the Indians where he would be sure to learn so much of
interest and send in a true account of the lives of American Indians.
The contest was open to any composition dealing with children of
any particular race or country, and was to reveal their habits and
customs.
"Oh! You'll win it easily, Bill," his chum had said. "Indians are such
interesting people, and you'll find out all about them if you stick to
your dad."
And Billy had been fired with ambition, when he had left, and when
he had first arrived. But the novelty of the idea was gradually
wearing off and he seemed to like far more to gallop over the
country on his pony, Peanuts, than to glean knowledge. Especially
since his meeting with Bah did he look forward each morning to his
ride. And each day he tried to find the Indian girl and went many
times to the cornfield. But she was never there and, try as he might,
Billy could not find her village.
Father did not wait for Billy to answer him, but said: "Well, old man,
I can see the radio set gradually taking wings and broadcasting
itself! You'll never win it this way, you know—and you'd have a good
chance, too, if you'd come along and listen to some of the old
fellows I'm chumming with each day."
"Oh, I'll come along tomorrow, Dad," said Billy carelessly. "Today I'm
going to the Trading Post and see the Indian stuff there."
"Well, do as you like, Son," said his father, "but don't be annoyed if
you don't win the contest."
"I'll write something yet, Dad, you'll see."
Peanuts and Billy found themselves at the Trading Post in the heat of
the day. Billy tied the pony in the shade and went into the store. It
was filled with a mixed assortment of objects. On one side of the
room were groceries, pots and pans, cigarettes, in fact a little bit of
everything necessary for housekeeping. On the other side were the
Indian curios—silver and matrix jewelry, beautifully fashioned with
blue stones set in, handsome Navajo blankets hanging on the wall,
pottery of all kinds, and beads, beads, beads.
Billy wandered about the store and he thought of his mother, and
how she would like something to take home as a souvenir. The
beads looked hopeful, as he could carry them, while a pottery jar or
blanket would be big and heavy. Taking from his pocket his two
dollars and some few cents, he selected the string of beads which
looked most likely.
One string in particular very much pleased him. It was delicately
made, but looked simple enough to be within reach of his two
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

DOCX
Identify design strategies that address human recognition and reca.docx
PDF
Cio Best Practices Enabling Strategic Value With Information Technology Secon...
PDF
Universal Meta Data Models David Marco Michael Jennings
PDF
Database concepts 8th Edition David M. Kroenke
PDF
Database concepts 8th Edition David M. Kroenke
DOCX
Data Science & Big Data Analytics Discovering, Analyzing
PDF
CIO Best Practices Enabling Strategic Value with Information Technology Secon...
DOCX
Data Science & Big Data Analytics Discovering, Analyzing.docx
Identify design strategies that address human recognition and reca.docx
Cio Best Practices Enabling Strategic Value With Information Technology Secon...
Universal Meta Data Models David Marco Michael Jennings
Database concepts 8th Edition David M. Kroenke
Database concepts 8th Edition David M. Kroenke
Data Science & Big Data Analytics Discovering, Analyzing
CIO Best Practices Enabling Strategic Value with Information Technology Secon...
Data Science & Big Data Analytics Discovering, Analyzing.docx

Similar to Visual Data Mining Techniques And Tools For Data Visualization And Mining 1st Tom Soukup (20)

DOCX
Data Science & Big Data Analytics Discovering, Analyzing.docx
DOCX
Data Science & Big Data Analytics Discovering, Analyzing.docx
DOCX
Could you increase your knowledge—and raise your grade—i.docx
DOCX
ffirs.indd 24316PM12112014 Page iData Scienc.docx
DOCX
ffirs.indd 24316PM12112014 Page iData Scienc.docx
PDF
Data Analytics Initiatives Managing Analytics For Success Ondej Bothe
PDF
Time Difference: How Tomorrow's Companies Will Outpace Today's
PDF
Download full ebook of Insourcing Innovation David Silverstein instant downlo...
PDF
Knowledge Management And Elearning Jay Liebowitz
PDF
Smarter Analytics: Supporting the Enterprise with Automation
PDF
CIO Best Practices Enabling Strategic Value with Information Technology Secon...
PDF
The Abcs Of Ip Addressing 1st Edition Gilbert Held
PDF
Delivering Successful Projects With Tsp Mukesh Jain
DOCX
assign - id = exprid - A B Cexpr - id + .docx
PDF
Business Intelligence Applied Implementing An Effective Information And Commu...
PDF
Business Intelligence Applied Implementing An Effective Information And Commu...
PDF
SNW Spring 10 Presentation
DOCX
Editor in Chief Stephanie Wall Executive Editor Bo.docx
DOCX
Editor in Chief Stephanie Wall Executive Editor Bo
PDF
Professional Visual Studio 2005 Andrew Parsons
Data Science & Big Data Analytics Discovering, Analyzing.docx
Data Science & Big Data Analytics Discovering, Analyzing.docx
Could you increase your knowledge—and raise your grade—i.docx
ffirs.indd 24316PM12112014 Page iData Scienc.docx
ffirs.indd 24316PM12112014 Page iData Scienc.docx
Data Analytics Initiatives Managing Analytics For Success Ondej Bothe
Time Difference: How Tomorrow's Companies Will Outpace Today's
Download full ebook of Insourcing Innovation David Silverstein instant downlo...
Knowledge Management And Elearning Jay Liebowitz
Smarter Analytics: Supporting the Enterprise with Automation
CIO Best Practices Enabling Strategic Value with Information Technology Secon...
The Abcs Of Ip Addressing 1st Edition Gilbert Held
Delivering Successful Projects With Tsp Mukesh Jain
assign - id = exprid - A B Cexpr - id + .docx
Business Intelligence Applied Implementing An Effective Information And Commu...
Business Intelligence Applied Implementing An Effective Information And Commu...
SNW Spring 10 Presentation
Editor in Chief Stephanie Wall Executive Editor Bo.docx
Editor in Chief Stephanie Wall Executive Editor Bo
Professional Visual Studio 2005 Andrew Parsons
Ad

Recently uploaded (20)

PPTX
Cell Structure & Organelles in detailed.
PDF
Complications of Minimal Access Surgery at WLH
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Trump Administration's workforce development strategy
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Cell Types and Its function , kingdom of life
PDF
Classroom Observation Tools for Teachers
PPTX
Lesson notes of climatology university.
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Cell Structure & Organelles in detailed.
Complications of Minimal Access Surgery at WLH
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Chinmaya Tiranga quiz Grand Finale.pdf
RMMM.pdf make it easy to upload and study
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
VCE English Exam - Section C Student Revision Booklet
Trump Administration's workforce development strategy
Final Presentation General Medicine 03-08-2024.pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Anesthesia in Laparoscopic Surgery in India
Cell Types and Its function , kingdom of life
Classroom Observation Tools for Teachers
Lesson notes of climatology university.
Module 4: Burden of Disease Tutorial Slides S2 2025
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Ad

Visual Data Mining Techniques And Tools For Data Visualization And Mining 1st Tom Soukup

  • 1. Visual Data Mining Techniques And Tools For Data Visualization And Mining 1st Tom Soukup download https://ebookbell.com/product/visual-data-mining-techniques-and- tools-for-data-visualization-and-mining-1st-tom-soukup-920306 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Visual Data Mining Theory Techniques And Tools For Visual Analytics 1st Edition Simeon J Simoff https://ebookbell.com/product/visual-data-mining-theory-techniques- and-tools-for-visual-analytics-1st-edition-simeon-j-simoff-1223898 Visual Data Mining The Visminer Approach 2nd Edition Russell K Anderson https://ebookbell.com/product/visual-data-mining-the-visminer- approach-2nd-edition-russell-k-anderson-4084584 Visual And Spatial Analysis Advances In Data Mining Reasoning And Problem Solving 1st Edition Boris Kovalerchuk Auth https://ebookbell.com/product/visual-and-spatial-analysis-advances-in- data-mining-reasoning-and-problem-solving-1st-edition-boris- kovalerchuk-auth-4327288 Visual Analytics And Interactive Technologies Data Text And Web Mining Applications Premier Reference Source 1st Edition Qingyu Zhang https://ebookbell.com/product/visual-analytics-and-interactive- technologies-data-text-and-web-mining-applications-premier-reference- source-1st-edition-qingyu-zhang-2367810
  • 3. Collaborative Filtering Using Data Mining And Analysis Hardcover Vishal Bhatnagar https://ebookbell.com/product/collaborative-filtering-using-data- mining-and-analysis-hardcover-vishal-bhatnagar-9998672 Visual Data Insights Using Sas Ods Graphics A Guide To Communicationeffective Data Visualization 1st Edition Leroy Bessler https://ebookbell.com/product/visual-data-insights-using-sas-ods- graphics-a-guide-to-communicationeffective-data-visualization-1st- edition-leroy-bessler-47552196 Visual Data And Their Use In Science Education 1st Edition Kevin D Finson Jon Pedersen https://ebookbell.com/product/visual-data-and-their-use-in-science- education-1st-edition-kevin-d-finson-jon-pedersen-51388710 Visual Data Storytelling With Tableau 1st Edition Lindy Ryan https://ebookbell.com/product/visual-data-storytelling-with- tableau-1st-edition-lindy-ryan-10458402 Big Visual Data Analysis Scene Classification And Geometric Labeling 1st Edition Chen Chen https://ebookbell.com/product/big-visual-data-analysis-scene- classification-and-geometric-labeling-1st-edition-chen-chen-5359282
  • 5. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -1- Present to you by: Team-Fly® Visual Data Mining: Techniques and Tools for Data Visualization and Mining by Tom Soukup and Ian Davidson ISBN: 0471149993 John Wiley & Sons ?2002 (382 pages) Master the power of visual data mining tools and techniques. Table of Contents Back Cover Comments Table of Contents Visual Data Mining—Techniques and Tools for Data Visualization and Mining Trademarks Introduction Part I - Introduction and Project Planning Phase Chapter 1 - Introduction to Data Visualization and Visual Data Mining Chapter 2 - Step 1: Justifying and Planning the Data Visualization and Data Mining Project Chapter 3 - Step 2: Identifying the Top Business Questions Part II - Data Preparation Phase Chapter 4 - Step 3: Choosing the Business Data Set Chapter 5 - Step 4: Transforming the Business Data Set Chapter 6 - Step 5: Verify the Business Data Set Part III - Data Analysis Phase and Beyond Chapter 7 - Step 6: Choosing the Visualization or Data Mining Tool Chapter 8 - Step 7: Analyzing the Visualization or Mining Tool Chapter 9 - Step 8: Verifying and Presenting the Visualizations or Mining Models Chapter 10 - The Future of Visual Data Mining Appendix A - Inserts Glossary References Index List of Figures List of Tables List of Codes
  • 6. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -2- Present to you by: Team-Fly® Visual Data Mining-Techniques and Tools for Data Visualization and Mining Tom Soukup Ian Davidson Wiley Publishing, Inc. Publisher: Robert Ipsen Executive Editor: Robert Elliott Assistant Editor: Emilie Herman Associate Managing Editor: John Atkins New Media Editor: Brian Snapp Text Design & Composition: John Wiley Production Services Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. This book is printed on acid-free paper. Copyright © 2002 by Tom Soukup and Ian Davidson. All rights reserved. Published by John Wiley & Sons, Inc. Published simultaneously in Canada. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008, email: <PERMREQ@WILEY.COM> This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the understanding that the publisher is not engaged in professional services. If professional advice or other expert assistance is required, the services of a competent professional person should be sought. Library of Congress Cataloging-in-Publication Data:
  • 7. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -3- Present to you by: Team-Fly® Soukup, Tom, 1962- Visual data mining: techniques and tools for data visualization and mining / Tom Soukup, Ian Davidson. p. cm. "Wiley Computer Publishing." Includes bibliographical references and index. ISBN 0-471-14999-3 1. Data mining. 2. Database searching. I. Davidson, Ian, 1971- II. Title. QA76.9.D343 S68 2002 006.3-dc21 2002004004 Printed in the United States of America. 10 9 8 7 6 5 4 3 2 1 To Ed and my family for their encouragement -TOM To my wife and parents for their support. -IAN ACKNOWLEDGMENTS This book would not have been possible without the generous help of many people. We thank the reviewers for their timely critique of our work, and our editor, Emilie Herman, who skillfully guided us through the book-writing process. We thank the Oracle Technology Network and SPSS Inc., for providing us evaluation copies of Oracle and Clementine, respectively. The use of these products helped us to demonstrate key concepts in the book. Finally, we both learned a great deal from our involvement in Silicon Graphics' data mining projects. This, along with our other data mining project experience, was instrumental in formulating and trying the visual data mining methodology we present in this book. Tom Soukup and Ian Davidson My sincere thanks to the people with whom I have worked on data mining projects. You have all demonstrated and taught me many aspects of working on successful data mining projects. Ian Davidson To all my data mining and business intelligence colleagues, I add my thanks. Your business acumen and insights have aided in the formulation of a successful visual data mining methodology. Tom Soukup
  • 8. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -4- Present to you by: Team-Fly® ABOUT THE AUTHORS Tom Soukup is a data mining and data warehousing specialist with more than 15 years experience in database management and analysis. He currently works for Konami Gaming Systems Division as Director of Business Intelligence and DBA. Ian Davidson, Ph.D., has worked on a variety of commercial data-mining projects, such as cross sell, retention, automobile claim, and credit card fraud detection. He recently joined the State University of New York at Albany as an Assistant Professor of Computer Science. Trademarks Microsoft, Microsoft Excel, and PivotTable are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. Oracle is a registered trademark of Oracle Corporation. SPSS is a registered trademark, and Clementine and Clementine Solution Publisher are either registered trademarks or trademarks of SPSS Inc. MineSet is a registered trademark of Silicon Graphics, Inc.
  • 9. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -5- Present to you by: Team-Fly® Introduction Business intelligence solutions transform business data into conclusive, fact-based, and actionable information and enable businesses to spot customer trends, create customer loyalty, enhance supplier relationships, reduce financial risk, and uncover new sales opportunities. The goal of business intelligence is to make sense of change-to understand and even anticipate it. It furnishes you with access to current, reliable, and easily digestible information. It provides you the flexibility to look at and model that information from all sides, and in different dimensions. A business intelligence solution answers the question "What if ..." instead of "What happened?" In short, a business intelligence solution is the path to gaining-and maintaining-your competitive advantage. Data visualization and data mining are two techniques often used to create and deploy successful business intelligence solutions. By applying visualizations and data mining techniques, businesses can fully exploit business data to discover previously unknown trends, behaviors, and anomalies: ƒ Data visualization tools and techniques assist users in creating two- and three-dimensional pictures of business data sets that can be easily interpreted to gain knowledge and insights. ƒ Visual data mining tools and techniques assist users in creating visualizations of data mining models that detect patterns in business data sets that help with decision making and predicting new business opportunities. In both cases, visualization is key in assisting business and data analysts to discover new patterns and trends from their business data sets. Visualization is a proven method for communicating these discoveries to the decision makers. The payoffs and return on investment (ROI) can be substantial for businesses that employ a combination of data visualizations and visual data mining effectively. For instance, businesses can gain a greater understanding of customer motivations to help reduce fraud, anticipate resource demand, increase acquisition, and curb customer turnover (attrition). Overview of the Book and Technology This book was written to assist you to first prepare and transform your raw data into business data sets, then to help you create and analyze the prepared business data set with data visualization and visual data mining tools and techniques. Compared with other business intelligence techniques and tools, we have found that visualizations help reduce your time-to-insight-the time it takes you to discover and understand previously unknown trends, behaviors, and anomalies and communicate those findings to decision makers. It is often said that a picture paints a thousand words. For instance, a few data visualizations can be used to quickly communicate the most important discoveries instead of sorting through hundreds of pages of a traditional on-line analytical processing (OLAP) report. Similarly, visual data mining tools and techniques enable you to visually inspect and interact with the classification, association, cluster, and other data mining models for better understanding and faster time-to-insight. Throughout this book, we use the term visual data mining to indicate the use of visualization for inspecting, understanding, and interacting with data mining algorithms. Finding patterns in a data visualization with your eyes can also be considered visual data mining. In this case, the human mind acts as the pattern recognition data mining engine. Unfortunately, not all models produced by data mining algorithms can be visualized (or a visualization of
  • 10. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -6- Present to you by: Team-Fly® them just wouldn't make sense). For instance, neural network models for classification, estimation, and clustering do not lend themselves to useful visualization. The most sophisticated pattern recognition machine in the world is the human mind. Visualization and visual data mining tools and techniques aid in the process of pattern recognition by reducing large quantities of complicated patterns into two- and three-dimensional pictures of data sets and data mining models. Often, these visualizations lead to actionable business insights. Visualization helps business and data analysts to quickly and intuitively discover interesting patterns and effectively communicate these insights to other business and data analysts, as well as, decision makers. IDC and The Data Warehousing Institute have sampled business intelligence solutions customers. They concluded the following: 1. Visualization is essential (Source: IDC). Eighty percent of business intelligence solution customers find visualization to be desirable. 2. Data mining algorithms are important to over 80 percent of data warehousing users (Source: The Data Warehousing Institute). Visualization and data mining business intelligence solutions reach across industries and business functions. For example, telecommunications, stock exchanges, and credit card and insurance companies use visualization and data mining to detect fraudulent use of their services; the medical industry uses data mining to predict the effectiveness of surgical procedures, medical tests, medications, and fraud; and retailers use data mining to assess the effectiveness of coupons and promotional events. The Gartner Group analyst firm estimates that by 2010, the use of data mining in targeted marketing will increase from less than 5 percent to more than 80 percent (Source: Gartner). In practice, visualization and data mining has been around for quite a while. However, the term data mining has only recently earned credibility within the business world for its abilities to control costs and contribute to revenue. You may have heard data mining referred to as knowledge discovery in databases (KDD). The formal definition of data mining, or KDD, is the extraction of interesting (non-trivial, implicit, previously unknown, and potentially useful) information or patterns in large database. The overall goal of this book is to first introduce you to data visualization and visual data mining tools and techniques, demonstrate how to acquire and prepare your business data set, and provide you with a methodology for using visualization and visual data mining to solve your business questions. How This Book Is Organized Although there are many books on data visualization and data mining theory, few present a practical methodology for creating data visualizations and for performing visual data mining. Our book presents a proven eight-step data visualization and visual data mining (VDM) methodology, as outlined in Figure I.1. Throughout the book, we have stringently adhered to this eight-step VDM methodology. Each step of the methodology is explained with the help of practical examples and then applied to a real-world business problem using a real-world data set. The data set is available on the book's companion Web site. It is our hope that as you learn each methodology step, you will be able to apply the methodology to your real-world data sets and begin receiving the benefits of data visualization and visual data mining to solve your business issues.
  • 11. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -7- Present to you by: Team-Fly® Figure I.1: Eight-step data visualization and visual data mining methodology. Figure I.1 depicts the methodology as a sequential series of steps; however, the process of preparing the business data set and creating and analyzing the data visualizations and data mining models is an iterative process. Visualization and visual data mining steps are often repeated as the data and visualizations are refined and as you gain more understanding about the data set and the significance of one data fact (a column) to other data facts (other columns). It is rare that data or business analysts create a production-class data visualization or data mining model the first time through the data mining discovery process. This book is organized into three main sections that correspond to the phases of a data visualization and visual data mining (VDM) project: ƒProject planning ƒData preparation ƒData analysis
  • 12. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -8- Present to you by: Team-Fly® Part 1: Introduction and Project Planning Phase Chapter 1: "Introduction to Data Visualization and Visual Data Mining," introduces you to data visualization and visual data mining concepts used throughout the book. It illustrates how a few data visualizations can replace (or augment) hundreds of pages of traditional "green-bar" OLAP reports. Multidimensional, spatial (landscape), and hierarchical analysis data visualization tools and techniques are discussed through examples. Traditional statistical tools, such as basic statistics and histograms, are given a visual twist through statistic and histogram visualizations. Chapter 1 also introduces you to visual data mining concepts. This chapter describes how visualizations of data mining models assist the data and business analysts, domain experts and decision makers in understanding and visually interacting with data mining models such as decision trees. It also discusses using visualization tools to plot the effectiveness of data mining models, as well as to analyze the potential deployment of the models. Chapter 2: "Step 1: Justifying and Planning the Data Visualization and Data Mining Project," introduces you to the first of the eight steps in the data visualization and visual data mining (VDM) methodology and discusses the business aspects of business intelligence solutions. In most cases, the project itself needs a business justification before you can begin (or get funding for the project). This chapter presents examples of how various businesses have justified (and benefited) from using data visualization and visual data mining tools and techniques. Chapter 2 also discusses planning a VDM project and provides guidance on estimating the project time and resource requirements. It helps you to define team roles and responsibilities for the project. The customer retention business VDM project case study is introduced, and then Step 1 is applied to the case study. Chapter 3: "Step 2: Identifying the Top Business Questions," introduces you to the second step of the VDM methodology. This chapter discusses how to identify and refine business questions so that they can be investigated through data visualization and visual data mining. It also guides you through mapping the top business questions for your VDM project into data visualization and visual data mining problem definitions. Step 2 is then applied to the continuing customer retention VDM project case study. Part 2: The Data Preparation Phase Chapter 4: "Step 3: Choosing the Data," introduces you to the third step of the VDM methodology and discusses how to select the data relating to the data visualization and visual data mining questions identified in Chapter 3 from your operational data source. It introduces the concept of using an exploratory data mart as a repository for building and maintaining business data sets that address the business questions under investigation. The exploratory data mart is then used to extract, cleanse, transform, load (ECTL), and merge the raw operational data sources into one or more production business data sets. This chapter guides you through choosing the data set for your VDM project by presenting and discussing practical examples, and applying Step 3 to the customer retention VDM project case study. Chapter 5: "Step 4: Transforming the Data Set," introduces you to the fourth step of the VDM methodology. Chapter 5 discusses how to perform logical transformations on the business data set stored in the exploratory data mart. These logical transformations often help in augmenting the business data set to enable you to gain more insight into the business problems under investigation. This chapter guides you through transforming the data set
  • 13. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -9- Present to you by: Team-Fly® for your VDM project by presenting and discussing practical examples, and applying Step 4 to the customer retention VDM project case study. Chapter 6: "Step 5: Verifying the Data Set," introduces you to the fifth step of the VDM methodology. Chapter 6 discusses how to verify that the production business data set contains the expected data and that all of the ECTL steps (from Chapter 4) and logical transformations (from Chapter 5) have been applied correctly, are error free, and did not introduce bias into your business data set. This chapter guides you through verifying the data set for your VDM project by presenting and discussing practical examples, and applying Step 5 to the customer retention VDM project case study. Chapter 7: "Step 6: Choosing the Visualization or Data Mining Tool," introduces you to the sixth step of the VDM methodology. Chapter 7 discusses how to choose and fine-tune the data visualization or data mining model tool appropriate in investigating the business questions identified in Chapter 3. This chapter guides you through choosing the data visualization and data mining model tools by presenting and discussing practical examples, and applying Step 6 to the customer retention VDM project case study. Part 3: The Data Analysis Phase Chapter 8: "Step 7: Analyzing the Visualization or Data Mining Model," introduces you to the seventh step of the VDM methodology. Chapter 8 discusses how to use the data visualizations and data mining models to gain business insights in answering the business questions identified in Chapter 3. For data mining, the predictive strength of each model can be evaluated and compared to each other enabling you to decide on the best model that addresses your business questions. Moreover, each data visualization or data mining model can be visually investigated to discover patterns (business trends and anomalies). This chapter guides you through analyzing the visualizations or data mining models by presenting and discussing practical examples, and applying Step 7 to the continuing customer retention VDM project case study. Chapter 9: "Step 8: Verifying and Presenting Analysis," introduces you to the final step of the VDM methodology. Chapter 9 discussed the three parts to this step: verifying that the visualizations and data mining model satisfies your business goals and objectives, presenting the visualization and data mining discoveries to the decision-makers, and if appropriate, deploying the visualizations and mining models in a production environment. Although this chapter discusses the implementation phase, a complete essay of this phase is outside the scope of this book. Step 8 is then applied to the continuing customer retention VDM project case study. Chapter 10, "The Future of Visual Data Mining," serves as a summary of the previous chapters and discusses the future of data visualization and visual data mining. The Glossary provides a quick reference to definitions of commonly used data visualizations and data mining terms and algorithms. Who Should Read This Book A successful business intelligence solution using data visualization or visual data mining requires the participation and cooperation from many parts of your business organization. Since this books endeavors to cover the VDM project from the justification and planning phase up to implementation phase, it has a wide and diverse audience.
  • 14. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -10- Present to you by: Team-Fly® The following definitions identify categories and roles of people in a typical business organization and lists which chapters are most advantageous for them to read. Depending on your business organization, you may be responsible for one or more roles. (In a small organization, you may be responsible for all roles). Data Analysts normally interact directly with the visualization and visual data mining software to create and evaluate the visualizations and data mining models. Data analysts collaborate with business analysts and domain experts to identify and define the business questions and get help in understanding and selecting columns from the raw data sources. We recommend data analysts focus on all chapters. Business Analysts typically interact with previously created data visualizations and data mining models. Business analysts help define the business questions and communicate the data mining discoveries to other analysts - domain experts and decision makers. We recommend that business analysts focus on Chapters 1 through 4 and Chapters 8 and 9. Domain Experts typically do not create data visualizations and data mining models, but rather, interact with the final visualizations and models. Domain experts know the business, as well as what data the business collects. Data analysts and business analysts draw on the domain expert to understand and select the right data from the raw operational data sources, as well as to clarify and verify their visualization and data mining discoveries. We recommend domain experts focus on Chapters 1 through 4 and Chapters 6 and 9. Decision Makers typically have the power to act on the data visualization and data mining discoveries. The visualization and visual data mining discoveries are presented to decision makers to help them make decisions based on these discoveries. We recommend decision makers focus on Chapters 1, 2, and 9. Chapter 10 focuses on the near future of visualization in data mining. We recommend that all individuals read it. Table I.1: How This Book Is Organized and Who Should Read It CHAPTER TOPIC AND VDM STEP DISCUSSES DATA ANALYSTS BUSINESS ANALYSTS DOMAIN EXPERTS DECISION MAKERS 1 Introduction to Data Visualization and Visual Data Mining √ √ √ √ 2 Step 1: Justifying and Planning the Data Visualization/Data Mining Project √ √ √ √ 3 Step 2: Identifying the Top Business Questions √ √ √ 4 Step 3: Choosing the Data Set √ √ √ 5 Step 4: Transforming the √
  • 15. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -11- Present to you by: Team-Fly® Table I.1: How This Book Is Organized and Who Should Read It CHAPTER TOPIC AND VDM STEP DISCUSSES DATA ANALYSTS BUSINESS ANALYSTS DOMAIN EXPERTS DECISION MAKERS Data Set 6 Step 5: Verifying the Data Set √ √ 7 Step 6: Choosing the Visualization or Data Mining Model √ 8 Step 7: Analyzing the Visualization or Data Mining Model √ √ 9 Step 8: Verifying and Presenting the Analysis √ √ √ √ 10 The Future of Visualization and Visual Data Mining √ √ √ √ Software Tools Used There are numerous visualization software tools, and more are being developed and enhanced each year that you can use for data preparation, data visualization, and data mining. The graphical and data mining analysis capabilities of software tools vary from package to package. We have decided to limit our selection to four core packages for illustrating the data preparation and data analysis phases: Oracle, Microsoft Excel, SGI MineSet, and SPSS Clementine. These software packages are not required for reading or understanding this book, as the data visualization and data mining techniques described in the book are similar to those available in the majority of data visualization and data mining software packages. Oracle The majority of query examples in the book are written using ANSI standard structured query language (SQL) syntax. For the data preparation extraction, cleanse, transform, and load (ECTL) tasks, we chose to use Oracle SQL*Loader syntax. For some of the logical transformation tasks, we chose to use Oracle procedural language SQL (PL/SQL). The majority of queries, ECTL, and logical transformation tasks can be accomplished using similar functions and tools in other popular RDBMS products, such as Microsoft SQL server, Sybase, Informix, DB2, and RedBrick.
  • 16. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -12- Present to you by: Team-Fly® Microsoft Excel Excel is the most widely used spreadsheet and business graphics software tool. Excel provides comprehensive tools to help you create, analyze, and share spreadsheets containing graphs. We chose to use Excel to illustrate core data visualization types such as column, bar, pie, line, scatter, and radar graphs. These traditional graph types are common to most visualization tool suites. SGI MineSet Although no longer commercially available, we chose to use MineSet to illustrate advanced data visualization types, such as tree, statistics, and the 3D scatter graphs. These advanced graph types are common in most data mining software suites, such as ANGOSS Knowledge Studio, Oracle Darwin, IBM Intelligent Miner, and SAS Enterprise Miner. SPSS Clementine Clementine supports a variety of data mining techniques, such as prediction, classification, segmentation, and association detection. We chose to use Clementine to illustrate these core data mining techniques. These core data mining techniques are common in most of the data mining software suites previously listed. What's on the Web Site The companion Web site (www.wiley.com/compbooks/soukup) contains Web links to the data visualization and visual data mining software tools discussed throughout this book. It also contains Web links to the extraction, cleansing, transformation, and loading (ECTL) tools referenced in Chapter 4, as well as, other software tools discussed in other chapters. To demonstrate the eight-step data visualization and visual data mining methodology, we used a variety of business data sets. One business data set we used frequently was from a home equity loan campaign. We have included the entire home equity loan campaign prepared business data set on the Web site. For ease of transport and download, we have saved it as an Excel spreadsheet containing 44,124 records and 20 columns. At the end of Chapters 2 through 9, we applied each of the VDM steps to an ongoing customer retention case study. However, the size of the operations data sources, as well as the final two business data sets, is fairly large. For instance, the INVOICE.TXT file contains over 4.6 million rows. Therefore, we are providing the operational data sources and business data sets as an Access database file, casestudy.mdb, which is 180 MB. In addition, we are providing a 10 percent sample of each of the operational sources files, as well as the prepared business data sets as Excel spreadsheets, namely: ƒ10 percent sample of the CUSTOMER.TXT, CONTRACT.TXT, INVOICE.TXT, and DEMOGRAPHIC.TXT operational source files ƒ10 percent sample of the untransformed business data sets, customer_join and customer_demographics
  • 17. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -13- Present to you by: Team-Fly® ƒ10 percent sample of the prepared production business data sets, customer_join and customer_demographics Beware, if you use the sample Code Figure SQL on the 10 percent sample files instead of the complete data set your results may not exactly match those demonstrated in the book. However, depending on the capacity of your computer system and what database you are using, the 10 percent sample files may be easier for you to work with than the complete files contained in the Access database file. The decision of which set of files to use is up to you; nevertheless, we encourage you to work though the methodology steps with the customer retention operational data source files and business data set files as you read the book. Summary The process of planning, preparing the business data set, and creating and analyzing data visualizations and data mining models, is an iterative process. Visualization and visual data mining steps as described in the visualization and visual data mining (VDM) methodology are frequently repeated. As you gain more understanding of the data set and the significance of one data fact (a column) to other data facts (other columns), the data and visualizations are refined. It is rare that data or business analysts create a production-class data visualization or data mining model the first time through the data mining discovery process. Often the data must be further transformed or more data is necessary to answer the business question. In some cases, discoveries about the data set lead to refining the original business questions. The power of visualization provides you the ability to quickly see and understand the data set and data mining model so you can improve your analysis interactively. We hope that this book helps you develop production-class visualizations and data mining models that address your business questions. Furthermore, we hope that this book gives you the essential guidance to make your VDM project a success. The next chapter introduces you to data visualization and visual data mining concepts used throughout the book.
  • 18. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -14- Present to you by: Team-Fly® Part I: Introduction and Project Planning Phase Chapter List Chapter 1: Introduction to Data Visualization and Visual Data Mining Chapter 2: Step 1: Justifying and Planning the Data Visualization and Data Mining Project Chapter 3: Step 2: Identifying the Top Business Questions
  • 19. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -15- Present to you by: Team-Fly® Chapter 1: Introduction to Data Visualization and Visual Data Mining Overview When you read a newspaper or magazine, or watch a news or weather program on TV, you see numerous data visualizations. For example, bar and column graphs are often used to communicate categorical and demographic discoveries such as household or population survey results or trends, line graphs are used to communicate financial market time-based trends, and map graphs are used to communicate geographic weather patterns. Have you ever asked yourself why? Could it be that two- and three-dimensional data visualizations are the most effective way of communicating large quantities of complicated data? In this book, not only do we emphasize the benefits of data visualization to analyze business data sets and communicate your discoveries, but we also outline a proven data visualization and visual data mining methodology that explains how to conduct successful data mining projects within your organization. Chapter 1 introduces you to a variety of data visualization tools and techniques that you can use to visualize business data sets and discover previously unknown trends, behavior, and anomalies. It also introduces you to a variety of data visualization tools and techniques for visualizing, analyzing, and evaluating popular data mining algorithms. This book discusses two broad classes of visualizations-(1) data visualization techniques for visualizing business data sets and (2) visual data mining tools and techniques for visualizing and analyzing data mining algorithms and exploring the resultant data mining models. The distinction is as follows: ƒData visualization tools and techniques help you create two- and three-dimensional pictures of business data that can be easily interpreted to gain knowledge and insights into those data sets. With data visualization, you act as the data mining or pattern recognition engine. By visually inspecting and interacting with the two- or three-dimensional visualization, you can identify the interesting (nontrivial, implicit, perhaps previously unknown and potentially useful) information or patterns in the business data set. ƒVisual data mining tools and techniques help you create visualizations of data mining models to gain knowledge and insight into the patterns discovered by the data mining algorithms that help with decision making and predicting new business opportunities. With visual data mining tools, you can inspect and interact with the two- or three-dimensional visualization of the predictive or descriptive data mining model to understand (and validate) the interesting information and patterns discovered by the data mining algorithm. In addition, data visualization tools and techniques are used to understand and evaluate the results of the data mining model. The output from a data mining tool is a model of some sort. You can think of a model as a collection of generalizations or patterns found in the business data set that is an abstraction of the task. Just as humans may use their previous experience to develop a strategy to handle, say, difficult people, the data mining tool develops a model to predict people who are likely to leave a service organization. Depending on the data
  • 20. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -16- Present to you by: Team-Fly® mining tool, an explanation of why a decision was made is possible. Some data mining tools provide a clear set of reasons as to why a particular decision was made, while others are black boxes, making decisions but not telling you why. In both cases, visualization is key in helping you discover new patterns and trends and to communicate these discoveries to the decision makers. The payoffs and ROI (return-on-investment) can be substantial for businesses that use a combination of data visualization and visual data mining effectively. A base knowledge of various types of data visualization and visual data mining tools is required before beginning the eight-step data visualization and data mining (VDM) methodology discussed in Chapters 2 through 9. A good working knowledge of the visualization types will aid you in the project planning, data preparation, and data analysis phases of your VDM project. Visualization Data Sets The majority of business data sets are stored as a single table of information composed of a finite number of columns and one or more rows of data. Chapter 4 discusses how to choose the data from your operational data warehouse or other business data sources. However, before we begin introducing you to the visualization tools and techniques, a brief explanation of the business data set is necessary. Table 1.1 shows an example of a simple business data set with information (data) about weather. Table 1.1: Business Data Set Weather CITY DATE TEMPERATURE HUMIDITY CONDITION Athens 01-MAY-2001 97.1 89.2 Sunny Chicago 01-MAY-2001 66.5 100.0 Rainy Paris 01-MAY-2001 71.3 62.3 Cloudy The information (data facts) about the WEATHER subject data set is interpreted as follows: ƒWEATHER is the file, table, or data set name. A city's weather on a particular day is the subject under investigation. ƒCITY, DATE, TEMPERATURE, HUMIDITY, and CONDITION are four columns of the data set. These columns describe the kind of information kept in the data set-that is, attributes about the weather for each city. ƒATHENS, 01-MAY-2001, 97.1, 89.2, SUNNY is a particular record or row in the data set. Each unique set of data (data fact) should have its own record (row). For this row, the data value "Athens" identifies the CITY, "01-MAY-2001" identifies the DATE the measurement was taken, "97.1" identifies TEMPERATURE in degrees Fahrenheit, "89.2" identifies the HUMIDITY in percent, and "Sunny" identifies the CONDITION. ƒThe level of detail or granularity of data facts (experimental unit) is at the city level.
  • 21. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -17- Present to you by: Team-Fly® Data visualization tools and techniques are used to graphically display the data facts as a 2-D or 3-D picture (representation) of the columns and rows contained in the business data sets. Visualization Data Types Columns in a business data set (table or file) contain either discrete or continuous data values. A discrete column, also known as a categorical variable, is defined as a column of the table whose corresponding data values (record or row values) have a finite number of distinct values. For instance, discrete data type columns are those that contain a character string, an integer, or a finite number of grouped ranges of continuous data values. The possible data values for a discrete column normally range from one to a few hundred unique values. If there is an inherent order to the discrete column, it is also referred to as an ordinal variable. For instance, a discrete column whose unique values are SMALL, MEDIUM, or LARGE is considered an ordinal variable. A continuous column, also known as a numeric variable or date variable, is defined as a column of a table whose corresponding data values (record or row values) can take on a full range (potentially an infinite number) of numeric values. For instance, continuous data type columns are those that contain dates, double-precision numbers, or floating-point numbers. The possible unique data values for a continuous column normally range from a few thousand to an infinite number of unique values. Table 1.2 shows examples of the discrete and continuous columns. Table 1.2: Discrete and Contin uous Column Examples COLUMN DATA TYPE COLUMN NAME EXAMPLE ROW VALUES DATA VALUE RANGE Discrete CITY Athens, Chicago, Paris Finite number of cities in the world Discrete CONDITION Sunny, Rainy Finite number of weather conditions, such as Sunny, Partly Cloudy, Cloudy, Rainy Ordinal EDUCATION Unknown, High School Finite number of educational degree categories, such as High School, Bachelor, Master, Doctorate Discrete GENDER M, F, U Finite number of values, such as M for male, F for female, U for unknown Ordinal AGE_GROUPS 0-21, 22-35 Finite number of age range groups
  • 22. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -18- Present to you by: Team-Fly® Table 1.2: Discrete and Contin uous Column Examples COLUMN DATA TYPE COLUMN NAME EXAMPLE ROW VALUES DATA VALUE RANGE Discrete PURCHASE_MONTH January, February Finite number of months Continuous DATE 01-MAY-2001, 02-MAY-2001 All possible dates Continuous TEMPERATURE 97.1, 66.2, 71.3 All possible numeric temperatures in degrees Fahrenheit Continuous HUMIDITY 89.1, 100.0, 62.3 All numbers between 0 and 100 percent Continuous TOTAL_SALES 1.00, $1,000,000.00 All possible total sales amounts Visual versus Data Dimensions Take care not to confuse the terms visual dimension and data dimension. Visual dimension relates to the spatial coordinate system. Data dimension, on the other hand, relates to the number of columns in a business data set. Visual dimensions are the graphical x-, y-, and z-axis of the spatial coordinate system or the color, opacity, height, or size of the graphical object. Data dimensions are the discrete or continuous columns or variables contained within the business data set. If we use the business data set from Table 1.1, the data dimensions of the weather data set are the columns CITY, DATE, TEMPERATURE, HUMIDITY, and CONDITION. To create a two- or three-dimensional visualization of the weather data set, the columns under investigation are selected from the business data set to create a graphical data table. The graphical data table is used to map the column values of the business data set to corresponding data points in an x-, y-, or z-axis coordinate system. Figure 1.1 illustrates a column graph visualization comparing the TEMPERATURE and HUMIDITY continuous data dimensions by the CITY discrete data dimension for the weather data set. The corresponding graphical data table values for the TEMPERATURE and HUMIDITY columns are represented by the height of the bars. A pair of bars is drawn for each corresponding CITY value. Normally, the graphical data table is not part of the visualization; however, in this example, the table is included to illustrate how the column graph was created.
  • 23. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -19- Present to you by: Team-Fly® Figure 1.1: Column graph comparing temperature and humidity by city. Since the WEATHER data set only contained summer temperatures ranging from 32 to 120 degrees Fahrenheit, the same y-axis scale can be used for both HUMIDITY and TEMPERATURE. For a data set with different HUMDITY and TEMPERATURE ranges, two y-axes would be required-one for the HUMIDITY scale (0 to 100 percent) and one for the TEMPERATURE scale (-65 to 150 degrees Fahrenheit). Data Visualization Tools Data visualization tools are used to create two- and three-dimensional pictures of business data sets. Some tools even allow you to animate the picture through one or more data dimensions. Simple visualization tools such as line, column, bar, and pie graphs have been used for centuries. However, most businesses still rely on the traditional "green-bar" tabular report for the bulk of the information and communication needs. Recently, with the advance of new visualization techniques, businesses are finding they can rapidly employ a few visualizations to replace hundreds of pages of tabular reports. Other businesses use these visualizations to augment and summarize their traditional reports. Using visualization tools and techniques can lead to quicker deployment, result in faster business insights, and enable you to easily communicate those insights to others. The data visualization tool used depends on the nature of the business data set and its underlying structure. Data visualization tools can be classified into two main categories: ƒMultidimensional visualizations ƒSpecialized hierarchical and landscape visualizations
  • 24. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -20- Present to you by: Team-Fly® Choosing which visualization technique or tool to use to address your business questions is discussed in Chapter 7. Using and analyzing the visualization to discover previously unknown trends, behaviors, and anomalies in your business data set is covered in Chapter 8. Multidimensional Data Visualization Tools The most commonly used data visualization tools are those that graph multidimensional data sets. Multidimensional data visualization tools enable users to visually compare data dimensions (column values) with other data dimensions using a spatial coordinate system. Figure 1.2 shows examples of the most common visualization graph types. Other common multidimensional graph types not shown in Figure 1.2 include contour, histogram, error, Westinghouse, and box graphs. For more information on these and other graph types refer to Information Graphics: A Comprehensive Illustrated Reference, by R. Harris (Oxford: Oxford University Press, 1999). Figure 1.2: Multidimensional data visualization graph types. Most multidimensional visualizations are used to compare and contrast the values of one column (data dimension) to the values of other columns (data dimensions) in the prepared business data set. They are also used to investigate the relationships between two or more continuous or discrete columns in the business data set. Table 1.3 lists some common multidimensional graph types and the types of column values they can compare or the kinds of relationships they can investigate.
  • 25. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -21- Present to you by: Team-Fly® Table 1.3: Graph Types and Column Types GRAPH TYPE TYPE OF COLUMN VALUES TO COMPARE Column and bar Used to compare discrete (categorical) column values to continuous column values Area, stacked column or bar, line, high-low-close, and radar Used to compare discrete (categorical) column values over a continuous column Pie, doughnut, histogram, distribution, and box Used to compare the distribution of distinct values for one or more discrete columns Scatter Used to investigate the relationship between two or more continuous columns Column and Bar Graphs Column and bar graphs, such as clustered column and clustered bar graphs, compare continuous data dimensions across discrete data dimensions in an x- and y-coordinate system. Column graphs plot data dimensions much like a line graph, except that a vertical column is drawn from the x-axis to the y-axis for the value of the data dimension. Bar graphs are identical to column graphs, except the x-axis and y-axis are switched so that the bar graphical entities are drawn horizontally instead of vertically. In either case, the data values associated with different sets of data are grouped by their x-axis label to permit easy comparison between groups. Each set of data can be represented by a different color or pattern. Stacked column and bar graphs work exactly like the non-stacked version, except that the y-axis data dimension values from previous data sets are accumulated as each column is plotted. Thus, bar graphical entities appear to be stacked upon each other rather than being placed side by side. Figure 1.1 illustrates a multidimensional column graph visualization comparing the TEMPERATURE and HUMIDITY data dimensions by the CITY data dimension for the weather data set from Table 1.1. The interpretation of the bar graph in Figure 1.1 is left to the viewer-who posssesses perhaps the most sophisticated pattern recognition machine ever created. What conclusions can be discovered from the column graph illustrated in Figure 1.1? You may conclude the rule is that (in most cases) temperature tends to be higher than the humidity. However, in the case of Chicago, the rule is broken. Despite this, if you must also take into consideration the CONDITION column, you can refine the rule to be that temperature tends to be higher than humidity unless it is raining. Now the rule would be true for all rows in the data set. Obtaining more records for the data set and plotting them would help you visually test and refine your rule. Distribution and Histogram Graphs An extremely useful analytical technique is to use basic bar and column graphs to display the distribution of values for a data dimension (column). Distribution and histogram graphs display the proportion of the values for discrete (nonnumeric) and continuous (numeric) columns as specialized bar and column graphs. A distribution graph shows the occurrence of discrete, non-numeric column values in a data set. A typical use of the distribution graph is to show imbalances in the data. A histogram, also referred to as a frequency graph, plots the number of occurrence of same or distinct values in the data set. They are also used to reveal imbalances in the data. Chapters 4, 5, and 6 use distribution and histogram graphs to initially explore the data set, detect imbalances, and verify the
  • 26. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -22- Present to you by: Team-Fly® correction of these imbalances. Chapters 7 and 8 use distribution and histogram graphs to discover and evaluate key business indicators. Figure 1.3 shows a distribution graph of the INVOICE DATE data dimension for 2,333 billing records for the first four months of 2000. From the distribution graph, you can visually see that the month of February 2000 had the most invoices. Since you can verify the number of records by month against the original operational data source, the distribution graph provides you a method for verifying whether there are missing records in your business data set. Figure 1.3: Distribution graph of invoices for the first four months of 2000. Figure 1.4a shows a histogram graph of the number of invoices by REGION and Figure 1.4b shows a histogram graph of the number of invoices by BILLING RATE groupings for the first four months of 2000 from the same accounting business data set. In both of these graphs, you can visually see the skewness (lack of symmetry in a frequency distribution) in the column value distribution. For instance, the histogram graph of invoices by REGION (Figure 1.4a) is skewed toward the Eastern region while the histogram graph of invoice by BILLING RATE (Figure 1.4b) is skewed toward billing rates of $15.00 an hour or less.
  • 27. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -23- Present to you by: Team-Fly® Figure 1.4: Histogram graphs of invoices by region and by billing rate regions. Box Graphs Understanding descriptive statistical information about the column's values has typically been accomplished by analyzing measurements of central tendency (such as mean, median, and mode), measurements of variability (such as standard deviation and variance), and measures of distribution (such as kurtosis and skewness). For more information about central tendency, variability, and distribution measurements, refer to Statistics for the Utterly Confused by L. Jaisingh (New York: McGraw-Hill, 2000). Table 1.4 shows some of the common descriptive statistics derived from the values of the continuous column BILLING RATE. Table 1.4: Descriptive Statistics for BILLING RATE
  • 28. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -24- Present to you by: Team-Fly® BILLING_RATE Mean 19.59751 Standard error 0.271229 Median 15 Mode 12 Standard deviation 13.10066 Sample variance 171.6274 Kurtosis 16.48715 Skewness 3.196885 Range 159 Minimum 7 Maximum 166 Sum 45721 Count 2333 Confidence level (95.0%) 0.531874 A variation on the histogram graph is the box plot graph. It visually displays statistics about a continuous column (numeric and date data types). Figure 1.5 shows two box plots for the BILLING RATE and INVOICE DATE.
  • 29. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -25- Present to you by: Team-Fly® Figure 1.5: Box graph of BILLING RATE and INVOICE DATE. The box graphs display the following for each continuous column in the data set: ƒ The two quartiles (25th and 75th percentiles) of the column's values. The quartiles are shown as lines across a vertical colored bar. The length of the bar represents the difference between the 25th and 75th percentiles. From the length of the bar you can determine the variability of the continuous column. The larger the bar, the greater the spread in the data. ƒ The minimum, maximum, median, and mean of the column's values. The horizontal line inside the bar represents the median. If the median is not in the center of the bar, the distribution is skewed. ƒ The standard deviation of the column's values. The standard deviation is shown + and - one standard deviation from the column's mean value. The box plots visually reveal statistical information about the central tendency, variance, and distribution of the continuous column values in the data set. The statistics graphs in Figure 1.5 show the position of the descriptive statistics on a scale ranging from the minimum to the maximum value for numeric columns. They are often used to explore the data in preparation for transformations and model building. Similar to the distribution and histogram graph, statistics graphs are frequently used to reveal imbalances in the data. Chapters 4, 5, and 6 use statistics graphs to initially explore the data set, detect imbalances, and verify the correction of these imbalances. Line Graphs In its simplest form, a line graph (chart) is nothing more than a set of data points plotted in an x- and y-coordinate system, possibly connected by line segments. Line graphs normally show how the values of one column (data
  • 30. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -26- Present to you by: Team-Fly® dimension) compare to another column (data dimension) within an x- and y-coordinate system. Line and spline segments will connect adjacent points from the values of the data column. The data values for the x-axis can be either discrete or continuous. If the data values are discrete, the discrete values become the labels for successive locations on the axis. The data values for the y-axis must be continuous. Often line graphs are used to demonstrate time series trends. Figure 1.6 shows a line graph visualization comparing the 1-, 3-, 6-, and 12-month bond yield indices from 1/17/1996 to 6/23/2000. The time series data dimension (date) is plotted on the x-axis. The corresponding data values for the 1-, 3-, 6-, and 12-month yields are plotted on the y-axis. The corresponding column data values are shown as points connected by a line within the x-y coordinate system. Figure 1.6: Line graph of bond yield indices. Figure 1.6 is the compilation of four individual line graphs. It allows you to quickly see how the yield indices compare to one another over the time dimension by the positions of the lines in the x- and y-coordinate system. In this single data visualization, over 4,500 pieces of information are communicated (1,136 individual daily readings of 4 values). Various trends may have been missed if you were only looking at column after column of numbers from a green-bar report. A high-low-close graph is a variation on the line graph. Instead of a single x-y data point, the high, low, and close column values are displayed as hash markers on a floating column (the floating column being defined by the high and low values) within the x- and y- coordinate system. A typical use of high-low-close graphs is to show stock trends. Another variation on the line graph is the radar graph, which shows radars with markers at each data point in a 360-degree coordinate system instead of the traditional 90-degree x-y coordinate system. Figure 1.7 shows a
  • 31. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -27- Present to you by: Team-Fly® radar graph of the bond yield indices comparing the 1- and 6-month bond yields. In Chapters 7 and 8, line and radar graphs are used to discover and analyze time-based trends. Figure 1.7: Radar graph of bond yield indices. Scatter Graphs Scatter graphs (sometimes referred to as scatter plots) are typically used to compare pairs of values. A scatter graph enables you to visualize the business data set by mapping each row or record in the data set to a graphical entity within a two- or three-dimensional graph. In contrast to the line graph, a scatter graphs displays unconnected points on an x-, y-, or z-coordinate system (3-D). In its simplest mode, data dimensions from the data set are mapped to the corresponding points in an x- and y-coordinate (2-D). The bubble graph is a variation of a simple scatter graph that allows you to display another data dimension of the data set as the size of the graphical entity, as well as its position within the x- and y-coordinate system. Figure 1.8 illustrates how you can use a scatter graph to investigate the relationship between the number of store promotions and the weekly profit. In Chapters 7 and 8, scatter graphs are used to discover and evaluate cause and effect relationships.
  • 32. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -28- Present to you by: Team-Fly® Figure 1.8: Scatter graph of weekly profit by number of promotions. Pie Graphs Pie graphs display the contribution of each value to the sum of the values for a particular column. Discrete column values become the labels for the slices of the pie, while the continuous column values are summarized into contribution per the discrete column value. Figure 1.9a shows a pie graph comparing the percent contribution of the total votes cast for each candidate in the state of Florida during the 2001 U.S. presidential race. Pie graphs are also very useful in showing column value distributions. In Chapters 4, 5, and 6, they are used to compare column value distributions before and after data preparation steps.
  • 33. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -29- Present to you by: Team-Fly® Figure 1.9: Pie and doughnut graphs of the presidential vote in Florida. The doughnut graph is a variation on the pie graph. It can be used to compare and contrast multiple continuous columns at the same time. For instance, using a doughnut graph, you could show the voting percentages per U.S. presidential candidate in Florida, Wisconsin, and other states within the same visualization. This allows you to not only compare the vote percentages per candidate in Florida but also to compare those percentages against the other states that were visualized. Figure 1.9b shows a doughnut graph of the presidential vote in Florida.
  • 34. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -30- Present to you by: Team-Fly® Hierarchical and Landscape Data Visualization Tools Hierarchical, landscape, and other specialized data visualization tools differ from normal multidimensional tools in that they exploit or enhance the underlining structure of the business data set itself. You are most likely familiar with an organizational chart or a family tree. Some business data sets possess an inherent hierarchical structure. Tree visualizations can be useful for exploring the relationships between the hierarchy levels. Other business data sets have an inherent geographical or spatial structure. For instance, data sets that contain addresses have a geographical structure component. Map visualization can be useful for exploring the geographical relationships in the data set. In other cases, the data set may have a spatial versus geographical structure component. For instance, a data set that contains car part failures inherently has spatial information about the location of the failure within the car. The failures can be "mapped" to a diagram of a car (a car landscape). Another data set may contain where in the factory the failing part was manufactured. The failure can be "mapped" to a diagram of the factory (a factory landscape) to explore whether the failed part has any significance to the location where it was manufactured. Tree Visualizations The tree graph presents a data set in the form of a tree. Each level of the tree branches (or splits) based upon the values of a different attribute (hierarchy in the data set). Each node in the tree shows a graph representing all the data in the sub-tree below it. The tree graph displays quantitative and relational characteristics of a data set by showing them as hierarchically connected nodes. Each node contains information usually in the form of bars or disks whose height and color correspond to aggregations of data values (usually sums, averages, or counts). The lines (called edges) connect the nodes together and show the relationship of one set of data to its subsets. Figure 1.10 illustrates the number of families on Medicaid from a 1995 Census data set using a tree graph. The "root" node, or start of the tree, shows the total number of families on Medicaid (the small, darker colored column on the right) and not on Medicaid (the taller, lighter colored column on the left) that occur in the entire data set. You can see the number of families on Medicaid is very small, as the height of the lighter column is much greater than the darker column. The second level of the tree represents the number of families on Medicaid by the various family types. By visualizing the data in this way, you may be able to find some combination attributes and values that are indicative of families having a higher than normal chance of being on Medicaid. As you can see from tree visualization, some types of families have a significantly higher chance of being on Medicaid than others (related subfamily and second individual family types versus non-family householders).
  • 35. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -31- Present to you by: Team-Fly® Figure 1.10: Tree visualization of proportion of families on Medicaid by family type and region. Map Visualizations To explore business data sets for strong spatial (typically geographical) relationships, you can use a map visualization. The corresponding column values are displayed as graphical elements on a visual map based on a spatial key. Although the data set contains a geographic data dimension, what is not contained in the data set is the information that says there are 50 states in the United States, that California and New York are 3,000 miles apart, that California is south of Oregon, or what the latitude or longitude coordinates are for the states. For instance, you can plot your total sales by state, state and county, and zip code. Figure 1.11 is a map visualization of a business data set that contains information about the number of new account registrations by state. Using a corresponding color key, the states are colored based on the number of registrations by state. You can quickly determine from the map which sales locations (states and regions) are signing up more new customers than others. You can also see the geographic significance of the best-producing state or regions compared with other states and regions.
  • 36. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -32- Present to you by: Team-Fly® Figure 1.11: Map visualization of new account registrations by state. Visual Data Mining Tools Visual data mining tools can be used to create two- and three-dimensional pictures of the how the data mining model is making its decision. The visualization tool used depends on the nature of data set and the underlying structure of the resulting model. For example, in Figure 1.12 a decision tree model is visualized using a hierarchical tree graph. From this visualization you can more easily see the structure of the model.
  • 37. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -33- Present to you by: Team-Fly® Figure 1.12: Tree visualization of a decision tree to predict potential salary. Unfortunately, not all data mining algorithms can be readily visualized with commercially available software. For instance, neural network data mining models simulate a large number of interconnected simple processing units segmented into input, hidden, and output layers. Visualizing the entire network with its inputs, connections, weights, and outputs as a two- or three-dimensional picture is an active research question. Visualization tools are also used to plot the effectiveness of the data mining model, as well as to analyze the potential deployment of the model. A gains chart is a line graph that directly compares a model's performance at predicting a target event in comparison to always guessing it occurs. The cumulative gain is the proportion of all the target events that occur up to a specific percentile. Figure 1.13 illustrates a cumulative gains chart. The population series refers to our random-guess model. From this line graph, you can compare and contrast the performance of different data mining models. You can also use these visualizations to compare and contrast the performance of the models at the time they are built and once they are deployed. You can quickly visually inspect the performance of the model to see if it is performing as expected or becoming stale and out-of-date. Other multidimensional data visualization tools are useful in analyzing the data mining model results, as well as comparing and contrasting multiple data mining models.
  • 38. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -34- Present to you by: Team-Fly® Figure 1.13: Evaluation line graph. The tree visualization in Figure 1.12 and the line visualization in Figure 1.13 are just two examples of how you can use data visualization to explore how data mining models make their decisions and evaluate multiple data mining models. Choosing which visual data mining tool to use to address your business questions is discussed in Chapter 7. Analyzing the visualization of the data mining model to discover previously unknown trends, behaviors, and anomalies in business data set is discussed in Chapter 8. Summary Chapter 1 summarized data visualization and visual data mining tools and techniques that can be used to discover previously unknown trends, behaviors, and anomalies in business data. In the next chapter, we help you justify and plan a data visualization and data mining project so you can begin to exploit your business data with data visualization and visual data mining to gain knowledge and insights into business data sets and communicate those discoveries to the decision makers. Chapters 2 through 9 present and teach you a proven eight-step VDM methodology that we have used to create successful business intelligence solutions with data visualization and visual data mining tools and techniques.
  • 39. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -35- Present to you by: Team-Fly® Chapter 2: Step 1: Justifying and Planning the Data Visualization and Data Mining Project Overview Step 1 of the eight-step data visualization and data mining (VDM) methodology is composed of both the project justification and the project plan. Chapter 1 provided you with an introduction to visualization and data mining tools and techniques. This chapter shows you how to justify and plan the VDM project. Before the first row of data is visualized or mined, a project justification and plan needs to be developed to ensure the success of the project. The purpose of the project justification is to identify quantitative project objectives and develop a sound business case for performing the project, and to gain executive support and funding from the decision makers for the project. The project justification defines the overall business stimulus, return-on-investment (ROI) targets, and visualization and data mining goals for the project. The purpose of a project plan is to define the scope, high-level tasks, roles, and responsibilities for the project. The project plan establishes a roadmap and project time-line. It defines the roles and responsibilities of all participants who will be involved in the project and serves as an "agreement" of individual responsibilities among the operations and data warehousing, the data and business analyst, the domain expert, and the decision maker teams. A closed-loop business model is often helpful in modeling the business aspects of the project. The closed-loop model ensures the resulting visualizations or data mining models feed back into the initial data set sources. This feedback loop enables you to refine, improve, and correct your production visualizations or data mining models through time. Other feedback loops within the business model ensure your project stays focused, makes business sense, and remains within the scope of the project. This chapter begins by discussing three types of projects: ƒ Proof-of-concept ƒ Pilot ƒ Production We then introduce using a closed-loop business model, provide guidance to estimating the project timeline and resources, and define team roles and responsibilities for the project. At the end of this chapter, we introduce the case study of a customer retention business problem. We then apply the concepts discussed in this chapter to the case study to illustrate Step 1 of the VDM methodology. Classes of Projects The overall scope of your VDM project can be categorized into three classes of projects: proof-of-concept, pilot, or production. Often a successful proof-of-concept or pilot project later leads to a production project. Therefore, no matter which type of project is planned, it helps to keep the overall structure of the project justification and plan consistent. This enables you to quickly turn a proof-of-concept project justification and plan into a pilot or
  • 40. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -36- Present to you by: Team-Fly® production project without starting over from scratch or wasting time and resources. Among other factors, the type of project will determine the following: ƒThe difficulty and number of the business questions investigated ƒThe complexity and amount of data analyzed ƒThe quality and completeness of the data ƒThe project costs (personnel, software, and hardware cost) ƒThe duration of the project ƒThe complexity and number of resultant visualizations and models created A proof-of-concept VDM project has a limited scope. The overall scope of a proof-of-concept project is to determine whether visualization and data mining will be beneficial to your business, to prove to the decision makers the value of visualization and data mining, and to give your organization experience with visualization and data mining concepts. Typically, one or two relatively trivial business questions are investigated. The data set analyzed is limited to a small sample of existing data. The average duration of a proof-of-concept project normally is a few weeks. A pilot VDM project also has a limited scope. The overall scope of the pilot project is to investigate, analyze, and answer one or more business questions to determine if the ROI of the discoveries warrants a production project. The data set analyzed is limited to representative samples from the real data sources. Often you will need to purchase limited copies of the visualization and data mining tools. However, since the pilot project may not be implemented, you may not have to purchase the production hardware or copies of the visualization and data mining tools for everyone. The average duration of a pilot project is normally a few months . A production VDM project is similar to the pilot project in scope; however, the resulting visualizations and data mining models are implemented into a production environment. The overall scope of the production project is to fully investigate, analyze, and answer the business questions and then to implement an action plan and measure the results of the production visualizations and data mining models created. You will need to purchase licenses for the visualization and data mining tools for all production users and buy the production hardware. The average duration of a production project ranges from a few months to a year. The actual project deployment may last many years. Depending on the visualization and data mining experience level of your staff, you may need to augment it. For production projects, you will need a dedicated and trained staff to maintain the production environment. Many times after you see the benefits and ROI from the project, you will want to use visualization and data mining to answer other business questions or use VDM in other departments in your organization. Project Justifications After you have decided which class of project to do, you next need to create a project justification. The project justification defines the overall business stimulus, ROI targets, and visualization and data mining goals for the
  • 41. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -37- Present to you by: Team-Fly® project. Developing a project justification begins by identifying a high-level business issue your business needs to address. Table 2.1 lists a few of the business issues that can be addressed by VDM projects. Table 2.1: Business Issues Addressed by Visualizations or Visual Data Mining Projects BUSINESS ISSUE VDM PROJECT OBJECTIVES Target marketing To discover segments of "ideal" customers who share the same characteristics, such as income level, and spending habits, with the best candidates for a specific product or service Cross-marketing To discover co-relations and associations between product sales and make predictions based on these associations to facilitate cross-marketing Customer profiling To create models to determine what types of customers buy which products Identification of customer requirements To discover the best product matches for different segments of customers and use predictions to find what factors will attract new customers Financial planning and asset evaluation To create descriptive or predictive models to aid in cash flow analysis and predictions, contingent claim analysis, and trend analysis to evaluate assets Resource planning To create descriptive or predictive models to aid in analyzing and comparing resources and spending Competitive analysis To segment customers into classes for class-based pricing structures and set pricing strategies for highly competitive markets Fraud detection To create descriptive or predictive models to aid in analyzing historical data to detect fraudulent behaviors in such industries as medical, retail, banking, credit card, telephone, and insurance Attrition modeling and analysis To create descriptive or predictive models to aid in the analysis of customer attrition Chemical and pharmaceutical analysis To create descriptive or predictive models to aid in molecular pattern modeling and analysis, as well as drug discovery and clinical trial modeling and analysis Attempt to state your overall project goal in a single statement, for instance, "To discover segments of ideal customers who share the same characteristics and who are the best candidates for our new cable modem service offering." You may need to interview various departments within your organization before deciding on your project goal. For proof-of-concept projects, keep the overall project goal simple. For pilot or production projects, the overall project goal may be more complex. Use the examples in Table 2.1 to establish your own project objective. Perhaps the most difficult part of the project's business justification is determining realistic ROI objects and expected outcomes. You will often need the assistance of the business analysts or line-of-business manager to help quantify the cost of continuing to do business "status quo." Your aim should be to create a document that contains the project ROI objectives; describes the content, form, access, and owners of the data sources; summarizes the
  • 42. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -38- Present to you by: Team-Fly® previous research; explains the proposed methodology; and forecasts the anticipated outcome. When preparing the justification document, keep in mind the class of project you are planning, as well as your target audience-the decision makers and business experts. As reference material for your business justification, include industry examples of visualization and data mining success stories. Choose those success stories that relate to the business issues you are trying to address. Our companion Web site (www.wiley.com/compbooks/soukup) has links to the majority of the commercially available data visualization and visual data mining software providers. For example, you can find the following success stories on the SPSS, SAS, and Oracle Web sites. Dayton Hudson Corp. Success Story Retail is a very competitive industry. The Dayton Hudson Corp. (DHC) success story highlights how they use data mining to grow their business and improve customer satisfaction. For instance, the DHC research and planning department also uses data mining to help select new store sites. By analyzing trade and demographic data for 200 to 300 potential new sites with descriptive, correlation, and regression data mining models, the research group can quantitatively determine which sites have the best potential market success for each of its store lines: Target, Mervyn's, Dayton's, Hudson's, and Marshall Field's. The DHC consumer research department also uses data mining to target customer satisfaction issues. Often respondent surveys include data files with several hundred thousand cases from DHC stores, as well as, competitive stores. These surveys are analyzed with data mining to gain knowledge about what is most important to customers and to identify those stores with customer satisfaction problems. The data mining results are used to help management better allocate store resources and technology, as well as improve training. For more information on the DHC success story, refer to the SPSS Web site at www.spss.com/spssatwork/template_view.cfm?Story_ID=4 (SPSS, 2002). Marketing Dynamics Success Story Customer direct marketing is another industry that benefits from data visualization and data mining. The Marketing Dynamics success story highlights how they use visual data mining to develop more profitable direct marketing programs for their clients. Marketing Dynamics has access to large amounts of customer marketing data; however, the trick is to turn that data into insights. Through the use of data mining analysis, Marketing Dynamics is able to develop more profitable target marking programs for their clients, such as Cartier, Benjamin Moore & Company, SmithKline Beecham, American Express Publishing, and several prominent catalog companies. Marketing Dynamics uses analysis tools such as list analysis, data aggregation, cluster analysis, and other data mining techniques to deliver predictive models to their clients who then use these models to better understand their customers, discover new markets, and deploy successful direct marketing campaigns to reach those new markets.
  • 43. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -39- Present to you by: Team-Fly® For more information on the Marketing Dynamics success story, refer to the SPSS Web site at www.spss.com/spssatwork/template_view.cfm?Story_ID=25 (SPSS, 2002). Sprint Success Story Telecommunications is yet another fiercely competitive industry that is benefiting from data visualization and data mining. The Sprint success story highlights how they use visual data mining for customer relationship management (CRM). Within the sphere of CRM, Sprint not only uses data mining to improve customer satisfaction, but also uses data mining for cross-selling, customer retention, and new customer acquisition. Sprint uses SAS to provide their marketing departments with a central analytic repository. Internal sales and marketing groups access this repository to create better target marketing programs, improve customer relationships, and cross-sell to existing customers. The central repository enables them to integrate multiple legacy systems and incorporate feedback loops into their CRM system. For more information on the Sprint success story, refer to the SAS Web site at www.sas.com/news/success/sprint.html (SAS, 2002). Lowestfare.com Success Story Similar to the traditional retail industry, the Internet online travel industry may be even more brutally competitive. The Lowestfare.com success story highlights how they used data mining to target those customers most likely to purchase over the Internet. Lowestfare.com built a data warehouse with the most important facts about customers. By analyzing these data sets, they were able to better understand their customers in order to sell them the right products through the best channels, thus increasing customer loyalty. Developing successful target-marketing models helped Lowestfare.com increase profits for each ticket sold. Lowestfare.com augmented their customer data warehouse with 650 pieces of demographic information purchased from Acxiom. This enabled them to not only better understand who their customers were, but it also helped them to build predictive cross-selling models. Through data mining, they were able to identify the top (87) pieces of demographic information that profiled their customers. Then they were able to build data mining C&RT models that produced customer profiles based on purchase behavior and deploy these models into their Internet site. For more information on the Lowestfare.com success story, refer to the Oracle Web site at http://otn.oracle.com/products/datamining/pdf/lowestfare.pdf, "Lowestfare.com Targeting Likely Internet Purchasers."
  • 44. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -40- Present to you by: Team-Fly® Challenges to Visual Data Mining Many challenges exist for justifying your VDM project. The various stakeholders in the organization may not understand data mining and what it can do. Following are some common objections to visual data mining approaches. Data Visualization, Analysis, and Statistics are Meaningless This objection is often due to a lack of familiarity with the process and benefits that visual data mining can provide. The objection can be overcome by explaining that data analysis is part of most decision-making processes. Whether consciously or subconsciously, individuals, teams, and organizations make decisions based on historical experience every day. Data mining can be easily compared to this decision-making process. For instance, if you view all your previous experiences as a large data set that can be investigated and analyzed, then the processes of drawing actionable conclusions from this data set can be likened to the task of data mining. A critical aspect of the VDM methodology is validation (discussed fully in Chapter 9). VDM tools and techniques only find the interesting patterns and insights. It is the various stakeholders, such as the decision makers and domain experts, that validate whether or not these discoveries are actionable, pragmatic, and worth implementing. Why Are the Predictions Not 100 Percent Accurate? One of the benefits of data mining is that it provides you with quantification of error. To some, the very fact that an insight or model has error at all is cause to discount the benefits of visual data mining. After all, shouldn't the model be 100 percent accurate before it is deployed? The accuracy of a model is only one measure that can be used to value its worth. The ability to easily explain the model to regulators and domain experts and the ease of implementation and maintenance are other important factors. Often, analyzing the errors or false prediction cases leads to greater insight into the business problem as a whole. Similarly, visually comparing the model with line graphs (discussed in Chapter 8) assists you in evaluating and selecting the "best" models based on your project objectives. Our Data Can't Be Visualized or Mined Data integrity is very important for building useful visualizations and data mining models. How does an organization determine that its data has the level of integrity needed to make a positive impact for the firm? At what point is the data good enough? The issue of data integrity unfortunately prevents many companies who would benefit from data mining capabilities from getting started on building what is potentially a valuable future core competency. Very few organizations possess data that is immediately suitable for mining unless it was collected for that purpose. A key part of the VDM methodology is data preparation (fully discussed in Chapters 4, 5, and 6), which explicitly involves making the data good enough to work with. Furthermore, it is quite feasible to measure the potential financial success of a visual data mining project by working with historical data. Often the VDM data preparation steps can help your organization pinpoint integrity problems with your existing historical data, as well as implement new standards to ensure the integrity of new business data before and as it is gathered.
  • 45. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -41- Present to you by: Team-Fly® Closed-Loop Business Model Whether you are planning a proof-of-concept, pilot, or production project, consider using a closed-loop business model. A business model is considered closed-loop when the output of the final stage feeds back into the initial step. The interactions among and between stages reveal the iterative nature of the business model. Most VDM projects can be diagrammed as a closed-loop business model. Figure 2.1 shows the business stages and interactions of a closed-loop business model for a VDM project. This model may be applied to a multitude of visualization and data mining projects, such as projects that: ƒPrevent customer attrition ƒCross-sell to existing customers ƒAcquire new customers ƒDetect fraud ƒIdentify most profitable customers ƒProfile customers with greater accuracy Figure 2.1: Closed-loop business model. The business model can also be used for VDM projects that detect hidden patterns in scientific, government, manufacturing, medical, and other applications, such as:
  • 46. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -42- Present to you by: Team-Fly® ƒPredicting the quality of a manufactured part ƒFinding associations between patients, drugs, and outcomes ƒIdentifying possible network intrusions As illustrated in Figure 2.1, the closed-loop business model contains the data preparation and data analysis phases of the eight-step VDM methodology described throughout this book. However, the implementation phase is outside the book's scope. We have included the entire closed-loop business model to provide you with a business framework for justifying and planning your VDM project. Our companion Web site has links to the majority of the commercially available data visualization and "visual" data mining software providers where you can find information on the implementation phase of a VDM project. The following section discusses how to use the closed-loop business model for a customer attrition VDM project. The overall business goal of a customer attrition project was to reduce customer attrition from 30 percent to 25 percent. You may be saying to yourself that a 5 percent improvement doesn't seem to be a very valuable goal. However, in this particular case, 5 percent of approximately 4 million customers equates to 200,000 customers. Given the average customer represents $240.00 a year in sales, a 5 percent improvement equates to approximately $48 million in sales a year. The overall business strategy of the customer attrition project was to create, analyze, and deploy visualization and data mining models to discover profiles of customers who switched services to a competitor, to understand why they switched services, and to find current customers who have similar profiles and then to take corrective action to keep them from switching to the competition. The process of developing the business strategy and identifying the business questions is the second step of the VDM methodology, which we discuss in Chapter 3. Using the Closed-Loop Business Model The first stage in the business model is to obtain and select the raw data from the data warehouse and business data repositories pertaining to customers. In the customer retention project, it was discovered that a "customer" was defined differently in the multiple databases. In addition, "customer attrition" was defined differently by different organizations. These types of data issues need to be resolved to ensure the proper data is selected. Unless they are resolved, the resulting analysis may be faulty. The process of obtaining and selecting the data are Steps 3, 4, and 5 of the VDM methodology and are discussed in Chapters 4 through 6. Identifying the key business indicators is the next stage in the business model. Once all the project teams agreed on the business rules (definitions) of who constitutes a "customer," and what constitutes "customer attrition," visualization and data mining tools were used to begin the process of identifying the key business indicators for classifying satisfied versus lost customers. In the customer retention project, common indicators or profiles that define a satisfied customer as compared to a lost customer were discovered from the historical data. After the key indicators were discovered, the investigation and drill-down stage started. In this stage the data set is further investigated to gain business insights and understanding of behavior (patterns and trends) of lost customers. As shown in Figure 2.1, these business stages feed back into one another. If a key business indicator cannot be substantiated or doesn't make good business sense, other indicators need to be identified. Sometimes a key
  • 47. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -43- Present to you by: Team-Fly® business indicator looks promising on the surface, but upon further investigation, it doesn't really help in revealing insightful customer behaviors. During the customer retention project, it was discovered that customers who had originally selected a particular service rate plan were extremely likely to begin shopping around for a better rate after about 9 months of service. In addition, after a year of service, customers wanted new equipment. The process of identifying and analyzing the key business indicators and drilling down into the data is Step 7 of the VDM methodology, which we discuss in Chapter 8. The development of visual and analytical models for different business scenarios is the next stage. For instance, a model that identifies which customers are most likely to switch to the competition unless they are sent updated equipment may be too cost-prohibitive, whereas a model for changing a customer from one service plan to another may be more cost-effective. In this stage, the visualization and data mining models are used to help develop different business scenarios. The process of developing the visualizations and analytical models is also part of Step 7 of the VDM methodology, discussed in Chapter 8. Creating an action plan and gaining approval for the "best" strategic use of the models, visualizations, and insights that produce the best ROI based on the business goals is the next stage in the business model. In this stage, the visualizations and data mining models are used to communicate the findings to the decision makers and other business analysts. These business stages feed back into one another, as shown in Figure 2.1. There may be high-level business reasons for choosing one scenario over another. For instance, during the presentation of the customer attrition project findings to the decision makers, the vice president of finance suggested that upgrading the customers to newer equipment would not be as cost-prohibitive as originally thought (or modeled). The VP of finance had just renegotiated a contract with the equipment manufacturer that greatly reduced the cost of the equipment. The process of evaluating the "best" models is discussed in Chapter 8. Creating a presentation of the analysis is Step 8 of the VDM methodology, which we discuss in Chapter 9. Implementing the action plan once it has been approved is the next stage in the business model. In this stage, the visualizations or data mining model are prepared for production. For example, during the customer retention project, the rules from a data mining model were coded into the C language, and weekly batch procedures flagged customers who had a high probability of switching to the competition. The customer support center was given this list of customers at the beginning of each week. The customer support center then contacted each customer on the list throughout the week and either offered to upgrade them to newer equipment or to switch them to a different rate plan. Measuring the results of the action plan against the model is the next stage in the business model. For example, during the customer retention project, those customers offered and upgraded to the newer equipment were monitored for a full year to determine the actual ROI of the project. The decision tree model had estimated that the "upgrade" campaign would reduce customer attrition by 2.5 percent-$24 million. However, after 6 months, the actual measured results were only around 2 percent-$20 million. The initial data set was augmented with the results, and more refined data mining models were developed and put into production that resulted in a 3 percent reduction in customer attrition-$29 million. In addition to the "upgrade" campaign, the customer attrition project implemented a different data mining model to identify customers who should be offered a different plan before they switched to a competitor that reduced customer attrition by another 2 percent. Overall, the customer attrition project was deemed a success and added over $40 million to the bottom line. The customer retention project costs (personnel, software, and hardware) were approximately $800,000, resulting in a profit of $31 million for the first year. Using a closed-loop business model helped make the customer retention
  • 48. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -44- Present to you by: Team-Fly® project a success. The feedback loops enabled the data and business analysts to focus and improve their data mining models to glean a higher rate of return. Project Timeline The project timeline will depend on the type of VDM project you are planning. Table 2.2 lists the average workdays per task for typical proof-of-concept, pilot, and production projects. We have compiled this list based on different real-world projects that we have completed. Of course, your "mileage" may vary depending on the business issues investigated, the complexity of the data, the skill level of your teams, and the complexity of the implementation, among other factors. Table 2.2 should give you a general guideline for estimating the project plan timeline for proof-of-concept, pilot, and production projects. Table 2.2: Estimating the Project Duration PROJECT PHASE VDM METHOD OLOGY STEP TASK NAME PROOF-OF-C ONCEPT PILOT PRODUCTION Planning 1 Justify and Plan Project 5 5 5 2 Identify the Top Business Questions 3 5 10 Estimated Project Planning Phase Days 8 10 15 Data Preparation 3 Choose the Data Set 1 2 5 4 Transform the Data Set 3 10 15 5 Verify the Data Set 1 2 5 Estimated Data Preparation Phase Days 5 14 25
  • 49. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -45- Present to you by: Team-Fly® Table 2.2: Estimating the Project Duration PROJECT PHASE VDM METHOD OLOGY STEP TASK NAME PROOF-OF-C ONCEPT PILOT PRODUCTION Data Analysis 6 Choose the Visualization or Mining Tools 3 10 15 7 Analyze the Visualization or Mining Models 5 10 15 8 Verify and Present the Visualization or Mining Models 2 10 15 Estimated Data Analysis Phase Days 11 30 45 Implementation Create Action Plan 10 Approve Action Plan 5 Implement Action Plan 20 Measure Results 30 Estimated Implementatio n Phase Days 65 TOTAL PROJECT DURATION 24 days 54 days 150 days
  • 50. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -46- Present to you by: Team-Fly® Project Resources and Roles As illustrated in Table 2.2, schedule a few weeks if you are planning a proof-of-concept project, a month or more for a pilot project, and a few months to a year for a production project. When allocating your project resources, be sure to reach agreement with all teams. The project consists of multiple teams: operations and data warehousing, data and business analysts, domain experts, and decision makers. In the following sections, we will define each team and their responsibilities. The time and resource demands for each team will depend on the type of VDM project you are planning. A successful business intelligence solution using data visualization or visual data mining requires the participation and cooperation from many parts of a business organization. Depending on the size of your business organization, you may be responsible for one or more roles. (In small organizations, you may be responsible for all roles.) Tables 2.3 through 2.6 list the average workdays per resource for typical proof-of-concept, pilot, and production projects. We have compiled this list based on different real-world projects that we have completed. As with the project time-line, your "mileage" may vary depending on the business issues investigated, the complexity of the data, the skill level of your teams, and the complexity of the implementation, among other factors. Data and Business Analyst Team The data and business analyst team is involved in all phases of the project; therefore, you can use Table 2.2 as a guideline for estimating the average workdays for typical proof-of-concept, pilot, and production projects. For proof-of-concept, pilot, and production projects, the data and business analyst team is often responsible for the following: ƒ Justifying and planning the project to the decision makers and creating the project justification and planning document ƒ Identifying the top business questions to be investigated ƒ Mapping the top business questions into questions that can be investigated through visualization and data mining ƒ Creating extract procedures for historical and demographic data with the guidance of domain experts and data warehousing team ƒ Creating, analyzing, and evaluating the visualizations and data mining models with the guidance of domain experts ƒ Presenting the solution to the decision makers and assisting to create an action plan During the implementation phase of a production project, the data and business analyst team is often responsible for the following:
  • 51. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -47- Present to you by: Team-Fly® ƒ Implementing the solution's production environment and maintaining the solution's production environment until the operations team is trained ƒ Measuring the results of the solution and using the results to further refine, enhance, and correct the production visualizations and data mining models Domain Expert Team The role of the domain expert team is to act as consultants to the data and business analysts to ensure the correct data is obtained and valid business indicators are discovered. They also act as consultants to the decision maker team to ensure the solution makes sound business sense. Table 2.3 lists the average workdays for typical proof-of-concept, pilot, and production projects. Table 2.3: Domain Expert Team Roles and Responsibilities VDM METHODOLOGY STEP TASKS PROOF-OF-CO NCEPT PILOT PRODUCTION 1 Justify and Plan Project 5 5 5 2 Identify the Top Business Questions 3 3 9 3 Choose the Data Set 1 2 5 4 Transform the Data Set - - - 5 Verify the Data Set 1 2 5 6 Choose the Visualization or Mining Tools - - - 7 Analyze the Visualization or Mining Models 1 2 4 8 Verify and Present the Visualization or Mining Models 1 2 4 Create Action Plan 5 Implement Action Plan 10 Measure Results 15
  • 52. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -48- Present to you by: Team-Fly® Table 2.3: Domain Expert Team Roles and Responsibilities VDM METHODOLOGY STEP TASKS PROOF-OF-CO NCEPT PILOT PRODUCTION ESTIMATED DAYS 12 days 16 days 62 days For proof-of-concept, pilot, and production projects, the domain expert team is often responsible for the following: ƒ Helping the data and business analysts justify and plan the project ƒ Helping the data and business analysts identify the top business questions to be investigated ƒ Validating the data obtained by operations and the data and business analysts ƒ Validating the key business indicators discovered by the data and business analysts In addition, for the implementation phase of a production project, the domain expert team often has the following responsibilities: ƒ Helping the data and business analysts and decision makers to create a valid action plan ƒ Assisting in measuring the results of the project Decision Maker Team The role of the decision maker team is to evaluate the business scenarios and potentially approve the solution. Table 2.4 lists the average workdays for typical proof-of-concept, pilot, and production projects. Table 2.4: Decision Maker Team Roles and Responsibilities VDM METHODOLOGY STEP TASKS PROOF-OF- CONCEPT PILOT PRODUCTION 1 Justify and Plan Project 2 5 5 2 Identify the Top Business Questions 1 3 3 3 Choose the Data Set - - - 4 Transform the Data Set - - -
  • 53. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -49- Present to you by: Team-Fly® Table 2.4: Decision Maker Team Roles and Responsibilities VDM METHODOLOGY STEP TASKS PROOF-OF- CONCEPT PILOT PRODUCTION 5 Verify the Data Set - - - 6 Choose the Visualization or Mining Tools - - - 7 Analyze the Visualization or Mining Models - - - 8 Verify and Present the Visualization or Mining Models 3 3 3 Create Action Plan 3 Implement Action Plan 2 Measure Results 3 ESTIMATED DAYS 6 days 11 days 19 days For proof-of-concept, pilot, and production projects, the decision maker team is often responsible for the following: ƒ Evaluating and approving the business justification and plan ƒ Championing the project to the rest of the organization at a high level ƒ Allocating the project funds In addition, for the implementation phase of a production project, the decision maker team often has the following responsibilities: ƒ Providing feedback to the data and business analysts during the action plan creation ƒ Providing feedback to the measured results of the project
  • 54. John Wiley & Son- Visual Data Mining: Techniques and Tools for Data Visualization and Mining -50- Present to you by: Team-Fly® Operations Team The role of the operations team is to provide network, database administration, and system administration assistance to the data and business analyst team. They help in obtaining the project data, implementing the production system, as well as measuring the results. Table 2.5 lists the average workdays for typical proof-of-concept, pilot, and production projects. Table 2.5: Operations Team Roles and Responsibilities VDM METHODOLOGY STEP TASKS PROOF-OF-CO NCEPT PILOT PRODUCTION 1 Justify and Plan Project 2 3 3 2 Identify the Top Business Questions 3 Choose the Data Set 1 2 5 4 Transform the Data Set 5 Verify the Data Set 6 Choose the Visualization or Mining Tools 7 Analyze the Visualization or Mining Models 8 Verify and Present the Visualization or Mining Models Create Action Plan 5 Implement Action Plan 20 Measure Results 15 ESTIMATED DAYS 3 days 5 days 48 days For proof-of-concept, pilot, and production project, the operations team is often responsible for the following:
  • 55. Exploring the Variety of Random Documents with Different Content
  • 59. The Project Gutenberg eBook of The Little Indian Weaver
  • 60. This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. Title: The Little Indian Weaver Author: Madeline Brandeis Release date: July 19, 2012 [eBook #40277] Most recently updated: October 23, 2024 Language: English Credits: Produced by Juliet Sutherland, Diane Monico, and the Online Distributed Proofreading Team at http://www.pgdp.net *** START OF THE PROJECT GUTENBERG EBOOK THE LITTLE INDIAN WEAVER ***
  • 63. BAH, THE LITTLE INDIAN WEAVER The LITTLE INDIAN WEAVER BY MADELINE BRANDEIS Producer of the Motion Pictures "The Little Indian Weaver" "The Wee Scotch Piper"
  • 64. "The Little Dutch Tulip Girl" "The Little Swiss Wood-Carver" Distributed by Pathè Exchange, Inc., New York City Photographic Illustrations by the Author GROSSET & DUNLAP PUBLISHERS NEW YORK by arrangement with the A. Flanagan Company COPYRIGHT, 1928, BY A. FLANAGAN COMPANY PRINTED IN THE UNITED STATES OF AMERICA To every child of every land, Little sister, little brother, As in this book your lives unfold, May you learn to love each other.
  • 65. CONTENTS Chapter I Page The Corn Ear Doll 9 Chapter II Something Terrible Happens 32 Chapter III At the Trading Post 43 Chapter IV The Prayer Stick 62 Chapter V At Bah's Hogan 75 Chapter VI Billy Starts His Story 88 Chapter VII All About the Indians 101 Chapter VIII Who Wins the Radio? 119
  • 68. CHAPTER I THE CORN EAR DOLL How would you like to have a doll made from a corn ear? That is the only kind of doll that Bah ever thought of having. Bah was only five years old and she had never been away from her home, so of course she couldn't know very much. But she knew a bit about weaving blankets, and she was learning more each day from her mother, who made beautiful ones and sold them. You see, Bah and her mother were American Indians, and they belonged to the Navajo tribe. Their home was on the Navajo Reservation in Arizona, and they called it an Indian village. But if you went there you would not think it very much of a village in comparison to the villages you know. As a matter of fact, all you could see was a row of funny little round houses, looking very much like large beehives, put together with mud and sticks and called hogans. A street of hogans in each of which lived a whole family of Indians, a few goats and sheep, a stray dog or two, an Indian woman sitting outside her hogan weaving a blanket, perhaps a child running with a dog—this, then, was a Navajo village.
  • 69. THE LITTLE INDIAN WEAVER How different from your villages with their smooth stone buildings, their stores and gasoline stations, and pretty shrub-covered bungalows! Most Indian women have many babies, and the whole family lives together in one room which is the living room, bedroom, kitchen and dining room all rolled into one. In the top of the hogan is a hole, so that the smoke from the cooking fire in the middle of the room can go out. Bah did not spend much time in her hogan. No sooner was she up in the morning than she was outside gathering sticks for the breakfast fire. From the time she put her little brown face outside the hogan door, bright and early in the morning, until nightfall when she cuddled down in her warm Navajo blanket, she was out in the air—
  • 70. and the air is so fresh out there in the desert; so much fresher than it is in the big smoky cities. Bah was a bright-eyed, healthy little girl, and the way she dressed will sound queer to you, for her clothes were made just like her mother's. On rainy days you have no doubt "dressed up" in mother's clothes and thought it quite a lark. But when the game was over, how glad you were to come back to your own little dresses and short socks. But Bah had always dressed in the same way—and that is, in a long full cotton skirt, a calico waist with long sleeves, and many strings of bright beads about her neck. Her hair was long, black and shiny, and her mother tied it up in a knot at the back of her neck with a white cloth. Every morning Bah had a lesson in weaving, just as you have a drawing lesson or a sewing lesson. Her father had made her a tiny loom which stood outside the hogan door next to her mother's big loom. The morning when Bah planned the corn ear doll she was in the midst of her weaving lesson. Mother's fingers were flying in and out, and Bah's fingers were slow—oh, so slow, but her mind was not. Her mind was at work on a doll. She had once seen the picture of a doll, a real one. It was such a lovely doll! She wanted to cuddle it. How she would love to hug a doll close to her and rock it to sleep! The corn was ripe in the field which was not far away. After the lesson she would pick an ear of corn, dry it nicely and dress it in a wee Indian blanket. She would make some beads for its neck. She would stick in two black beads for eyes. She would— "Bah! you do not heed the lesson!" It was Mother. And Mother was scolding. There were few times in Bah's life when she could remember Mother having been cross. Bah was at once attentive.
  • 71. "I am sorry, Ma Shima (my mother)," she said, in the Navajo language. "I was dreaming of something sweet." "It is bad medicine to dream when one is awake, Bah," said Mother. "You will never learn to weave—and a Navajo woman who cannot weave blankets is indeed a useless one." Bah hung her head in shame. But Mother laughed. "Do not look that way, my little one, but try now to make the little pattern which I teach you." Bah did try. She had to rip out several rows of bad weaving caused by her dreams of her corn ear doll. But not once, until the lesson was over, did Bah think again of the doll. The weaving lesson was at last over, and Bah ran quickly to the cornfield, where she began to look eagerly for a proper ear of corn with which to make a proper Indian doll. As she was looking through the many waving stalks, she thought she heard her name being called. But was it her name, and was it being called? It sounded more like singing than like calling—and Mother did not sing. "Bah, Bah, Black Sheep Have you any wool?" This is what Bah heard. She stopped in her search and looked around. There, a few yards away, was some one coming towards her on a pony. Bah's first thought was to run. She did not want to meet a stranger. So few came here to her home, where the only people the little girl ever saw were Mother, Father, and the few Indians who lived nearby. White people were mysterious to Bah, and yet she often wondered about the white children and how they played and worked and what they did all day in school. Bah would go to school next year—to the
  • 72. big new school just built on the Reservation for Indian children. White people built it, and so it must be like the white children's school. Sometimes she longed to go—and other times she was just a little bit afraid. "Yes, sir, yes, sir, Three bags full." The pony which Bah had seen from a distance was now standing beside her, and she could see the rider, although he could not see her, for she had hidden and was crouching between the cornstalks. BAH'S HOME The rider was a very small person—a boy—a white boy. Bah really didn't feel as though he should be classified as white, for his skin was a mixture of orange and brown—orange where the sun had
  • 73. burned him, and over that a pattern of vivid brown freckles. Bah had never before seen anything like him, and it is no wonder that the timid little Indian hid herself. The speckled boy took off his large cowboy hat and wiped his hot brow with a cowboy's handkerchief. "Gee, it's hot, Peanuts," he said aloud to the pony. "And I'd like to know the way back—but looks as if we're lost." Peanuts was presumably bored, for he let his head sink slowly, closed his eyes and patiently waited for the next move. None came. Bah, in her hiding place, was as dumb, if not as bored, as Peanuts. She was tense with excitement, which obviously Peanuts was not, and did not take her eyes from the boy's face. His every move very much interested her. Here, then, was a white boy. He must be white, for he was not an Indian and he spoke English. Bah understood English, and of that she was very proud. Her mother and father had always traded with the white man, so they had learned to speak English, and had wisely taught their little girl. Now how much easier it would be for Bah when she started to school. But her knowledge did not help her at the moment when she looked up from her cornstalk hiding place into the face of a live white boy. Indeed she had even decided to run away, and was crawling noiselessly through the corn. "Baa, Baa, Black Sheep," again the boy began to sing as he started to turn away. Bah stopped crawling. He did sing her name. He wanted her to come back. Maybe she could help him find his way. And Oh! the pony was stepping all over the corn. Didn't he know better than to do that? The cornstalks rustled. The pony jumped to the side, and the boy turned in his saddle and saw Bah standing.
  • 74. "Oh, hello!" he said and turned back—the pony trampling upon a beautiful stalk of corn. "I didn't see you before. Where were you?" Bah couldn't speak. She tried ever so hard, but the English words she knew so well would not come. The boy jumped down from his pony and went up to her. There was a smile on his face and as he came closer she saw that his eyes were as blue as the sky. That part of him was pretty, thought Bah, even if his skin was not—and the smile was friendly. So she gained courage. "You call my name?" she ventured. The boy looked puzzled. "No," he said, "I don't know your name, but I'm glad I've found you." Again he smiled, and this time Bah smiled too. "My name Bah," she said, "and you say 'Bah, Bah, back skip'—I think you call me come back to you." When it suddenly dawned upon the boy what she meant he opened his mouth very wide indeed and laughed so hard that Bah again began to be afraid. But he stopped suddenly, realizing perhaps that he had frightened her, and said: "Oh, no. That is a song we sing about 'black sheep' that goes 'bah bah'! I didn't know you heard me singing it." Bah looked a bit ashamed, and did not offer a reply. The boy kept on talking— "But, gee, where do you come from, Bah? Is your house around here?" "Yes," said Bah. "Hogan over way, Bah come to find corn in cornfield."
  • 75. "Oh, I see," said the boy, "for dinner, I guess." "No," replied the Indian girl, looking up into his face, "Bah make so pretty doll from corn ear. Will dress in blanket and beads. You ever see little girl's doll?" She looked so intent and innocent that the boy could not scoff at what would have been, among members of his own group at home, a subject entirely forbidden in the presence of growing gentlemen. Dolls! What interest had he in dolls! But as he looked into the upturned face of the little brown maiden, he suddenly realized that she had never heard of a boy's dislike for dolls; in fact, she had probably never before met a white boy nor seen a white doll. "Oh, yes, plenty of 'em," answered the white boy, "but never made of an ear of corn—" Then, seeing a shadow pass over her face he resumed gallantly, "But it ought to make a peach of a doll. Maybe I could help you make it." Now Bah was certain that she would like the white boy. She had never before had a human playmate, and the feeling was a pleasant one. But she remembered that her new friend was lost. "You no can find way home?" she asked. The boy laughed. "I guess you want to get rid of me," he said. Then, sobering, he resumed. "Yes, really, I'm lost. Peanuts and I have been wandering all morning. You see, we started from Tuba early and we just didn't watch the trails, so here we are." "Oh, Tuba," said Bah, "not so very far. I show you how to go." "But first I'll help you fix up a corn doll," said the boy. "We'll first have to find a good fat corn ear. Nice fat dolls are the best, don't you think so?"
  • 76. As he talked he began looking through the cornstalks, and Bah watched him. He finally found what he considered to be an ideal ear, and together the two children made it into a doll, black bead eyes, cornsilk hair, blanket, and all. "I have just the name for her," said the boy. "We'll call her 'Cornelia!' Shall we?" Bah nodded happily. The name was a new one to her and she did not catch its meaning in relation to her beautiful new doll, but it pleased her nevertheless. In fact, everything about the boy pleased her, and she was sorry when at last he said:
  • 77. BAH AND CORNELIA "It must be getting late. You'd better tell me how to get home. Mother will wonder what happened." Bah pointed out directions and the boy, thanking her, held out his hand and said: "You never even asked my name. Don't you want to know?" Bah drooped her head shyly as she replied: "Indian never ask name. Very bad manner."
  • 78. The white boy's eyes opened wide. "That's funny," he said. "Then how do you get to know people's names?" "When one people like other people, they tell name. No ask," said Bah seriously. "Oh, then I'll tell you quick 'cause I like you. My name's Billy." Bah did not reply, but stood watching Billy as he swung himself onto his pony. Then, when he was seated and smiled down at her, she smiled up sweetly and said: "We have cow named Billy." BILLY
  • 80. CHAPTER II SOMETHING TERRIBLE HAPPENS For days Bah's chief delight was her new corn ear doll. She kept it with her constantly. It went to bed with her, sat at meals with her, and watched the daily weaving lesson. But one day a terrible thing happened. She was sitting by her mother's side outside the hogan, her little fingers flying through the strings of her loom, and one eye watching Mother's more experienced fingers as they made a beautiful new pattern. Cornelia had been carefully dressed in her blanket, her beads hung about her neck and fondly kissed by her devoted parent, and was now lying at Bah's feet while the little girl worked hard at her lesson.
  • 81. THE WEAVING LESSON "Pull your wool tighter, Bah," said Mother, in Navajo. Bah's fingers and tongue worked together. Children's tongues have a habit of moving with whatever else is in motion. And as Bah worked, some sheep came wandering in from the field. They were tame sheep and often nosed about the hogan for a bit of human company or food, as the case might be, and this morning I fear the reason was food. Father sheep was very large and therefore hungrier than the rest. His hunger made him bold. But Bah was a particular friend of his, and I doubt whether even his appetite could have driven him to do what he did that morning, had he been able to guess the great sorrow he was to cause.
  • 82. "You have left out a stitch, my child, and there will be a hole in the work." Bah's fingers stopped and so did her tongue. "Oh dear, must I do that all over again, Mother?" she asked. "If you wish to weave perfectly so that you may some day sell your work, then you must learn to rip and go over many times." Ripping is deadly work, as everyone who has ever ripped knows. And Bah was not as interested in ripping as she had been in making her pattern. So her thoughts naturally turned to her precious Cornelia lying at her feet. Her eyes turned at the same time, and horror upon horrors, what did she see? The big black sheep was there chewing contentedly, but Cornelia was gone. The little blanket was there—so were the beads and some of the cornsilk hair. But Cornelia was gone. The sheep went on chewing and couldn't understand why Bah did not caress him as usual. "Bah, do pay attention to your work!" Mother was annoyed. Bah turned around and Mother saw a very sad sight. She saw before her another mother—a stricken little mother whose child had just provided a meal for a hungry animal. She rocked an empty blanket back and forth, and the tears were beginning to gather. Mother understood what had happened, and now her voice sounded soft and kind.
  • 83. "GO AWAY, MR. SHEEP!" "Poor Bah! Your doll is gone!" The little girl was crying as she continued to hug the empty blanket. "Do not cry, my little one," said Mother. "Are there not many more corn ears in the field?" "Yes, my Mother," sobbed the child, "but no more Cornelias!" And that was final. Never again could Bah go back to the cornfield. Never again! How could Mother even have suggested such a thing! Didn't she know that Cornelia, since the day of her birth, had been different from all other ears of corn? Why, Cornelia was a doll—she and Billy had decided that—and the rest were vegetables! Oh, didn't Mother understand? Perhaps Mother did, for her next remark showed it.
  • 84. "One day, Bah, when I went to the Trading Post near Tuba I saw a most beautiful doll. She was an Indian baby—a papoose—and she was strapped upon the prettiest little laced baby cradle you ever saw. She was dressed in a bright blanket and she had real hair and such lovely beads around her neck." A smile was trying to chase away the tears on the face of the little mother as she listened to her own mother's recital of something too wonderful to imagine. She said sorrowfully: "Some white child will buy her, and how happy she will be. Ah, how I should like to have her." Mother said: "And so you shall, if you will work to have her." Bah's eyes asked the question: "How?" and her mother went on: "You know, Bah, that Mother sells or trades blankets, and that Father sells or trades his beautiful silver and matrix jewelry to the Trading Post. We do this so that we may have, in return, things which we want and need. Now, you want and need a little doll. Why not sell your work? Bah must weave a little blanket and take it to the store where they will perhaps trade with you for the papoose doll." "Do you really think they will, Ma Shima?" asked Bah as if she could hardly believe it, and she wiped away her tears.
  • 85. HOW BAH LONGED FOR THE PAPOOSE DOLL! "Yes, I do," answered Mother. "But your blanket must be well made and of a pretty pattern—else they will not take it, for they, in turn, must sell it to the tourists." "Then I shall make the most beautiful blanket which has ever been made," laughed Bah, now thoroughly interested in her new task with its wonderful object. She worked all through the morning on her little blanket, with happy thoughts of a real-haired Indian doll flying through her mind as her fingers flew through her work. It was not until she heard Mother grinding the corn for lunch that she looked up, and not until then that she thought again of the morning's sorrow. But then she did think of it, and her parents wondered why she could not eat her corn bread.
  • 87. CHAPTER III AT THE TRADING POST Billy's mother and father had come to Arizona for a special reason. Billy's father was a writer, and he had come for information on the Navajo Indians for a new book he was writing. Every day he would go to the Indian villages, sit among the big chiefs and medicine men (who are the wise ones among the Indians and are supposed to work charms which cure the sick) and he would jot down in his notebook many things which they told him. Billy went with his father the first few days, but he didn't care much for the way they sat around and did nothing but talk. Billy was a very active boy and he soon grew tired of listening to the droning voices of the Indian men, and the scratching of Father's pencil. At last he told Father how it was, and Father laughed. "I thought you were going to write, too, Billy," he said. "You'll never find out about the Indians if you don't take the trouble to listen—and then you'll never win that composition contest you've been dreaming about." It was true that Billy, since he had left New York, had dreamed of nothing else but the composition contest. Many of his friends at home were already struggling with their compositions, for the prize was worth striving for—a wonderful radio set, the very latest model.
  • 88. "I TRADE MY BLANKET FOR PAPOOSE DOLL!" And how the others had envied him, for he was to go to Arizona and live among the Indians where he would be sure to learn so much of interest and send in a true account of the lives of American Indians. The contest was open to any composition dealing with children of any particular race or country, and was to reveal their habits and customs. "Oh! You'll win it easily, Bill," his chum had said. "Indians are such interesting people, and you'll find out all about them if you stick to your dad." And Billy had been fired with ambition, when he had left, and when he had first arrived. But the novelty of the idea was gradually wearing off and he seemed to like far more to gallop over the country on his pony, Peanuts, than to glean knowledge. Especially
  • 89. since his meeting with Bah did he look forward each morning to his ride. And each day he tried to find the Indian girl and went many times to the cornfield. But she was never there and, try as he might, Billy could not find her village. Father did not wait for Billy to answer him, but said: "Well, old man, I can see the radio set gradually taking wings and broadcasting itself! You'll never win it this way, you know—and you'd have a good chance, too, if you'd come along and listen to some of the old fellows I'm chumming with each day." "Oh, I'll come along tomorrow, Dad," said Billy carelessly. "Today I'm going to the Trading Post and see the Indian stuff there." "Well, do as you like, Son," said his father, "but don't be annoyed if you don't win the contest." "I'll write something yet, Dad, you'll see." Peanuts and Billy found themselves at the Trading Post in the heat of the day. Billy tied the pony in the shade and went into the store. It was filled with a mixed assortment of objects. On one side of the room were groceries, pots and pans, cigarettes, in fact a little bit of everything necessary for housekeeping. On the other side were the Indian curios—silver and matrix jewelry, beautifully fashioned with blue stones set in, handsome Navajo blankets hanging on the wall, pottery of all kinds, and beads, beads, beads. Billy wandered about the store and he thought of his mother, and how she would like something to take home as a souvenir. The beads looked hopeful, as he could carry them, while a pottery jar or blanket would be big and heavy. Taking from his pocket his two dollars and some few cents, he selected the string of beads which looked most likely. One string in particular very much pleased him. It was delicately made, but looked simple enough to be within reach of his two
  • 90. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com