SlideShare a Scribd company logo
Accelerating Discovery Mining Unstructured
Information For Hypothesis Generation Spangler
download
https://ebookbell.com/product/accelerating-discovery-mining-
unstructured-information-for-hypothesis-generation-
spangler-5261124
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Accelerating Discovery Mining Unstructured Information For Hypothesis
Generation Scott Spangler
https://ebookbell.com/product/accelerating-discovery-mining-
unstructured-information-for-hypothesis-generation-scott-
spangler-5241508
Knowledge Guided Machine Learning Accelerating Discovery Using
Scientific Knowledge And Data Taylor Francis Group
https://ebookbell.com/product/knowledge-guided-machine-learning-
accelerating-discovery-using-scientific-knowledge-and-data-taylor-
francis-group-43676354
Accelerating The Discovery Of New Dielectric Properties In Polymer
Insulation 1st Edition Boxue Du
https://ebookbell.com/product/accelerating-the-discovery-of-new-
dielectric-properties-in-polymer-insulation-1st-edition-boxue-
du-6837250
Accelerating Economic Growth Lessons From 200000 Years Of
Technological Progress And Human Development Jakub Growiec
https://ebookbell.com/product/accelerating-economic-growth-lessons-
from-200000-years-of-technological-progress-and-human-development-
jakub-growiec-46081114
Accelerating Digital Transformation 10 Years Of Software Center Jan
Bosch
https://ebookbell.com/product/accelerating-digital-
transformation-10-years-of-software-center-jan-bosch-46706442
Accelerating Organisation Culture Change Innovation Through Digital
Tools Jaclyn Lee
https://ebookbell.com/product/accelerating-organisation-culture-
change-innovation-through-digital-tools-jaclyn-lee-47134668
Accelerating Digital Transformation Of Smes Pohsun Seow Clarence Goh
https://ebookbell.com/product/accelerating-digital-transformation-of-
smes-pohsun-seow-clarence-goh-49474098
Accelerating Learning Recovery For All Students Margaret Vaughn
https://ebookbell.com/product/accelerating-learning-recovery-for-all-
students-margaret-vaughn-49998496
Accelerating Performance How Organizations Can Mobilize Execute And
Transform With Agility 1st Edition Colin Price
https://ebookbell.com/product/accelerating-performance-how-
organizations-can-mobilize-execute-and-transform-with-agility-1st-
edition-colin-price-50583728
Accelerating Discovery Mining Unstructured Information For Hypothesis Generation Spangler
ACCELERATING DISCOVERY
MINING UNSTRUCTURED
INFORMATION FOR HYPOTHESIS
GENERATION
Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series
SERIES EDITOR
Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A.
AIMS AND SCOPE
This series aims to capture new developments and applications in data mining and knowledge
discovery, while summarizing the computational tools and techniques useful in data analysis.This
series encourages the integration of mathematical, statistical, and computational methods and
techniques through the publication of a broad range of textbooks, reference works, and hand-
books. The inclusion of concrete examples and applications is highly encouraged. The scope of the
series includes, but is not limited to, titles in the areas of data mining and knowledge discovery
methods and applications, modeling, algorithms, theory and foundations, data and knowledge
visualization, data mining systems and tools, and privacy and security issues.
PUBLISHED TITLES
ACCELERATING DISCOVERY: MINING UNSTRUCTURED INFORMATION FOR
HYPOTHESIS GENERATION
Scott Spangler
ADVANCES IN MACHINE LEARNING AND DATA MINING FOR ASTRONOMY
Michael J. Way, Jeffrey D. Scargle, Kamal M. Ali, and Ashok N. Srivastava
BIOLOGICAL DATA MINING
Jake Y. Chen and Stefano Lonardi
COMPUTATIONAL BUSINESS ANALYTICS
Subrata Das
COMPUTATIONAL INTELLIGENT DATA ANALYSIS FOR SUSTAINABLE
DEVELOPMENT
TingYu, NiteshV. Chawla, and Simeon Simoff
COMPUTATIONAL METHODS OF FEATURE SELECTION
Huan Liu and Hiroshi Motoda
CONSTRAINED CLUSTERING: ADVANCES IN ALGORITHMS, THEORY,
AND APPLICATIONS
Sugato Basu, Ian Davidson, and Kiri L. Wagstaff
CONTRAST DATA MINING: CONCEPTS, ALGORITHMS, AND APPLICATIONS
Guozhu Dong and James Bailey
DATA CLASSIFICATION: ALGORITHMS AND APPLICATIONS
Charu C. Aggarawal
DATA CLUSTERING: ALGORITHMS AND APPLICATIONS
Charu C. Aggarawal and Chandan K. Reddy
DATA CLUSTERING IN C++: AN OBJECT-ORIENTED APPROACH
Guojun Gan
DATA MINING FOR DESIGN AND MARKETING
Yukio Ohsawa and Katsutoshi Yada
DATA MINING WITH R: LEARNING WITH CASE STUDIES
Luís Torgo
FOUNDATIONS OF PREDICTIVE ANALYTICS
James Wu and Stephen Coggeshall
GEOGRAPHIC DATA MINING AND KNOWLEDGE DISCOVERY,
SECOND EDITION
Harvey J. Miller and Jiawei Han
HANDBOOK OF EDUCATIONAL DATA MINING
Cristóbal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan S.J.d. Baker
HEALTHCARE DATA ANALYTICS
Chandan K. Reddy and Charu C. Aggarwal
INFORMATION DISCOVERY ON ELECTRONIC HEALTH RECORDS
Vagelis Hristidis
INTELLIGENT TECHNOLOGIES FOR WEB APPLICATIONS
Priti Srinivas Sajja and Rajendra Akerkar
INTRODUCTION TO PRIVACY-PRESERVING DATA PUBLISHING: CONCEPTS
AND TECHNIQUES
Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S. Yu
KNOWLEDGE DISCOVERY FOR COUNTERTERRORISM AND
LAW ENFORCEMENT
David Skillicorn
KNOWLEDGE DISCOVERY FROM DATA STREAMS
João Gama
MACHINE LEARNING AND KNOWLEDGE DISCOVERY FOR
ENGINEERING SYSTEMS HEALTH MANAGEMENT
Ashok N. Srivastava and Jiawei Han
MINING SOFTWARE SPECIFICATIONS: METHODOLOGIES AND APPLICATIONS
DavidLo,Siau-ChengKhoo,JiaweiHan,andChaoLiu
MULTIMEDIA DATA MINING: A SYSTEMATIC INTRODUCTION TO
CONCEPTS AND THEORY
Zhongfei Zhang and Ruofei Zhang
MUSIC DATA MINING
Tao Li, Mitsunori Ogihara, and George Tzanetakis
NEXT GENERATION OF DATA MINING
Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, and Vipin Kumar
RAPIDMINER: DATA MINING USE CASES AND BUSINESS ANALYTICS
APPLICATIONS
Markus Hofmann and Ralf Klinkenberg
RELATIONAL DATA CLUSTERING: MODELS, ALGORITHMS,
AND APPLICATIONS
Bo Long, Zhongfei Zhang, and Philip S. Yu
SERVICE-ORIENTED DISTRIBUTED KNOWLEDGE DISCOVERY
Domenico Talia and Paolo Trunfio
SPECTRAL FEATURE SELECTION FOR DATA MINING
Zheng Alan Zhao and Huan Liu
STATISTICAL DATA MINING USING SAS APPLICATIONS, SECOND EDITION
George Fernandez
SUPPORTVECTOR MACHINES: OPTIMIZATION BASED THEORY,
ALGORITHMS, AND EXTENSIONS
Naiyang Deng, Yingjie Tian, and Chunhua Zhang
TEMPORAL DATA MINING
Theophano Mitsa
TEXT MINING: CLASSIFICATION, CLUSTERING, AND APPLICATIONS
Ashok N. Srivastava and Mehran Sahami
THE TOP TEN ALGORITHMS IN DATA MINING
Xindong Wu and Vipin Kumar
UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX
DECOMPOSITIONS
David Skillicorn
ACCELERATING DISCOVERY
MINING UNSTRUCTURED
INFORMATION FOR HYPOTHESIS
GENERATION
Scott Spangler
IBM Research
San Jose, California, USA
The views expressed here are solely those of the author in his private capacity and do not in any way represent
the views of the IBM Corporation.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2016 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20150721
International Standard Book Number-13: 978-1-4822-3914-0 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To Karon, my love
Accelerating Discovery Mining Unstructured Information For Hypothesis Generation Spangler
ix
Contents
Preface, xvii
Acknowledgments, xxi
Chapter 1 ◾ 
Introduction 1
Chapter 2 ◾ 
Why Accelerate Discovery? 9
Scott Spangler and Ying Chen
THE PROBLEM OF SYNTHESIS 11
THE PROBLEM OF FORMULATION 11
WHAT WOULD DARWIN DO? 13
THE POTENTIAL FOR ACCELERATED DISCOVERY:
USING COMPUTERS TO MAP THE KNOWLEDGE SPACE 14
WHY ACCELERATE DISCOVERY: THE BUSINESS
PERSPECTIVE 15
COMPUTATIONAL TOOLS THAT ENABLE ACCELERATED
DISCOVERY 16
Search 16
Business Intelligence and Data Warehousing 17
Massive Parallelization 17
Unstructured Information Mining 17
Natural Language Processing 17
Machine Learning 18
Collaborative Filtering/Matrix Factorization 18
Modeling and Simulation 18
Service-Oriented Architectures 19
x   ◾    Contents
Ontological Representation Schemes 19
DeepQA 19
Reasoning under Uncertainty 20
ACCELERATED DISCOVERY FROM A SYSTEM PERSPECTIVE 20
Content Curator 21
Domain-pedia 21
Annotators 23
Normalizers 23
BigInsights Framework 23
Query Services 23
Analytics Services 23
User Interface 23
Catalogue 24
ACCELERATED DISCOVERY FROM A DATA PERSPECTIVE 24
Initial Domain Content and Knowledge Collection 24
Content Comprehension and Semantic Knowledge Extraction 26
Complex and High-Level Knowledge Composition and
Representation 26
New Hypothesis and Discovery Creation 27
ACCELERATED DISCOVERY IN THE ORGANIZATION 28
CHALLENGE (AND OPPORTUNITY) OF ACCELERATED
DISCOVERY 29
REFERENCES 30
Chapter 3 ◾ 
Form and Function 33
THE PROCESS OF ACCELERATED DISCOVERY 34
CONCLUSION 40
REFERENCE 40
Chapter 4 ◾ 
Exploring Content to Find Entities 41
SEARCHING FOR RELEVANT CONTENT 42
HOW MUCH DATA IS ENOUGH? WHAT IS TOO MUCH? 42
HOW COMPUTERS READ DOCUMENTS 43
EXTRACTING FEATURES 43
Contents   ◾    xi
Editing the Feature Space 46
FEATURE SPACES: DOCUMENTS AS VECTORS 47
CLUSTERING 48
DOMAIN CONCEPT REFINEMENT 50
Category Level 50
Document Level 51
MODELING APPROACHES 51
Classification Approaches 52
Centroid 52
Decision Tree 52
Naïve Bayes 52
Numeric Features 52
Binary Features 53
Rule Based 53
Statistical 53
DICTIONARIES AND NORMALIZATION 54
COHESION AND DISTINCTNESS 54
Cohesion 55
Distinctness 56
SINGLE AND MULTIMEMBERSHIP TAXONOMIES 56
SUBCLASSING AREAS OF INTEREST 57
GENERATING NEW QUERIES TO FIND ADDITIONAL
RELEVANT CONTENT 57
VALIDATION 58
SUMMARY 58
REFERENCES 58
Chapter 5 ◾ 
Organization 61
DOMAIN-SPECIFIC ONTOLOGIES AND DICTIONARIES 61
SIMILARITY TREES 62
USING SIMILARITY TREES TO INTERACT WITH DOMAIN
EXPERTS 65
xii   ◾    Contents
SCATTER-PLOT VISUALIZATIONS 65
USING SCATTER PLOTS TO FIND OVERLAPS BETWEEN
NEARBY ENTITIES OF DIFFERENT TYPES 67
DISCOVERY THROUGH VISUALIZATION OF TYPE SPACE 69
REFERENCES 69
Chapter 6 ◾ 
Relationships 71
WHAT DO RELATIONSHIPS LOOK LIKE? 71
HOW CAN WE DETECT RELATIONSHIPS? 72
REGULAR EXPRESSION PATTERNS FOR EXTRACTING
RELATIONSHIPS 72
NATURAL LANGUAGE PARSING 73
COMPLEX RELATIONSHIPS 74
EXAMPLE: P53 PHOSPHORYLATION EVENTS 74
PUTTING IT ALL TOGETHER 75
EXAMPLE: DRUG/TARGET/DISEASE RELATIONSHIP
NETWORKS 75
CONCLUSION 79
Chapter 7 ◾ 
Inference 81
CO-OCCURRENCE TABLES 81
CO-OCCURRENCE NETWORKS 83
RELATIONSHIP SUMMARIZATION GRAPHS 83
HOMOGENEOUS RELATIONSHIP NETWORKS 83
HETEROGENEOUS RELATIONSHIP NETWORKS 86
NETWORK-BASED REASONING APPROACHES 86
GRAPH DIFFUSION 87
MATRIX FACTORIZATION 87
CONCLUSION 88
REFERENCES 89
Chapter 8 ◾ 
Taxonomies 91
TAXONOMY GENERATION METHODS 91
SNIPPETS 92
TEXT CLUSTERING 92
Contents   ◾    xiii
TIME-BASED TAXONOMIES 94
Partitions Based on the Calendar 94
Partitions Based on Sample Size 95
Partitions on Known Events 95
KEYWORD TAXONOMIES 95
Regular Expression Patterns 96
NUMERICAL VALUE TAXONOMIES 97
Turning Numbers into X-Tiles 98
EMPLOYING TAXONOMIES 98
Understanding Categories 98
Feature Bar Charts 98
Sorting of Examples 99
Category/Category Co-Occurrence 99
Dictionary/Category Co-Occurrence 100
REFERENCES 101
Chapter 9 ◾ 
Orthogonal Comparison 103
AFFINITY 104
COTABLE DIMENSIONS 105
COTABLE LAYOUT AND SORTING 106
FEATURE-BASED COTABLES 107
COTABLE APPLICATIONS 109
EXAMPLE: MICROBES AND THEIR PROPERTIES 109
ORTHOGONAL FILTERING 111
CONCLUSION 114
REFERENCE 115
Chapter 10 ◾ 
Visualizing the Data Plane 117
ENTITY SIMILARITY NETWORKS 117
USING COLOR TO SPOT POTENTIAL NEW
HYPOTHESES 119
VISUALIZATION OF CENTROIDS 123
EXAMPLE: THREE MICROBES 125
xiv   ◾    Contents
CONCLUSION 127
REFERENCE127
Chapter 11 ◾ 
Networks 129
PROTEIN NETWORKS 130
MULTIPLE SCLEROSIS AND IL7R 130
EXAMPLE: NEW DRUGS FOR OBESITY 134
CONCLUSION 136
REFERENCE 136
Chapter 12 ◾ 
Examples and Problems 139
PROBLEM CATALOGUE 139
EXAMPLE CATALOGUE 140
Chapter 13 ◾ 
Problem: Discovery of Novel Properties
of Known Entities 141
ANTIBIOTICS AND ANTI-INFLAMMATORIES 141
SOS PATHWAY FOR ESCHERICHIA COLI 146
CONCLUSIONS 149
REFERENCES 150
Chapter 14 ◾ 
Problem: Finding New Treatments for Orphan
Diseases from Existing Drugs 151
IC50:IC50 152
REFERENCES 158
Chapter 15 ◾ 
Example: Target Selection Based on Protein
Network Analysis 159
TYPE 2 DIABETES PROTEIN ANALYSIS 159
Chapter 16 ◾ 
Example: Gene Expression Analysis for
Alternative Indications 165
Scott Spangler, Ignacio Terrizzano, and Jeffrey Kreulen
NCBI GEO DATA 165
CONCLUSION 173
REFERENCES 174
Contents   ◾    xv
Chapter 17 ◾ 
Example: Side Effects 175
Chapter 18 ◾ 
Example: Protein Viscosity Analysis Using
Medline Abstracts 183
DISCOVERY OF ONTOLOGIES 184
USING ORTHOGONAL FILTERING TO DISCOVER
IMPORTANT RELATIONSHIPS 187
REFERENCE 194
Chapter 19 ◾ 
Example: Finding Microbes to Clean Up Oil
Spills 195
Scott Spangler, Zarath Summers, and Adam Usadi
ENTITIES 196
USING COTABLES TO FIND THE RIGHT COMBINATION
OF FEATURES 199
DISCOVERING NEW SPECIES 202
ORGANISM RANKING STRATEGY 205
CHARACTERIZING ORGANISMS 206
Respiration 209
Environment 215
Substrate 215
CONCLUSION 216
Chapter 20 ◾ 
Example: Drug Repurposing 225
COMPOUND 1: A PDE5 INHIBITOR 226
PPARα/γ AGONIST 228
Chapter 21 ◾ 
Example: Adverse Events 231
FENOFIBRATE 231
PROCESS 232
CONCLUSION 237
REFERENCES 239
Chapter 22 ◾ 
Example: P53 Kinases 241
AN ACCELERATED DISCOVERY APPROACH BASED ON
ENTITY SIMILARITY 243
xvi   ◾    Contents
RETROSPECTIVE STUDY 246
EXPERIMENTAL VALIDATION 248
CONCLUSION 250
REFERENCE 251
Chapter 23 ◾ 
Conclusion and Future Work 253
ARCHITECTURE 254
FUTURE WORK 255
ASSIGNING CONFIDENCE AND PROBABILITIES TO
ENTITIES, RELATIONSHIPS, AND INFERENCES 255
DEALING WITH CONTRADICTORY EVIDENCE 259
UNDERSTANDING INTENTIONALITY 259
ASSIGNING VALUE TO HYPOTHESES 261
TOOLS AND TECHNIQUES FOR AUTOMATING THE
DISCOVERY PROCESS 261
CROWD SOURCING DOMAIN ONTOLOGY CURATION 262
FINAL WORDS 262
REFERENCE 262
INDEX, 263
xvii
Preface
Afew years ago, having spent more than a decade doing
unstructured data mining of one form or another, in domains
spanning helpdesk problem tickets, social media, and patents, I thought
I fully understood the potential range of problems and likely areas of
applicability of this mature technology. Then, something happened that
completely changed how I thought about what I was doing and what its
potential really was.
The change in my outlook began with the Watson Jeopardy challenge.
Seeing a computer learn from text to play a game I had thought was far
beyond the capabilities of any artificial intelligence opened my eyes to new
possibilities. And I was not alone. Soon many customers were coming for-
ward with their own unique problems—problems I would have said a few
years ago were just too hard to solve with existing techniques. And now, I
said, let’s give it a try.
This wasn’t simply a straightforward application of the algorithms
used to win Jeopardy in a different context. Most of the problems I was
being asked to solve weren’t really even question-answering problems
at all. But they all had a similar quality in that they forced us to digest
all of the information in a given area and find a way to synthesize a new
kind of meaning out of it. This time the problem was not to win a game
show, but (to put it bluntly) to advance human scientific knowledge. More
than once in the early going, before we had any results to show for our
efforts, I wondered if I was out of my mind for even trying this. Early on, I
remember making more than one presentation to senior executives at my
company, describing what I was doing, half expecting they would tell me
to cease immediately, because I was attempting something way too hard
for current technology and far outside the bounds of a reasonable business
opportunity. But fortunately, no one ever said that, and I kept on going.
xviii   ◾    Preface
Somewhere along the way (I can’t say just when), I lost all doubt that I
was really onto something very important. This was more than another
new application of unstructured data mining. And as each new scien-
tific area came forward for analysis, the approach we were using began to
solidify into a kind of methodology. And then, just as this was happening,
I attended the ACM Knowledge Discovery and Data Mining Conference
in 2013 (KDD13); I met with CRC Press in a booth at the conference and
told them about my idea. Shortly thereafter, I was signed up to write a
book.
I knew at the time that what would make this book especially challeng-
ing is that I was still proving the methodology and tweaking it, even as
I was writing out the description of that method. This was not ideal, but
neither could it be avoided if I wanted to broaden the application of the
method beyond a small team of people working in my own group. And if
it could not be broadened, it would never realize its full potential.
Data science is a new discipline. It lacks a curriculum, a set of text-
books, a fundamental theory, and a set of guiding principles. This is both
regrettable and exciting. It must be rectified if the discipline is to become
established. Since I greatly desire that end, I write this book in the hopes
of furthering it.
Many years ago, I remember stumbling across a book by the statistician
John Tukey called Exploratory Data Analysis. It was written 30 years before
I read it, and the author was no longer living; yet it spoke to me as if I were
his research collaborator. It showed me how the ideas I had been grappling
with in the area of unstructured text had been similarly addressed in the
realm of structured data. Reading that book gave me renewed confidence
in the direction I was taking and a larger vision for what the fulfillment of
that vision might one day accomplish.
This book is one more step on that journey. The journey is essentially
my life’s work, and this book is that work’s synthesis thus far. It is far from
perfect, as anything that is real will always be a diminishment from what
is merely imagined. But I hope even in its imperfect state it will commu-
nicate some part of what I experience on a daily basis working through
accelerated discovery problems as a data scientist. I think it is unquestion-
ably the most rewarding and exciting job in the world. And I dare to hope
that 30 years from now, or maybe even sooner, someone will pick up this
book and see their own ideas and challenges reflected in its pages and feel
renewed confidence in the direction they are heading.
Preface   ◾    xix
At the same time, I fear that one or two readers (at least) will buy this
book and immediately be disappointed because it is not at all what they
were expecting it would be. It’s not a textbook. It’s not a business book. It’s
not a popular science book. It doesn’t fit well in any classification. I can
only say this: I wrote it for the person I was a few years back. Read it with
an open mind: you might find you get something useful out of it, regard-
less of the failure to meet your initial expectations. It took me a long time
to get to this level of proficiency in knowing how to address accelerated
discovery problems. I’ve tried my best to capture exactly how I go about
it, both from a systematic perspective and from a practical point of view.
This book provides motivation, strategy, tactics, and a heterogeneous set
of comprehensive examples to illustrate all the points I make. If it works
as I have intended it to, it will fill an important gap left by the other types
of books I have mentioned…the ones you thought this might be. You can
still buy those other books as well, but keep this one. Come back to it later
when you have started to put in practice the theories that you learned in
school to solve real-world applications. You may find then that the book
has more to say to you than you first thought.
Today, I go into each new problem domain with complete confidence
that I know how to get started; I know the major steps I need to accom-
plish, and I have a pretty good idea what the final solution will look like (or
at least I know the range of things it might look like when we first deliver
a useful prototype). It wasn’t always that way. Those first few customer
engagements of this type that I did, I was mostly winging it. It was excit-
ing, no doubt, but I would have really loved to have this book on my desk
(or in my e-reader) to look over after each meeting and help me figure out
what I should do next. If you are fortunate enough to do what I do for a
living, I think you will (eventually) find this book worthwhile.
Accelerating Discovery Mining Unstructured Information For Hypothesis Generation Spangler
xxi
Acknowledgments
There were many people who were instrumental in the creation of
this methodology and in the process of writing the book that explains
it. First, the team at IBM Watson Innovations, who made it all possible:
Ying Chen, Meena Nagarajan, Qi He, Linda Kato, Ana Lelescu, Jacques
LaBrie, Cartic Ramakrishnan, Sheng Hua Boa, Steven Boyer, Eric Louie,
Anshu Jain, Isaac Cheng, Griff Weber, Su Yan, and Roxana Stanoi. Also
instrumental in realizing the vision were the team at Baylor College of
Medicine, led by Olivier Lichtarge, with Larry Donehower, Angela Dawn
Wilkins, Sam Regenbogen, Curtis Pickering, and Ben Bachman.
Jeff Kreulen has been a collaborator for many years now and contin-
ues to be a big supporter and contributor to the ideas described here.
Michael Karasick and Laura Haas have been instrumental in consistently
supporting and encouraging this work from a management perspective at
IBM. John Richter, Meena Nagarajan, and Peter Haas were early reviewers
of my first draft, and I appreciate their input. Ying Chen helped write the
chapter on Why Accelerate Discovery?, for which I am most grateful. Pat
Langley provided some very good advice during the planning phase for
the book, which I profited from.
Finally, and most importantly, my wife, Karon Barber, who insisted
that I finish this project, at the expense of time that I would rather have
spent with her. Nothing I’ve accomplished in life would have happened
without her steadfast faith and love.
Accelerating Discovery Mining Unstructured Information For Hypothesis Generation Spangler
1
C h a p t e r 1
Introduction
This book is about discovery in science and the importance of
heterogeneous data analytics in aiding that discovery. As the vol-
ume of scientific data and literature increases exponentially, scientists
need ever-more powerful tools to process and synthesize that infor-
mation in a practical and meaningful way. But in addition, scientists
need a methodology that takes all the relevant information in a given
problem area—all the available evidence—and processes it in order to
propose the set of potential new hypotheses that are most likely to be
both true and important. This book describes a method for achieving
this goal.
But first, I owe the reader a short introduction and an explanation of
why I am the one writing this book. The short answer is a lucky accident
(lucky for me anyway; for you it remains to be seen). I stumbled into a
career doing the most exciting and rewarding work I can imagine. I do
not know it for a fact, but I suspect that I have done more of this kind
of work and for a longer period of time than anyone else now alive. It is
this experience that I now feel compelled to share, and it is that experi-
ence that should make the book interesting reading for those who also
see the potential of the approach but do not know how to get started
applying it.
It all started out with a love of mathematics, in particular discreet
mathematics, and to be even more specific: combinatorics. Basically this is
the study of how to count things. I was never happier than when I found
this course in college. The discovery of a discipline devoted precisely to
what one instinctively loves is one of life’s greatest joys. I was equally
2   ◾    Accelerating Discovery
disappointed to find there was no such thing as a career in combinatorics,
outside of academia—at least, not at that time.
But I wandered into computer science, and from there into machine
learning and from there into text mining, and suddenly I became aware
that the skill and practice of knowing how to count things had a great deal
of practical application after all. And now 30 years have passed since I fin-
ished that combinatorics course, and with every passing year the number,
variety, importance, and fascination of the problems I work on are still
increasing.
Nothing thrills me more than to have a new data set land in my inbox.
This is especially so if it is some kind of data I have never seen before, better
still if analyzing it requires me to learn about a whole new field of knowl-
edge, and best yet if the result will somehow make a difference in the world.
I can honestly say I have had the privilege of working on such problems,
not once, not twice, but more times than I can reckon. I do not always suc-
ceed, but I do make progress often enough that more and more of these
problems seem to find their way to me. At some level I wish I could do them
all, but that would be selfish (and not very practical). So I am writing this
book instead. If you love working with unstructured, heterogeneous data
the way I do, I believe this book will have a positive impact on your career,
and that you will in turn have a positive impact on society.
This book is an attempt to document and teach Accelerated Discovery
to the next generation of data scientists. Once you have learned these
techniques and practiced them in the lab, your mission will be to find
a scientist or an engineer struggling with a big data challenge and help
them to make a better world. I know these scientists and engineers exist,
and I know they have these challenges, because I have talked to them
and corresponded with them. I have wished there were more of me to go
around so that I could help with every one of them, because they are all
fascinating and all incredibly promising and worthy efforts. But there are
only so many hours in a week, and I can only pick a few of the most prom-
ising to pursue, and every one of these has been a rewarding endeavor.
For the rest and for those problems that will come, I have written this
book.
This book is not a data-mining manual. It does not discuss how to build
a text-classification engine or the ins and outs of writing an unsupervised
clustering implementation. Other books already do this, and I could not
surpass these. This book assumes you already know how to process data
using the basic bag of tools that are now taught in any good data-mining
Introduction   ◾    3
or machine-learning course. Where those courses leave off, this book
begins. The question this book answers is how to use unstructured mining
approaches to solve a really complex problem in a given scientific domain.
How do you create a system that can reason in a sophisticated way about a
complex problem and come up with solutions that are profound, nonobvi-
ous, and original?
From here on, this book is organized in a more or less top-down fashion.
The next chapter discusses the importance of the Accelerated Discovery
problem space and why the time has come to tackle it with the tools we
currently have available. Even if you are already motivated to read the
book, do not skip this chapter, because it contains some important mate-
rial about how flexibly the technology can be applied across a wide swath
of problems.
What follows immediately thereafter is a set of five chapters that
describe the method at a fairly high level. These are the most important
chapters of the book, because they should be in the front of your mind
each time you face a new analytics challenge in science. First there is a
high-level description of our method for tackling these problems, followed
by four detailed chapters giving a general approach to arriving at a solu-
tion. When put together, these five chapters essentially cover our method
for accelerating discovery. Not every problem you encounter will use every
step of this method in its solution, but the basic approach can be applied in
a more or less universal way.
The next section brings the level of detail down to specific technologies
for implementing the method. These are less universal in character but
hopefully will make the method more concrete. This set of four chapters
goes into greater detail about the tools and algorithms I use to help real-
ize the approach in practice. It is not complete, but hopefully it will be
illustrative of the kinds of techniques that can make the abstract process
a reality.
The rest of the book is made up of sample problems or examples of how
this really works in practice. I included ten such examples because it was
a nice round number, and I think the examples I have selected do pro-
vide a good representative sample of this kind of engagement. All of these
examples are from real scientists, are based on real data, and are focused
on important problems in discovery. The examples all come from the life-
sciences area, but that is not meant to be the only area where these tech-
niques would apply; in fact, I have applied them in several other sciences,
including materials and chemistry. But my best physical science examples
4   ◾    Accelerating Discovery
are not publishable due to proprietary concerns, so for this book I have
chosen to focus on the science of biology.
That is how the book is organized, but do not feel you have to read it
this way. You could just as well start with the examples and work your
way back to the earlier chapters when you want to understand the method
in more detail. You will quickly notice that not every problem solution
employs every step of the methodology anyway. The methodology is a
flexible framework on which to assemble the components of your solu-
tion, as you need them, and where they make sense. And it is meant to be
iterative and to evolve as you get deeper into the information complexity
of each new domain.
As you read the book, I hope that certain core principles of how to be
a good data scientist will naturally become apparent. Here is a brief cata-
logue of those principles to keep in mind each time you are faced with a
new problem.
• The whole is greater than the sum of the parts: As scientists we natu-
rally tend toward reductionism when we think about how to solve a
problem. But in data science, it is frequently the case that, by con-
sidering all the relevant data at once, we can learn something that
we cannot see by looking at each piece of data in isolation. Consider
ways to unify everything we know about an individual entity as a
complete picture. What you learn is frequently surprising.
• More X is not always better: There is a wishful tendency among those
less familiar with the problems of data science to imagine that every
problem can be solved with more data, no matter how irrelevant that
data happens to be; or that, if we have run out of data, then adding
more features to the feature space ought to help; or, if that fails, that
adding more categories to our taxonomy should help, and so on. The
operative concept is more is always better. And certainly, one is sup-
posed to assume that at least more stuff can never hurt; the solution
must be in there somewhere. But the problem is, if you add mostly
more noise, the signal gets harder to find, not easier. Careful selec-
tion of the right data, the right features, and the right categories is
always preferable to indiscriminate addition.
• Compare and contrast: Measuring something in isolation does not
tell you very much. Only when you compare the value to some other
related thing does it begin to have meaning. If I tell you a certain
Introduction   ◾    5
baseball player hit 50 home runs last season, this will not mean much
if you know nothing about the game. But if you know what percentile
that puts him in compared to other players, that tells you something,
especially if you also take into account plate appearances, difficulty
of pitchers faced, and the ball parks he played in. The point is that too
often in data science, we are tempted to look too narrowly at only one
aspect of a domain in order to get the precise number we are look-
ing for. We also need to look more broadly in order to understand
the implications of that value: to know whether it means anything of
importance.
• Divide and conquer: When you have a lot of data you are trying to
make sense of, the best strategy for doing this is to divide it into
smaller and smaller chunks that you can more easily comprehend.
But this only works if you divide up the data in a way that you can
make sense of when you put it all back together again. For exam-
ple, one way to divide up data is by letter of the alphabet, but this is
unlikely to make any one category much different than any other,
and thus the problem has not become any easier within each subcat-
egory. But if I focus on concepts rather than syntax, I stand a much
better chance of being enlightened at the end.
• “There’s more than one way to…”: Being a cat lover, I shy away from
completing that statement, but the sentiment is no less true for being
illustrated in such an unpleasant way. Once we find a solution or
approach, our brains seem to naturally just turn off. We have to avoid
this trap and keep looking for other ways to arrive at the result. If we
apply these additional ways and get the same answer, we can be far
more confident than we were that the answer is correct. If we apply
the additional approaches and get a different answer, that opens up
whole new areas for analysis that were closed to us before. Either
way, we win.
• Use your whole brain (and its visual cortex): Find a way to make the
data draw a picture, and you will see something new and important
that was hidden before. Of course, the challenge is to draw the pic-
ture that illustrates the key elements of importance across the most
visible dimensions. Our brains have evolved over time to take in
vast amounts of data through the eyes and convert it effortlessly into
a reasonably accurate view of what is going on around us. Find a
6   ◾    Accelerating Discovery
way to put that powerful specialized processor to work on your data
problem and you will inevitably be astounded at what you can see.
• Everything is a taxonomy/feature vector/network: At the risk of
oversimplifying things a bit, there are really only three basic things
you need to know to make sense of data: What the entities are that
you care about and how they relate to each other (the taxonomy),
how you can describe those entities as features (feature vector), and
how you can represent the way those entities interact (network).
Every problem involves some subset or combination of these ideas. It
really is that simple.
• Time is not a magazine: The data we take to begin our investigation
with is usually static, meaning it sits in a file that we have down-
loaded from somewhere, and we make sure that file does not change
over time (we may even back it up to be absolutely sure). This often
leads us to forget that change is the only constant in the universe,
and over time we will find that our file bears less and less relation
to the new reality of now. Find a way to account for time and to use
time recorded in data to learn how things evolve.
• All data is local: A corollary to the problem of time is the problem
of localization. Most data files we work with are subsets of the larger
data universe, and thus we have to generalize what we learn from
them to make them applicable to the real universe. That generaliza-
tion problem is going to be much harder than you realize. Prejudice
toward what we know and ignorance of what we do not is the bane
of all future predictions. Be humble in the face of your own limited
awareness.
• Prepare for surprise: If you are not constantly amazed by what you
find in data, you are doing something wrong.
Hopefully this brief introduction gives you some sense of the ideas to
keep in mind as you begin to master this discipline. Discovery is always
hard and always involves synthesizing different kinds of data and ana-
lytics. The crucial step is to make all those moving parts work together
constructively to illuminate what lies just beyond the known. The key
ingredient is figuring out what to count and how to count it. In the end,
everything is just combinatorics all the way down!
Introduction   ◾    7
I hope this book helps you to solve complex and important problems
of this type. More than that, I encourage you to develop your own meth-
ods for accelerating discovery and publish them as I have done mine. This
problem is too important for one small group of data scientists in one
organization to have all the fun. Come join us.
Accelerating Discovery Mining Unstructured Information For Hypothesis Generation Spangler
9
C h a p t e r 2
Why Accelerate
Discovery?
Scott Spangler and Ying Chen
There is a crisis emerging in science due to too much data. On the
surface, this sounds like an odd problem for a scientist to have. After
all, science is all about data, and the more the better. Scientists crave data;
they spend time and resources collecting it. How can there be too much
data? After all, why can scientists not simply ignore the data they do not
need and keep the data they find useful?
But therein lies the problem. Which data do they need? What data will
end up proving useful? Answering this question grows more difficult with
increasing data availability. And if data grows exponentially, the problem
may reach the point where individual scientists cannot make optimal
decisions based on their own limited knowledge of what the data contains.
I believe we have reached this situation in nearly all sciences today. We
have certainly reached it in some sciences.
So by accelerating discovery, I do not simply mean doing science the
way we do it today, only faster; I really mean doing science in a profoundly
new way, using data in a new way, and generating hypotheses in a new
way. But before getting into all that, I want to present some historical con-
text in order to show why science the way it has always been practiced is
becoming less and less viable over time.
To illustrate what I mean, consider the discovery of evolution by
Charles Darwin. This is one of the most studied examples of great science
10   ◾    Accelerating Discovery
in history, and in many ways his example provided the template for all
scientific practice for the next 150 years. On the surface, the story is
remarkably elegant and straightforward. Darwin travels to the Galapagos
Islands, where he discovers a number of new and unique species. When
he gets back from his trip, he notices a pattern of species changing over
time. From this comes the idea of species “evolution,” which he publishes
to much acclaim and significant controversy. Of course, what really hap-
pened is far more complex and illuminating from the standpoint of how
science was, and for the most part is still, actually practiced.
First of all, as inhabitants of the twenty-first century, we may forget
how difficult world travel was back in Darwin’s day. Darwin’s voyage was
a survey mission that took him around the world on a sailing ship, the
Beagle. He made many stops and had many adventures. Darwin left on
his trip in 1831 and returned five years later, in 1836. During that time,
he collected many samples from numerous locations and he took copi-
ous notes on everything he did and saw. After he got back to England, he
then spent many years collating, organizing, and systematically catalogu-
ing his specimens and notes. In 1839, he published, to much acclaim, a
book describing the incidents of this voyage (probably not the one you are
thinking of, that one came much later): Journal and Remarks, Voyage of
the Beagle. Darwin then spent the next 20 years doing research and col-
lecting evidence on plants and animals and their tendency to change over
time. But though he was convinced the phenomenon was real, he still did
not have a mechanism by which this change occurred.
Then Darwin happened upon Essay on the Principle of Population (1798)
by Thomas Malthus. It introduced the idea that animals produce more
offspring than typically actually survive. This created a “struggle for exis-
tence” among competing offspring. This led Darwin directly to the idea of
“natural selection.” The Origin of Species was published in 1859, 28 years
after the Beagle left on its voyage. And of course, it was many decades later
before Darwin’s theory would be generally accepted.
There are certain key aspects of this story that I want to highlight as
particularly relevant to the question “Why Accelerate Discovery?” The
first has to do with the 20 years it took Darwin to collect and analyze the
data he felt was necessary to develop and validate his theory. The second
is related to the connection that Darwin made between his own work and
that of Malthus. I think both of these phenomena are characteristic of the
big data issue facing scientists both then and now. And if we think about
them carefully in the context of their time and ours, we can see how it
Why Accelerate Discovery?   ◾    11
becomes imperative that scientists working today use methods and tools
that are far more powerful than those of Darwin and his contemporaries.
THE PROBLEM OF SYNTHESIS
When Darwin returned from his 5-year voyage, he had a formidable col-
lection of notes and specimens to organize and catalogue. This step took
him many years; longer, in fact, than it took him to collect the data in the
first place, but it was crucial to the discovery process. We often think of
scientific discovery as a Eureka moment—a bolt from the blue. But in real-
ity, it is much more frequently the result of painstaking labor carried out
to collect, organize, and catalogue all the relevant information. In essence,
this is a problem of synthesis. Data hardly ever comes in a form that can be
readily processed and interpreted. In nearly every case, the genius lies not
in the finding and collecting of the data but in the organization scheme
that makes the interpretation possible. In a very real sense, all significant
scientific discoveries are about overcoming the data-synthesis problem.
Clearly, data synthesis is hard (because otherwise everyone would do
it), but what makes it so? It is often not easy to see the effort required if
only the result is observed. This is because the most difficult step, the part
that requires the real genius, is almost invisible. It is hidden within the
structure of the catalogue itself. Let us look at the catalogue of specimens
Darwin created.
Darwin’s task in specimen organization cataloguing was not just to
record the species and physical characteristics of each specimen—it was
to find the hidden relationships between them, as illustrated in Figure 2.1.
Organizing data into networks of entities and relationships is a recurring
theme in science. Taxonomies and ontologies are another manifestation of
this. Taxonomies break entities down into classes and subclasses based on
some measure of similarity, so that the further down the tree you go, the
more alike things are within the same class. Ontologies represent a more
general kind of entity network that expresses how entities relate to each
other in the world. Creating such networks from raw data is the problem
of synthesis. The more data there is, and in particular the more hetero-
geneous the forms of that data, the more challenging synthesis becomes.
THE PROBLEM OF FORMULATION
Once Darwin had synthesized his data, it became clear to him that species
did indeed change over time. But merely to observe this phenomenon was
not enough. To complete his theory, he needed a mechanism by which
12   ◾    Accelerating Discovery
this change takes place. He needed to create a model that could explain
how the data points (i.e., the species) connected to each other; otherwise
all he would have is a way to organize the data, without having any addi-
tional insight into what the data meant. Creating this additional insight
that emerges from synthesis is the problem of formulation.
Formulation requires the creation of an equation or algorithm that
explains a process or at least simulates or approximates mathematically
how that process behaves in the physical world.
From a data-science perspective, formulation requires extracting
patterns that may appear across many disparate, heterogeneous data
collections. Going beyond synthesis to explanation may require data visu-
alization and sometimes even analogy. It requires pattern matching and
a14
a10
a9
a8
a7
a6
a5
a4
a3
a2
a1
q14
p14 b14
f 14
f 10
f 9
f 8
f 7
f 6
o14 e14
m14
F14
F10
E10
m10
m9
m8
m7
m6
m5
m4
m3
m2
m1
s2
i2
i3
k5
k6
k7
k8
l8
l7
n14 r14
w14 y14
v14 z14
z10
z9
z8
z7
z6
z5
z4
z3
z2
z1
t2
t3
u5
u6
u7
u8
w9
w10
w8
w7
A
W. West lith Halton garden
B C D E F G H I K L
I
II
III
IV
V
VI
VII
VIII
IX
X
XI
XII
XIII
XIV
FIGURE 2.1 Illustrations from The Origin of Species.
Why Accelerate Discovery?   ◾    13
being able to draw from a wide array of related data. This is what Darwin
was able to do when he drew on the writings of Thomas Malthus to dis-
cover the driving mechanism behind species change. Darwin reused an
existing formulation, the struggle for existence among competing off-
spring (i.e., “survival of the fittest”), and applied it to competition among
all living things in order to arrive at a formulation of how species evolve.
The process of formulation begins with observation. The scientist
observes how entities change and interact over time. She observes which
properties of entities tend to occur together and which tend to be indepen-
dent. Often, data visualization—charts or graphs, for example—is used to
summarize large tables of numbers in a way that the human visual cortex
can digest and make sense of. The synthesis of data is one of the key steps
in discovery—one that often looks obvious in retrospect but, at the begin-
ning of research, is far from being so in most cases.
WHAT WOULD DARWIN DO?
The process of synthesis and formulation used by Darwin and other
scientists worked well in the past, but this process is increasingly prob-
lematic. To put it bluntly, the amount and variety of data that needs to be
synthesized and the complexity of the models that need to be formulated
has begun to exceed the capacity of individual human intelligence. To see
this, let us compare the entities and relationships described in The Origin
of Species to the ones that today’s biologists need to grapple with.
Today’s biologists need to go well beyond the species and physical anat-
omy of organisms. Today, biology probes life at the molecular level. The
number of different proteins that compose the cells in the human organ-
ism is over a million. Each of these proteins has different functions in the
cell. Individual proteins work in concert with other proteins to create
additional functionality. The complexity and richness of all these interac-
tions and functions is awe inspiring. It is also clearly beyond the capability
of a single human mind to grasp. And all of this was entirely unknown to
Darwin.
To look at this in a different way, the number of scholarly publications
available for Darwin to read in his field might have been on the order
of around 10,000–100,000 at most. Today, that number would be on the
order of fifty million [1].
How do the scientists of today even hope to fathom such complexity and
scale of knowledge? There are two strategies that every scientist employs
to one degree or another: specialization and consensus. Each scientist
14   ◾    Accelerating Discovery
chooses an area of specialization that is narrow enough to encompass a
field wherein they can be familiar with all the important published litera-
ture and findings. Of course, this implies that as time goes on and more
and more publications occur, specialization must grow more and more
intense. This has the obvious drawback of narrowing the scope of each
scientist’s knowledge and the application of their research. In addition,
each scientist will read only the publications in the most prestigious, high-
profile journals. These will represent the best consensus opinion of the
most important research in their area. The drawback is that consensus in
science is frequently wrong. Also, if the majority of scientists are pursuing
the same line of inquiry, the space of possible hypothesis is very ineffi-
ciently and incompletely explored.
THE POTENTIAL FOR ACCELERATED DISCOVERY: USING
COMPUTERS TO MAP THE KNOWLEDGE SPACE
But all is not lost for the scientists of today, for the very tools that help
generate the exponentially increasing amounts of data can also help to
synthesize and formulate that data. Due to Moore’s Law, scientists have
and will most likely continue to have exponentially increasing amounts
of computational power available to them. What is needed is a way to
harness that computational power to carry out better synthesis and for-
mulation—to help the scientist see the space of possibilities and explore
that space much more effectively than they can today. What is needed is
a methodology that can be repeatedly employed to apply computation to
any scientific domain in such a way as to make the knowledge space com-
prehensible to the scientist’s brain.
The purpose of this book is to present one such methodology and to
describe exactly how to carry it out, with specific examples from biology
and elsewhere. We have shown this method to be an effective way to syn-
thesize all published literature in a given subject area and to formulate
new properties of entities based on everything that we know about those
entities from previous results. This leads us to conclude that the meth-
odology is an effective tool for accelerating scientific discovery. Since the
methods we use are in no way specific to these examples, we think there
is a strong possibility that they may be effective in many other scientific
domains as well.
Moreover, regardless of whether our particular methodology is opti-
mal or effective for any particular scientific domain, the fact remains
that all scientific disciplines that are pursued by ever-increasing numbers
Why Accelerate Discovery?   ◾    15
of investigators must ultimately address this fundamental challenge:
Eventually, the rate of data publication will exceed the individual human
capacity to process it in a timely fashion. Only with the aid of computation
can the brain hope to keep pace. The challenges we address here and the
method we employ to meet those challenges will continue to be relevant
and essential for science for the foreseeable future.
So clearly the need exists and will continue to increase for aiding sci-
entific discovery with computational tools. But some would argue that no
such tools exist, or that if they do exist, they are still too rudimentary to
really create value on a consistent basis. Computers can do computation
and information retrieval, but scientific discovery requires creativity and
thinking “outside the box,” which is just what computers cannot do. A few
years ago, the authors would have been largely in agreement with this
viewpoint, but something has changed in the field of computer science
that makes us believe that accelerating scientific discovery is no longer a
distant dream but is actually well within current capability. Later in this
chapter, we will describe these recent developments and preview some of
the implications of these emerging capabilities.
WHY ACCELERATE DISCOVERY: THE BUSINESS PERSPECTIVE
Discovery is central and critical to the whole of humanity and to many
of the world’s most significant challenges. Discovery represents an ability
to uncover things that are not previously known. It underpins all innova-
tions (Figure 2.2).
Looking at what we human beings consume—for example, consumer
goods such as food, clothing, household items, and energy—we would
quickly realize that we need significant innovations across the board.
We need to discover new ways to generate and store energy, new water
Major discoveries and innovation are critical to many world challenges
and the success of many companies across industries
Smarter planet/
consumer goods:
water filtration,
product innovation
Information
technology:
nanotechnologies,
mobile
Life sciences:
drug discovery,
biomedical
research
Energy storage
and generation:
batteries,
solar, CO2
FIGURE 2.2 Example application areas for Accelerated Discovery.
16   ◾    Accelerating Discovery
filtration methods, and new product formations for food and other
goods so that they are more sustainable for our environments and
healthier for human beings. We need these innovations more desper-
ately than ever.
Looking at what we make and build—for example, new computer and
mobile devices, infrastructures, and machines—again, the need for dis-
covery and innovation is at the center of all these. We need new kinds of
nanotechnologies that can scale human capacity to unimaginable limits,
new materials that have all the desired properties that will lower energy
consumption while sustaining its intended functions, and new designs
that can take a wide variety of factors into consideration.
Looking at ourselves, human beings, our own wellbeing depends heav-
ily on the discovery and innovation in healthcare, life sciences, and a wide
range of social services. We need a much better understanding of human
biology. We need to discover new drugs and new therapies that can target
diseases much more effectively and efficiently.
Yet today, the discovery processes in many industries are slow, manual,
expensive, and ad hoc. For example, in drug discovery, it takes on average
10–15 years to develop one drug, and costs hundreds of millions of dol-
lars per drug. The attrition rate of drug development today is over 90%.
Similarly, new energy forms, such as the lithium battery, take tens of years
to discover. New consumer product formations are mostly done on a trial-
and-error basis. There is a need across all industries for a reliable, repeat-
able process to make discovery more cost-effective.
COMPUTATIONAL TOOLS THAT ENABLE
ACCELERATED DISCOVERY
Accelerated Discovery is not just one capability or algorithm but a com-
bination of many complementary approaches and strategies for synthe-
sizing information and formulating hypotheses. The following existing
technologies represent the most important enablers of better hypothesis
generation in science.
Search
The ability to index and retrieve documents based on the words they con-
tain across all relevant content in a given scientific field is a primary enabler
of all the technologies that are involved in Accelerated Discovery. Being
able to selectively and rapidly acquire all the relevant content concerning
a given subject of interest is the first step in synthesizing the meaning of
Why Accelerate Discovery?   ◾    17
that content. The easy availability, scalability, and application of search to
this problem space have made everything else possible.
Business Intelligence and Data Warehousing
The ability to store, retrieve, and summarize large amounts of structured
data (i.e., numeric and categorical data in tables) allows us to deal with
all kinds of information in heterogeneous formats. This gives us the criti-
cal ability to survey scientific discoveries over time and space or to com-
pare similar treatments on different populations. The ability to aggregate
data and then accurately compare different subsets is a technology that
we apply over and over again in our approach as we seek to determine the
credibility and reliability of each fact and conclusion.
Massive Parallelization
In recent years, Hadoop and MapReduce frameworks [2] have made par-
allelization approaches much more applicable to real-world computing
problems. This gives us the ability to attack hard problems involving large
amounts of data in entirely new ways. In short, we can build up a number
of simple strategies to mine and annotate data that can, in aggregate, add
up to a very sophisticated model of what the data actually means. Massive
parallelization also allows us to try out thousands of approaches and com-
binations in real time before selecting the few candidates that are most
likely to succeed based on our models and predictions.
Unstructured Information Mining
Most of the critical information in science is unstructured. In other words,
it comes in the form of words, not numbers. Unstructured information
mining provides the ability to reliably and accurately convert words into
other kinds of structures that computers can more readily deal with. As
we will see in this book, this is a key element of the accelerated discovery
process. This allows us to go beyond retrieving the right document, to
actually discovering hidden relationships between the elements described
by those documents.
Natural Language Processing
The ability to recognize entities, relationships or transitions, and features
and properties and to attribute them appropriately requires natural lan-
guage processing. This technology allows us to parse the individual ele-
ments of the sentence, identify their part of speech, and determine to what
18   ◾    Accelerating Discovery
they refer. It can also allow us to discover the intentionality of the author.
These natural language processing abilities enable the precise determina-
tion of what is known, what is hypothesized, and what is still to be deter-
mined through experimentation. It creates the underlying fact-based
framework from which our hypotheses can be generated.
Machine Learning
To do Accelerated Discovery in complex domains requires more than
just establishing the factual statements in literature. Not all literature is
equally trustworthy, and the trustworthiness may differ depending on
the scope and context. To acquire the level of sophistication and nuance
needed to make these determinations will require more than human pro-
gramming can adequately provide. It will require learning from mistakes
and past examples in order to get better and better over time at deciding
which information is credible and which is suspect. Machine learning is
the technology that provides this type of capability. In fact, it has proven
remarkably adept at tackling even heretofore intractable problems where
there is sufficient training data to be had [3]. Machine learning will enable
our Accelerated Discovery approach to apply sophisticated judgment at
each decision point and to improve that judgment over time.
Collaborative Filtering/Matrix Factorization
Collaborative filtering is a technique made famous by Amazon and Netflix
[4], where it is used to accurately identify the best movie or book for a given
customer based on the purchase history of that customer and other simi-
lar customers (customers who buy similar things). Customer purchases
are a special kind of entity-entity network. Other kinds of entity-entity
networks are similarly amenable to this kind of link prediction. We can
use a generalization of this approach as a way to predict new links in an
existing entity-entity network, which can be considered to be a hypothesis
of a new connection between entities (or a new property of an entity)that is
not currently known but is very likely based on everything we know about
the relevant entities.
Modeling and Simulation
In order to reason accurately about the physical world we have to be able to
simulate its processes in silico and predict what would happen if an experi-
ment were tried or a new property or relationship was actually found to
exist. These types of simulations will help reveal potential downstream
Why Accelerate Discovery?   ◾    19
problems or contradictions that might occur if we were to hypothesize a
physically unrealizable condition or some impossible connection between
entities. Moreover, modeling and simulation can help determine what the
likely impact would be on the physical system as a whole of any new prop-
erty or relationships being discovered, in order to foresee whether such a
discovery would be likely to be uninteresting or quite valuable because it
would imply a favorable outcome or have a wide impact in the field.
Service-Oriented Architectures
Clearly, doing Accelerated Discovery requires a large array of hetero-
geneous software components providing a wide variety of features and
functions across multiple platforms. Service-oriented architectures (SOA)
provide a uniform communication protocol that allows the components
to communicate across the network using complex and evolving data rep-
resentations. It also allows new implementations and algorithms to be eas-
ily swapped in as they become available. SOAs represent an indispensable
tool for enabling the large, sophisticated, and distributed systems needed
for accelerated discovery applications to emerge from components that
can largely be found on the shelf or in open-source libraries.
Ontological Representation Schemes
In addition to being able to extract entities and relationships from unstruc-
tured content, we also need powerful ways to represent those entities and
their features and connections in a persistent fashion, and in a way that
makes it possible to do reasoning over these objects. Existing ontological
representation schemes (e.g., OWL [5]) make it possible to store entities
in a way that retains all the pertinent contextual information about those
entities while still maintaining a degree of homogeneity. This homogene-
ity makes it possible to design algorithms that can discover or infer new
properties based on all known existing patterns. The ability to store such
representations in a scalable database and/or index provides the capability
of growing the stored version of what is known to the level necessary to
comprehend an entire scientific domain.
DeepQA
While question answering is not a central feature of Accelerated Discovery,
the two applications share many common components. Both require the
computational digestion of large amounts of unstructured content, which
then must be used in aggregated to form a conclusion with a likelihood
20   ◾    Accelerating Discovery
estimate. Both also support their answers with evidence extracted from
text sources.
Reasoning under Uncertainty
Machine learning techniques allow us to predict the likelihood that some
conclusion or fact is true. Reasoning under uncertainty is how we use this
probabilistic knowledge to form new hypotheses or to invalidate or ignore
some fact that is highly unlikely. Bayesian inferencing [6] is one example
of an existing framework that we can apply to uncertain causal networks
in order to do credible deduction of probable end states. This capability is
central to telling the difference between a likely outcome and something
that is wildly fanciful.
ACCELERATED DISCOVERY FROM A SYSTEM PERSPECTIVE
The previous list of enabling technologies available to support Accelerated
Discovery (AD) is necessary to the task, but incomplete. What is needed
is a coherent framework in which these technologies can effectively work
together to achieve the desired outcome.
To support and enable such continuous data transformations and dis-
covery, we must design our discovery solution carefully. In particular,
our discovery solution must adapt and scale to a wide range of chang-
ing dynamics, including data content, domain knowledge, and human
interactions. This is crucial because in all industry domains, domain con-
tent and knowledge are constantly changing and expanding. If we do not
design our discovery system to cope with such changes, the outcome will
be a system that lacks longevity and capability. Most, if not all, of today’s
discovery solutions can only deal with limited volume and diversity of
content and knowledge.
To enable adaptation and scaling, we instituted two system design prin-
ciples: agility and adaptivity. Agility means that a discovery system must
be able to rapidly generate outputs in the face of changes in data content,
knowledge, and human inputs. This is far from a reality in today’s discov-
ery systems. For example, a discovery system may be built for one kind
of data input formats. When the data input format changes, significant
manual intervention may be needed and downstream system components
may also need to change accordingly. Such designs make a discovery pro-
cess extremely lengthy and error prone. We will describe our approach to
build agility into the system from the very beginning.
Why Accelerate Discovery?   ◾    21
Adaptivity means that a discovery solution must consider “changes in
all forms” to be the first-class citizen; for example, changes in individual
system components, changes in data content, and changes in knowledge.
We envision that various underlying technology components will go
through their evolution to become better over time, and the same is true
of data content and knowledge bases. A discovery system must have the
notion of adaptivity built into it from day one.
To enable agility, we suggest that all major system components be
designed with a “core-abstraction plus configurable customization”
approach. The core abstraction defines all the major services that the sys-
tem components intend to support. The configurable customizations allow
for changes and adaptations of the core abstractions for specific needs in
data-content formats, knowledge repositories, and interactions with other
components. For example, a content collection component may have com-
mon and core services dealing with the processing of typical unstructured
documents. A configurable customization can define the specific fields
and format extensions that are needed for those unstructured sources
without code change in the core abstraction services.
To enable adaptivity, we defined generalized data input and output for-
mats, called common data models (CDMs), and common interfaces around
all system components. This allows developers to change the component
engine itself without impacting the overall function of the discovery system.
It also allows us to adapt to new changes in data sources and knowledge bases
by simply mapping them to CDMs without changing the rest of the system.
Figure 2.3 summarizes the key system components of a discovery
solution. The boxes are the major system components. We will also pro-
vide a summary of the description to each of the following major system
components:
Content Curator
Content curator is responsible for managing domain content sources. It
includes collecting, cleansing, and making diverse content sources available
fordownstreamprocessing,end-userqueries,andbrowsing.Typicalfunctions
include data ingestion, extraction, transformation, and loading into some
content repository. It also includes functions such as indexing and searching.
Domain-pedia
Domain-pedia is responsible for managing domain knowledge. It must
ingest, update, and process data knowledge from existing knowledge
22   
◾
   
Accelerating
Discovery
All key components have common data model extensible data exchange format,
standard web service interfaces, and customizable configurations
Key
components
Catalog
Content
curator
Domain-
pedia
Annotators
Normalizers
BigInsights
framework
Query services
Analytics
services
User
interface
Catalog
Annotators Normalizers
Normalizers
registry
Annotator
registry
Filter Annotate
Content
curator
Content
registry
Ontology /
dictionary
registry
Domain-pedia
BigInsights-based map reduce framework
Normalize
Transform
and index
Content
index
Specialty
index
Graph
store
Query
services
Analytics
services
Core
UI
services
custom
user
interfaces
FIGURE 2.3 (See color insert.) Functional model.
Why Accelerate Discovery?   ◾    23
sources as well as some of the downstream processing, such as the seman-
tic entity extraction process. It can also be a resource for runtime process-
ing and analytics and end-user searching and browsing, similar to what
one might do on Wikipedia, such as searching for knowledge around a
given protein.
Annotators
Annotators are the engines that pull semantic domain concepts out of
unstructured text information, such as chemicals, genes, and proteins,
and their relationships.
Normalizers
Normalizers are the engines that organize the vocabularies around vari-
ous domain concepts into a more consistent form. As the name indicates,
they normalize domain concepts into standardized forms, such as unique
chemical structures and protein names.
BigInsights Framework
TheBigInsightsframeworkisanorchestrationenginethatmanagestheinter-
actions between the components described above in a scalable and efficient
fashion by mapping runtime content, knowledge, annotation, and normal-
ization processing in a large-scale Hadoop-like infrastructure framework.
Such a framework is like the blood vessels of the human being; without it, we
only have pieces and parts of the system, rather than a live system.
Query Services
Query services provide consistent data access interfaces for end-user and
other system components to query underlying data repositories without
knowing what format each data repository might be in.
Analytics Services
Analytics services include runtime algorithms to enable the discovery
and construction of more complex knowledge representations. For exam-
ple, they may produce network representations based on the underlying
knowledge repository of a gene-to-gene relationship.
User Interface
The user interactions of a discovery system can be diverse. They can range
from basic searches and reporting to more complex visualizations and
24   ◾    Accelerating Discovery
workflows. Our user interface component is built with a “platform+appli-
cations” principle. That is, we developed a suite of common user-interface
widgets that can be leveraged by a wide range of use cases. We then design
specific applications that are targeted for specific use cases by leveraging
the common widgets as much as possible. This way, our discovery system
can quickly be adapted to support different use cases without applications
having to be rewritten each time.
Catalogue
Finally, the catalogue component manages the system records about all
system components and their configurations and metadata about the con-
tent sources and knowledge sources. It creates and manages a list of sys-
tem and data catalogues. System administrators are the typical users of
such a component.
Clearly, a discovery system is complex and dynamic. Without the design
principles that have been described, a discovery solution can quickly find
itself stuck in face of changing content, knowledge, and user-interaction
paradigms.
ACCELERATED DISCOVERY FROM A DATA PERSPECTIVE
The process of Accelerated Discovery can essentially be thought of as a
data transformation process. Having just described the system for trans-
forming that data, let us look at the process again from a data perspective
and see how the data may be changed as we move through the discovery
process.
The discovery process is inherently a continuous transformation from
raw pieces of data and information to enriched pieces of knowledge and
more comprehensive knowledge representations, all the way to brand new
discoveries and hypotheses. A discovery solution/platform can therefore
be viewed as a system that supports and enables such data transformations
and end-user interactions. Figure 2.4 summarizes four major data trans-
formations driven by each of the four discovery process steps. We will now
discuss each of these steps of transformation in detail.
Initial Domain Content and Knowledge Collection
The bottom inner circle of Figure 2.4 marks the beginning of the data
transformation journey. To enable major discoveries, we must ensure that
the system is aware of a vast and diverse set of prior domain knowledge
and domain content. Without this step, the downstream analysis and
Why Accelerate Discovery?   ◾    25
discovery will be extremely limited in scope, and wrong conclusions could
be drawn. This is often the case with today’s bioinformatics tools, which
operate on small and narrowly scoped data sets.
We differentiate domain knowledge and domain content deliberately
here since they mean different things. Domain knowledge means prior
knowledge that has been captured digitally, such as manually curated
domain ontologies, taxonomies, dictionaries, and manually curated
structured databases. For example, in drug discovery, such domain
knowledge may include ChemBL database [7], OBO ontologies [8], and
other dictionaries.
Domaincontentmeansrawdigitaldatacontentthatmightcaptureexist-
ing domain knowledge but has not been curated systematically to allow a
broad set of scientists to gain such knowledge easily. For example, many of
the unstructured information sources, such as patents, published confer-
ence and journal articles, and internal reports, often contain knowledge
known by the individual scientists who wrote the documents. However, if
this knowledge has not been made widely and easily accessible, much of
it is locked down and unusable. Domain content also includes structured
data and semistructured data such as genomic screening information
The Discovery Platform—A Data Perspective
The discovery platform is a system that continuously transforms from initial raw data and
domain knowledge to brand new discoveries through a series of data transformation steps.
Each step along the way will enable services and bring value to our clients and partners
Emerging patterns
and discovery
Complex knowledge
and visual
representations
Enriched data and
enhanced domain
knowledge
Data and
domain
knowledge
Create new hypothesis and predictions
• Given known, show unknown
• Can be simple or complex representations
Compose complex and holistic knowledge
representations
• Graphs and networks
• Runtime calculated visualizations, such as scatter plots
• ......
Comprehend and extract semantic knowledge
• Entity, relationships, and complex relationship
extraction core and customizations
• E.g., chemical, biological and toxicology
annotators and PTM relationships
• Enhanced domain-knowledge and enriched data
content
Collect and curate domain content and knowledge
• Content collection and ingestion
• Content curation, indexing
• Patents, medline literature, ChemBL, ...
• Domain knowledge/ontology ingestion
• Domain-pedia management
• OBO, SIDER, dictionaries, ontologies
FIGURE 2.4 Data evolution.
26   ◾    Accelerating Discovery
and experiments. Many such sources of raw data also require significant
cleansing and processing before they can be made available and accessible.
A discovery solution must be able to collect, cleanse, and make acces-
sible a vast and diverse amount of domain knowledge and content in its
initial data transformation step. The output of this initial step is a content
and knowledge repository that is ready for downstream analysis, index-
ing, data exploration, and browsing.
Content Comprehension and Semantic Knowledge Extraction
The second transformation has to do with the ability to derive an enriched
set of domain knowledge from the initial content and knowledge col-
lection. Specifically, when it comes to unstructured and semistructured
content sources, such a transformation step is critical in capturing and
extracting knowledge buried in text, tables, and figures in a systematic
fashion; for example, extracting chemical structures and their properties,
genes and proteins described in text, and tables of experimental results.
Other ways to comprehend the content also include classification and
taxonomy generation. Such methods help organize content more mean-
ingfully and allow scientists to examine data with different lenses.
Such a step often requires natural language processing, text mining,
and machine learning capabilities. In addition, it also requires systems to
coalesce and cross-map different styles of vocabulary into something that
is more consistent. For example, one chemical compound may have over
100 different names. Unless such chemical compounds are cross-mapped
and normalized in some fashion, there is very little hope that scientists can
gain comprehensive knowledge about them, let alone discover new insights.
A well-designed system component that carries out such forms of data
comprehension will result in an enriched content and knowledge reposi-
tory. Such a repository not only unlocks and captures domain knowledge
buried in unstructured data, but also cross-maps different domain vocab-
ularies in a more consistent fashion. An analogy of such a knowledge
repository is a “domain-pedia.” That is, it contains all pieces of knowledge
gained via machine curation and gathers them together into a reposi-
tory that is like Wikipedia but more in-depth and comprehensive for the
domain under study.
Complex and High-Level Knowledge Composition and Representation
Building on the repositories created from the previous data transforma-
tion steps, the third data transformation step attempts to generate more
Why Accelerate Discovery?   ◾    27
holistic knowledge representations by composing fragmented pieces of
knowledge and data into more holistic and complex views, such as graphs
and networks. Such compositions represent known knowledge, but the
knowledge becomes much more visible and accessible to scientists than
before.
Compared to the previous steps, which focus more on gathering pieces
of fragmented content and knowledge in one place and making them
more easily accessible, this step focuses more on various ways to compose
content and knowledge. Such views and representations allow scientists
to have a better chance of gaining new insights and making discoveries.
Such a transformation is facilitated by a combination of techniques,
including graph- and network-based algorithms, visualizations, statistics,
and machine learning on top of the underlying content and knowledge
repositories. This is a step where many different combinations of views
and representations may be created, and human-machine interactions via
creative visualizations become essential.
New Hypothesis and Discovery Creation
The last data transformation in this process leapfrogs from the forms of
data transformation described above into new hypotheses and discover-
ies. The key to this step lies in prediction. Since the previous data transfor-
mations are meant to operate on vast and diverse content and knowledge,
the input dimensions for this discovery and predictive step can be far
more significant than what traditional approaches have to deal with. For
example, the feature space of the models can be extremely large. This may
require totally new approaches to modeling and prediction.
The discovery output of this step, when validated, can become new
knowledge that feeds back once again into the entire data transformation
and discovery process. Clearly, these data transformations are not static.
They are continuous, self-enhancing, and self-correcting. New content
sources and knowledge may be added and incorporated, and obsolete con-
tent and knowledge may be cleansed or archived.
Notice also that end users and businesses can take advantage of each
step of the data transformations and create business value before the final
steps are completed. The initial data collection and curation itself can be
of tremendous value as it will have already brought multiple fragmented
sources of data together in one place and made content searchable and
browsable. One of the most basic uses of a discovery solution is simply to
query data sets that have been brought together into a single index.
28   ◾    Accelerating Discovery
The second step of the data transformation fills a huge gap that exists
today across industries. We now can extract tens of millions of chemical
structures from patents and publications in hours or minutes. In the past,
such an endeavor would have taken hundreds of chemists manually read-
ing documents months or even years. Scientists can now immediately find
all chemical compounds invented in the past.
The third step of the data transformation reveals more comprehensive
views of the scientific domain to scientists. Even without machines automati-
cally predicting and discovering new insights, such comprehensive represen-
tations will make it much more possible for humans to spot new patterns.
The last step makes machine-based discovery visible to the end users.
ACCELERATED DISCOVERY IN THE ORGANIZATION
Many enterprises have begun to realize the need for computationally aided
discovery. However, existing IT infrastructure, data-center frameworks,
and data-processing paradigms may not be easily reworked to support an
end-to-end discovery process. This is because many such infrastructures
are built on historical needs and assumptions. The underlying software
and hardware are often insufficient to scale and adapt to what is needed
by a discovery solution. Because of this, we believe new business models
and business processes will be needed to enable discovery systematically.
In particular, with the rapid growth of public content and knowledge
repositories, cloud-based infrastructure, and scalable cognitive comput-
ing capabilities built on distributed architecture such as Hadoop, it has
become more attractive to structure “managed discovery service” busi-
ness models to enable the rapid adoption of a discovery solution.
A managed discovery service can allow a packaged cloud-compute
infrastructure, preloaded relevant public content and knowledge reposito-
ries, configured discovery middle-ware software stack, and predefined use
cases to be supported. It can also be extended to incorporate additional
content and knowledge sources based on customer needs. It can be cus-
tomized to support use cases that customers desire.
A managed discovery service can be structured in multiple ways to
allow different cost-pricing structures to be implemented. For example,
a pure public content–driven discovery solution can potentially be made
available via a multitenant Software-as-a-Service (SaaS) model to allow
cost sharing. When private content and use cases are incorporated, one
can structure a hybrid model or single-tenant model, which would incur
higher cost but would have higher levels of security and service-level
Random documents with unrelated
content Scribd suggests to you:
CHAPTER XIII.
SHADOWY VISITORS.
When the eye gazes steadily at the Pleiades, in the midnight
splendor of the starlit sky, one of the blazing orbs shrinks modestly
from view and only six remain to be admired by the wondering gazer
below: it is the quick, casual glance that catches the brilliant sister
unawares, before she can hide her face.
So, when the pioneers within the block-house looked intently at the
stockade, they saw nothing but the wall of shadow and the outline
of the sharp pickets above; but, as their vision flitted along the front,
they caught the faint suggestions of the figures of men standing
erect and doubtless intently watching the block-house, from which
the rifles of the Kentuckians had flashed but a short time before.
Whenever the moon's light was obscured, nothing but blank
darkness met the eye, the line of stockades themselves vanishing
from sight. Once one of the warriors moved a few steps to the left,
and Jo Stinger and Ned Preston detected it.
Why not try another shot? asked the Colonel, when the matter was
referred to.
It is too much guess-work: nobody can take any sort of aim, when
it is so dark in the block-house.
I wonder what their purpose can be, muttered the Colonel,
speaking as much to himself as to those near him.
I knows what it am, said Blossom Brown, who had been drawn to
the spot by the firing and the words he had overheard.
You do, eh? remarked the Colonel, looking toward him in the
darkness; what is it?
Dey're comin' to steal de well.
What will they do with it, after they steal it?
Take it off in de woods and hide it, I s'pose.
They won't have any trouble in preventing us from stealing it,—that
is certain, observed the Colonel, bitterly.
Why can't we dig the well inside the block-house, as you intended?
asked Ned; there are shovels, spades and picks, and I don't
suppose it would take us a great while.
If we are driven to it, we will make the attempt; but there is no
likelihood that we will have a chance. All our attention will be
required by the Indians.
You can set Blossom to work if you wish to, said Ned Preston; he
is good for little except to cut wood and dig. If he worked steadily
for two or three days, he might reach water.
Ned was in earnest with this proposition, and he volunteered to take
his turn with his servant and the others; but the scheme filled
Blossom with dismay.
I neber dugged a well, he said, with a contemptuous sniff; if I
should undertook it, de well would cave in on me, and den all you
folks would hab to stop fightin' de Injines and go to diggin' me out
agin.
Colonel Preston did not consider the project feasible just then, and
Blossom Brown was relieved from an anticipation which was
anything but pleasant.
Jo Stinger was attentively watching the stockade where the figures
of the Wyandot warriors were faintly seen. He was greatly mystified
to understand what their object could be in exposing themselves to
such risk, when, so far as he could judge, there was nothing to be
gained by so doing; but none knew better than did the veteran that,
brave as were these red men, they were not the ones to face a
danger without the reasonable certainty of acquiring some
advantage over an enemy.
I will risk a shot anyway, he thought; for, though I can't make
much of an aim, there is a chance of doing something. As soon as
the moon comes out, I will see how the varmints will stand a bullet
or two.
So he waited till the clouds rolled by, but, as he feared, the
straining eye could not catch the faintest suggestion of a warrior,
where several were visible only a short time before.
They had vanished as silently as the shadows of the clouds swept
across the clearing.
The action of the Indians in this respect was the cause of all kinds of
conjectures and theories, none of the garrison being able to offer
one that satisfied the others.
Megill believed it was a diversion intended to cover up some design
in another direction. He was sure that, when the Wyandots made a
demonstration, it would come from some other point altogether. He,
therefore, gave his attention mainly to the cabins and the clearing in
front.
Turner suspected they meant to destroy the well by filling it up, so
that it would be useless when the supply of water within the block-
house should become exhausted. Precisely how this filling up was to
be done, and wherein the necessity existed (since the Wyandots
could command the approaches to the water day and night), were
beyond the explanation of the settler.
Jo Stinger, the veteran of the company, scouted these theories, as
he did that of the Colonel that it was a mere reconnoissance, but he
would not venture any guess further than that the mischief was
much deeper than any believed, and that never was there more
necessity of the most unremitting vigilance.
Megill asserted that some scheme was brewing in the cabin from
which the two warriors emerged, when they sought to cut off the
boys in their run to the block-house. He had seen lights moving
about, though the ones who carried the torches took care not to
expose themselves to any shot from the station.
The silence lasted two hours longer without the slightest evidence
that a living person was within a mile of the block-house. During that
period, not a glimmer of a light could be detected in the cabin, there
was not a single burning arrow, nor did so much as a war-whoop or
signal pass the lips of one of the Wyandots.
The keen eyes of Jo Stinger and Ned Preston failed to catch a
glimpse of the shadowy figures at which they discharged their rifles,
and which caused them so much wonderment and speculation.
But the keen scrutiny that seized every favoring moment and
roamed along the lines of stockades, further than the ordinary eye
could follow, discovered a thing or two which were not without their
significance.
On the northern and eastern sides a number of pickets had been
removed, leaving several gaps wide enough to admit the passage of
a person. This required a great deal of hard work, for the pickets
had been driven deep into the earth and were well secured and
braced from the inside.
They needed men on both sides of the stockade to do that, said
Colonel Preston, and those whom we saw, climbed over, so as to
give assistance.
That's the most sensible idee that's been put forward, replied Jo
Stinger, and I shouldn't be s'prised if you was right; but somehow
or other——
By gracious! I smell smoke sure as yo's bo'n!
Blossom Brown gave several vigorous sniffs before uttering this
alarming exclamation, but the words had no more than passed his
lips, when every man knew he spoke the truth.
There was smoke in the upper part of the block-house, and though it
could not be seen in the darkness, yet it was perceptible to the
sense of smell.
Consternation reigned for a few minutes among the garrison, and
there was hurrying to and fro in the effort to learn the cause of the
burning near them.
The most terrifying cry that can strike the ears of the sailor or
passenger at sea is that of fire, but no such person could hold the
cry in greater dread than did the garrison, shut in the block-house
and surrounded by fierce American Indians.
The first supposition of Colonel Preston was that it came from the
roof, and springing upon a chair, he shoved up the trap-doors, one
after the other, to a dangerously high extent. But whatever might
have happened to the other portions of the structure, the roof was
certainly intact.
The next natural belief was that it was caused by the fire on the
hearth in the lower story, and Colonel Preston and Blossom Brown
made all haste down the ladder. Blossom, indeed, was too hasty, for
he missed one of the rounds and went bumping and tumbling to the
floor, where he set up a terrific cry, to which no attention was paid
amid the general excitement.
Here it is! Here's the fire! suddenly shouted Ned Preston, in a voice
which instantly brought the others around him.
Ned had done that wise thing to which we have all been urged many
a time and oft: he had followed his nose to the north-east corner
of the block-house, where the vapor was so dense that he knew the
cause must be very near.
It so happened that this very nook was the least guarded of all.
Looking directly downward through the holes cut in the projecting
floor, his eyes smarted so much from the ascending vapor that he
was forced to rub them vigorously that he might be able to see.
He could detect nothing but smoke for a minute or so, and that, of
course, made itself manifest to the sense of smell and touch rather
than to that of sight; but he soon observed, directly beneath his
feet, the red glow of fire itself. Then it was he uttered the startling
cry, which awoke Mrs. Preston and brought the rest around him.
Despite the care and skill with which the station had been guarded
by the garrison, all of whom possessed a certain experience in
frontier-life, the wily Wyandots had not only crept up to the block-
house itself without discovery, but they had brought sticks, had piled
them against the north-east corner, had set fire to them, and had
skulked away without being suspected by any one of the sentinels.
The fact seemed incredible, and yet there was the most convincing
evidence before or rather under their eyes. Jo Stinger gave
utterance to several emphatic expressions, as he made a dash for
the barrel of water, and he was entirely willing to admit that of all
idiots who had ever pretended to be a sensible man, he was the
chief.
But the danger was averted without difficulty. Two pails of water
were carefully poured through the openings in the floor of the
projecting roof, and every spark of fire was extinguished.
The water added to the density of the vapor. It set all the inmates
coughing and caused considerable annoyance; but it soon passed
away, and, after a time, the air became comparatively pure again.
Megill complimented the cunning of the Wyandots, but Jo insisted
that they had shown no special skill at all: it was the utter stupidity
of himself and friends who had allowed such a thing to be done
under their very noses.
And, if it hadn't been for that darkey there, said he, with all the
severity he could command, we wouldn't have found it out till this
old place was burned down, and we was scootin' across the clearin'
with the varmints crackin' away at us.
De gemman is right, assented Blossom, as he stopped rubbing the
bruises he received from tumbling through the ladder; you'll find
dat it's allers me dat wokes folks up when de lightnin' am gwine to
strike somewhar 'bout yar.
We won't deny you proper credit, said Colonel Preston, though Jo
is a little wild in his statements——
The unimportant remark of Colonel Preston was bisected by the
sharp report of Jo Stinger's rifle, followed on the instant by a
piercing shriek from some point near the block-house, within the
stockade.
I peppered him that time! exclaimed the veteran; it's all well
enough to crawl into yer winder, gather all the furniture together and
set fire to it, and then creep out agin, but when it comes to stealin'
the flint and tinder out of your pocket to do it with, then I'm going
to get mad.
When the scout had regained something of his usual good nature,
he explained that he had scarcely turned to look out, when he
actually saw two of the Wyandots walking directly toward the heap
of smoking brush, as though they intended to renew the fire. The
sight he considered one of the grossest insults ever offered his
intelligence, and he fired, without waiting till some one could
arrange to shoot the second red man.
With a daring that was scarcely to be wondered at, the warrior who
was unhurt threw his arm about his smitten companion and hurried
to one of the openings in the stockade, through which he made his
way.
This slight check would doubtless cause the red men to be more
guarded in their movements against the garrison.
It has teached them, said the hunter, with something of his grim
humor, that accidents may happen, and some of 'em mought get
hurt if they go to looking down the muzzles of our guns.
All noticed a rather curious change in the weather. The sky, which
had been quite clear early in the evening, was becoming overcast,
and the clouds hid the moon most of the time. It remained cold and
chilly, and more than one of the garrison wrapped a blanket around
him, while doing duty at the loopholes.
The cloudiness became so marked, after a brief while, that the view
was much shortened in every direction. Those at the front of the
block-house could not see the edge of the clearing, where the
Licking flowed calmly on its way to the Ohio. Those on the north saw
first the line of stockades dissolve into darkness, and then the well-
curb (consisting of a rickety crank and windlass), grew indistinct until
its outlines faded from sight.
The two cabins to the south loomed up in the gloom as the hulls of
ships are sometimes seen in the night-time at sea, but the blackness
was so profound, it became oppressive. Within the block-house,
where there was no light of any kind burning, it was like that of
ancient Egypt.
Colonel Preston could not avoid a certain nervousness over the
attempt of the Wyandots to fire the building, and, though it failed,
he half suspected it would be repeated.
He descended the ladder and made as careful an examination as
possible, but failed to find anything to add to his alarm and
misgiving. Everything seemed to be secure: the fastenings of the
doors were such that they might be considered almost as firm as the
solid logs themselves.
While he was thus engaged, he heard some one coming down the
ladder. Who's there? he asked in an undertone.
It's Jo—don't be scart.
I'm not scared; I only wanted to know who it is; what are you
after?
I'm going out-doors, right among the varmints.
What has put that idea in your head?
They've been playing their tricks on us long enough, and now I'm
going to show them that Jo Stinger knows a thing or two as well as
them.
Colonel Preston would have sought to dissuade the veteran from the
rash proceeding, had he not known that it was useless to do so.
CHAPTER XIV.
A MISHAP AND A SENTENCE.
Deerfoot the Shawanoe first pinned the rattlesnake to the earth with
the arrow which he threw with his deft left hand, then he flung the
reptile from his path and resumed his delicate and dangerous
attempt to creep past the three Wyandots who were lying against
the hank of the Licking, watching the block-house, now and then
firing a shot at the solid logs, as if to express their wishes respecting
the occupants of the building.
If the task was almost impossible at first, it soon became utterly so,
as the young Shawanoe was compelled to admit. The contour of the
bank was such that, after getting by the log, he would be compelled
to approach the warriors so close that he could touch them with his
outstretched hand. This would have answered at night, when they
were asleep, but he might as well have attempted to lift himself
through the air as to do it under the circumstances we have
described.
Deerfoot never despaired nor gave up so long as he held space in
which to move. He immediately repeated the retrograde motion he
had used when confronted by the venomous serpent, his wish now
being to return to the spot from which he fired the arrow.
The ventures made satisfied him that he had but one chance in a
thousand of escaping capture and death. He could not move to the
right nor left: it would have been certain destruction to show himself
on the clearing, and equally fatal to attempt to use the shallow
Licking behind him.
There was a remote possibility that the arrowy messenger which he
had sent from his bow had not been noticed by any of the besieging
Wyandots, and that, as considerable time had already passed, none
of them would come over to where he was to inquire into the matter.
If they would keep as far away from him as they were when his
friend Ned Preston started on his desperate run for the block-house,
of course he would be safe. He could wait where he was, lying flat
on the ground, through all the long hours of the day, until the
mantle of night should give him the chance for which he sighed.
Ah, but for one hour of darkness! His flight from the point of danger
would be but pastime.
The single chance in a thousand was that which we have named:
the remote possibility that none of the Wyandots would come any
nearer to where he was hugging the river bank.
For a full hour Deerfoot was in suspense, with a fluttering hope that
it might be his fortune to wait until the sun should climb to the
zenith and sink in the west; for, young as was the Shawanoe, he had
learned the great truth that in the affairs of this world no push or
energy will win, where the virtue of patience is lacking. Many a time
a single move, born of impatience, has brought irretrievable disaster,
where success otherwise was certain.
As the Shawanoe lay against the bank, looking across the clearing
toward the block-house, he recalled that message which, instead of
being spoken, as were all that he knew of, was carried on the arrow
he sent through the window. If he but understood how to place
those words on paper or on a dried leaf even, he would send
another missive to Colonel Preston, saying that, inasmuch as he was
shut in from all hope of escape, he would make the effort to run
across the open space, as did his friends before him.
But the thing was impossible: the door of the block-house was
fastened, and if Deerfoot should start, he would reach it, if he
reached it at all, before the Colonel could draw the first bolt. Even if
the Shawanoe youth should succeed in making the point, which was
extremely doubtful, now that the Wyandots were fully awake, the
inevitable few seconds' halt there must prove fatal.
The short conversation which he had overheard, convinced him of
the sentiments of Waughtauk and his warriors toward him, and led
the young Shawanoe to determine on an effort to extricate himself.
It is the very daring of such a scheme which sometimes succeeds,
and he put it in execution without delay.
Instead of crouching to the ground, as he had been doing, he now
rose upright and moved down the bank, in the direction of the three
Wyandots who first turned him back. They were in their old position,
and he had gone only a few steps when one of them turned his
head and saw the youthful warrior approaching. He uttered a
surprised Hooh! and the others looked around at the figure, as
they might have done had it been an apparition.
The scheme of Deerfoot was to attempt the part of a friend of the
Wyandots and consequently that of an enemy of the white race. He
acted as if without thought of being anything else, and as though he
never dreamed there was a suspicion of his loyalty.
At a leisurely gait he walked toward the three Indians, holding his
head down somewhat, and glancing sideways through the scattered
bushes at the top of the bank, as though afraid of a shot from the
garrison.
Have any of my brethren of the Wyandots been harmed by the
dogs of the Yenghese? asked Deerfoot in the high-flown language
peculiar to his people.
The eyes of Deerfoot must have been closed not to see Oo-oo-mat-
ah lying on the ground before his eyes.
This was an allusion to the warrior who made the mistake of
stopping Ned Preston when on his way to the block-house.
Deerfoot saw Oo-oo-mat-ah fall, as falls the brave warrior fighting
his foe; the eyes of Deerfoot were wet with tears, when his brave
Wyandot brother fell.
Strictly speaking, a microscope would not have detected the first
grain of truth in this grandiloquent declaration, which was
accompanied by a gesture as though the audacious young
Shawanoe was on the point of breaking into sobs again.
The apparent sincerity of Deerfoot's grief seemed to disarm the
Wyandots for the moment, which was precisely what the young
Shawanoe was seeking to do.
Having mastered his sorrow, he started down the river bank on the
same slow gait, glancing sideways at the block-house as though he
feared a shot from that point. But the Indians were not to be baffled
in that fashion: their estimate of the daring Deerfoot was the same
as Waughtauk's.
Without any further dissembling, one of the Wyandots, a lithe
sinewy brave, fully six feet in height, bounded in front of the
Shawanoe, and grasping his knife, said with flashing eyes—
Deerfoot is a dog! he is a traitor; he is a serpent that has two
tongues! he shall die!
The others stood a few feet behind the couple and watched the
singular encounter.
The Wyandot, with the threatening words in his mouth, leaped
toward Deerfoot, striking a vicious blow with his knife. It was a
thrust which would have ended the career of the youthful brave, had
it reached its mark.
But Deerfoot dodged it easily, and, without attempting to return it,
shot under the infuriated arm and sped down the river bank with all
the wonderful speed at his command.
The slight disturbance had brought the other three Wyandots to the
spot, and it would have been an easy thing to shoot the fugitive as
he fled. But among the new arrivals were those who knew it was the
wish of Waughtauk that Deerfoot should be taken prisoner, that he
might be put to the death all traitors deserved.
Instead of firing their guns therefore, the whole six broke into a run,
each exerting himself to the utmost to overtake the fleet-footed
youth, who was no match for any one of them in a hand-to-hand
conflict, or a trial of strength.
Deerfoot, by his sharp strategy, had thrown the whole party behind
him and had gained two or three yards' start: he felt that, if he could
not hold this against the fleetest of the Wyandots, then he deserved
to die the death of a dog.
The bushes, undergrowth and logs which obstructed his path, were
as troublesome to his pursuers as to himself, and he bounded over
them like a mountain chamois, leaping from crag to crag.
There can be no question that, if this contest had been decided by
the relative swiftness of foot on the part of pursuer and pursued, the
latter would have escaped without difficulty, but, as if the fates were
against the brave Shawanoe, his matchless limbs were no more than
fairly going, when two Wyandot warriors appeared directly in front in
such a position that it was impossible to avoid them.
Deerfoot made a wrenching turn to the right, as if he meant to flank
them, but he stumbled, nearly recovered himself—then fell with
great violence, turning a complete somersault from his own
momentum, and then rose to his feet, as the Indians in front and
rear closed around him.
He uttered a suppressed exclamation of pain, limped a couple of
steps, and then grasped a tree to sustain himself. He seemed to
have sprained his ankle badly and could bear his weight only on one
foot. No more disastrous termination of the flight could have
followed.
The Wyandots gathered about the poor fugitive with many
expressions of pleasure, for the pursuers had just been forced to
believe the young brave was likely to escape them, and it was a
delightful surprise when the two appeared in front and headed him
off.
Besides, a man with a sprained ankle is the last one in the world to
indulge in a foot-race, and they felt secure, therefore, in holding
their prisoner.
Dog! traitor! serpent with the forked tongue! base son of a brave
chieftain! warrior with the white heart!
These were a few of the expressions applied to the captive, who
made no answer. In fact, he seemed to be occupied exclusively with
his ankle, for, while they were berating him, he stooped over and
rubbed it with both hands, flinging his long bow aside, as though it
could be of no further use to him.
The epithets were enough to blister the skin of the ordinary
American Indian, and there came a sudden flush to the dusky face
of the youthful brave, when he heard himself called the base son of
a brave chieftain. But he had learned to conquer himself, and he
uttered not a word in response.
One of the Wyandots picked up the bow which the captive had
thrown aside, and examined it with much curiosity. There was no
attempt to disarm him of his knife and tomahawk, for had he not
been disabled by the sprained ankle, he would have been looked
upon as an insignificant prisoner, against whom it was cowardly to
take any precautions. In fact, to remove his weapons that remained
would have been giving dignity to one too contemptible to deserve
the treatment of an ordinary captive.
The aborigines, like all barbarians and many civilized people, are
cruel by nature. The Wyandots, who had secured Deerfoot, refrained
from killing him for no other reason than that it would have been
greater mercy than they were willing to show to one whom they
held in such detestation.
As it was, two of them struck him and repeated the taunting names
uttered when they first laid hands on him. Deerfoot still made no
answer, though his dark eyes flashed with a dangerous light when
he looked in the faces of the couple who inflicted the indignity.
He asked them quietly to help him along, but, with another taunt,
the whole eight refused. The one who had smote him twice and who
held his bow, placed his hand against the shoulder of the youth and
gave him a violent shove. Deerfoot went several paces and then fell
on his knees and hands with a gasp of pain severe enough to make
him faint.
The others laughed, as he painfully labored to his feet. He then
asked that he might have his bow to use as a cane; but even this
was refused. Finding nothing in the way of assistance was to be
obtained, his proud spirit closed his lips, and he limped forward,
scarcely touching the great toe of the injured limb to the ground.
The brief flight and pursuit had led the parties so far down the
Licking that they were out of sight of the block-house, quite a
stretch of forest intervening; but it had also taken them nearer the
headquarters, as they may be called, of Waughtauk, leader of the
Wyandots besieging Fort Bridgman.
This sachem showed, in a lesser way, something of the military
prowess of Pontiac, chief of the Chippewas, King Philip of Pokanoket,
and Tecumseh, who belonged to the same tribe with Deerfoot.
Although his entire force numbered a little more than fifty, yet he
had disposed them with such skill around the block-house that the
most experienced of scouts failed to make his way through the lines.
Waughtauk was well convinced of the treachery of the Shawanoe,
and there was no living man for whom he would have given a
greater amount of wampum.
The eyes of the chieftain sparkled with pleasure when the youthful
warrior came limping painfully toward him, escorted by the
Wyandots, as though they feared that, despite his disabled
condition, he might dart off with the speed of the wind.
Waughtauk rose from the fallen tree on which he had been seated
among his warriors, and advanced a step or two to meet the party
as it approached.
Dog! base son of the noble chief Allomaug! youth with the red face
and the white heart! serpent with the forked tongue! the Great Spirit
has given it to Waughtauk that he should inflict on you the death
that is fitting all such.
These were fierce words, but the absolute fury of manner which
marked their utterance showed how burning was the hate of the
Wyandot leader and his warriors. They knew that this youth had
been honored and trusted as no one of his years had ever been
honored and trusted by his tribe, and his treachery was therefore all
the deeper, and deserving of the worst punishment that could be
devised.
Deerfoot, standing on one foot, with his hand grasping a sapling at
his side, looked calmly in the face of the infuriated leader, and in his
low, musical voice, said—
When Deerfoot was sick almost to death, his white brother took the
place of the father and mother who went to the happy hunting
grounds long ago; Deerfoot would have been a dog, had he not
helped his white brother through the forest, when the bear and the
panther and the Wyandot were in his path.
This defence, instead of soothing the chieftain, seemed to arouse all
the ferocity of his nature. His face fairly shone with flame through
his ochre and paint; and striding toward the prisoner, he raised his
hand with such fierceness that the muscles of the arm rose in knots
and the veins stood out in ridges on temple and forehead.
As he threw his fist aloft and was on the point of smiting Deerfoot to
the earth, the latter straightened up with his native dignity, and, still
grasping the sapling and still standing on one foot, looked him in the
eye.
It was as if a great lion-tamer, hearing the stealthy approach of the
wild beast, had suddenly turned and confronted him.
Waughtauk paused at the moment, his fist was in the air directly
over the head of Deerfoot, glowering down upon him with an
expression demoniac in its hate. He breathed hard and fast for a few
seconds and then retreated without striking the impending blow.
But it must not be understood that it was the defiant look of the
captive which checked the chief. It produced no such effect, nor was
it intended to do so: it simply meant on the part of Deerfoot that he
expected indignity and torture and death, and he could bear them as
unflinchingly as Waughtauk himself.
As for the chieftain, he reflected that a little counsel and consultation
were needed to fix upon the best method of putting this tormentor
out of the way. If Waughtauk should allow his own passion to master
him, the anticipated enjoyment would be lost.
While Deerfoot, therefore, retained his grasp on the sapling, that he
might be supported from falling, Waughtauk called about him his
cabinet, as it may be termed, and began the consideration of the
best means of punishing the traitor.
The captive could hear all the discussion, and, it need not be said,
he listened with much more interest than he appeared to feel.
It would be revolting to detail the schemes advocated. If there is any
one direction in which the human mind is marvelous in its ingenuity,
it is in the single one of devising means of making other beings
miserable. Some of the proposals of the Wyandots were worthy of
Nana Sahib, of Bithoor, but they were rejected one after the other,
as falling a little short of the requirements of the leader.
There was one fact which did not escape the watchful eye and ear of
the prisoner. The Wyandot who struck him twice, and who had taken
charge of his bow, as a trophy belonging specially to himself, was
the foremost in proposing the most cruel schemes. The look which
Deerfoot cast upon him said plainly—
I would give the world for a chance to settle with you before I
suffer death!
Suddenly a thought seemed to seize Waughtauk like an inspiration.
Rising to his feet, he held up his hand for his warriors to listen:
Deerfoot is a swift runner; he has overtaken the fleeing horse and
leaped upon his back; he shall be placed in the Long Clearing; he
shall be given a start, and the swiftest Wyandot warriors shall be
placed in line on the edge of the Long Clearing; they shall start
together, and the scalp of Deerfoot shall belong to him who first
overtakes him.
This scheme, after all, was merciful when compared with many that
were proposed; but the staking of a man's life on his fleetness, when
entirely unable to run, is an idea worthy of an American Indian.
CHAPTER XV.
AN UNEXPECTED VISITOR.
Jo Stinger had decided to venture out from the block-house, at a
time when the Wyandots were on every side, and when many of
them were within the stockade and close to the building itself It was
a perilous act, but the veteran had what he deemed good grounds
for undertaking it.
In the first place, the darkness had deepened to that extent, within
the last few hours, that he believed he could move about without
being suspected: he was confident indeed that he could stay out as
long as he chose and return in safety.
He still felt chagrined over the audacity of the Wyandots, which
came so near success, and longed to turn the tables upon them.
But Jo Stinger had too much sense to leave the garrison and run into
great peril without the prospect of accomplishing some good
thereby. He knew the Wyandots were completing preparations to
burn the block-house. He believed it would be attempted before
morning, and, if not detected by him, would succeed. He had strong
hope that, by venturing outside, he could learn the nature of the
plan against which it would therefore be possible to make some
preparation.
Colonel Preston was not without misgiving when he drew the
ponderous bolts, but he gave no expression to his thoughts. All was
blank darkness, but, when the door was drawn inward, he felt
several cold specks on his hand, from which he knew it was
snowing.
The flakes were very fine and few, but they were likely to increase
before morning, by which time the ground might be covered.
When shall I look for your return? asked the Colonel, but, to his
surprise, there was no answer. Jo had moved away, and was gone
without exchanging another word with the commandant.
The latter refastened the door at once. He could not but regard the
action of the most valuable man of his garrison as without excuse:
at the same time he reflected that his own title could not have been
more empty, for no one of the three men accepted his orders when
they conflicted with his personal views.
In the meantime Jo Stinger, finding himself on the outside of the
block-house, was in a situation where every sense needed to be on
the alert, and none knew it better than he.
The door which Colonel Preston opened was the front one, being
that which the scout passed through the previous night, and which
opened on the clearing along the river. He was afraid that, if he
emerged from the other entrance, he would step among the
Wyandots and be recognized before he could take his bearings.
But Jo felt that he had entered on an enterprise in which the
chances were against success, and in which he could accomplish
nothing except by the greatest risk to himself. The listening Colonel
fancied he heard the sound of his stealthy footstep, as the hunter
moved from the door of the block-house. He listened a few minutes
longer, but all was still except the soft sifting of the snow against the
door, like the finest particles of sand and dust filtering through the
tree-tops.
The Colonel passed to the narrow window at the side and looked
out. It had become like the blackness of darkness, and several of the
whirling snow-flakes struck his face.
The Wyandots are concocting some mischief, and there's no telling
what shape it will take until it comes. I don't believe Jo will do
anything that will help us.
And with a sigh the speaker climbed the ladder again and told his
friends how rash the pioneer had been.
I wouldn't have allowed him to go, said Ned Preston.
There's no stopping him when he has made up his mind to do
anything.
Why didn't you took him by de collar, asked Blossom Brown, and
slam him down on de floor? Dat's what I'd done, and, if he'd said
anyting, den I'd took him by de heels and banged his head agin de
door till he'd be glad to sot down and behave himself.
Jo is a skilled frontiersman, said the Colonel, who felt that it was
time he rallied to the defence of the scout; he has tramped
hundreds of miles with Simon Kenton and Daniel Boone, and, if his
gun hadn't flashed fire one dark night last winter, he would have
ended the career of Simon Girty.
How was that?
Simon Girty and Kenton served together as spies in Dunmore's
expedition in 1774, and up to that time Girty was a good soldier,
who risked much for his country. He was badly used by General
Lewis, and became the greatest scourge we have had on the
frontier. I don't suppose he ever has such an emotion as pity in his
breast, and there is no cruelty that he wouldn't be glad to inflict on
the whites. He and Jo know and hate each other worse than poison.
Last winter, Jo crept into one of the Shawanoe towns one dark night,
and when only a hundred feet away, aimed straight at Girty, who sat
on a log, smoking his pipe, and talking to several warriors. Jo was so
angered when his gun flashed in the pan, that he threw it upon the
ground and barely saved himself by dashing out of the camp at the
top of his speed. Jo has been in a great many perilous situations,
added Colonel Preston, and he can tell of many a thrilling encounter
in the depths of the silent forest and on the banks of the lonely
streams, where no other human eyes saw him and his foe.
No doubt of all that, replied Ned, who knew that he was speaking
the sentiments of his uncle, but it seems to me he is running a
great deal more risk than he ought to.
I agree with you, but we have been greatly favored so far, and we
will continue to hope for the best.
The long spell of quiet which had followed the attempt to fire the
block-house, permitted the children to sleep, and their mother, upon
the urgency of her husband, had lain down beside them and was
sinking into a refreshing slumber.
Megill and Turner kept their places at the loopholes, watching for the
signs of danger with as vigilant interest as though it was the first
hour of the alarm. They were inclined to commend the course of Jo
Stinger, despite the great peril involved.
The Wyandots, beyond question, were perfecting some scheme of
attack, which most likely could be foiled only by previous knowledge
on the part of the garrison. The profound darkness and the skill of
the hunter would enable him to do all that could be done by any
one, under the circumstances.
There came seconds, and sometimes minutes, when no one spoke,
and the silence within the block-house was so profound that the
faint sifting of the snow on the roof was heard. Then an eddy of
wind would whirl some of the sand-like particles through the
loopholes into the eyes and faces of those who were peering out.
Men and boys gathered their blankets closer about their shoulders,
and set their muskets down beside them, where they could be
caught up the instant needed, while they carefully warmed their
benumbed fingers by rubbing and striking the palms together.
All senses were concentrated in the one of listening, for no other
faculty was of avail at such a time. Nerves were strung to the
highest point, because there was not one who did not feel certain
they were on the eve of events which were to decide the fate of the
little company huddled together in Fort Bridgman.
This stillness was at its profoundest depth, the soft rustling of the
snowflakes seemed to have ceased, and not a whisper was on the
lips of one of the garrison, when there suddenly rang out on the
night a shriek like that of some strong man caught in the crush of
death. It was so piercing that it seemed almost to sound from the
center of the room, and certainly must have been very close to the
block-house itself.
That was the voice of Jo! said Colonel Preston, in a terrified
undertone, after a minute's silence; he has met his fate.
You are mistaken, Megill hastened to say; I have been with Jo too
often, and I know his voice too well to be deceived.
It sounded marvelously like his.
It did not to me, though it may have been so to you.
If it was not Jo, then it must have been one of the Wyandots.
That follows, as a matter of course; in spite of all of Jo's care, he
has run against one of their men, or one of them has run against
him. The only way to settle it then was in the hurricane order, and Jo
has done it that promptly that the other has just had time to work in
a first-class yell like that.
I'm greatly relieved to hear you take such a view, said Colonel
Preston, who, like the rest, was most agreeably disappointed to hear
Megill speak so confidently, his brother-in-law adding his testimony
to the same effect.
Directly after that shriek, said Turner, I'm sure there was the
tramping of feet, as if some one was running very fast: it passed
under the stockade and out toward the well.
I heard the footsteps too, added Ned Preston.
So did I, chimed in Blossom Brown, feeling it his duty to say
something to help the others along; but I'm suah dat de footsteps
dat I heerd war on de roof. Some onrespectful Wyamdot hab
crawled up dar, and I bet am lookin' down de chimbley dis minute.
It seems to me, observed Ned to his uncle, that Jo will want to
come back pretty soon.
I think so too, replied his uncle, I will go down-stairs and wait for
him.
With these words he descended the rounds of the ladder and moved
softly across the lower floor to the door, where he paused, with his
hands on one of the heavy bars which held the structure in place.
While crossing the room he looked toward the fire-place. Among the
ashes he caught the sullen red of a single point of fire, like the
glowering eye of some ogre, watching him in the darkness.
Beside the huge latch, there were three ponderous pieces of timber
which spanned the inner side of the door, the ends dropping into
massive sockets strong enough to hold the puncheon slabs against
prodigious pressure from the outside.
Colonel Preston carefully lifted the upper one out of place and then
did the same with the lowest. Then he placed his hand on the
middle bar and held his ear close to the jamb, so that he might
catch the first signal from the scout, whose return was due every
minute.
The listening ear caught the silken sifting of the particles of snow,
which insinuated themselves into and through the smallest crevices,
and a slight shiver passed through the frame of the pioneer, who
had thrown his blanket off his shoulders so that he might have his
arms free to use the instant it should become necessary.
Colonel Preston had stood thus only a few minutes, when he fancied
he heard some one on the outside. The noise was very slight and
much as if a dog was scratching with his paw. Knowing that wood is
a better conductor of sound than air, he pressed his ear against the
door.
To his astonishment he then heard nothing except the snowflakes,
which sounded like the tapping of multitudinous fairies, as they
romped back and forth and up and down the door.
That's strange, thought he, after listening a few minutes; there's
something unusual out there, and I don't know whether it is Jo or
not. I'm afraid the poor fellow has been hurt and is afraid to make
himself known.
The words were yet in his mouth, when he caught a faint tapping
outside, as if made by the bill of a bird.
That's Jo! he exclaimed, immediately raising the end of the middle
bar from its socket; he must be hurt, or he is afraid to signal me,
lest he be recognized.
At the moment the fastenings were removed, and Colonel Preston
was about drawing the door inward, he stayed his hand, prompted
so to do by the faintest suspicion that something was amiss.
Jo! is that you? he asked in a whisper.
Sh! Sh!
He caught the warning, almost inaudible as it was, and instantly
drew the door inward six or eight inches.
Quick, Jo! the way is open!
Even then a vague suspicion that all was not right led Colonel
Preston to step back a single step, and, though he had no weapons,
he clenched his fist and braced himself for an assault which he did
not expect.
The darkness was too complete for him to see anything, while the
faint ember, smouldering in the fire-place, threw no reflection on the
figure of the pioneer, so as to reveal his precise position.
It was a providential instinct that led Colonel Preston to take this
precaution, for as he recoiled some one struck a venomous blow at
him with a knife, under the supposition that he was standing on the
same spot where he stood at the moment the door was opened. Had
he been there, he would have been killed with the suddenness
almost of the lightning stroke.
The pioneer could not see, and he heard nothing except a sudden
expiration of the breath, which accompanied the fierce blow into
vacancy, but he knew like a flash that, instead of Jo, it was a
Wyandot Indian who was in the act of making a rush to open the
way for the other warriors behind him.
The right fist shot forward, with all the power Colonel Preston could
throw into it. He was an athlete and a good boxer. As he struck, he
hurled his body with the fist, so that all the momentum possible
went with it. Fortunately for the pioneer the blow landed on the
forehead of the unprepared warrior, throwing him violently backward
against his comrades, who were in the act of rushing forward to
follow in his wake.
But for them he would have been flung prostrate full a dozen feet
distant.
The instant the blow was delivered, Colonel Preston sprang back,
shoved the door to and caught up the middle bar. At such crises it
seems as if fate throws every obstruction in the way, and his agony
was indescribable, while desperately trying to get the bar in place.
Only a few seconds were occupied in doing so, but those seconds
were frightful ones to him. He was sure the entire war party would
swarm into the block-house, before he could shut them out.
The Indians, who were forced backward by the impetus of the
smitten leader, understood the need of haste. They knew that,
unless they recovered their ground immediately, their golden
opportunity was gone.
Suppressing all outcry, for they had no wish to draw the fire from
the loopholes above, they precipitated themselves against the door,
as though each one was the carved head of a catapult, equal to the
task of bursting through any obstacle in its path.
Thank Heaven! In the very nick of time Colonel Preston got the
middle bar into its socket. This held the door so securely that the
other two were added without trouble, and he then breathed freely.
Drops of cold perspiration stood on his forehead, and he felt so faint
that he groped about for a stool, on which he dropped until he could
recover.
CHAPTER XVI.
OUT-DOORS ON A DARK NIGHT.
In the meantime Jo Stinger, the veteran frontiersman, had not found
the plain sailing which he anticipated.
It will be remembered that he passed out upon the clearing in front
of the block-house, because he feared that, if he entered the yard
inclosed by the stockade, he would find himself among the
Wyandots, who would be quick to detect his identity.
His presence immediately in front of the structure would also draw
attention to himself, and he therefore glided away until he was fully
a hundred feet distant, when he paused close to the western
pickets.
Looking behind him, he could not see the outlines of the building
which he had just left. For the sake of safety Colonel Preston
allowed no light burning within the block-house, which itself was like
a solid bank of darkness.
It would be easy enough now for me to make my way to Wild
Oaks, reflected Stinger; for, when the night is like this, three
hundred Indians could not surround the old place close enough to
catch any one crawling through. But it is no use for me to strike out
for the Ohio now, for the boys could not get here soon enough to
affect the result one way or the other. Long before that the varmints
will wind up this bus'ness, either by going away, or by cleaning out
the whole concern.
Jo Stinger unquestionably was right in this conclusion, but he
possessed a strong faith that Colonel Preston and the rest of them in
the block-house would be able to pull through, if they displayed the
vigilance and care which it was easy to display: this faith explains
how it was the frontiersman had ventured upon what was, beyond
all doubt, a most perilous enterprise.
Jo, from some cause or other which he could not explain, suspected
the Wyandots were collecting near the well, and he began working
his way in that direction.
It was unnecessary to scale the stockade, and he therefore moved
along the western side, until he reached the angle, when he turned
to the right and felt his way parallel with the northern line of pickets.
Up to this time he had not caught sight or sound to show that an
Indian was within a mile of him. The fine particles of snow made
themselves manifest only by the icy, needle-like points which
touched his face and hands, as he groped along. He carried his
faithful rifle in his left hand, and his right rested on the haft of his
long hunting-knife at his waist. His head was thrust forward, while
he peered to the right and left, advancing with as much care as if he
were entering a hostile camp on a moonlight night, when the
overturning of a leaf is enough to awaken a score of sleeping red
men.
A moment after passing the corner of the stockade something
touched his elbow. He knew on the instant that it was one of the
Wyandots. In the darkness they had come thus close without either
suspecting the presence of the other.
Hooh! my brother is like Deerfoot, the dog of a Shawanoe.
This was uttered in the Wyandot tongue, and the scout understood
the words, but he did not dare reply. He could not speak well
enough to deceive the warrior, who evidently supposed he was one
of his own people.
But there was the single exclamation which he could imitate to
perfection, and he did so as he drew his knife.
Hooh! he responded, moving on without the slightest halt. The
response seemed satisfactory to the Wyandot, but could Jo have
seen the actions of the Indian immediately after, he would have felt
anything but secure on that point.
The brave stood a minute or so, looking in the direction taken by the
other, and then, as if suspicious that all was not what it seemed, he
followed after the figure which had vanished so quickly.
I would give a good deal if I but knowed what he meant by
speaking of Deerfoot as he did, said Jo to himself, but I didn't dare
ask him to give the partic'lars. I make no doubt they've catched the
Shawanoe and scalped him long ago.
Remembering the openings which he had seen in the stockade
before the darkness became so intense, Jo reached out his right
hand and run it along the pickets, so as not to miss them.
He had gone only a little way, when his touch revealed the spot
where a couple had been removed, and there was room for him to
force his body through.
Jo was of a spare figure, and, with little difficulty, he entered the
space inclosed by the stockade. He now knew his surroundings and
bearings, as well as though it were high noon, and began making his
way with great stealth in the direction of the well standing near the
middle of the yard.
While he was doing this, the Wyandot with whom he had exchanged
salutations was stealing after him: it was the old case of the hunter
going to hunt the tiger, and soon finding the tiger was hunting him.
The task of the Wyandot, however, for the time, was a more delicate
one than was the white man's, for the dusky pursuer had lost sight
of his foe (if indeed it can be said he had ever caught a view of
him), instantly after the brief salutation between them.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Accelerating Discovery Mining Unstructured Information For Hypothesis Generat...
PDF
Accelerating Discovery Mining Unstructured Information For Hypothesis Generat...
PDF
Computational Intelligent Data Analysis For Sustainable Development Ting Yu
PDF
Petascale Analytics Largescale Machine Learning In The Earth Sciences 1st Edi...
PDF
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
PDF
Serviceoriented Distributed Knowledge Discovery 1st Edition Domenico Talia
PDF
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
PDF
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
Accelerating Discovery Mining Unstructured Information For Hypothesis Generat...
Accelerating Discovery Mining Unstructured Information For Hypothesis Generat...
Computational Intelligent Data Analysis For Sustainable Development Ting Yu
Petascale Analytics Largescale Machine Learning In The Earth Sciences 1st Edi...
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
Serviceoriented Distributed Knowledge Discovery 1st Edition Domenico Talia
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal

Similar to Accelerating Discovery Mining Unstructured Information For Hypothesis Generation Spangler (20)

PDF
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
PDF
Healthcare Data Analytics 1st Edition Chandan K. Reddy
PDF
Healthcare Data Analytics 1st Edition Chandan K. Reddy
PDF
Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia
PDF
Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia
PDF
Biological Data Mining Chapman Hall Crc Data Mining And Knowledge Discovery S...
PDF
Biological Data Mining Chapman Hall Crc Data Mining and Knowledge Discovery S...
PDF
Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia
PDF
Next generation of data mining 1st Edition Hillol Kargupta
PDF
Privacyaware Knowledge Discovery Novel Applications And New Techniques France...
PDF
Next generation of data mining 1st Edition Hillol Kargupta
PDF
Computational Methods Of Feature Selection Huan Liu Hiroshi Motoda
PDF
Data Driven Decision Making Using Analytics Computational Intelligence Techni...
PDF
Evolutionary Multiobjective System Designtheory And Applications 1st Edition ...
PDF
Frontiers In Data Science 1st Edition Matthias Dehmer Frank Emmertstreib
PDF
Methods And Applications Of Autonomous Experimentation Marcus M Noack Daniela...
PDF
Handbook Of Parallel Computing Models Algorithms And Applications Chapman Hal...
PDF
Big Data Analytics In Future Power Systems Ahmed F Zobaa Trevor J Bihl
PDF
Exascale Scientific Applications Scalability and Performance Portability 1st ...
PDF
Hybrid Rough Sets And Applications In Uncertain Decisionmaking Systems Evalua...
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
Healthcare Data Analytics 1st Edition Chandan K. Reddy
Healthcare Data Analytics 1st Edition Chandan K. Reddy
Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia
Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia
Biological Data Mining Chapman Hall Crc Data Mining And Knowledge Discovery S...
Biological Data Mining Chapman Hall Crc Data Mining and Knowledge Discovery S...
Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia
Next generation of data mining 1st Edition Hillol Kargupta
Privacyaware Knowledge Discovery Novel Applications And New Techniques France...
Next generation of data mining 1st Edition Hillol Kargupta
Computational Methods Of Feature Selection Huan Liu Hiroshi Motoda
Data Driven Decision Making Using Analytics Computational Intelligence Techni...
Evolutionary Multiobjective System Designtheory And Applications 1st Edition ...
Frontiers In Data Science 1st Edition Matthias Dehmer Frank Emmertstreib
Methods And Applications Of Autonomous Experimentation Marcus M Noack Daniela...
Handbook Of Parallel Computing Models Algorithms And Applications Chapman Hal...
Big Data Analytics In Future Power Systems Ahmed F Zobaa Trevor J Bihl
Exascale Scientific Applications Scalability and Performance Portability 1st ...
Hybrid Rough Sets And Applications In Uncertain Decisionmaking Systems Evalua...
Ad

More from gerykfanabe (7)

PDF
Accelerating Discovery Mining Unstructured Information For Hypothesis Generat...
PDF
Absolute Music And The Construction Of Meaning Daniel Chua
PDF
About Psychology Essays At The Crossroads Of History Theory And Philosophy Kral
PDF
Absolute Music And The Construction Of Meaning Daniel Chua
PDF
About Psychology Essays At The Crossroads Of History Theory And Philosophy Kral
PDF
About Psychology Essays At The Crossroads Of History Theory And Philosophy Kral
PDF
Absolute Music And The Construction Of Meaning Daniel Chua
Accelerating Discovery Mining Unstructured Information For Hypothesis Generat...
Absolute Music And The Construction Of Meaning Daniel Chua
About Psychology Essays At The Crossroads Of History Theory And Philosophy Kral
Absolute Music And The Construction Of Meaning Daniel Chua
About Psychology Essays At The Crossroads Of History Theory And Philosophy Kral
About Psychology Essays At The Crossroads Of History Theory And Philosophy Kral
Absolute Music And The Construction Of Meaning Daniel Chua
Ad

Recently uploaded (20)

PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
RMMM.pdf make it easy to upload and study
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Presentation on HIE in infants and its manifestations
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
01-Introduction-to-Information-Management.pdf
PDF
Complications of Minimal Access Surgery at WLH
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Microbial diseases, their pathogenesis and prophylaxis
human mycosis Human fungal infections are called human mycosis..pptx
Microbial disease of the cardiovascular and lymphatic systems
Chinmaya Tiranga quiz Grand Finale.pdf
VCE English Exam - Section C Student Revision Booklet
RMMM.pdf make it easy to upload and study
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Presentation on HIE in infants and its manifestations
A systematic review of self-coping strategies used by university students to ...
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
01-Introduction-to-Information-Management.pdf
Complications of Minimal Access Surgery at WLH
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Final Presentation General Medicine 03-08-2024.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Cell Structure & Organelles in detailed.
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx

Accelerating Discovery Mining Unstructured Information For Hypothesis Generation Spangler

  • 1. Accelerating Discovery Mining Unstructured Information For Hypothesis Generation Spangler download https://ebookbell.com/product/accelerating-discovery-mining- unstructured-information-for-hypothesis-generation- spangler-5261124 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Accelerating Discovery Mining Unstructured Information For Hypothesis Generation Scott Spangler https://ebookbell.com/product/accelerating-discovery-mining- unstructured-information-for-hypothesis-generation-scott- spangler-5241508 Knowledge Guided Machine Learning Accelerating Discovery Using Scientific Knowledge And Data Taylor Francis Group https://ebookbell.com/product/knowledge-guided-machine-learning- accelerating-discovery-using-scientific-knowledge-and-data-taylor- francis-group-43676354 Accelerating The Discovery Of New Dielectric Properties In Polymer Insulation 1st Edition Boxue Du https://ebookbell.com/product/accelerating-the-discovery-of-new- dielectric-properties-in-polymer-insulation-1st-edition-boxue- du-6837250 Accelerating Economic Growth Lessons From 200000 Years Of Technological Progress And Human Development Jakub Growiec https://ebookbell.com/product/accelerating-economic-growth-lessons- from-200000-years-of-technological-progress-and-human-development- jakub-growiec-46081114
  • 3. Accelerating Digital Transformation 10 Years Of Software Center Jan Bosch https://ebookbell.com/product/accelerating-digital- transformation-10-years-of-software-center-jan-bosch-46706442 Accelerating Organisation Culture Change Innovation Through Digital Tools Jaclyn Lee https://ebookbell.com/product/accelerating-organisation-culture- change-innovation-through-digital-tools-jaclyn-lee-47134668 Accelerating Digital Transformation Of Smes Pohsun Seow Clarence Goh https://ebookbell.com/product/accelerating-digital-transformation-of- smes-pohsun-seow-clarence-goh-49474098 Accelerating Learning Recovery For All Students Margaret Vaughn https://ebookbell.com/product/accelerating-learning-recovery-for-all- students-margaret-vaughn-49998496 Accelerating Performance How Organizations Can Mobilize Execute And Transform With Agility 1st Edition Colin Price https://ebookbell.com/product/accelerating-performance-how- organizations-can-mobilize-execute-and-transform-with-agility-1st- edition-colin-price-50583728
  • 6. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series SERIES EDITOR Vipin Kumar University of Minnesota Department of Computer Science and Engineering Minneapolis, Minnesota, U.S.A. AIMS AND SCOPE This series aims to capture new developments and applications in data mining and knowledge discovery, while summarizing the computational tools and techniques useful in data analysis.This series encourages the integration of mathematical, statistical, and computational methods and techniques through the publication of a broad range of textbooks, reference works, and hand- books. The inclusion of concrete examples and applications is highly encouraged. The scope of the series includes, but is not limited to, titles in the areas of data mining and knowledge discovery methods and applications, modeling, algorithms, theory and foundations, data and knowledge visualization, data mining systems and tools, and privacy and security issues. PUBLISHED TITLES ACCELERATING DISCOVERY: MINING UNSTRUCTURED INFORMATION FOR HYPOTHESIS GENERATION Scott Spangler ADVANCES IN MACHINE LEARNING AND DATA MINING FOR ASTRONOMY Michael J. Way, Jeffrey D. Scargle, Kamal M. Ali, and Ashok N. Srivastava BIOLOGICAL DATA MINING Jake Y. Chen and Stefano Lonardi COMPUTATIONAL BUSINESS ANALYTICS Subrata Das COMPUTATIONAL INTELLIGENT DATA ANALYSIS FOR SUSTAINABLE DEVELOPMENT TingYu, NiteshV. Chawla, and Simeon Simoff COMPUTATIONAL METHODS OF FEATURE SELECTION Huan Liu and Hiroshi Motoda CONSTRAINED CLUSTERING: ADVANCES IN ALGORITHMS, THEORY, AND APPLICATIONS Sugato Basu, Ian Davidson, and Kiri L. Wagstaff CONTRAST DATA MINING: CONCEPTS, ALGORITHMS, AND APPLICATIONS Guozhu Dong and James Bailey DATA CLASSIFICATION: ALGORITHMS AND APPLICATIONS Charu C. Aggarawal
  • 7. DATA CLUSTERING: ALGORITHMS AND APPLICATIONS Charu C. Aggarawal and Chandan K. Reddy DATA CLUSTERING IN C++: AN OBJECT-ORIENTED APPROACH Guojun Gan DATA MINING FOR DESIGN AND MARKETING Yukio Ohsawa and Katsutoshi Yada DATA MINING WITH R: LEARNING WITH CASE STUDIES Luís Torgo FOUNDATIONS OF PREDICTIVE ANALYTICS James Wu and Stephen Coggeshall GEOGRAPHIC DATA MINING AND KNOWLEDGE DISCOVERY, SECOND EDITION Harvey J. Miller and Jiawei Han HANDBOOK OF EDUCATIONAL DATA MINING Cristóbal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan S.J.d. Baker HEALTHCARE DATA ANALYTICS Chandan K. Reddy and Charu C. Aggarwal INFORMATION DISCOVERY ON ELECTRONIC HEALTH RECORDS Vagelis Hristidis INTELLIGENT TECHNOLOGIES FOR WEB APPLICATIONS Priti Srinivas Sajja and Rajendra Akerkar INTRODUCTION TO PRIVACY-PRESERVING DATA PUBLISHING: CONCEPTS AND TECHNIQUES Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S. Yu KNOWLEDGE DISCOVERY FOR COUNTERTERRORISM AND LAW ENFORCEMENT David Skillicorn KNOWLEDGE DISCOVERY FROM DATA STREAMS João Gama MACHINE LEARNING AND KNOWLEDGE DISCOVERY FOR ENGINEERING SYSTEMS HEALTH MANAGEMENT Ashok N. Srivastava and Jiawei Han MINING SOFTWARE SPECIFICATIONS: METHODOLOGIES AND APPLICATIONS DavidLo,Siau-ChengKhoo,JiaweiHan,andChaoLiu MULTIMEDIA DATA MINING: A SYSTEMATIC INTRODUCTION TO CONCEPTS AND THEORY Zhongfei Zhang and Ruofei Zhang MUSIC DATA MINING Tao Li, Mitsunori Ogihara, and George Tzanetakis
  • 8. NEXT GENERATION OF DATA MINING Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, and Vipin Kumar RAPIDMINER: DATA MINING USE CASES AND BUSINESS ANALYTICS APPLICATIONS Markus Hofmann and Ralf Klinkenberg RELATIONAL DATA CLUSTERING: MODELS, ALGORITHMS, AND APPLICATIONS Bo Long, Zhongfei Zhang, and Philip S. Yu SERVICE-ORIENTED DISTRIBUTED KNOWLEDGE DISCOVERY Domenico Talia and Paolo Trunfio SPECTRAL FEATURE SELECTION FOR DATA MINING Zheng Alan Zhao and Huan Liu STATISTICAL DATA MINING USING SAS APPLICATIONS, SECOND EDITION George Fernandez SUPPORTVECTOR MACHINES: OPTIMIZATION BASED THEORY, ALGORITHMS, AND EXTENSIONS Naiyang Deng, Yingjie Tian, and Chunhua Zhang TEMPORAL DATA MINING Theophano Mitsa TEXT MINING: CLASSIFICATION, CLUSTERING, AND APPLICATIONS Ashok N. Srivastava and Mehran Sahami THE TOP TEN ALGORITHMS IN DATA MINING Xindong Wu and Vipin Kumar UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS David Skillicorn
  • 9. ACCELERATING DISCOVERY MINING UNSTRUCTURED INFORMATION FOR HYPOTHESIS GENERATION Scott Spangler IBM Research San Jose, California, USA
  • 10. The views expressed here are solely those of the author in his private capacity and do not in any way represent the views of the IBM Corporation. CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2016 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20150721 International Standard Book Number-13: 978-1-4822-3914-0 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit- ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
  • 11. To Karon, my love
  • 13. ix Contents Preface, xvii Acknowledgments, xxi Chapter 1 ◾  Introduction 1 Chapter 2 ◾  Why Accelerate Discovery? 9 Scott Spangler and Ying Chen THE PROBLEM OF SYNTHESIS 11 THE PROBLEM OF FORMULATION 11 WHAT WOULD DARWIN DO? 13 THE POTENTIAL FOR ACCELERATED DISCOVERY: USING COMPUTERS TO MAP THE KNOWLEDGE SPACE 14 WHY ACCELERATE DISCOVERY: THE BUSINESS PERSPECTIVE 15 COMPUTATIONAL TOOLS THAT ENABLE ACCELERATED DISCOVERY 16 Search 16 Business Intelligence and Data Warehousing 17 Massive Parallelization 17 Unstructured Information Mining 17 Natural Language Processing 17 Machine Learning 18 Collaborative Filtering/Matrix Factorization 18 Modeling and Simulation 18 Service-Oriented Architectures 19
  • 14. x   ◾    Contents Ontological Representation Schemes 19 DeepQA 19 Reasoning under Uncertainty 20 ACCELERATED DISCOVERY FROM A SYSTEM PERSPECTIVE 20 Content Curator 21 Domain-pedia 21 Annotators 23 Normalizers 23 BigInsights Framework 23 Query Services 23 Analytics Services 23 User Interface 23 Catalogue 24 ACCELERATED DISCOVERY FROM A DATA PERSPECTIVE 24 Initial Domain Content and Knowledge Collection 24 Content Comprehension and Semantic Knowledge Extraction 26 Complex and High-Level Knowledge Composition and Representation 26 New Hypothesis and Discovery Creation 27 ACCELERATED DISCOVERY IN THE ORGANIZATION 28 CHALLENGE (AND OPPORTUNITY) OF ACCELERATED DISCOVERY 29 REFERENCES 30 Chapter 3 ◾  Form and Function 33 THE PROCESS OF ACCELERATED DISCOVERY 34 CONCLUSION 40 REFERENCE 40 Chapter 4 ◾  Exploring Content to Find Entities 41 SEARCHING FOR RELEVANT CONTENT 42 HOW MUCH DATA IS ENOUGH? WHAT IS TOO MUCH? 42 HOW COMPUTERS READ DOCUMENTS 43 EXTRACTING FEATURES 43
  • 15. Contents   ◾    xi Editing the Feature Space 46 FEATURE SPACES: DOCUMENTS AS VECTORS 47 CLUSTERING 48 DOMAIN CONCEPT REFINEMENT 50 Category Level 50 Document Level 51 MODELING APPROACHES 51 Classification Approaches 52 Centroid 52 Decision Tree 52 Naïve Bayes 52 Numeric Features 52 Binary Features 53 Rule Based 53 Statistical 53 DICTIONARIES AND NORMALIZATION 54 COHESION AND DISTINCTNESS 54 Cohesion 55 Distinctness 56 SINGLE AND MULTIMEMBERSHIP TAXONOMIES 56 SUBCLASSING AREAS OF INTEREST 57 GENERATING NEW QUERIES TO FIND ADDITIONAL RELEVANT CONTENT 57 VALIDATION 58 SUMMARY 58 REFERENCES 58 Chapter 5 ◾  Organization 61 DOMAIN-SPECIFIC ONTOLOGIES AND DICTIONARIES 61 SIMILARITY TREES 62 USING SIMILARITY TREES TO INTERACT WITH DOMAIN EXPERTS 65
  • 16. xii   ◾    Contents SCATTER-PLOT VISUALIZATIONS 65 USING SCATTER PLOTS TO FIND OVERLAPS BETWEEN NEARBY ENTITIES OF DIFFERENT TYPES 67 DISCOVERY THROUGH VISUALIZATION OF TYPE SPACE 69 REFERENCES 69 Chapter 6 ◾  Relationships 71 WHAT DO RELATIONSHIPS LOOK LIKE? 71 HOW CAN WE DETECT RELATIONSHIPS? 72 REGULAR EXPRESSION PATTERNS FOR EXTRACTING RELATIONSHIPS 72 NATURAL LANGUAGE PARSING 73 COMPLEX RELATIONSHIPS 74 EXAMPLE: P53 PHOSPHORYLATION EVENTS 74 PUTTING IT ALL TOGETHER 75 EXAMPLE: DRUG/TARGET/DISEASE RELATIONSHIP NETWORKS 75 CONCLUSION 79 Chapter 7 ◾  Inference 81 CO-OCCURRENCE TABLES 81 CO-OCCURRENCE NETWORKS 83 RELATIONSHIP SUMMARIZATION GRAPHS 83 HOMOGENEOUS RELATIONSHIP NETWORKS 83 HETEROGENEOUS RELATIONSHIP NETWORKS 86 NETWORK-BASED REASONING APPROACHES 86 GRAPH DIFFUSION 87 MATRIX FACTORIZATION 87 CONCLUSION 88 REFERENCES 89 Chapter 8 ◾  Taxonomies 91 TAXONOMY GENERATION METHODS 91 SNIPPETS 92 TEXT CLUSTERING 92
  • 17. Contents   ◾    xiii TIME-BASED TAXONOMIES 94 Partitions Based on the Calendar 94 Partitions Based on Sample Size 95 Partitions on Known Events 95 KEYWORD TAXONOMIES 95 Regular Expression Patterns 96 NUMERICAL VALUE TAXONOMIES 97 Turning Numbers into X-Tiles 98 EMPLOYING TAXONOMIES 98 Understanding Categories 98 Feature Bar Charts 98 Sorting of Examples 99 Category/Category Co-Occurrence 99 Dictionary/Category Co-Occurrence 100 REFERENCES 101 Chapter 9 ◾  Orthogonal Comparison 103 AFFINITY 104 COTABLE DIMENSIONS 105 COTABLE LAYOUT AND SORTING 106 FEATURE-BASED COTABLES 107 COTABLE APPLICATIONS 109 EXAMPLE: MICROBES AND THEIR PROPERTIES 109 ORTHOGONAL FILTERING 111 CONCLUSION 114 REFERENCE 115 Chapter 10 ◾  Visualizing the Data Plane 117 ENTITY SIMILARITY NETWORKS 117 USING COLOR TO SPOT POTENTIAL NEW HYPOTHESES 119 VISUALIZATION OF CENTROIDS 123 EXAMPLE: THREE MICROBES 125
  • 18. xiv   ◾    Contents CONCLUSION 127 REFERENCE127 Chapter 11 ◾  Networks 129 PROTEIN NETWORKS 130 MULTIPLE SCLEROSIS AND IL7R 130 EXAMPLE: NEW DRUGS FOR OBESITY 134 CONCLUSION 136 REFERENCE 136 Chapter 12 ◾  Examples and Problems 139 PROBLEM CATALOGUE 139 EXAMPLE CATALOGUE 140 Chapter 13 ◾  Problem: Discovery of Novel Properties of Known Entities 141 ANTIBIOTICS AND ANTI-INFLAMMATORIES 141 SOS PATHWAY FOR ESCHERICHIA COLI 146 CONCLUSIONS 149 REFERENCES 150 Chapter 14 ◾  Problem: Finding New Treatments for Orphan Diseases from Existing Drugs 151 IC50:IC50 152 REFERENCES 158 Chapter 15 ◾  Example: Target Selection Based on Protein Network Analysis 159 TYPE 2 DIABETES PROTEIN ANALYSIS 159 Chapter 16 ◾  Example: Gene Expression Analysis for Alternative Indications 165 Scott Spangler, Ignacio Terrizzano, and Jeffrey Kreulen NCBI GEO DATA 165 CONCLUSION 173 REFERENCES 174
  • 19. Contents   ◾    xv Chapter 17 ◾  Example: Side Effects 175 Chapter 18 ◾  Example: Protein Viscosity Analysis Using Medline Abstracts 183 DISCOVERY OF ONTOLOGIES 184 USING ORTHOGONAL FILTERING TO DISCOVER IMPORTANT RELATIONSHIPS 187 REFERENCE 194 Chapter 19 ◾  Example: Finding Microbes to Clean Up Oil Spills 195 Scott Spangler, Zarath Summers, and Adam Usadi ENTITIES 196 USING COTABLES TO FIND THE RIGHT COMBINATION OF FEATURES 199 DISCOVERING NEW SPECIES 202 ORGANISM RANKING STRATEGY 205 CHARACTERIZING ORGANISMS 206 Respiration 209 Environment 215 Substrate 215 CONCLUSION 216 Chapter 20 ◾  Example: Drug Repurposing 225 COMPOUND 1: A PDE5 INHIBITOR 226 PPARα/γ AGONIST 228 Chapter 21 ◾  Example: Adverse Events 231 FENOFIBRATE 231 PROCESS 232 CONCLUSION 237 REFERENCES 239 Chapter 22 ◾  Example: P53 Kinases 241 AN ACCELERATED DISCOVERY APPROACH BASED ON ENTITY SIMILARITY 243
  • 20. xvi   ◾    Contents RETROSPECTIVE STUDY 246 EXPERIMENTAL VALIDATION 248 CONCLUSION 250 REFERENCE 251 Chapter 23 ◾  Conclusion and Future Work 253 ARCHITECTURE 254 FUTURE WORK 255 ASSIGNING CONFIDENCE AND PROBABILITIES TO ENTITIES, RELATIONSHIPS, AND INFERENCES 255 DEALING WITH CONTRADICTORY EVIDENCE 259 UNDERSTANDING INTENTIONALITY 259 ASSIGNING VALUE TO HYPOTHESES 261 TOOLS AND TECHNIQUES FOR AUTOMATING THE DISCOVERY PROCESS 261 CROWD SOURCING DOMAIN ONTOLOGY CURATION 262 FINAL WORDS 262 REFERENCE 262 INDEX, 263
  • 21. xvii Preface Afew years ago, having spent more than a decade doing unstructured data mining of one form or another, in domains spanning helpdesk problem tickets, social media, and patents, I thought I fully understood the potential range of problems and likely areas of applicability of this mature technology. Then, something happened that completely changed how I thought about what I was doing and what its potential really was. The change in my outlook began with the Watson Jeopardy challenge. Seeing a computer learn from text to play a game I had thought was far beyond the capabilities of any artificial intelligence opened my eyes to new possibilities. And I was not alone. Soon many customers were coming for- ward with their own unique problems—problems I would have said a few years ago were just too hard to solve with existing techniques. And now, I said, let’s give it a try. This wasn’t simply a straightforward application of the algorithms used to win Jeopardy in a different context. Most of the problems I was being asked to solve weren’t really even question-answering problems at all. But they all had a similar quality in that they forced us to digest all of the information in a given area and find a way to synthesize a new kind of meaning out of it. This time the problem was not to win a game show, but (to put it bluntly) to advance human scientific knowledge. More than once in the early going, before we had any results to show for our efforts, I wondered if I was out of my mind for even trying this. Early on, I remember making more than one presentation to senior executives at my company, describing what I was doing, half expecting they would tell me to cease immediately, because I was attempting something way too hard for current technology and far outside the bounds of a reasonable business opportunity. But fortunately, no one ever said that, and I kept on going.
  • 22. xviii   ◾    Preface Somewhere along the way (I can’t say just when), I lost all doubt that I was really onto something very important. This was more than another new application of unstructured data mining. And as each new scien- tific area came forward for analysis, the approach we were using began to solidify into a kind of methodology. And then, just as this was happening, I attended the ACM Knowledge Discovery and Data Mining Conference in 2013 (KDD13); I met with CRC Press in a booth at the conference and told them about my idea. Shortly thereafter, I was signed up to write a book. I knew at the time that what would make this book especially challeng- ing is that I was still proving the methodology and tweaking it, even as I was writing out the description of that method. This was not ideal, but neither could it be avoided if I wanted to broaden the application of the method beyond a small team of people working in my own group. And if it could not be broadened, it would never realize its full potential. Data science is a new discipline. It lacks a curriculum, a set of text- books, a fundamental theory, and a set of guiding principles. This is both regrettable and exciting. It must be rectified if the discipline is to become established. Since I greatly desire that end, I write this book in the hopes of furthering it. Many years ago, I remember stumbling across a book by the statistician John Tukey called Exploratory Data Analysis. It was written 30 years before I read it, and the author was no longer living; yet it spoke to me as if I were his research collaborator. It showed me how the ideas I had been grappling with in the area of unstructured text had been similarly addressed in the realm of structured data. Reading that book gave me renewed confidence in the direction I was taking and a larger vision for what the fulfillment of that vision might one day accomplish. This book is one more step on that journey. The journey is essentially my life’s work, and this book is that work’s synthesis thus far. It is far from perfect, as anything that is real will always be a diminishment from what is merely imagined. But I hope even in its imperfect state it will commu- nicate some part of what I experience on a daily basis working through accelerated discovery problems as a data scientist. I think it is unquestion- ably the most rewarding and exciting job in the world. And I dare to hope that 30 years from now, or maybe even sooner, someone will pick up this book and see their own ideas and challenges reflected in its pages and feel renewed confidence in the direction they are heading.
  • 23. Preface   ◾    xix At the same time, I fear that one or two readers (at least) will buy this book and immediately be disappointed because it is not at all what they were expecting it would be. It’s not a textbook. It’s not a business book. It’s not a popular science book. It doesn’t fit well in any classification. I can only say this: I wrote it for the person I was a few years back. Read it with an open mind: you might find you get something useful out of it, regard- less of the failure to meet your initial expectations. It took me a long time to get to this level of proficiency in knowing how to address accelerated discovery problems. I’ve tried my best to capture exactly how I go about it, both from a systematic perspective and from a practical point of view. This book provides motivation, strategy, tactics, and a heterogeneous set of comprehensive examples to illustrate all the points I make. If it works as I have intended it to, it will fill an important gap left by the other types of books I have mentioned…the ones you thought this might be. You can still buy those other books as well, but keep this one. Come back to it later when you have started to put in practice the theories that you learned in school to solve real-world applications. You may find then that the book has more to say to you than you first thought. Today, I go into each new problem domain with complete confidence that I know how to get started; I know the major steps I need to accom- plish, and I have a pretty good idea what the final solution will look like (or at least I know the range of things it might look like when we first deliver a useful prototype). It wasn’t always that way. Those first few customer engagements of this type that I did, I was mostly winging it. It was excit- ing, no doubt, but I would have really loved to have this book on my desk (or in my e-reader) to look over after each meeting and help me figure out what I should do next. If you are fortunate enough to do what I do for a living, I think you will (eventually) find this book worthwhile.
  • 25. xxi Acknowledgments There were many people who were instrumental in the creation of this methodology and in the process of writing the book that explains it. First, the team at IBM Watson Innovations, who made it all possible: Ying Chen, Meena Nagarajan, Qi He, Linda Kato, Ana Lelescu, Jacques LaBrie, Cartic Ramakrishnan, Sheng Hua Boa, Steven Boyer, Eric Louie, Anshu Jain, Isaac Cheng, Griff Weber, Su Yan, and Roxana Stanoi. Also instrumental in realizing the vision were the team at Baylor College of Medicine, led by Olivier Lichtarge, with Larry Donehower, Angela Dawn Wilkins, Sam Regenbogen, Curtis Pickering, and Ben Bachman. Jeff Kreulen has been a collaborator for many years now and contin- ues to be a big supporter and contributor to the ideas described here. Michael Karasick and Laura Haas have been instrumental in consistently supporting and encouraging this work from a management perspective at IBM. John Richter, Meena Nagarajan, and Peter Haas were early reviewers of my first draft, and I appreciate their input. Ying Chen helped write the chapter on Why Accelerate Discovery?, for which I am most grateful. Pat Langley provided some very good advice during the planning phase for the book, which I profited from. Finally, and most importantly, my wife, Karon Barber, who insisted that I finish this project, at the expense of time that I would rather have spent with her. Nothing I’ve accomplished in life would have happened without her steadfast faith and love.
  • 27. 1 C h a p t e r 1 Introduction This book is about discovery in science and the importance of heterogeneous data analytics in aiding that discovery. As the vol- ume of scientific data and literature increases exponentially, scientists need ever-more powerful tools to process and synthesize that infor- mation in a practical and meaningful way. But in addition, scientists need a methodology that takes all the relevant information in a given problem area—all the available evidence—and processes it in order to propose the set of potential new hypotheses that are most likely to be both true and important. This book describes a method for achieving this goal. But first, I owe the reader a short introduction and an explanation of why I am the one writing this book. The short answer is a lucky accident (lucky for me anyway; for you it remains to be seen). I stumbled into a career doing the most exciting and rewarding work I can imagine. I do not know it for a fact, but I suspect that I have done more of this kind of work and for a longer period of time than anyone else now alive. It is this experience that I now feel compelled to share, and it is that experi- ence that should make the book interesting reading for those who also see the potential of the approach but do not know how to get started applying it. It all started out with a love of mathematics, in particular discreet mathematics, and to be even more specific: combinatorics. Basically this is the study of how to count things. I was never happier than when I found this course in college. The discovery of a discipline devoted precisely to what one instinctively loves is one of life’s greatest joys. I was equally
  • 28. 2   ◾    Accelerating Discovery disappointed to find there was no such thing as a career in combinatorics, outside of academia—at least, not at that time. But I wandered into computer science, and from there into machine learning and from there into text mining, and suddenly I became aware that the skill and practice of knowing how to count things had a great deal of practical application after all. And now 30 years have passed since I fin- ished that combinatorics course, and with every passing year the number, variety, importance, and fascination of the problems I work on are still increasing. Nothing thrills me more than to have a new data set land in my inbox. This is especially so if it is some kind of data I have never seen before, better still if analyzing it requires me to learn about a whole new field of knowl- edge, and best yet if the result will somehow make a difference in the world. I can honestly say I have had the privilege of working on such problems, not once, not twice, but more times than I can reckon. I do not always suc- ceed, but I do make progress often enough that more and more of these problems seem to find their way to me. At some level I wish I could do them all, but that would be selfish (and not very practical). So I am writing this book instead. If you love working with unstructured, heterogeneous data the way I do, I believe this book will have a positive impact on your career, and that you will in turn have a positive impact on society. This book is an attempt to document and teach Accelerated Discovery to the next generation of data scientists. Once you have learned these techniques and practiced them in the lab, your mission will be to find a scientist or an engineer struggling with a big data challenge and help them to make a better world. I know these scientists and engineers exist, and I know they have these challenges, because I have talked to them and corresponded with them. I have wished there were more of me to go around so that I could help with every one of them, because they are all fascinating and all incredibly promising and worthy efforts. But there are only so many hours in a week, and I can only pick a few of the most prom- ising to pursue, and every one of these has been a rewarding endeavor. For the rest and for those problems that will come, I have written this book. This book is not a data-mining manual. It does not discuss how to build a text-classification engine or the ins and outs of writing an unsupervised clustering implementation. Other books already do this, and I could not surpass these. This book assumes you already know how to process data using the basic bag of tools that are now taught in any good data-mining
  • 29. Introduction   ◾    3 or machine-learning course. Where those courses leave off, this book begins. The question this book answers is how to use unstructured mining approaches to solve a really complex problem in a given scientific domain. How do you create a system that can reason in a sophisticated way about a complex problem and come up with solutions that are profound, nonobvi- ous, and original? From here on, this book is organized in a more or less top-down fashion. The next chapter discusses the importance of the Accelerated Discovery problem space and why the time has come to tackle it with the tools we currently have available. Even if you are already motivated to read the book, do not skip this chapter, because it contains some important mate- rial about how flexibly the technology can be applied across a wide swath of problems. What follows immediately thereafter is a set of five chapters that describe the method at a fairly high level. These are the most important chapters of the book, because they should be in the front of your mind each time you face a new analytics challenge in science. First there is a high-level description of our method for tackling these problems, followed by four detailed chapters giving a general approach to arriving at a solu- tion. When put together, these five chapters essentially cover our method for accelerating discovery. Not every problem you encounter will use every step of this method in its solution, but the basic approach can be applied in a more or less universal way. The next section brings the level of detail down to specific technologies for implementing the method. These are less universal in character but hopefully will make the method more concrete. This set of four chapters goes into greater detail about the tools and algorithms I use to help real- ize the approach in practice. It is not complete, but hopefully it will be illustrative of the kinds of techniques that can make the abstract process a reality. The rest of the book is made up of sample problems or examples of how this really works in practice. I included ten such examples because it was a nice round number, and I think the examples I have selected do pro- vide a good representative sample of this kind of engagement. All of these examples are from real scientists, are based on real data, and are focused on important problems in discovery. The examples all come from the life- sciences area, but that is not meant to be the only area where these tech- niques would apply; in fact, I have applied them in several other sciences, including materials and chemistry. But my best physical science examples
  • 30. 4   ◾    Accelerating Discovery are not publishable due to proprietary concerns, so for this book I have chosen to focus on the science of biology. That is how the book is organized, but do not feel you have to read it this way. You could just as well start with the examples and work your way back to the earlier chapters when you want to understand the method in more detail. You will quickly notice that not every problem solution employs every step of the methodology anyway. The methodology is a flexible framework on which to assemble the components of your solu- tion, as you need them, and where they make sense. And it is meant to be iterative and to evolve as you get deeper into the information complexity of each new domain. As you read the book, I hope that certain core principles of how to be a good data scientist will naturally become apparent. Here is a brief cata- logue of those principles to keep in mind each time you are faced with a new problem. • The whole is greater than the sum of the parts: As scientists we natu- rally tend toward reductionism when we think about how to solve a problem. But in data science, it is frequently the case that, by con- sidering all the relevant data at once, we can learn something that we cannot see by looking at each piece of data in isolation. Consider ways to unify everything we know about an individual entity as a complete picture. What you learn is frequently surprising. • More X is not always better: There is a wishful tendency among those less familiar with the problems of data science to imagine that every problem can be solved with more data, no matter how irrelevant that data happens to be; or that, if we have run out of data, then adding more features to the feature space ought to help; or, if that fails, that adding more categories to our taxonomy should help, and so on. The operative concept is more is always better. And certainly, one is sup- posed to assume that at least more stuff can never hurt; the solution must be in there somewhere. But the problem is, if you add mostly more noise, the signal gets harder to find, not easier. Careful selec- tion of the right data, the right features, and the right categories is always preferable to indiscriminate addition. • Compare and contrast: Measuring something in isolation does not tell you very much. Only when you compare the value to some other related thing does it begin to have meaning. If I tell you a certain
  • 31. Introduction   ◾    5 baseball player hit 50 home runs last season, this will not mean much if you know nothing about the game. But if you know what percentile that puts him in compared to other players, that tells you something, especially if you also take into account plate appearances, difficulty of pitchers faced, and the ball parks he played in. The point is that too often in data science, we are tempted to look too narrowly at only one aspect of a domain in order to get the precise number we are look- ing for. We also need to look more broadly in order to understand the implications of that value: to know whether it means anything of importance. • Divide and conquer: When you have a lot of data you are trying to make sense of, the best strategy for doing this is to divide it into smaller and smaller chunks that you can more easily comprehend. But this only works if you divide up the data in a way that you can make sense of when you put it all back together again. For exam- ple, one way to divide up data is by letter of the alphabet, but this is unlikely to make any one category much different than any other, and thus the problem has not become any easier within each subcat- egory. But if I focus on concepts rather than syntax, I stand a much better chance of being enlightened at the end. • “There’s more than one way to…”: Being a cat lover, I shy away from completing that statement, but the sentiment is no less true for being illustrated in such an unpleasant way. Once we find a solution or approach, our brains seem to naturally just turn off. We have to avoid this trap and keep looking for other ways to arrive at the result. If we apply these additional ways and get the same answer, we can be far more confident than we were that the answer is correct. If we apply the additional approaches and get a different answer, that opens up whole new areas for analysis that were closed to us before. Either way, we win. • Use your whole brain (and its visual cortex): Find a way to make the data draw a picture, and you will see something new and important that was hidden before. Of course, the challenge is to draw the pic- ture that illustrates the key elements of importance across the most visible dimensions. Our brains have evolved over time to take in vast amounts of data through the eyes and convert it effortlessly into a reasonably accurate view of what is going on around us. Find a
  • 32. 6   ◾    Accelerating Discovery way to put that powerful specialized processor to work on your data problem and you will inevitably be astounded at what you can see. • Everything is a taxonomy/feature vector/network: At the risk of oversimplifying things a bit, there are really only three basic things you need to know to make sense of data: What the entities are that you care about and how they relate to each other (the taxonomy), how you can describe those entities as features (feature vector), and how you can represent the way those entities interact (network). Every problem involves some subset or combination of these ideas. It really is that simple. • Time is not a magazine: The data we take to begin our investigation with is usually static, meaning it sits in a file that we have down- loaded from somewhere, and we make sure that file does not change over time (we may even back it up to be absolutely sure). This often leads us to forget that change is the only constant in the universe, and over time we will find that our file bears less and less relation to the new reality of now. Find a way to account for time and to use time recorded in data to learn how things evolve. • All data is local: A corollary to the problem of time is the problem of localization. Most data files we work with are subsets of the larger data universe, and thus we have to generalize what we learn from them to make them applicable to the real universe. That generaliza- tion problem is going to be much harder than you realize. Prejudice toward what we know and ignorance of what we do not is the bane of all future predictions. Be humble in the face of your own limited awareness. • Prepare for surprise: If you are not constantly amazed by what you find in data, you are doing something wrong. Hopefully this brief introduction gives you some sense of the ideas to keep in mind as you begin to master this discipline. Discovery is always hard and always involves synthesizing different kinds of data and ana- lytics. The crucial step is to make all those moving parts work together constructively to illuminate what lies just beyond the known. The key ingredient is figuring out what to count and how to count it. In the end, everything is just combinatorics all the way down!
  • 33. Introduction   ◾    7 I hope this book helps you to solve complex and important problems of this type. More than that, I encourage you to develop your own meth- ods for accelerating discovery and publish them as I have done mine. This problem is too important for one small group of data scientists in one organization to have all the fun. Come join us.
  • 35. 9 C h a p t e r 2 Why Accelerate Discovery? Scott Spangler and Ying Chen There is a crisis emerging in science due to too much data. On the surface, this sounds like an odd problem for a scientist to have. After all, science is all about data, and the more the better. Scientists crave data; they spend time and resources collecting it. How can there be too much data? After all, why can scientists not simply ignore the data they do not need and keep the data they find useful? But therein lies the problem. Which data do they need? What data will end up proving useful? Answering this question grows more difficult with increasing data availability. And if data grows exponentially, the problem may reach the point where individual scientists cannot make optimal decisions based on their own limited knowledge of what the data contains. I believe we have reached this situation in nearly all sciences today. We have certainly reached it in some sciences. So by accelerating discovery, I do not simply mean doing science the way we do it today, only faster; I really mean doing science in a profoundly new way, using data in a new way, and generating hypotheses in a new way. But before getting into all that, I want to present some historical con- text in order to show why science the way it has always been practiced is becoming less and less viable over time. To illustrate what I mean, consider the discovery of evolution by Charles Darwin. This is one of the most studied examples of great science
  • 36. 10   ◾    Accelerating Discovery in history, and in many ways his example provided the template for all scientific practice for the next 150 years. On the surface, the story is remarkably elegant and straightforward. Darwin travels to the Galapagos Islands, where he discovers a number of new and unique species. When he gets back from his trip, he notices a pattern of species changing over time. From this comes the idea of species “evolution,” which he publishes to much acclaim and significant controversy. Of course, what really hap- pened is far more complex and illuminating from the standpoint of how science was, and for the most part is still, actually practiced. First of all, as inhabitants of the twenty-first century, we may forget how difficult world travel was back in Darwin’s day. Darwin’s voyage was a survey mission that took him around the world on a sailing ship, the Beagle. He made many stops and had many adventures. Darwin left on his trip in 1831 and returned five years later, in 1836. During that time, he collected many samples from numerous locations and he took copi- ous notes on everything he did and saw. After he got back to England, he then spent many years collating, organizing, and systematically catalogu- ing his specimens and notes. In 1839, he published, to much acclaim, a book describing the incidents of this voyage (probably not the one you are thinking of, that one came much later): Journal and Remarks, Voyage of the Beagle. Darwin then spent the next 20 years doing research and col- lecting evidence on plants and animals and their tendency to change over time. But though he was convinced the phenomenon was real, he still did not have a mechanism by which this change occurred. Then Darwin happened upon Essay on the Principle of Population (1798) by Thomas Malthus. It introduced the idea that animals produce more offspring than typically actually survive. This created a “struggle for exis- tence” among competing offspring. This led Darwin directly to the idea of “natural selection.” The Origin of Species was published in 1859, 28 years after the Beagle left on its voyage. And of course, it was many decades later before Darwin’s theory would be generally accepted. There are certain key aspects of this story that I want to highlight as particularly relevant to the question “Why Accelerate Discovery?” The first has to do with the 20 years it took Darwin to collect and analyze the data he felt was necessary to develop and validate his theory. The second is related to the connection that Darwin made between his own work and that of Malthus. I think both of these phenomena are characteristic of the big data issue facing scientists both then and now. And if we think about them carefully in the context of their time and ours, we can see how it
  • 37. Why Accelerate Discovery?   ◾    11 becomes imperative that scientists working today use methods and tools that are far more powerful than those of Darwin and his contemporaries. THE PROBLEM OF SYNTHESIS When Darwin returned from his 5-year voyage, he had a formidable col- lection of notes and specimens to organize and catalogue. This step took him many years; longer, in fact, than it took him to collect the data in the first place, but it was crucial to the discovery process. We often think of scientific discovery as a Eureka moment—a bolt from the blue. But in real- ity, it is much more frequently the result of painstaking labor carried out to collect, organize, and catalogue all the relevant information. In essence, this is a problem of synthesis. Data hardly ever comes in a form that can be readily processed and interpreted. In nearly every case, the genius lies not in the finding and collecting of the data but in the organization scheme that makes the interpretation possible. In a very real sense, all significant scientific discoveries are about overcoming the data-synthesis problem. Clearly, data synthesis is hard (because otherwise everyone would do it), but what makes it so? It is often not easy to see the effort required if only the result is observed. This is because the most difficult step, the part that requires the real genius, is almost invisible. It is hidden within the structure of the catalogue itself. Let us look at the catalogue of specimens Darwin created. Darwin’s task in specimen organization cataloguing was not just to record the species and physical characteristics of each specimen—it was to find the hidden relationships between them, as illustrated in Figure 2.1. Organizing data into networks of entities and relationships is a recurring theme in science. Taxonomies and ontologies are another manifestation of this. Taxonomies break entities down into classes and subclasses based on some measure of similarity, so that the further down the tree you go, the more alike things are within the same class. Ontologies represent a more general kind of entity network that expresses how entities relate to each other in the world. Creating such networks from raw data is the problem of synthesis. The more data there is, and in particular the more hetero- geneous the forms of that data, the more challenging synthesis becomes. THE PROBLEM OF FORMULATION Once Darwin had synthesized his data, it became clear to him that species did indeed change over time. But merely to observe this phenomenon was not enough. To complete his theory, he needed a mechanism by which
  • 38. 12   ◾    Accelerating Discovery this change takes place. He needed to create a model that could explain how the data points (i.e., the species) connected to each other; otherwise all he would have is a way to organize the data, without having any addi- tional insight into what the data meant. Creating this additional insight that emerges from synthesis is the problem of formulation. Formulation requires the creation of an equation or algorithm that explains a process or at least simulates or approximates mathematically how that process behaves in the physical world. From a data-science perspective, formulation requires extracting patterns that may appear across many disparate, heterogeneous data collections. Going beyond synthesis to explanation may require data visu- alization and sometimes even analogy. It requires pattern matching and a14 a10 a9 a8 a7 a6 a5 a4 a3 a2 a1 q14 p14 b14 f 14 f 10 f 9 f 8 f 7 f 6 o14 e14 m14 F14 F10 E10 m10 m9 m8 m7 m6 m5 m4 m3 m2 m1 s2 i2 i3 k5 k6 k7 k8 l8 l7 n14 r14 w14 y14 v14 z14 z10 z9 z8 z7 z6 z5 z4 z3 z2 z1 t2 t3 u5 u6 u7 u8 w9 w10 w8 w7 A W. West lith Halton garden B C D E F G H I K L I II III IV V VI VII VIII IX X XI XII XIII XIV FIGURE 2.1 Illustrations from The Origin of Species.
  • 39. Why Accelerate Discovery?   ◾    13 being able to draw from a wide array of related data. This is what Darwin was able to do when he drew on the writings of Thomas Malthus to dis- cover the driving mechanism behind species change. Darwin reused an existing formulation, the struggle for existence among competing off- spring (i.e., “survival of the fittest”), and applied it to competition among all living things in order to arrive at a formulation of how species evolve. The process of formulation begins with observation. The scientist observes how entities change and interact over time. She observes which properties of entities tend to occur together and which tend to be indepen- dent. Often, data visualization—charts or graphs, for example—is used to summarize large tables of numbers in a way that the human visual cortex can digest and make sense of. The synthesis of data is one of the key steps in discovery—one that often looks obvious in retrospect but, at the begin- ning of research, is far from being so in most cases. WHAT WOULD DARWIN DO? The process of synthesis and formulation used by Darwin and other scientists worked well in the past, but this process is increasingly prob- lematic. To put it bluntly, the amount and variety of data that needs to be synthesized and the complexity of the models that need to be formulated has begun to exceed the capacity of individual human intelligence. To see this, let us compare the entities and relationships described in The Origin of Species to the ones that today’s biologists need to grapple with. Today’s biologists need to go well beyond the species and physical anat- omy of organisms. Today, biology probes life at the molecular level. The number of different proteins that compose the cells in the human organ- ism is over a million. Each of these proteins has different functions in the cell. Individual proteins work in concert with other proteins to create additional functionality. The complexity and richness of all these interac- tions and functions is awe inspiring. It is also clearly beyond the capability of a single human mind to grasp. And all of this was entirely unknown to Darwin. To look at this in a different way, the number of scholarly publications available for Darwin to read in his field might have been on the order of around 10,000–100,000 at most. Today, that number would be on the order of fifty million [1]. How do the scientists of today even hope to fathom such complexity and scale of knowledge? There are two strategies that every scientist employs to one degree or another: specialization and consensus. Each scientist
  • 40. 14   ◾    Accelerating Discovery chooses an area of specialization that is narrow enough to encompass a field wherein they can be familiar with all the important published litera- ture and findings. Of course, this implies that as time goes on and more and more publications occur, specialization must grow more and more intense. This has the obvious drawback of narrowing the scope of each scientist’s knowledge and the application of their research. In addition, each scientist will read only the publications in the most prestigious, high- profile journals. These will represent the best consensus opinion of the most important research in their area. The drawback is that consensus in science is frequently wrong. Also, if the majority of scientists are pursuing the same line of inquiry, the space of possible hypothesis is very ineffi- ciently and incompletely explored. THE POTENTIAL FOR ACCELERATED DISCOVERY: USING COMPUTERS TO MAP THE KNOWLEDGE SPACE But all is not lost for the scientists of today, for the very tools that help generate the exponentially increasing amounts of data can also help to synthesize and formulate that data. Due to Moore’s Law, scientists have and will most likely continue to have exponentially increasing amounts of computational power available to them. What is needed is a way to harness that computational power to carry out better synthesis and for- mulation—to help the scientist see the space of possibilities and explore that space much more effectively than they can today. What is needed is a methodology that can be repeatedly employed to apply computation to any scientific domain in such a way as to make the knowledge space com- prehensible to the scientist’s brain. The purpose of this book is to present one such methodology and to describe exactly how to carry it out, with specific examples from biology and elsewhere. We have shown this method to be an effective way to syn- thesize all published literature in a given subject area and to formulate new properties of entities based on everything that we know about those entities from previous results. This leads us to conclude that the meth- odology is an effective tool for accelerating scientific discovery. Since the methods we use are in no way specific to these examples, we think there is a strong possibility that they may be effective in many other scientific domains as well. Moreover, regardless of whether our particular methodology is opti- mal or effective for any particular scientific domain, the fact remains that all scientific disciplines that are pursued by ever-increasing numbers
  • 41. Why Accelerate Discovery?   ◾    15 of investigators must ultimately address this fundamental challenge: Eventually, the rate of data publication will exceed the individual human capacity to process it in a timely fashion. Only with the aid of computation can the brain hope to keep pace. The challenges we address here and the method we employ to meet those challenges will continue to be relevant and essential for science for the foreseeable future. So clearly the need exists and will continue to increase for aiding sci- entific discovery with computational tools. But some would argue that no such tools exist, or that if they do exist, they are still too rudimentary to really create value on a consistent basis. Computers can do computation and information retrieval, but scientific discovery requires creativity and thinking “outside the box,” which is just what computers cannot do. A few years ago, the authors would have been largely in agreement with this viewpoint, but something has changed in the field of computer science that makes us believe that accelerating scientific discovery is no longer a distant dream but is actually well within current capability. Later in this chapter, we will describe these recent developments and preview some of the implications of these emerging capabilities. WHY ACCELERATE DISCOVERY: THE BUSINESS PERSPECTIVE Discovery is central and critical to the whole of humanity and to many of the world’s most significant challenges. Discovery represents an ability to uncover things that are not previously known. It underpins all innova- tions (Figure 2.2). Looking at what we human beings consume—for example, consumer goods such as food, clothing, household items, and energy—we would quickly realize that we need significant innovations across the board. We need to discover new ways to generate and store energy, new water Major discoveries and innovation are critical to many world challenges and the success of many companies across industries Smarter planet/ consumer goods: water filtration, product innovation Information technology: nanotechnologies, mobile Life sciences: drug discovery, biomedical research Energy storage and generation: batteries, solar, CO2 FIGURE 2.2 Example application areas for Accelerated Discovery.
  • 42. 16   ◾    Accelerating Discovery filtration methods, and new product formations for food and other goods so that they are more sustainable for our environments and healthier for human beings. We need these innovations more desper- ately than ever. Looking at what we make and build—for example, new computer and mobile devices, infrastructures, and machines—again, the need for dis- covery and innovation is at the center of all these. We need new kinds of nanotechnologies that can scale human capacity to unimaginable limits, new materials that have all the desired properties that will lower energy consumption while sustaining its intended functions, and new designs that can take a wide variety of factors into consideration. Looking at ourselves, human beings, our own wellbeing depends heav- ily on the discovery and innovation in healthcare, life sciences, and a wide range of social services. We need a much better understanding of human biology. We need to discover new drugs and new therapies that can target diseases much more effectively and efficiently. Yet today, the discovery processes in many industries are slow, manual, expensive, and ad hoc. For example, in drug discovery, it takes on average 10–15 years to develop one drug, and costs hundreds of millions of dol- lars per drug. The attrition rate of drug development today is over 90%. Similarly, new energy forms, such as the lithium battery, take tens of years to discover. New consumer product formations are mostly done on a trial- and-error basis. There is a need across all industries for a reliable, repeat- able process to make discovery more cost-effective. COMPUTATIONAL TOOLS THAT ENABLE ACCELERATED DISCOVERY Accelerated Discovery is not just one capability or algorithm but a com- bination of many complementary approaches and strategies for synthe- sizing information and formulating hypotheses. The following existing technologies represent the most important enablers of better hypothesis generation in science. Search The ability to index and retrieve documents based on the words they con- tain across all relevant content in a given scientific field is a primary enabler of all the technologies that are involved in Accelerated Discovery. Being able to selectively and rapidly acquire all the relevant content concerning a given subject of interest is the first step in synthesizing the meaning of
  • 43. Why Accelerate Discovery?   ◾    17 that content. The easy availability, scalability, and application of search to this problem space have made everything else possible. Business Intelligence and Data Warehousing The ability to store, retrieve, and summarize large amounts of structured data (i.e., numeric and categorical data in tables) allows us to deal with all kinds of information in heterogeneous formats. This gives us the criti- cal ability to survey scientific discoveries over time and space or to com- pare similar treatments on different populations. The ability to aggregate data and then accurately compare different subsets is a technology that we apply over and over again in our approach as we seek to determine the credibility and reliability of each fact and conclusion. Massive Parallelization In recent years, Hadoop and MapReduce frameworks [2] have made par- allelization approaches much more applicable to real-world computing problems. This gives us the ability to attack hard problems involving large amounts of data in entirely new ways. In short, we can build up a number of simple strategies to mine and annotate data that can, in aggregate, add up to a very sophisticated model of what the data actually means. Massive parallelization also allows us to try out thousands of approaches and com- binations in real time before selecting the few candidates that are most likely to succeed based on our models and predictions. Unstructured Information Mining Most of the critical information in science is unstructured. In other words, it comes in the form of words, not numbers. Unstructured information mining provides the ability to reliably and accurately convert words into other kinds of structures that computers can more readily deal with. As we will see in this book, this is a key element of the accelerated discovery process. This allows us to go beyond retrieving the right document, to actually discovering hidden relationships between the elements described by those documents. Natural Language Processing The ability to recognize entities, relationships or transitions, and features and properties and to attribute them appropriately requires natural lan- guage processing. This technology allows us to parse the individual ele- ments of the sentence, identify their part of speech, and determine to what
  • 44. 18   ◾    Accelerating Discovery they refer. It can also allow us to discover the intentionality of the author. These natural language processing abilities enable the precise determina- tion of what is known, what is hypothesized, and what is still to be deter- mined through experimentation. It creates the underlying fact-based framework from which our hypotheses can be generated. Machine Learning To do Accelerated Discovery in complex domains requires more than just establishing the factual statements in literature. Not all literature is equally trustworthy, and the trustworthiness may differ depending on the scope and context. To acquire the level of sophistication and nuance needed to make these determinations will require more than human pro- gramming can adequately provide. It will require learning from mistakes and past examples in order to get better and better over time at deciding which information is credible and which is suspect. Machine learning is the technology that provides this type of capability. In fact, it has proven remarkably adept at tackling even heretofore intractable problems where there is sufficient training data to be had [3]. Machine learning will enable our Accelerated Discovery approach to apply sophisticated judgment at each decision point and to improve that judgment over time. Collaborative Filtering/Matrix Factorization Collaborative filtering is a technique made famous by Amazon and Netflix [4], where it is used to accurately identify the best movie or book for a given customer based on the purchase history of that customer and other simi- lar customers (customers who buy similar things). Customer purchases are a special kind of entity-entity network. Other kinds of entity-entity networks are similarly amenable to this kind of link prediction. We can use a generalization of this approach as a way to predict new links in an existing entity-entity network, which can be considered to be a hypothesis of a new connection between entities (or a new property of an entity)that is not currently known but is very likely based on everything we know about the relevant entities. Modeling and Simulation In order to reason accurately about the physical world we have to be able to simulate its processes in silico and predict what would happen if an experi- ment were tried or a new property or relationship was actually found to exist. These types of simulations will help reveal potential downstream
  • 45. Why Accelerate Discovery?   ◾    19 problems or contradictions that might occur if we were to hypothesize a physically unrealizable condition or some impossible connection between entities. Moreover, modeling and simulation can help determine what the likely impact would be on the physical system as a whole of any new prop- erty or relationships being discovered, in order to foresee whether such a discovery would be likely to be uninteresting or quite valuable because it would imply a favorable outcome or have a wide impact in the field. Service-Oriented Architectures Clearly, doing Accelerated Discovery requires a large array of hetero- geneous software components providing a wide variety of features and functions across multiple platforms. Service-oriented architectures (SOA) provide a uniform communication protocol that allows the components to communicate across the network using complex and evolving data rep- resentations. It also allows new implementations and algorithms to be eas- ily swapped in as they become available. SOAs represent an indispensable tool for enabling the large, sophisticated, and distributed systems needed for accelerated discovery applications to emerge from components that can largely be found on the shelf or in open-source libraries. Ontological Representation Schemes In addition to being able to extract entities and relationships from unstruc- tured content, we also need powerful ways to represent those entities and their features and connections in a persistent fashion, and in a way that makes it possible to do reasoning over these objects. Existing ontological representation schemes (e.g., OWL [5]) make it possible to store entities in a way that retains all the pertinent contextual information about those entities while still maintaining a degree of homogeneity. This homogene- ity makes it possible to design algorithms that can discover or infer new properties based on all known existing patterns. The ability to store such representations in a scalable database and/or index provides the capability of growing the stored version of what is known to the level necessary to comprehend an entire scientific domain. DeepQA While question answering is not a central feature of Accelerated Discovery, the two applications share many common components. Both require the computational digestion of large amounts of unstructured content, which then must be used in aggregated to form a conclusion with a likelihood
  • 46. 20   ◾    Accelerating Discovery estimate. Both also support their answers with evidence extracted from text sources. Reasoning under Uncertainty Machine learning techniques allow us to predict the likelihood that some conclusion or fact is true. Reasoning under uncertainty is how we use this probabilistic knowledge to form new hypotheses or to invalidate or ignore some fact that is highly unlikely. Bayesian inferencing [6] is one example of an existing framework that we can apply to uncertain causal networks in order to do credible deduction of probable end states. This capability is central to telling the difference between a likely outcome and something that is wildly fanciful. ACCELERATED DISCOVERY FROM A SYSTEM PERSPECTIVE The previous list of enabling technologies available to support Accelerated Discovery (AD) is necessary to the task, but incomplete. What is needed is a coherent framework in which these technologies can effectively work together to achieve the desired outcome. To support and enable such continuous data transformations and dis- covery, we must design our discovery solution carefully. In particular, our discovery solution must adapt and scale to a wide range of chang- ing dynamics, including data content, domain knowledge, and human interactions. This is crucial because in all industry domains, domain con- tent and knowledge are constantly changing and expanding. If we do not design our discovery system to cope with such changes, the outcome will be a system that lacks longevity and capability. Most, if not all, of today’s discovery solutions can only deal with limited volume and diversity of content and knowledge. To enable adaptation and scaling, we instituted two system design prin- ciples: agility and adaptivity. Agility means that a discovery system must be able to rapidly generate outputs in the face of changes in data content, knowledge, and human inputs. This is far from a reality in today’s discov- ery systems. For example, a discovery system may be built for one kind of data input formats. When the data input format changes, significant manual intervention may be needed and downstream system components may also need to change accordingly. Such designs make a discovery pro- cess extremely lengthy and error prone. We will describe our approach to build agility into the system from the very beginning.
  • 47. Why Accelerate Discovery?   ◾    21 Adaptivity means that a discovery solution must consider “changes in all forms” to be the first-class citizen; for example, changes in individual system components, changes in data content, and changes in knowledge. We envision that various underlying technology components will go through their evolution to become better over time, and the same is true of data content and knowledge bases. A discovery system must have the notion of adaptivity built into it from day one. To enable agility, we suggest that all major system components be designed with a “core-abstraction plus configurable customization” approach. The core abstraction defines all the major services that the sys- tem components intend to support. The configurable customizations allow for changes and adaptations of the core abstractions for specific needs in data-content formats, knowledge repositories, and interactions with other components. For example, a content collection component may have com- mon and core services dealing with the processing of typical unstructured documents. A configurable customization can define the specific fields and format extensions that are needed for those unstructured sources without code change in the core abstraction services. To enable adaptivity, we defined generalized data input and output for- mats, called common data models (CDMs), and common interfaces around all system components. This allows developers to change the component engine itself without impacting the overall function of the discovery system. It also allows us to adapt to new changes in data sources and knowledge bases by simply mapping them to CDMs without changing the rest of the system. Figure 2.3 summarizes the key system components of a discovery solution. The boxes are the major system components. We will also pro- vide a summary of the description to each of the following major system components: Content Curator Content curator is responsible for managing domain content sources. It includes collecting, cleansing, and making diverse content sources available fordownstreamprocessing,end-userqueries,andbrowsing.Typicalfunctions include data ingestion, extraction, transformation, and loading into some content repository. It also includes functions such as indexing and searching. Domain-pedia Domain-pedia is responsible for managing domain knowledge. It must ingest, update, and process data knowledge from existing knowledge
  • 48. 22    ◾     Accelerating Discovery All key components have common data model extensible data exchange format, standard web service interfaces, and customizable configurations Key components Catalog Content curator Domain- pedia Annotators Normalizers BigInsights framework Query services Analytics services User interface Catalog Annotators Normalizers Normalizers registry Annotator registry Filter Annotate Content curator Content registry Ontology / dictionary registry Domain-pedia BigInsights-based map reduce framework Normalize Transform and index Content index Specialty index Graph store Query services Analytics services Core UI services custom user interfaces FIGURE 2.3 (See color insert.) Functional model.
  • 49. Why Accelerate Discovery?   ◾    23 sources as well as some of the downstream processing, such as the seman- tic entity extraction process. It can also be a resource for runtime process- ing and analytics and end-user searching and browsing, similar to what one might do on Wikipedia, such as searching for knowledge around a given protein. Annotators Annotators are the engines that pull semantic domain concepts out of unstructured text information, such as chemicals, genes, and proteins, and their relationships. Normalizers Normalizers are the engines that organize the vocabularies around vari- ous domain concepts into a more consistent form. As the name indicates, they normalize domain concepts into standardized forms, such as unique chemical structures and protein names. BigInsights Framework TheBigInsightsframeworkisanorchestrationenginethatmanagestheinter- actions between the components described above in a scalable and efficient fashion by mapping runtime content, knowledge, annotation, and normal- ization processing in a large-scale Hadoop-like infrastructure framework. Such a framework is like the blood vessels of the human being; without it, we only have pieces and parts of the system, rather than a live system. Query Services Query services provide consistent data access interfaces for end-user and other system components to query underlying data repositories without knowing what format each data repository might be in. Analytics Services Analytics services include runtime algorithms to enable the discovery and construction of more complex knowledge representations. For exam- ple, they may produce network representations based on the underlying knowledge repository of a gene-to-gene relationship. User Interface The user interactions of a discovery system can be diverse. They can range from basic searches and reporting to more complex visualizations and
  • 50. 24   ◾    Accelerating Discovery workflows. Our user interface component is built with a “platform+appli- cations” principle. That is, we developed a suite of common user-interface widgets that can be leveraged by a wide range of use cases. We then design specific applications that are targeted for specific use cases by leveraging the common widgets as much as possible. This way, our discovery system can quickly be adapted to support different use cases without applications having to be rewritten each time. Catalogue Finally, the catalogue component manages the system records about all system components and their configurations and metadata about the con- tent sources and knowledge sources. It creates and manages a list of sys- tem and data catalogues. System administrators are the typical users of such a component. Clearly, a discovery system is complex and dynamic. Without the design principles that have been described, a discovery solution can quickly find itself stuck in face of changing content, knowledge, and user-interaction paradigms. ACCELERATED DISCOVERY FROM A DATA PERSPECTIVE The process of Accelerated Discovery can essentially be thought of as a data transformation process. Having just described the system for trans- forming that data, let us look at the process again from a data perspective and see how the data may be changed as we move through the discovery process. The discovery process is inherently a continuous transformation from raw pieces of data and information to enriched pieces of knowledge and more comprehensive knowledge representations, all the way to brand new discoveries and hypotheses. A discovery solution/platform can therefore be viewed as a system that supports and enables such data transformations and end-user interactions. Figure 2.4 summarizes four major data trans- formations driven by each of the four discovery process steps. We will now discuss each of these steps of transformation in detail. Initial Domain Content and Knowledge Collection The bottom inner circle of Figure 2.4 marks the beginning of the data transformation journey. To enable major discoveries, we must ensure that the system is aware of a vast and diverse set of prior domain knowledge and domain content. Without this step, the downstream analysis and
  • 51. Why Accelerate Discovery?   ◾    25 discovery will be extremely limited in scope, and wrong conclusions could be drawn. This is often the case with today’s bioinformatics tools, which operate on small and narrowly scoped data sets. We differentiate domain knowledge and domain content deliberately here since they mean different things. Domain knowledge means prior knowledge that has been captured digitally, such as manually curated domain ontologies, taxonomies, dictionaries, and manually curated structured databases. For example, in drug discovery, such domain knowledge may include ChemBL database [7], OBO ontologies [8], and other dictionaries. Domaincontentmeansrawdigitaldatacontentthatmightcaptureexist- ing domain knowledge but has not been curated systematically to allow a broad set of scientists to gain such knowledge easily. For example, many of the unstructured information sources, such as patents, published confer- ence and journal articles, and internal reports, often contain knowledge known by the individual scientists who wrote the documents. However, if this knowledge has not been made widely and easily accessible, much of it is locked down and unusable. Domain content also includes structured data and semistructured data such as genomic screening information The Discovery Platform—A Data Perspective The discovery platform is a system that continuously transforms from initial raw data and domain knowledge to brand new discoveries through a series of data transformation steps. Each step along the way will enable services and bring value to our clients and partners Emerging patterns and discovery Complex knowledge and visual representations Enriched data and enhanced domain knowledge Data and domain knowledge Create new hypothesis and predictions • Given known, show unknown • Can be simple or complex representations Compose complex and holistic knowledge representations • Graphs and networks • Runtime calculated visualizations, such as scatter plots • ...... Comprehend and extract semantic knowledge • Entity, relationships, and complex relationship extraction core and customizations • E.g., chemical, biological and toxicology annotators and PTM relationships • Enhanced domain-knowledge and enriched data content Collect and curate domain content and knowledge • Content collection and ingestion • Content curation, indexing • Patents, medline literature, ChemBL, ... • Domain knowledge/ontology ingestion • Domain-pedia management • OBO, SIDER, dictionaries, ontologies FIGURE 2.4 Data evolution.
  • 52. 26   ◾    Accelerating Discovery and experiments. Many such sources of raw data also require significant cleansing and processing before they can be made available and accessible. A discovery solution must be able to collect, cleanse, and make acces- sible a vast and diverse amount of domain knowledge and content in its initial data transformation step. The output of this initial step is a content and knowledge repository that is ready for downstream analysis, index- ing, data exploration, and browsing. Content Comprehension and Semantic Knowledge Extraction The second transformation has to do with the ability to derive an enriched set of domain knowledge from the initial content and knowledge col- lection. Specifically, when it comes to unstructured and semistructured content sources, such a transformation step is critical in capturing and extracting knowledge buried in text, tables, and figures in a systematic fashion; for example, extracting chemical structures and their properties, genes and proteins described in text, and tables of experimental results. Other ways to comprehend the content also include classification and taxonomy generation. Such methods help organize content more mean- ingfully and allow scientists to examine data with different lenses. Such a step often requires natural language processing, text mining, and machine learning capabilities. In addition, it also requires systems to coalesce and cross-map different styles of vocabulary into something that is more consistent. For example, one chemical compound may have over 100 different names. Unless such chemical compounds are cross-mapped and normalized in some fashion, there is very little hope that scientists can gain comprehensive knowledge about them, let alone discover new insights. A well-designed system component that carries out such forms of data comprehension will result in an enriched content and knowledge reposi- tory. Such a repository not only unlocks and captures domain knowledge buried in unstructured data, but also cross-maps different domain vocab- ularies in a more consistent fashion. An analogy of such a knowledge repository is a “domain-pedia.” That is, it contains all pieces of knowledge gained via machine curation and gathers them together into a reposi- tory that is like Wikipedia but more in-depth and comprehensive for the domain under study. Complex and High-Level Knowledge Composition and Representation Building on the repositories created from the previous data transforma- tion steps, the third data transformation step attempts to generate more
  • 53. Why Accelerate Discovery?   ◾    27 holistic knowledge representations by composing fragmented pieces of knowledge and data into more holistic and complex views, such as graphs and networks. Such compositions represent known knowledge, but the knowledge becomes much more visible and accessible to scientists than before. Compared to the previous steps, which focus more on gathering pieces of fragmented content and knowledge in one place and making them more easily accessible, this step focuses more on various ways to compose content and knowledge. Such views and representations allow scientists to have a better chance of gaining new insights and making discoveries. Such a transformation is facilitated by a combination of techniques, including graph- and network-based algorithms, visualizations, statistics, and machine learning on top of the underlying content and knowledge repositories. This is a step where many different combinations of views and representations may be created, and human-machine interactions via creative visualizations become essential. New Hypothesis and Discovery Creation The last data transformation in this process leapfrogs from the forms of data transformation described above into new hypotheses and discover- ies. The key to this step lies in prediction. Since the previous data transfor- mations are meant to operate on vast and diverse content and knowledge, the input dimensions for this discovery and predictive step can be far more significant than what traditional approaches have to deal with. For example, the feature space of the models can be extremely large. This may require totally new approaches to modeling and prediction. The discovery output of this step, when validated, can become new knowledge that feeds back once again into the entire data transformation and discovery process. Clearly, these data transformations are not static. They are continuous, self-enhancing, and self-correcting. New content sources and knowledge may be added and incorporated, and obsolete con- tent and knowledge may be cleansed or archived. Notice also that end users and businesses can take advantage of each step of the data transformations and create business value before the final steps are completed. The initial data collection and curation itself can be of tremendous value as it will have already brought multiple fragmented sources of data together in one place and made content searchable and browsable. One of the most basic uses of a discovery solution is simply to query data sets that have been brought together into a single index.
  • 54. 28   ◾    Accelerating Discovery The second step of the data transformation fills a huge gap that exists today across industries. We now can extract tens of millions of chemical structures from patents and publications in hours or minutes. In the past, such an endeavor would have taken hundreds of chemists manually read- ing documents months or even years. Scientists can now immediately find all chemical compounds invented in the past. The third step of the data transformation reveals more comprehensive views of the scientific domain to scientists. Even without machines automati- cally predicting and discovering new insights, such comprehensive represen- tations will make it much more possible for humans to spot new patterns. The last step makes machine-based discovery visible to the end users. ACCELERATED DISCOVERY IN THE ORGANIZATION Many enterprises have begun to realize the need for computationally aided discovery. However, existing IT infrastructure, data-center frameworks, and data-processing paradigms may not be easily reworked to support an end-to-end discovery process. This is because many such infrastructures are built on historical needs and assumptions. The underlying software and hardware are often insufficient to scale and adapt to what is needed by a discovery solution. Because of this, we believe new business models and business processes will be needed to enable discovery systematically. In particular, with the rapid growth of public content and knowledge repositories, cloud-based infrastructure, and scalable cognitive comput- ing capabilities built on distributed architecture such as Hadoop, it has become more attractive to structure “managed discovery service” busi- ness models to enable the rapid adoption of a discovery solution. A managed discovery service can allow a packaged cloud-compute infrastructure, preloaded relevant public content and knowledge reposito- ries, configured discovery middle-ware software stack, and predefined use cases to be supported. It can also be extended to incorporate additional content and knowledge sources based on customer needs. It can be cus- tomized to support use cases that customers desire. A managed discovery service can be structured in multiple ways to allow different cost-pricing structures to be implemented. For example, a pure public content–driven discovery solution can potentially be made available via a multitenant Software-as-a-Service (SaaS) model to allow cost sharing. When private content and use cases are incorporated, one can structure a hybrid model or single-tenant model, which would incur higher cost but would have higher levels of security and service-level
  • 55. Random documents with unrelated content Scribd suggests to you:
  • 56. CHAPTER XIII. SHADOWY VISITORS. When the eye gazes steadily at the Pleiades, in the midnight splendor of the starlit sky, one of the blazing orbs shrinks modestly from view and only six remain to be admired by the wondering gazer below: it is the quick, casual glance that catches the brilliant sister unawares, before she can hide her face. So, when the pioneers within the block-house looked intently at the stockade, they saw nothing but the wall of shadow and the outline of the sharp pickets above; but, as their vision flitted along the front, they caught the faint suggestions of the figures of men standing erect and doubtless intently watching the block-house, from which the rifles of the Kentuckians had flashed but a short time before. Whenever the moon's light was obscured, nothing but blank darkness met the eye, the line of stockades themselves vanishing from sight. Once one of the warriors moved a few steps to the left, and Jo Stinger and Ned Preston detected it. Why not try another shot? asked the Colonel, when the matter was referred to. It is too much guess-work: nobody can take any sort of aim, when it is so dark in the block-house. I wonder what their purpose can be, muttered the Colonel, speaking as much to himself as to those near him.
  • 57. I knows what it am, said Blossom Brown, who had been drawn to the spot by the firing and the words he had overheard. You do, eh? remarked the Colonel, looking toward him in the darkness; what is it? Dey're comin' to steal de well. What will they do with it, after they steal it? Take it off in de woods and hide it, I s'pose. They won't have any trouble in preventing us from stealing it,—that is certain, observed the Colonel, bitterly. Why can't we dig the well inside the block-house, as you intended? asked Ned; there are shovels, spades and picks, and I don't suppose it would take us a great while. If we are driven to it, we will make the attempt; but there is no likelihood that we will have a chance. All our attention will be required by the Indians. You can set Blossom to work if you wish to, said Ned Preston; he is good for little except to cut wood and dig. If he worked steadily for two or three days, he might reach water. Ned was in earnest with this proposition, and he volunteered to take his turn with his servant and the others; but the scheme filled Blossom with dismay. I neber dugged a well, he said, with a contemptuous sniff; if I should undertook it, de well would cave in on me, and den all you folks would hab to stop fightin' de Injines and go to diggin' me out agin. Colonel Preston did not consider the project feasible just then, and Blossom Brown was relieved from an anticipation which was anything but pleasant.
  • 58. Jo Stinger was attentively watching the stockade where the figures of the Wyandot warriors were faintly seen. He was greatly mystified to understand what their object could be in exposing themselves to such risk, when, so far as he could judge, there was nothing to be gained by so doing; but none knew better than did the veteran that, brave as were these red men, they were not the ones to face a danger without the reasonable certainty of acquiring some advantage over an enemy. I will risk a shot anyway, he thought; for, though I can't make much of an aim, there is a chance of doing something. As soon as the moon comes out, I will see how the varmints will stand a bullet or two. So he waited till the clouds rolled by, but, as he feared, the straining eye could not catch the faintest suggestion of a warrior, where several were visible only a short time before. They had vanished as silently as the shadows of the clouds swept across the clearing. The action of the Indians in this respect was the cause of all kinds of conjectures and theories, none of the garrison being able to offer one that satisfied the others. Megill believed it was a diversion intended to cover up some design in another direction. He was sure that, when the Wyandots made a demonstration, it would come from some other point altogether. He, therefore, gave his attention mainly to the cabins and the clearing in front. Turner suspected they meant to destroy the well by filling it up, so that it would be useless when the supply of water within the block- house should become exhausted. Precisely how this filling up was to be done, and wherein the necessity existed (since the Wyandots could command the approaches to the water day and night), were beyond the explanation of the settler.
  • 59. Jo Stinger, the veteran of the company, scouted these theories, as he did that of the Colonel that it was a mere reconnoissance, but he would not venture any guess further than that the mischief was much deeper than any believed, and that never was there more necessity of the most unremitting vigilance. Megill asserted that some scheme was brewing in the cabin from which the two warriors emerged, when they sought to cut off the boys in their run to the block-house. He had seen lights moving about, though the ones who carried the torches took care not to expose themselves to any shot from the station. The silence lasted two hours longer without the slightest evidence that a living person was within a mile of the block-house. During that period, not a glimmer of a light could be detected in the cabin, there was not a single burning arrow, nor did so much as a war-whoop or signal pass the lips of one of the Wyandots. The keen eyes of Jo Stinger and Ned Preston failed to catch a glimpse of the shadowy figures at which they discharged their rifles, and which caused them so much wonderment and speculation. But the keen scrutiny that seized every favoring moment and roamed along the lines of stockades, further than the ordinary eye could follow, discovered a thing or two which were not without their significance. On the northern and eastern sides a number of pickets had been removed, leaving several gaps wide enough to admit the passage of a person. This required a great deal of hard work, for the pickets had been driven deep into the earth and were well secured and braced from the inside. They needed men on both sides of the stockade to do that, said Colonel Preston, and those whom we saw, climbed over, so as to give assistance.
  • 60. That's the most sensible idee that's been put forward, replied Jo Stinger, and I shouldn't be s'prised if you was right; but somehow or other—— By gracious! I smell smoke sure as yo's bo'n! Blossom Brown gave several vigorous sniffs before uttering this alarming exclamation, but the words had no more than passed his lips, when every man knew he spoke the truth. There was smoke in the upper part of the block-house, and though it could not be seen in the darkness, yet it was perceptible to the sense of smell. Consternation reigned for a few minutes among the garrison, and there was hurrying to and fro in the effort to learn the cause of the burning near them. The most terrifying cry that can strike the ears of the sailor or passenger at sea is that of fire, but no such person could hold the cry in greater dread than did the garrison, shut in the block-house and surrounded by fierce American Indians. The first supposition of Colonel Preston was that it came from the roof, and springing upon a chair, he shoved up the trap-doors, one after the other, to a dangerously high extent. But whatever might have happened to the other portions of the structure, the roof was certainly intact. The next natural belief was that it was caused by the fire on the hearth in the lower story, and Colonel Preston and Blossom Brown made all haste down the ladder. Blossom, indeed, was too hasty, for he missed one of the rounds and went bumping and tumbling to the floor, where he set up a terrific cry, to which no attention was paid amid the general excitement. Here it is! Here's the fire! suddenly shouted Ned Preston, in a voice which instantly brought the others around him.
  • 61. Ned had done that wise thing to which we have all been urged many a time and oft: he had followed his nose to the north-east corner of the block-house, where the vapor was so dense that he knew the cause must be very near. It so happened that this very nook was the least guarded of all. Looking directly downward through the holes cut in the projecting floor, his eyes smarted so much from the ascending vapor that he was forced to rub them vigorously that he might be able to see. He could detect nothing but smoke for a minute or so, and that, of course, made itself manifest to the sense of smell and touch rather than to that of sight; but he soon observed, directly beneath his feet, the red glow of fire itself. Then it was he uttered the startling cry, which awoke Mrs. Preston and brought the rest around him. Despite the care and skill with which the station had been guarded by the garrison, all of whom possessed a certain experience in frontier-life, the wily Wyandots had not only crept up to the block- house itself without discovery, but they had brought sticks, had piled them against the north-east corner, had set fire to them, and had skulked away without being suspected by any one of the sentinels. The fact seemed incredible, and yet there was the most convincing evidence before or rather under their eyes. Jo Stinger gave utterance to several emphatic expressions, as he made a dash for the barrel of water, and he was entirely willing to admit that of all idiots who had ever pretended to be a sensible man, he was the chief. But the danger was averted without difficulty. Two pails of water were carefully poured through the openings in the floor of the projecting roof, and every spark of fire was extinguished. The water added to the density of the vapor. It set all the inmates coughing and caused considerable annoyance; but it soon passed away, and, after a time, the air became comparatively pure again.
  • 62. Megill complimented the cunning of the Wyandots, but Jo insisted that they had shown no special skill at all: it was the utter stupidity of himself and friends who had allowed such a thing to be done under their very noses. And, if it hadn't been for that darkey there, said he, with all the severity he could command, we wouldn't have found it out till this old place was burned down, and we was scootin' across the clearin' with the varmints crackin' away at us. De gemman is right, assented Blossom, as he stopped rubbing the bruises he received from tumbling through the ladder; you'll find dat it's allers me dat wokes folks up when de lightnin' am gwine to strike somewhar 'bout yar. We won't deny you proper credit, said Colonel Preston, though Jo is a little wild in his statements—— The unimportant remark of Colonel Preston was bisected by the sharp report of Jo Stinger's rifle, followed on the instant by a piercing shriek from some point near the block-house, within the stockade. I peppered him that time! exclaimed the veteran; it's all well enough to crawl into yer winder, gather all the furniture together and set fire to it, and then creep out agin, but when it comes to stealin' the flint and tinder out of your pocket to do it with, then I'm going to get mad. When the scout had regained something of his usual good nature, he explained that he had scarcely turned to look out, when he actually saw two of the Wyandots walking directly toward the heap of smoking brush, as though they intended to renew the fire. The sight he considered one of the grossest insults ever offered his intelligence, and he fired, without waiting till some one could arrange to shoot the second red man.
  • 63. With a daring that was scarcely to be wondered at, the warrior who was unhurt threw his arm about his smitten companion and hurried to one of the openings in the stockade, through which he made his way. This slight check would doubtless cause the red men to be more guarded in their movements against the garrison. It has teached them, said the hunter, with something of his grim humor, that accidents may happen, and some of 'em mought get hurt if they go to looking down the muzzles of our guns. All noticed a rather curious change in the weather. The sky, which had been quite clear early in the evening, was becoming overcast, and the clouds hid the moon most of the time. It remained cold and chilly, and more than one of the garrison wrapped a blanket around him, while doing duty at the loopholes. The cloudiness became so marked, after a brief while, that the view was much shortened in every direction. Those at the front of the block-house could not see the edge of the clearing, where the Licking flowed calmly on its way to the Ohio. Those on the north saw first the line of stockades dissolve into darkness, and then the well- curb (consisting of a rickety crank and windlass), grew indistinct until its outlines faded from sight. The two cabins to the south loomed up in the gloom as the hulls of ships are sometimes seen in the night-time at sea, but the blackness was so profound, it became oppressive. Within the block-house, where there was no light of any kind burning, it was like that of ancient Egypt. Colonel Preston could not avoid a certain nervousness over the attempt of the Wyandots to fire the building, and, though it failed, he half suspected it would be repeated. He descended the ladder and made as careful an examination as possible, but failed to find anything to add to his alarm and
  • 64. misgiving. Everything seemed to be secure: the fastenings of the doors were such that they might be considered almost as firm as the solid logs themselves. While he was thus engaged, he heard some one coming down the ladder. Who's there? he asked in an undertone. It's Jo—don't be scart. I'm not scared; I only wanted to know who it is; what are you after? I'm going out-doors, right among the varmints. What has put that idea in your head? They've been playing their tricks on us long enough, and now I'm going to show them that Jo Stinger knows a thing or two as well as them. Colonel Preston would have sought to dissuade the veteran from the rash proceeding, had he not known that it was useless to do so.
  • 65. CHAPTER XIV. A MISHAP AND A SENTENCE. Deerfoot the Shawanoe first pinned the rattlesnake to the earth with the arrow which he threw with his deft left hand, then he flung the reptile from his path and resumed his delicate and dangerous attempt to creep past the three Wyandots who were lying against the hank of the Licking, watching the block-house, now and then firing a shot at the solid logs, as if to express their wishes respecting the occupants of the building. If the task was almost impossible at first, it soon became utterly so, as the young Shawanoe was compelled to admit. The contour of the bank was such that, after getting by the log, he would be compelled to approach the warriors so close that he could touch them with his outstretched hand. This would have answered at night, when they were asleep, but he might as well have attempted to lift himself through the air as to do it under the circumstances we have described. Deerfoot never despaired nor gave up so long as he held space in which to move. He immediately repeated the retrograde motion he had used when confronted by the venomous serpent, his wish now being to return to the spot from which he fired the arrow. The ventures made satisfied him that he had but one chance in a thousand of escaping capture and death. He could not move to the right nor left: it would have been certain destruction to show himself
  • 66. on the clearing, and equally fatal to attempt to use the shallow Licking behind him. There was a remote possibility that the arrowy messenger which he had sent from his bow had not been noticed by any of the besieging Wyandots, and that, as considerable time had already passed, none of them would come over to where he was to inquire into the matter. If they would keep as far away from him as they were when his friend Ned Preston started on his desperate run for the block-house, of course he would be safe. He could wait where he was, lying flat on the ground, through all the long hours of the day, until the mantle of night should give him the chance for which he sighed. Ah, but for one hour of darkness! His flight from the point of danger would be but pastime. The single chance in a thousand was that which we have named: the remote possibility that none of the Wyandots would come any nearer to where he was hugging the river bank. For a full hour Deerfoot was in suspense, with a fluttering hope that it might be his fortune to wait until the sun should climb to the zenith and sink in the west; for, young as was the Shawanoe, he had learned the great truth that in the affairs of this world no push or energy will win, where the virtue of patience is lacking. Many a time a single move, born of impatience, has brought irretrievable disaster, where success otherwise was certain. As the Shawanoe lay against the bank, looking across the clearing toward the block-house, he recalled that message which, instead of being spoken, as were all that he knew of, was carried on the arrow he sent through the window. If he but understood how to place those words on paper or on a dried leaf even, he would send another missive to Colonel Preston, saying that, inasmuch as he was shut in from all hope of escape, he would make the effort to run across the open space, as did his friends before him.
  • 67. But the thing was impossible: the door of the block-house was fastened, and if Deerfoot should start, he would reach it, if he reached it at all, before the Colonel could draw the first bolt. Even if the Shawanoe youth should succeed in making the point, which was extremely doubtful, now that the Wyandots were fully awake, the inevitable few seconds' halt there must prove fatal. The short conversation which he had overheard, convinced him of the sentiments of Waughtauk and his warriors toward him, and led the young Shawanoe to determine on an effort to extricate himself. It is the very daring of such a scheme which sometimes succeeds, and he put it in execution without delay. Instead of crouching to the ground, as he had been doing, he now rose upright and moved down the bank, in the direction of the three Wyandots who first turned him back. They were in their old position, and he had gone only a few steps when one of them turned his head and saw the youthful warrior approaching. He uttered a surprised Hooh! and the others looked around at the figure, as they might have done had it been an apparition. The scheme of Deerfoot was to attempt the part of a friend of the Wyandots and consequently that of an enemy of the white race. He acted as if without thought of being anything else, and as though he never dreamed there was a suspicion of his loyalty. At a leisurely gait he walked toward the three Indians, holding his head down somewhat, and glancing sideways through the scattered bushes at the top of the bank, as though afraid of a shot from the garrison. Have any of my brethren of the Wyandots been harmed by the dogs of the Yenghese? asked Deerfoot in the high-flown language peculiar to his people. The eyes of Deerfoot must have been closed not to see Oo-oo-mat- ah lying on the ground before his eyes.
  • 68. This was an allusion to the warrior who made the mistake of stopping Ned Preston when on his way to the block-house. Deerfoot saw Oo-oo-mat-ah fall, as falls the brave warrior fighting his foe; the eyes of Deerfoot were wet with tears, when his brave Wyandot brother fell. Strictly speaking, a microscope would not have detected the first grain of truth in this grandiloquent declaration, which was accompanied by a gesture as though the audacious young Shawanoe was on the point of breaking into sobs again. The apparent sincerity of Deerfoot's grief seemed to disarm the Wyandots for the moment, which was precisely what the young Shawanoe was seeking to do. Having mastered his sorrow, he started down the river bank on the same slow gait, glancing sideways at the block-house as though he feared a shot from that point. But the Indians were not to be baffled in that fashion: their estimate of the daring Deerfoot was the same as Waughtauk's. Without any further dissembling, one of the Wyandots, a lithe sinewy brave, fully six feet in height, bounded in front of the Shawanoe, and grasping his knife, said with flashing eyes— Deerfoot is a dog! he is a traitor; he is a serpent that has two tongues! he shall die! The others stood a few feet behind the couple and watched the singular encounter. The Wyandot, with the threatening words in his mouth, leaped toward Deerfoot, striking a vicious blow with his knife. It was a thrust which would have ended the career of the youthful brave, had it reached its mark. But Deerfoot dodged it easily, and, without attempting to return it, shot under the infuriated arm and sped down the river bank with all
  • 69. the wonderful speed at his command. The slight disturbance had brought the other three Wyandots to the spot, and it would have been an easy thing to shoot the fugitive as he fled. But among the new arrivals were those who knew it was the wish of Waughtauk that Deerfoot should be taken prisoner, that he might be put to the death all traitors deserved. Instead of firing their guns therefore, the whole six broke into a run, each exerting himself to the utmost to overtake the fleet-footed youth, who was no match for any one of them in a hand-to-hand conflict, or a trial of strength. Deerfoot, by his sharp strategy, had thrown the whole party behind him and had gained two or three yards' start: he felt that, if he could not hold this against the fleetest of the Wyandots, then he deserved to die the death of a dog. The bushes, undergrowth and logs which obstructed his path, were as troublesome to his pursuers as to himself, and he bounded over them like a mountain chamois, leaping from crag to crag. There can be no question that, if this contest had been decided by the relative swiftness of foot on the part of pursuer and pursued, the latter would have escaped without difficulty, but, as if the fates were against the brave Shawanoe, his matchless limbs were no more than fairly going, when two Wyandot warriors appeared directly in front in such a position that it was impossible to avoid them. Deerfoot made a wrenching turn to the right, as if he meant to flank them, but he stumbled, nearly recovered himself—then fell with great violence, turning a complete somersault from his own momentum, and then rose to his feet, as the Indians in front and rear closed around him. He uttered a suppressed exclamation of pain, limped a couple of steps, and then grasped a tree to sustain himself. He seemed to have sprained his ankle badly and could bear his weight only on one
  • 70. foot. No more disastrous termination of the flight could have followed. The Wyandots gathered about the poor fugitive with many expressions of pleasure, for the pursuers had just been forced to believe the young brave was likely to escape them, and it was a delightful surprise when the two appeared in front and headed him off. Besides, a man with a sprained ankle is the last one in the world to indulge in a foot-race, and they felt secure, therefore, in holding their prisoner. Dog! traitor! serpent with the forked tongue! base son of a brave chieftain! warrior with the white heart! These were a few of the expressions applied to the captive, who made no answer. In fact, he seemed to be occupied exclusively with his ankle, for, while they were berating him, he stooped over and rubbed it with both hands, flinging his long bow aside, as though it could be of no further use to him. The epithets were enough to blister the skin of the ordinary American Indian, and there came a sudden flush to the dusky face of the youthful brave, when he heard himself called the base son of a brave chieftain. But he had learned to conquer himself, and he uttered not a word in response. One of the Wyandots picked up the bow which the captive had thrown aside, and examined it with much curiosity. There was no attempt to disarm him of his knife and tomahawk, for had he not been disabled by the sprained ankle, he would have been looked upon as an insignificant prisoner, against whom it was cowardly to take any precautions. In fact, to remove his weapons that remained would have been giving dignity to one too contemptible to deserve the treatment of an ordinary captive.
  • 71. The aborigines, like all barbarians and many civilized people, are cruel by nature. The Wyandots, who had secured Deerfoot, refrained from killing him for no other reason than that it would have been greater mercy than they were willing to show to one whom they held in such detestation. As it was, two of them struck him and repeated the taunting names uttered when they first laid hands on him. Deerfoot still made no answer, though his dark eyes flashed with a dangerous light when he looked in the faces of the couple who inflicted the indignity. He asked them quietly to help him along, but, with another taunt, the whole eight refused. The one who had smote him twice and who held his bow, placed his hand against the shoulder of the youth and gave him a violent shove. Deerfoot went several paces and then fell on his knees and hands with a gasp of pain severe enough to make him faint. The others laughed, as he painfully labored to his feet. He then asked that he might have his bow to use as a cane; but even this was refused. Finding nothing in the way of assistance was to be obtained, his proud spirit closed his lips, and he limped forward, scarcely touching the great toe of the injured limb to the ground. The brief flight and pursuit had led the parties so far down the Licking that they were out of sight of the block-house, quite a stretch of forest intervening; but it had also taken them nearer the headquarters, as they may be called, of Waughtauk, leader of the Wyandots besieging Fort Bridgman. This sachem showed, in a lesser way, something of the military prowess of Pontiac, chief of the Chippewas, King Philip of Pokanoket, and Tecumseh, who belonged to the same tribe with Deerfoot. Although his entire force numbered a little more than fifty, yet he had disposed them with such skill around the block-house that the most experienced of scouts failed to make his way through the lines.
  • 72. Waughtauk was well convinced of the treachery of the Shawanoe, and there was no living man for whom he would have given a greater amount of wampum. The eyes of the chieftain sparkled with pleasure when the youthful warrior came limping painfully toward him, escorted by the Wyandots, as though they feared that, despite his disabled condition, he might dart off with the speed of the wind. Waughtauk rose from the fallen tree on which he had been seated among his warriors, and advanced a step or two to meet the party as it approached. Dog! base son of the noble chief Allomaug! youth with the red face and the white heart! serpent with the forked tongue! the Great Spirit has given it to Waughtauk that he should inflict on you the death that is fitting all such. These were fierce words, but the absolute fury of manner which marked their utterance showed how burning was the hate of the Wyandot leader and his warriors. They knew that this youth had been honored and trusted as no one of his years had ever been honored and trusted by his tribe, and his treachery was therefore all the deeper, and deserving of the worst punishment that could be devised. Deerfoot, standing on one foot, with his hand grasping a sapling at his side, looked calmly in the face of the infuriated leader, and in his low, musical voice, said— When Deerfoot was sick almost to death, his white brother took the place of the father and mother who went to the happy hunting grounds long ago; Deerfoot would have been a dog, had he not helped his white brother through the forest, when the bear and the panther and the Wyandot were in his path. This defence, instead of soothing the chieftain, seemed to arouse all the ferocity of his nature. His face fairly shone with flame through
  • 73. his ochre and paint; and striding toward the prisoner, he raised his hand with such fierceness that the muscles of the arm rose in knots and the veins stood out in ridges on temple and forehead. As he threw his fist aloft and was on the point of smiting Deerfoot to the earth, the latter straightened up with his native dignity, and, still grasping the sapling and still standing on one foot, looked him in the eye. It was as if a great lion-tamer, hearing the stealthy approach of the wild beast, had suddenly turned and confronted him. Waughtauk paused at the moment, his fist was in the air directly over the head of Deerfoot, glowering down upon him with an expression demoniac in its hate. He breathed hard and fast for a few seconds and then retreated without striking the impending blow. But it must not be understood that it was the defiant look of the captive which checked the chief. It produced no such effect, nor was it intended to do so: it simply meant on the part of Deerfoot that he expected indignity and torture and death, and he could bear them as unflinchingly as Waughtauk himself. As for the chieftain, he reflected that a little counsel and consultation were needed to fix upon the best method of putting this tormentor out of the way. If Waughtauk should allow his own passion to master him, the anticipated enjoyment would be lost. While Deerfoot, therefore, retained his grasp on the sapling, that he might be supported from falling, Waughtauk called about him his cabinet, as it may be termed, and began the consideration of the best means of punishing the traitor. The captive could hear all the discussion, and, it need not be said, he listened with much more interest than he appeared to feel. It would be revolting to detail the schemes advocated. If there is any one direction in which the human mind is marvelous in its ingenuity, it is in the single one of devising means of making other beings
  • 74. miserable. Some of the proposals of the Wyandots were worthy of Nana Sahib, of Bithoor, but they were rejected one after the other, as falling a little short of the requirements of the leader. There was one fact which did not escape the watchful eye and ear of the prisoner. The Wyandot who struck him twice, and who had taken charge of his bow, as a trophy belonging specially to himself, was the foremost in proposing the most cruel schemes. The look which Deerfoot cast upon him said plainly— I would give the world for a chance to settle with you before I suffer death! Suddenly a thought seemed to seize Waughtauk like an inspiration. Rising to his feet, he held up his hand for his warriors to listen: Deerfoot is a swift runner; he has overtaken the fleeing horse and leaped upon his back; he shall be placed in the Long Clearing; he shall be given a start, and the swiftest Wyandot warriors shall be placed in line on the edge of the Long Clearing; they shall start together, and the scalp of Deerfoot shall belong to him who first overtakes him. This scheme, after all, was merciful when compared with many that were proposed; but the staking of a man's life on his fleetness, when entirely unable to run, is an idea worthy of an American Indian.
  • 75. CHAPTER XV. AN UNEXPECTED VISITOR. Jo Stinger had decided to venture out from the block-house, at a time when the Wyandots were on every side, and when many of them were within the stockade and close to the building itself It was a perilous act, but the veteran had what he deemed good grounds for undertaking it. In the first place, the darkness had deepened to that extent, within the last few hours, that he believed he could move about without being suspected: he was confident indeed that he could stay out as long as he chose and return in safety. He still felt chagrined over the audacity of the Wyandots, which came so near success, and longed to turn the tables upon them. But Jo Stinger had too much sense to leave the garrison and run into great peril without the prospect of accomplishing some good thereby. He knew the Wyandots were completing preparations to burn the block-house. He believed it would be attempted before morning, and, if not detected by him, would succeed. He had strong hope that, by venturing outside, he could learn the nature of the plan against which it would therefore be possible to make some preparation. Colonel Preston was not without misgiving when he drew the ponderous bolts, but he gave no expression to his thoughts. All was blank darkness, but, when the door was drawn inward, he felt
  • 76. several cold specks on his hand, from which he knew it was snowing. The flakes were very fine and few, but they were likely to increase before morning, by which time the ground might be covered. When shall I look for your return? asked the Colonel, but, to his surprise, there was no answer. Jo had moved away, and was gone without exchanging another word with the commandant. The latter refastened the door at once. He could not but regard the action of the most valuable man of his garrison as without excuse: at the same time he reflected that his own title could not have been more empty, for no one of the three men accepted his orders when they conflicted with his personal views. In the meantime Jo Stinger, finding himself on the outside of the block-house, was in a situation where every sense needed to be on the alert, and none knew it better than he. The door which Colonel Preston opened was the front one, being that which the scout passed through the previous night, and which opened on the clearing along the river. He was afraid that, if he emerged from the other entrance, he would step among the Wyandots and be recognized before he could take his bearings. But Jo felt that he had entered on an enterprise in which the chances were against success, and in which he could accomplish nothing except by the greatest risk to himself. The listening Colonel fancied he heard the sound of his stealthy footstep, as the hunter moved from the door of the block-house. He listened a few minutes longer, but all was still except the soft sifting of the snow against the door, like the finest particles of sand and dust filtering through the tree-tops. The Colonel passed to the narrow window at the side and looked out. It had become like the blackness of darkness, and several of the whirling snow-flakes struck his face.
  • 77. The Wyandots are concocting some mischief, and there's no telling what shape it will take until it comes. I don't believe Jo will do anything that will help us. And with a sigh the speaker climbed the ladder again and told his friends how rash the pioneer had been. I wouldn't have allowed him to go, said Ned Preston. There's no stopping him when he has made up his mind to do anything. Why didn't you took him by de collar, asked Blossom Brown, and slam him down on de floor? Dat's what I'd done, and, if he'd said anyting, den I'd took him by de heels and banged his head agin de door till he'd be glad to sot down and behave himself. Jo is a skilled frontiersman, said the Colonel, who felt that it was time he rallied to the defence of the scout; he has tramped hundreds of miles with Simon Kenton and Daniel Boone, and, if his gun hadn't flashed fire one dark night last winter, he would have ended the career of Simon Girty. How was that? Simon Girty and Kenton served together as spies in Dunmore's expedition in 1774, and up to that time Girty was a good soldier, who risked much for his country. He was badly used by General Lewis, and became the greatest scourge we have had on the frontier. I don't suppose he ever has such an emotion as pity in his breast, and there is no cruelty that he wouldn't be glad to inflict on the whites. He and Jo know and hate each other worse than poison. Last winter, Jo crept into one of the Shawanoe towns one dark night, and when only a hundred feet away, aimed straight at Girty, who sat on a log, smoking his pipe, and talking to several warriors. Jo was so angered when his gun flashed in the pan, that he threw it upon the ground and barely saved himself by dashing out of the camp at the top of his speed. Jo has been in a great many perilous situations,
  • 78. added Colonel Preston, and he can tell of many a thrilling encounter in the depths of the silent forest and on the banks of the lonely streams, where no other human eyes saw him and his foe. No doubt of all that, replied Ned, who knew that he was speaking the sentiments of his uncle, but it seems to me he is running a great deal more risk than he ought to. I agree with you, but we have been greatly favored so far, and we will continue to hope for the best. The long spell of quiet which had followed the attempt to fire the block-house, permitted the children to sleep, and their mother, upon the urgency of her husband, had lain down beside them and was sinking into a refreshing slumber. Megill and Turner kept their places at the loopholes, watching for the signs of danger with as vigilant interest as though it was the first hour of the alarm. They were inclined to commend the course of Jo Stinger, despite the great peril involved. The Wyandots, beyond question, were perfecting some scheme of attack, which most likely could be foiled only by previous knowledge on the part of the garrison. The profound darkness and the skill of the hunter would enable him to do all that could be done by any one, under the circumstances. There came seconds, and sometimes minutes, when no one spoke, and the silence within the block-house was so profound that the faint sifting of the snow on the roof was heard. Then an eddy of wind would whirl some of the sand-like particles through the loopholes into the eyes and faces of those who were peering out. Men and boys gathered their blankets closer about their shoulders, and set their muskets down beside them, where they could be caught up the instant needed, while they carefully warmed their benumbed fingers by rubbing and striking the palms together.
  • 79. All senses were concentrated in the one of listening, for no other faculty was of avail at such a time. Nerves were strung to the highest point, because there was not one who did not feel certain they were on the eve of events which were to decide the fate of the little company huddled together in Fort Bridgman. This stillness was at its profoundest depth, the soft rustling of the snowflakes seemed to have ceased, and not a whisper was on the lips of one of the garrison, when there suddenly rang out on the night a shriek like that of some strong man caught in the crush of death. It was so piercing that it seemed almost to sound from the center of the room, and certainly must have been very close to the block-house itself. That was the voice of Jo! said Colonel Preston, in a terrified undertone, after a minute's silence; he has met his fate. You are mistaken, Megill hastened to say; I have been with Jo too often, and I know his voice too well to be deceived. It sounded marvelously like his. It did not to me, though it may have been so to you. If it was not Jo, then it must have been one of the Wyandots. That follows, as a matter of course; in spite of all of Jo's care, he has run against one of their men, or one of them has run against him. The only way to settle it then was in the hurricane order, and Jo has done it that promptly that the other has just had time to work in a first-class yell like that. I'm greatly relieved to hear you take such a view, said Colonel Preston, who, like the rest, was most agreeably disappointed to hear Megill speak so confidently, his brother-in-law adding his testimony to the same effect. Directly after that shriek, said Turner, I'm sure there was the tramping of feet, as if some one was running very fast: it passed
  • 80. under the stockade and out toward the well. I heard the footsteps too, added Ned Preston. So did I, chimed in Blossom Brown, feeling it his duty to say something to help the others along; but I'm suah dat de footsteps dat I heerd war on de roof. Some onrespectful Wyamdot hab crawled up dar, and I bet am lookin' down de chimbley dis minute. It seems to me, observed Ned to his uncle, that Jo will want to come back pretty soon. I think so too, replied his uncle, I will go down-stairs and wait for him. With these words he descended the rounds of the ladder and moved softly across the lower floor to the door, where he paused, with his hands on one of the heavy bars which held the structure in place. While crossing the room he looked toward the fire-place. Among the ashes he caught the sullen red of a single point of fire, like the glowering eye of some ogre, watching him in the darkness. Beside the huge latch, there were three ponderous pieces of timber which spanned the inner side of the door, the ends dropping into massive sockets strong enough to hold the puncheon slabs against prodigious pressure from the outside. Colonel Preston carefully lifted the upper one out of place and then did the same with the lowest. Then he placed his hand on the middle bar and held his ear close to the jamb, so that he might catch the first signal from the scout, whose return was due every minute. The listening ear caught the silken sifting of the particles of snow, which insinuated themselves into and through the smallest crevices, and a slight shiver passed through the frame of the pioneer, who had thrown his blanket off his shoulders so that he might have his arms free to use the instant it should become necessary.
  • 81. Colonel Preston had stood thus only a few minutes, when he fancied he heard some one on the outside. The noise was very slight and much as if a dog was scratching with his paw. Knowing that wood is a better conductor of sound than air, he pressed his ear against the door. To his astonishment he then heard nothing except the snowflakes, which sounded like the tapping of multitudinous fairies, as they romped back and forth and up and down the door. That's strange, thought he, after listening a few minutes; there's something unusual out there, and I don't know whether it is Jo or not. I'm afraid the poor fellow has been hurt and is afraid to make himself known. The words were yet in his mouth, when he caught a faint tapping outside, as if made by the bill of a bird. That's Jo! he exclaimed, immediately raising the end of the middle bar from its socket; he must be hurt, or he is afraid to signal me, lest he be recognized. At the moment the fastenings were removed, and Colonel Preston was about drawing the door inward, he stayed his hand, prompted so to do by the faintest suspicion that something was amiss. Jo! is that you? he asked in a whisper. Sh! Sh! He caught the warning, almost inaudible as it was, and instantly drew the door inward six or eight inches. Quick, Jo! the way is open! Even then a vague suspicion that all was not right led Colonel Preston to step back a single step, and, though he had no weapons, he clenched his fist and braced himself for an assault which he did not expect.
  • 82. The darkness was too complete for him to see anything, while the faint ember, smouldering in the fire-place, threw no reflection on the figure of the pioneer, so as to reveal his precise position. It was a providential instinct that led Colonel Preston to take this precaution, for as he recoiled some one struck a venomous blow at him with a knife, under the supposition that he was standing on the same spot where he stood at the moment the door was opened. Had he been there, he would have been killed with the suddenness almost of the lightning stroke. The pioneer could not see, and he heard nothing except a sudden expiration of the breath, which accompanied the fierce blow into vacancy, but he knew like a flash that, instead of Jo, it was a Wyandot Indian who was in the act of making a rush to open the way for the other warriors behind him. The right fist shot forward, with all the power Colonel Preston could throw into it. He was an athlete and a good boxer. As he struck, he hurled his body with the fist, so that all the momentum possible went with it. Fortunately for the pioneer the blow landed on the forehead of the unprepared warrior, throwing him violently backward against his comrades, who were in the act of rushing forward to follow in his wake. But for them he would have been flung prostrate full a dozen feet distant. The instant the blow was delivered, Colonel Preston sprang back, shoved the door to and caught up the middle bar. At such crises it seems as if fate throws every obstruction in the way, and his agony was indescribable, while desperately trying to get the bar in place. Only a few seconds were occupied in doing so, but those seconds were frightful ones to him. He was sure the entire war party would swarm into the block-house, before he could shut them out.
  • 83. The Indians, who were forced backward by the impetus of the smitten leader, understood the need of haste. They knew that, unless they recovered their ground immediately, their golden opportunity was gone. Suppressing all outcry, for they had no wish to draw the fire from the loopholes above, they precipitated themselves against the door, as though each one was the carved head of a catapult, equal to the task of bursting through any obstacle in its path. Thank Heaven! In the very nick of time Colonel Preston got the middle bar into its socket. This held the door so securely that the other two were added without trouble, and he then breathed freely. Drops of cold perspiration stood on his forehead, and he felt so faint that he groped about for a stool, on which he dropped until he could recover.
  • 84. CHAPTER XVI. OUT-DOORS ON A DARK NIGHT. In the meantime Jo Stinger, the veteran frontiersman, had not found the plain sailing which he anticipated. It will be remembered that he passed out upon the clearing in front of the block-house, because he feared that, if he entered the yard inclosed by the stockade, he would find himself among the Wyandots, who would be quick to detect his identity. His presence immediately in front of the structure would also draw attention to himself, and he therefore glided away until he was fully a hundred feet distant, when he paused close to the western pickets. Looking behind him, he could not see the outlines of the building which he had just left. For the sake of safety Colonel Preston allowed no light burning within the block-house, which itself was like a solid bank of darkness. It would be easy enough now for me to make my way to Wild Oaks, reflected Stinger; for, when the night is like this, three hundred Indians could not surround the old place close enough to catch any one crawling through. But it is no use for me to strike out for the Ohio now, for the boys could not get here soon enough to affect the result one way or the other. Long before that the varmints will wind up this bus'ness, either by going away, or by cleaning out the whole concern.
  • 85. Jo Stinger unquestionably was right in this conclusion, but he possessed a strong faith that Colonel Preston and the rest of them in the block-house would be able to pull through, if they displayed the vigilance and care which it was easy to display: this faith explains how it was the frontiersman had ventured upon what was, beyond all doubt, a most perilous enterprise. Jo, from some cause or other which he could not explain, suspected the Wyandots were collecting near the well, and he began working his way in that direction. It was unnecessary to scale the stockade, and he therefore moved along the western side, until he reached the angle, when he turned to the right and felt his way parallel with the northern line of pickets. Up to this time he had not caught sight or sound to show that an Indian was within a mile of him. The fine particles of snow made themselves manifest only by the icy, needle-like points which touched his face and hands, as he groped along. He carried his faithful rifle in his left hand, and his right rested on the haft of his long hunting-knife at his waist. His head was thrust forward, while he peered to the right and left, advancing with as much care as if he were entering a hostile camp on a moonlight night, when the overturning of a leaf is enough to awaken a score of sleeping red men. A moment after passing the corner of the stockade something touched his elbow. He knew on the instant that it was one of the Wyandots. In the darkness they had come thus close without either suspecting the presence of the other. Hooh! my brother is like Deerfoot, the dog of a Shawanoe. This was uttered in the Wyandot tongue, and the scout understood the words, but he did not dare reply. He could not speak well enough to deceive the warrior, who evidently supposed he was one of his own people.
  • 86. But there was the single exclamation which he could imitate to perfection, and he did so as he drew his knife. Hooh! he responded, moving on without the slightest halt. The response seemed satisfactory to the Wyandot, but could Jo have seen the actions of the Indian immediately after, he would have felt anything but secure on that point. The brave stood a minute or so, looking in the direction taken by the other, and then, as if suspicious that all was not what it seemed, he followed after the figure which had vanished so quickly. I would give a good deal if I but knowed what he meant by speaking of Deerfoot as he did, said Jo to himself, but I didn't dare ask him to give the partic'lars. I make no doubt they've catched the Shawanoe and scalped him long ago. Remembering the openings which he had seen in the stockade before the darkness became so intense, Jo reached out his right hand and run it along the pickets, so as not to miss them. He had gone only a little way, when his touch revealed the spot where a couple had been removed, and there was room for him to force his body through. Jo was of a spare figure, and, with little difficulty, he entered the space inclosed by the stockade. He now knew his surroundings and bearings, as well as though it were high noon, and began making his way with great stealth in the direction of the well standing near the middle of the yard. While he was doing this, the Wyandot with whom he had exchanged salutations was stealing after him: it was the old case of the hunter going to hunt the tiger, and soon finding the tiger was hunting him. The task of the Wyandot, however, for the time, was a more delicate one than was the white man's, for the dusky pursuer had lost sight of his foe (if indeed it can be said he had ever caught a view of him), instantly after the brief salutation between them.
  • 87. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com