Topological Data Analysis With Applications Carlsson Gunnar Vejdemojohansson

Topological Data Analysis With Applications
Carlsson Gunnar Vejdemojohansson download
https://ebookbell.com/product/topological-data-analysis-with-
applications-carlsson-gunnar-vejdemojohansson-44666214
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Topological Data Analysis For Scientific Visualization Julien Tierny
https://ebookbell.com/product/topological-data-analysis-for-
scientific-visualization-julien-tierny-6826876
Topological Data Analysis For Scientific Visualization 1st Ed 2017
Julien Tierny
https://ebookbell.com/product/topological-data-analysis-for-
scientific-visualization-1st-ed-2017-julien-tierny-6982056
Topological Data Analysis For Genomics And Evolution Topology In
Biology Raul Rabadan
https://ebookbell.com/product/topological-data-analysis-for-genomics-
and-evolution-topology-in-biology-raul-rabadan-10666818
Topological Data Analysis The Abel Symposium 2018 1st Edition Baas
https://ebookbell.com/product/topological-data-analysis-the-abel-
symposium-2018-1st-edition-baas-11174668

Topological Dynamics And Topological Data Analysis 1st Edition Robert
L Devaney
https://ebookbell.com/product/topological-dynamics-and-topological-
data-analysis-1st-edition-robert-l-devaney-43431708
Ethical And Philosophical Issues In Medical Imaging Multimodal
Learning And Fusion Across Scales For Clinical Decision Support And
Topological Data Analysis For Biomedical Imaging 1st International
Workshop Epimi 2022 12th International Workshop Mlc John S H Baxter
https://ebookbell.com/product/ethical-and-philosophical-issues-in-
medical-imaging-multimodal-learning-and-fusion-across-scales-for-
clinical-decision-support-and-topological-data-analysis-for-
biomedical-imaging-1st-international-workshop-epimi-2022-12th-
international-workshop-mlc-john-s-h-baxter-48734076
Topological Methods In Data Analysis And Visualization Theory
Algorithms And Applications 1st Edition Kirk E Jordan
https://ebookbell.com/product/topological-methods-in-data-analysis-
and-visualization-theory-algorithms-and-applications-1st-edition-kirk-
e-jordan-2046606
Topological Methods In Data Analysis And Visualization V Theory
Algorithms And Applications 1st Ed Hamish Carr
and-visualization-v-theory-algorithms-and-applications-1st-ed-hamish-
carr-22425608
Topological Methods In Data Analysis And Visualization Ii Theory
Algorithms And Applications 1st Edition Jan Reininghaus
and-visualization-ii-theory-algorithms-and-applications-1st-edition-
jan-reininghaus-2515732

Topological Data Analysis with Applications
The continued and dramatic rise in the size of data sets has meant that new methods
are required to model and analyze them. This timely account introduces topological
data analysis (TDA), a method for modeling data by geometric objects, namely graphs
and their higher-dimensional versions, simplicial complexes. The authors outline the
necessary background material on topology and data philosophy for newcomers, while
more complex concepts are highlighted for advanced learners. The book covers all
the main TDA techniques, including persistent homology, cohomology, and Mapper.
The final section focuses on the diverse applications of TDA, examining a number
of case studies ranging from monitoring the progression of infectious diseases to the
study of motion capture data.
Mathematicians moving into data science, as well as data scientists or computer
scientists seeking to understand this new area, will appreciate this self-contained
resource which explains the underlying technology and how it can be used.
Gunnar Carlsson is Professor Emeritus at Stanford University. He received his doctoral
degree from Stanford in 1976, and has taught at the University of Chicago, at the
University of California, San Diego, at Princeton University, and, since 1991, at
Stanford University. His work within mathematics has been concentrated in algebraic
topology, and he has spent the last 20 years on the development of topological data
analysis. He is also passionate about the transfer of scientific findings to real-world
applications, leading him to found the topological data analysis-based company
Ayasdi in 2008.
Mikael Vejdemo-Johansson is Assistant Professor in the Department of Mathematics
at City University of New York, College of Staten Island. He received his doctoral
degree from Friedrich-Schiller-Universität Jena in 2008, and has worked in topolog-
ical data analysis since his first postdoc with Gunnar Carlsson at Stanford 2008–
2011. He is the chair of the steering committee for the Algebraic Topology: Methods,
Computation, and Science (ATMCS) conference series and runs the community web
resource appliedtopology.org.

Topological Data Analysis
with Applications
GUNNAR CARLSSON
Stanford University, California
MIKAEL VEJDEMO-JOHANSSON
City University of New York, College of Staten Island and the Graduate Center

University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre,
New Delhi – 110025, India
103 Penang Road, #05–06/07, Visioncrest Commercial, Singapore 238467
Cambridge University Press is part of the University of Cambridge.
It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9781108838658
DOI: 10.1017/9781108975704
© Gunnar Carlsson and Mikael Vejdemo-Johansson 2022
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2022
Printed in the United Kingdom by TJ Books Limited, Padstow Cornwall
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Carlsson, G (Gunnar), 1952– author | Vejdemo-Johansson, Mikael, 1980– author
Title: Topological data analysis with applications / Gunnar Carlsson, Mikael Vejdemo-Johansson.
Description: New York : Cambridge University Press, 2021. |
Includes bibliographical references and index.
Identifiers: LCCN 2021024970 | ISBN 9781108838658 (hardback)
Subjects: LCSH: Topology. | Mathematical analysis. | BISAC: MATHEMATICS / Topology
Classification: LCC QA611 .C29 2021 | DDC 514/.23–dc23
LC record available at https://lccn.loc.gov/2021024970
ISBN 978-1-108-83865-8 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.

Contents
Preface page vii
Part I Background 1
1 Introduction 3
1.1 Overview 6
1.2 Examples of Qualitative Properties in Applications 6
2 Data 11
2.1 Data Matrices and Spreadsheets 11
2.2 Dissimilarity Matrices and Metrics 15
2.3 Categorical Data and Sequences 18
2.4 Text 20
2.5 Graph Data 21
2.6 Images 22
2.7 Time Series 23
2.8 Density Estimation in Point Cloud Data 24
Part II Theory 27
3 Topology 29
3.1 History 29
3.2 Qualitative and Quantitative Properties 30
3.3 Chain Complexes and Homology 71
4 Shape of Data 94
4.1 Zero-Dimensional Topology: Single-Linkage Clustering 94
4.2 The Nerve Construction and Soft Clustering 98
4.3 Complexes for Point Cloud Data 102
4.4 Persistence 113
4.5 The Algebra of Persistence Vector Spaces 117
4.6 Persistence and Localization of Features 128
4.7 Non-Homotopy-Invariant Shape Recognition 130

vi Contents
4.8 Zig-Zag Persistence 139
4.9 Multidimensional Persistence 142
5 Structures on Spaces of Barcodes 145
5.1 Metrics on Barcode Spaces 145
5.2 Coordinatizing Barcode Space and Feature Generation 147
5.3 Distributions on B∞ 153
Part III Practice 157
6 Case Studies 159
6.1 Mumford Natural Image Data 159
6.2 Databases of Compounds 169
6.3 Viral Evolution 171
6.4 Time Series 174
6.5 Sensor Coverage and Evasion 180
6.6 Vectorization Methods and Machine Learning 187
6.7 Caging Grasps 193
6.8 Structure of the Cosmic Web 195
6.9 Politics 199
6.10 Amorphous Solids 201
6.11 Infectious Diseases 204
References 207
Index 216

Preface ix
• Cluster analysis seeks to divide data into disjoint groups to create a taxonomy of the
data set. In many situations where cluster analysis is applied, however, one finds
that the natural output is not a partition of the data into disjoint groups but, rather,
a “soft clustering” in which the data is broken into groups that may overlap. This
kind of information is very naturally modeled with a simplicial complex, which is
able to describe the relationships between groups implied by the overlaps, using
an appropriate shape or space. An ordinary cluster decomposition, by a partition,
is in this situation modeled by a zero-dimensional simplicial complex, i.e. a finite
set of points.
• Data science is often confronted with the problem of deciding the appropriate shape
for a data set, so as to be able to model it effectively. The theory of simplicial
complexes is equipped with a method for describing the shape structure of the
output of a model; this is based on an extension of the homology construction from
the algebraic topology of spaces. The extension is called persistent homology and
will be the subject of many of the ideas that we present.
• Feature selection and feature engineering is a major task in data science. It is
particularly challenging for data sets which are unstructured in the sense that they
are not well represented by a data matrix or a spreadsheet with numerical entries.
For example, a database of large molecules is regarded as unstructured because it
consists of an unordered set of atoms and an unordered set of bonds and, further,
because the spatial coordinates of the atoms can be varied via a rigid motion of
space while the structure of the molecule remains unchanged. This means that
a representation by the coordinates of the atoms is not meaningful. It turns out,
though, that the molecules themselves have a geometry expressed through inter-
atomic distances, which allows us to apply the homology tools described above
to generate meaningful numerical quantities that can be used for analysis. Images
form another class of data which can be viewed as unstructured and which can be
studied with homological methods.
• Another way in which topological methods can be used for feature engineering is
the notion of topological signal processing (Robinson 2014). The idea here is
that, when given a data matrix, it is also useful to develop a topological model
for the columns of the data matrix (i.e. the features) rather than for the points
or samples of the data (i.e. the rows). In this way, each of the original data
points can be viewed as a function on the set of features and ultimately as a
function on the topological model. Incorporating various methods, including graph
Laplacians, one can impose structure on data points using this approach, and obtain
topologically informed dimensionality reductions.
We believe that the use of TDA in data science will motivate interesting and useful
developments within topology. It is therefore useful to see where TDA methods fit within
standard algebraic topology and homotopy theory. Here are some important points about
this fit.
• Persistent homology can be described as the study of diagrams whose shape is defined
by the partially ordered set R. A number of other diagrams have been studied,

x Preface
including those used in zig-zag persistence and multidimensional persistence. As
the work in TDA broadens and deepens, it is likely that increasingly sophisticated
diagrams will be useful for extracting more detailed information from data sets. It
follows that the construction of invariants for diagrams of various shapes will be
a useful endeavor.
• Because TDA operates by studying samples of discrete sets of points, the dimensions
of spaces that can be analyzed solely by TDA methods are fairly low, for the most
part ≤ 5. A data set which would faithfully represent a space of dimension 10
would be expected to require at least 1010 points, if one assumes a resolution of 10
points for each dimension. This is already a very large number, and demonstrates
the point that, for example, 50-dimensional homology is not likely to occur in a
useful way in data sets. This suggests that studying more sophisticated unstable
homotopy invariants (such as cup products, Massey products, etc.) would be a
good direction to pursue. For example, the use of cup products is a key part of
Carlsson & Filippenko (2020).
• Within algebraic topology and homotopy theory, a very interesting aspect is the
topology of spaces equipped with a reference map to a base space B; this is
referred to as parametrized topology. All maps are then required to respect the
reference map. The category of spaces over a base contains a much richer set of
invariants than in the absolute case (i.e. ordinary topology, without a reference
map), where B is a single point. This idea comes up in the study of evasion
problems (Carlsson & Filippenko 2020) and can be used to define the idea of
data science over a base, or parametrized topological data analysis (Nelson 2020),
which appears to be a useful framework for an iterative method of data analysis.
The study of unstable invariants in this case is particularly rich, and warrants
further attention.
• One is often interested in studying the invariants of a space X which are not necessarily
topological in nature but which nevertheless can be thought of as qualitative, for
example, the notion of the corners or ends of spaces are examples of this kind of
situation. One way to approach such problems is to perform constructions on X so
as to produce an associated space which reflects the property one wants to study,
and then to use topological methods such as homology to perform the analysis.
A powerful example of this philosophy is the work of Simon Donaldson on the
topology of smooth 4-manifolds, where he showed that certain moduli spaces
attached to smooth 4-manifolds allow one to study the topology of the manifold
itself (Donaldson 1984). This kind of approach can be used to investigate various
shape distinction problems that are not directly topological in nature.
The goal of this book is to introduce the ideas of topological data analysis to both
data scientists and topologists. We have omitted much of the technical material about
topology in general, as well as for homology in particular, with the expectation that the
reader who has studied the book will be able to go further in the subject as needed.
We hope that it will encourage both groups to participate in this exciting intellectual
development.

Preface xi
As a matter of convention, we choose to use the terms injective, surjective, and
bijective instead of one-to-one, onto, and “one-to-one and onto”.
The authors are very grateful for helpful conversations with many people, including R.
Adler, A. Bak, E. Carlsson, J. Carlsson, F. Chazal, J. Curry, V. de Silva, P. Diaconis, H.
Edelsbrunner, R. Ghrist, L. Guibas, J. Harer, S. Holmes, M. Lesnick, A. Levine, P. Lum,
B. Mann, F. Mémoli, K. Mischaikow, D. Morozov, S. Mukherjee, J. Perea, R. Rabadan,
H. Sexton, P. Skraba, G. Singh, R. van de Weijgaert, S. Weinberger, and A. Zomorodian.
We particularly thank A. Blumberg, whose collaboration on early drafts of this book
has helped immensely.
Deep thanks also go to the editorial staff at Cambridge University Press, for their
patience and all their help.

1 Introduction
In the last two or three decades, the need for machine learning and artificial intelligence
has grown dramatically. As the tasks we undertake become ever more ambitious, both in
terms of size and complexity, it is imperative that the available methods keep pace with
these demands. A critical component of any such method is the ability to model very large
and complex data sets. There is a large suite of powerful modeling methods based on
linear algebra and cluster analysis that can often provide solutions for the problems that
arise. Although they are often successful, these methods suffer from some weaknesses.
In the case of algebraic methods, it is the fact that they are not always flexible enough to
model complex data, such as data sets of financial transactions or of surveys. Clustering
methods by definition cannot model continuous phenomena. Additionally, they often
require choosing thresholds for which there are no good theoretical justifications. What
we will discuss in this volume is a modeling methodology called topological data
analysis, or TDA, in which data is instead modeled by geometric objects, namely graphs
and their higher-dimensional versions, simplicial complexes. Topological data analysis
has been under development during the last 20 or so years, and has been applied in many
diverse situations. Its starting point is a set equipped with a metric, typically given as a
dissimilarity measure on the data points, which can be regarded as endowing the data
with a shape. This shape is very informative, in that it describes the overall organization
of the data set and therefore enables interrogation of various kinds to take place; TDA
provides methods for measuring the shape, in a suitable sense. This is useful in as much
as it allows one to access information about the overall organization. In addition, TDA
can be used to study data sets of complex description, which might be thought of as
unstructured data, where the data points themselves are sets equipped with a dissimilarity
measure. For example, one might consider data sets of molecules, where each data point
consists of a set of atoms and a set of bonds between those atoms, and use the bonds
to construct a metric on the set of atoms. This idea leads to powerful methods for the
vectorization of complex unstructured data. The methodology uses and is inspired by the
methods of topology, the mathematical study of shape, and we now give a more detailed
description of how it works.
Much of mathematics can be characterized as the construction of methods for
organizing infinite sets into understandable representations. Euclidean spaces are
organized using the notions of vector spaces and affine spaces, which allows one to
arrange the (infinite) underlying sets into understandable objects which can readily be
manipulated and which can be used to construct new objects from old in systematic ways.

6 Introduction
homology, which attach a list of non-negative integers (called Betti numbers) to
any topological space, and we also discuss the adaptation of homology to a tool
for the study of point clouds. This adaptation is called persistent homology.
• To introduce the mathematics surrounding the collection of persistence barcodes or
persistence diagrams, which are the values taken by persistent homology construc-
tions. Unlike Betti numbers, which are integer valued, persistent homology takes
its values in multisets of intervals on the real line. As such, persistence barcodes
have a mix of continuous and discrete structure. The study of these spaces from
various points of view, so as to be able to make them maximally useful in various
problem domains, is one of the most important research directions within applied
topology.
• To describe various examples of applications of topological methods to various
problem domains.
1.1 Overview
The purpose of this book is to develop topological techniques for the study of the
qualitative properties of geometric objects, particularly those objects which arise in real-
world situations such as sets of experimental data, scanned images of various geometric
objects, and arrays of points arising in engineering applications. The mathematical
formalism called algebraic topology, and more specifically homology theory, turns out
to be a useful tool in making precise various informal, intuitive, geometric notions such
as holes, tunnels, voids, connected components, and cycles. This precision has been
quite useful in mathematics proper, in situations where we are given geometric objects
in closed form and where calculations are carried out by hand. In recent years, there has
been a movement toward improving the formalism so that it becomes capable of dealing
with geometric objects from real world situations. This has meant that the formalism
must be able to deal with geometric objects given via incomplete information (i.e. as a
finite but large sample, perhaps with noise, from a geometric object) and that automatic
techniques for computing the homology are needed. We refer to this extension of standard
topological techniques as computational topology, and it is the subject of this volume.
We will assume that the reader is familiar with basic algebra, groups, and vector
spaces.
In this introductory chapter, we will sketch all the main ideas of computational
topology, without going into technical detail. The remaining chapters will then include
a precise technical development of the ideas as well as some applications of the theory
to actual situations.
1.2 Examples of Qualitative Properties in Applications
1.2.1 Diabetes Data and Clustering
Diabetes is a metabolic disorder which is characterized by elevated blood glucose levels.
Its symptoms include excessive thirst and frequent urination. In order to understand the

1.2 Examples of Qualitative Properties in Applications 7
disease more precisely, it is important to understand the possible configurations of values
that various metabolic variables can exhibit. The kind of understanding we hope for is
geometric in nature. An analysis of this type was carried out in the 1970s by Reaven &
Miller (1979).
In this study, a collection of five parameters (four metabolic quantities and the relative
weight) were measured for each patient. Each patient then corresponds to a single data
point in five-dimensional space. In Reaven & Miller (1979), the projection pursuit
method was used to produce a three-dimensional projection of the data set, which looks
like the situation on the left in Figure 1.1.
Figure 1.1 Diabetes patient distribution. See the main text for a description of the figure. From
Reaven & Miller (1979), reproduced with permission of Springer–Nature, © 1979.
Each patient was classified as being normal, chemical diabetic, or overt diabetic. This
is a classification which physicians devised using their observation of the patients. It was
observed that the normal patients occupied the central rounded object, and the chemical
diabetics and overt diabetics corresponded to the two “ears” in the picture. Another very
interesting method for visualizing the set was introduced in Diaconis & Friedman (1980).
These visualizations suggest that the two forms of diabetes are actually fundamentally
different ailments. In fact, physicians have understood that diabetes occurs in two forms,
“Type I” and “Type II”. Type I patients often have juvenile onset, and the disease may be
independent of the patient’s life style choices. Type II diabetes more often occurs later
in life and appears to depend on life style choices. The chemical diabetics are likely to
be thought of as Type II diabetics who eventually may arrive at the overt diabetic stage,
while the overt diabetics might arrive at overt status directly.
In this case, the qualitative property of the figure that is relevant is that it has the two
distinct ears, coming out of a central core. Although human beings can recognize this fact
in this projection, it is important to formalize mathematically what this means, so that
one can hope to automate the recognition of this qualitative property. For example, there
may be data sets for which no two- or three-dimensional projection gives a full picture of
the nature of the set. In this case, the mathematical version of this statement would be as
follows. We suppose that the three categories of patients (normal, chemical, and overt)
correspond to three different regions A, B, and C in five-dimensional Euclidean space.

8 Introduction
What this experiment suggests is that if we consider the union X = A ∪ B ∪ C (which
corresponds to all patients) and then remove A, the region corresponding to the normal
patients, the region we are left with breaks up into two distinct connected pieces, which do
not overlap and in fact are substantially removed from each other. Clustering techniques
from statistics were used in Symons (1981) to find methods to differentiate between
these two components. The qualitative question above about the nature of the disease
can now be stated as asking how many connected components are present in the space of
all patients having some form of diabetes. Finding the number of connected components
of a geometric object is a topological question.
1.2.2 Periodic Motion
Imagine that we are tracking a moving object in space, and that the information is given
in terms of a three-dimensional coordinate system, so that we are given coordinates
(x(t), y(t), z(t)). If we want to know whether the object is moving periodically, say,
because it is orbiting around a planet, we can simply check whether the values of the
coordinates repeat after some fixed period of time. Suppose, however, that we are not
given the time values corresponding to the points but just a set of positions, and want
to determine whether the object is undergoing periodic motion. We would thus like to
know whether the set of positions forms a closed loop in space. If the object is orbiting
around a single planet or star, and we therefore know by Kepler’s laws that the geometric
shape of the orbit must be that of an ellipse, we can determine that the object is orbiting
by simply curve-fitting an ellipse to the data set of positions. Suppose, however, that
the object is being acted on gravitationally by several other objects, so that the path
is not a familiar kind of closed curve. We would then still want to know whether the
space of positions is a closed loop, but perhaps not one for which we have a familiar
set of coordinatizations. The qualitative property in which we are interested is whether
the space is a closed loop, and we would like to develop techniques which allow us to
determine this without necessarily asking for a particular coordinatization of the curve.
In other words, we are asking whether our space is a closed loop of some kind, not
exactly what type of loop it is.
A more difficult situation is where we are not actually given the values of the position
of the object but, rather, a family of images taken from a digital camera. In this case,
the set of these images actually lies in a very high-dimensional space, namely the space
of all p-vectors, where p is the number of pixels. Each pixel of each image is given a
value, the gray-scale intensity at that pixel, and so each image corresponds to a vector,
with a coordinate for each pixel. If we take many images sequentially, we will obtain a
family of points in the p-dimensional space, which lies along a subset which should be
identified topologically with the set of positions of the object, i.e. a circle. So, although
this set is not identified with a circle through any simple set of equations in p variables,
the qualitative information that it is a circle is contained in this data. This is an example
of an exotic coordinatization of a space (namely the circle) and shows that, in order to
analyze this kind of data, it would be very useful to have tools which can tell whether
a space is a closed loop, without its having to be any particular loop. In other words,
coordinate-free tools are very useful.

10 Introduction
We will see later that cues involving “corners”, “curved arcs”, “vertices”, and “edges”
are not directly topological. We will develop methods for recognizing these cues
topologically on new spaces that we have constructed from the old ones using tangential
information.

2 Data
In this chapter, we will discuss the properties and methods associated with various
important data types, in order to indicate the wide range of applications for all methods
of data science. Since the purpose of this volume is to leverage geometric structures
which are applicable to data, we will point out the relevant geometric structures in each
case. We will also discuss the conventional methods that are applicable in each situation.
In particular, we will see that in many of these situations, it is important to use structures
on the set of features attached to data sets, and we will point them out. In particular,
geometric structures are often relevant for the features as well as for the data points
themselves. We are certainly not claiming to be exhaustive in what we present; rather,
we are attempting to give a reasonable sample of what is possible.
2.1 Data Matrices and Spreadsheets
Perhaps the most common representation of data sets is as tidy data: a spreadsheet with
real numerical entries. Each data point corresponds to a row in the spreadsheet, and each
column corresponds to a feature in the data set. In more mathematical terms, the data
set is represented as a matrix of real numbers, where the number of rows is the number
of data points and the number of columns is the number of features in the spreadsheet.
This interpretation suggests that linear algebra should be useful in the analysis of data
sets, and this is indeed the case.
An important method for the study of data sets given in matrix form is principal
component analysis (PCA); see Hastie et al. (2009) for a comprehensive description. It
is a method for modeling data by finite (but possibly very large) subsets of inner product
spaces. Of course, if we are given a data matrix, the data is exactly identified with a finite
subset of an inner product space, namely V = RN , where N is the number of columns
of the matrix. The inner product on V is the standard inner product, in which the unit
basis vectors form an orthonormal basis. If N is small, PCA permits some useful kinds
of analysis. If N = 1, 2, or 3, then one can actually visualize the data set as a scatter
plot. Even if N is greater than 3, but is of relatively small size, PCA makes many kinds
of calculations possible and can give insight. This is the case in the text analysis of
corpora of documents, where the data sets might have nearly a million features (one for
every word in a dictionary), but where one can often reduce the model to a few hundred

2.1 Data Matrices and Spreadsheets 13
X into two classes, X+1 and X 1, and whose goal is to produce a simple mathematical
procedure (called a linear classifier) which predicts the subset to which a new data point
should belong. The formula is produced by considering the decomposition of V using a
hyperplane, where one class lies on one side of the hyperplane and the other on the other
side. To sketch the idea, we suppose first that the sets X+1 and X−1 are in fact linearly
separable, i.e. that there is a hyperplane such that X+1 and X−1 are entirely on opposite
sides of it, with no point of X actually lying on it. We define a set H to be the collection
of all pairs of parallel hyperplanes (H+1, H−1) such that X+1 lies on the opposite side
of H+1 from H−1, and such that X−1 lies on the opposite side of H−1 from H+1. Each
such pair of hyperplanes determines a classifier, as follows. We consider the plane H0
which is “halfway between” H+1 and H 1, and describe it by an equation of the form
ϕ(~
v) = ~
w · ~
v b = 0. The parallel planes H+1 and H 1 are then given by equations
~
w · ~
v b+ = 0 and ~
w · ~
v b = 0 and, after possibly multiplying the equation by 1,
we may assume that b+ > b > b and that b+ b = b b . The classification is now
achieved by asserting that a vector ~
v belongs to (a) X+1 if ~
w · ~
v b > 0 and (b) X 1
if ~
w · ~
v b < 0. Of course, this can works for many pairs of hyperplanes. However,
for each pair of parallel hyperplanes (H+, H ) ∈ H, there is a well-defined distance
between the hyperplanes, and the choice that is made for the classifier is to select the pair
(H+, H ) ∈ H which maximizes this distance (it is unique), and then to use the classifier
attached to this pair. This would arguably be the best way to choose a linear classifier
for new data points for which a predicted class is desired.
Not all X = X+1 ∪ X 1 are linearly separable, in which case the above construction
is not applicable. However, it gives the motivation for a different optimization problem
which operates in non-separable situations. The idea is to define a loss function attached
to a hyperplane and a set of points divided into two classes within a finite-dimensional
inner product space. We note first that we will be optimizing over pairs (~
w, b), where
~
w ∈ V and b is a real number. This data represents the hyperplane ~
w · ~
v − b = 0. For
each data point xi ∈ X, we assign yi ∈ {±1} by declaring that yi = 1 if xi ∈ X+1 and
yi = 1 if xi ∈ X−1. To this configuration we now associate a loss function
L = (i, ~
w, b) = max(0, 1 yi (~
w · xi b))
with each data point xi. We note that L = 0 if either yi = +1 and ~
w · xi b ≥ 1 or
yi = 1 and ~
w · xi b ≤ 1. This means that the loss function is zero for points xi ∈ X+1
which are correctly correctly classified by the classifier ~
w · xi b ≥ 1, and similarly for
points in X 1. Points which are not classified by either of these classifiers lie between
the hyperplanes given by ~
w · ~
v − b = 1 and ~
w · ~
v − b = −1 and are assigned a positive
loss value depending on how close they are to one or the other hyperplane. The hard
loss function for the entire configuration is given by the sum
N
X
i=1
L(i, ~
w, b),
which one can attempt to minimize. If the subsets X+1 and X 1 are linearly separable
then the value 0 can be achieved, as one can easily see from the linearly separable
analysis above. One could now decide simply to use this loss function even in the

14 Data
linearly inseparable case, but it turns out to be useful to introduce a parameter λ and
consider instead the modified loss function
1
N






N
X
i=1
L(i, ~
w, b)






+ λk~
wk.
The aim here is that there should be a penalty for k~
wk being too large. The reasoning is
as follows. A hyperplane can be represented by many equations of the form ~
w ·~
v −b = 0,
since we may multiply the equation by any non-zero real number C and obtain an equally
valid equation. What does change with C are the hyperplanes ~
w · ~
v − b = ±1, which
become closer to each other as C → ∞. In the infinite limit, this means that the “band”
between them shrinks until the two hyperplanes become one. Thus the hard loss function
is essentially just an evaluation of the fraction of points that are misclassified. This is
in general too rigid, since one wants the loss function not to change too much with
small changes in the positions of the data points. The parameter λ allows one to tune the
classifier, to arrive at a model which is more robust to such small changes.
This method can be extended to classification problems with more than two classes. In
addition, one might hope to extend it to non-linear situations. This is sometimes done by
embedding the problem non linearly into a much higher-dimensional inner product space
and performing the linear classification there. See Vapnik (1998) for a more detailed
discussion.
Another method for addressing classification problems with linear algebra is logistic
regression. In the simplest case, this method requires a collection of data points which
consist of a number of continuous variables, called the independent variables, and one
{0, 1}-valued outcome variable. The aim is to design a procedure that estimates the
probability that the outcome variable equals 1, given the values of the independent
variables. It is assumed that the probability has the form σ(
P
i ci xi + b), where the xi
are the independent variables, the ci and b are parameters to be estimated, and σ is the
logistic function, given for a real number t by
σ(t) =
1
1 + e−t
.
In order to fit a model, we must choose a measure of the model fit that we can optimize.
The standard choice is the maximum likelihood function. We suppose that the data points
are {~
x1, . . ., ~
xN }, and that the outcome for the point ~
xi is yi such that yi ∈ {0, 1}. Given
the model determined by the coefficient vector ~
c and the number b, the likelihood that
the outcome variable equals 1 at the data point ~
xi is given by h~
c,b (~
xi) = σ(~
c · ~
xi + b)
and the likelihood that it equals 0 is 1 h~
c,b (~
xi). The likelihood of the values being
correct for all values of i is therefore
Y
i|yi =0
(1 h~
c,b (~
xi))
Y
i|yi =1
h~
c,b (~
xi),
or, equivalently,
Y
i
(1 − h~
c,b (~
xi))(1−yi )
h~
c,b (~
xi)yi
.

2.2 Dissimilarity Matrices and Metrics 15
This function is now maximized using gradient descent methods. Logistic regression
can also be extended to classification problems with more than two classes. A good
discussion can be found in Hastie et al. (2009).
2.2 Dissimilarity Matrices and Metrics
There is a straightforward extension of the notion of distance in R2 and in R3 to Rn, via
the formula
d(~
v, ~
w) =
p
(~
v − ~
w) · (~
v − ~
w).
Therefore, when we have a data set of points S = {~
v1, . . .,~
vN } ∈ Rn, we create a
symmetric matrix of distances, D(S), given by









d(~
v1,~
v1) · · · d(~
v1,~
vN )
.
.
.
.
.
.
d(~
vN,~
v1) · · · d(~
vN,~
vN ).









This matrix can be thought of as a dissimilarity matrix, because large distances can
be thought of as indicating dissimilarity and small distances as indicating similarity.
Definition 2.1 A dissimilarity matrix is a non-negative symmetric matrix which has
zeros along the diagonal. It is also useful to think of it as a structure on a finite set X,
defining a function D: X × X → [0, +∞) which (a) vanishes on the diagonal and (b)
has the symmetry property D(x, x0) = D(x0, x). A dissimilarity space is a pair (X, D),
where X is a set and D is a non-negative real-valued function on X × X satisfying
conditions (a) and (b) above. A dissimilarity matrix is of metric type if additionally (a)
all its off-diagonal entries are non-zero and (b) it satisfies the triangle inequality
Dik ≤ Dij + Djk .
The corresponding functions on the set X × X, where X is the set of columns of the
matrix D, is then called a metric, and the pair consisting of X and the function D is
called a metric space.
There will be situations where the data we are working with are not necessarily
embedded in Euclidean space, but where we can nevertheless construct a dissimilarity
matrix. Here are some examples of when this occurs.
1. There are a number of abstract distance functions that can be defined on various kinds
of categorical data, such as Hamming distances, which do not obviously arise from
subsets of Euclidean space.
2. If we have points that we think of as sampled from a Riemannian manifold M
embedded in Euclidean space, their dissimilarities may be better modeled by geodesic
distances in M rather than the Euclidean distances. For data embedded in Rn,
there are versions of the geodesic distance which are closely related to the graph
distances between data points, where points sufficiently close are connected to give

16 Data
a graph structure. This distinction is the key to the utility of the ISOMAP algorithm
(Tenenbaum et al. 2000).
3. There are situations where human beings have constructed intuitive but quantitative
notions of similarity, and it is desirable to work with them geometrically. See for
example Sneath & Sokal (1973) for numerous biological examples.
It is therefore useful to create methods and algorithms which operate directly on
dissimilarity matrices and which do not require a vector representation.
The first such algorithm that we will discuss is multidimensional scaling (MDS) (Borg
& Groenen 1997). Given an n × n dissimilarity matrix, interpreted as a dissimilarity
measure on a set of points S = {x1, . . ., x2}, MDS produces an embedding of S in
Euclidean space Ed, for some d, such that the loss function
X
i<j
(kx̃i x̃j k d(xi, xj ))2
is minimized, where x̃i denotes the vector in Ed corresponding to xi ∈ S. In other words,
the dissimilarity matrix is approximated as well as possible by the distances between
the embedded points in Euclidean space. It is possible to choose other loss functions,
depending on the situation, but this one has the attractive property that it is predictable
in the sense that it produces a predictable minimum in the dissimilarity matrices which
come from Euclidean space. This is demonstrated by the following analysis. Let D be a
matrix of Euclidean distances, dij = kx̃i x̃j k. We form the associated matrix K:
K =
1
2
HDH,
where H = I 1
n 11T
and 1 denotes the vector of all ones). When D is a Euclidean
distance matrix, K is positive semi-definite and so standard minimization arguments
imply that the solution is given by Λ1/2 ZT , where Λ is diagonal with the top k eigenvalues
of K along the diagonal and Z contains the corresponding top k eigenvectors of K. This
also demonstrates that the result is the same as that obtained by using PCA on the
Euclidean data. Multidimensional scaling can be used for dimensionality reduction
by choosing the dimension of the Euclidean space into which to embed. However, a
significant advantage of MDS is the fact that the dissimilarity matrix used as input need
not come from a distance function but can simply arise from any symmetric dissimilarity
matrix (Borg & Groenen 1997). It is apparent that MDS is an unsupervised method for
data analysis, and the discussion above shows that it generalizes PCA.
Another class of methods for unsupervised data analysis is cluster analysis. In this
case the output of the method is a partition of the data set rather than an embedding in a
Euclidean space. There is a very rich family of methods that perform this task, and we
will not attempt to be comprehensive here but, rather, will give a few simple examples.
For a more complete treatment, see Everitt et al. (2011).
By a clustering algorithm, we mean an algorithm which takes as its input a dissimilarity
space (X, D), and from it defines a partition of the set X. The blocks of the partition are
called clusters. Here are some examples.
Example 2.2 Single-linkage clustering with scale parameter R is defined on a dissimi
larity space (X, D) by declaring that two points x and x0 are in the same cluster if and only

18 Data
point, we have a clustering of X. Each cluster is a subset of X and, for each pair (ξ, ξ0)
of clusters, we can evaluate
L(ξ, ξ0
) = min
x∈ξ,x0 ∈ξ0
D(x, x0
).
We can now merge all pairs of clusters where the minimum positive value of L is
achieved, and this process can be iterated to yield a family of clusterings of increasing
coarseness. By keeping track of the values L computed at each stage, one can recover
single-linkage hierarchical clustering in this way. This point of view permits a useful
generalization, on using other choices for the function L. For example, one can replace L
by Lave, defined by setting Lave(ξ, ξ0) equal to the average of the set of values {D(x, x0)}
over pairs x ∈ ξ and x0 ∈ ξ0, where ξ , ξ0. One can also replace L by the function
Lmax, defined by setting Lmax(ξ, ξ0) equal to the maximum of the set of values {D(x, x0)}
over pairs x ∈ ξ and x0 ∈ ξ0, for any pairs of distinct clusters (ξ, ξ0). Each choice of L
gives a different hierarchical clustering scheme, with L, Lave, and Lmax corresponding
to hierarchical schemes referred to as single linkage, average-linkage, and complete
linkage hierarchical clustering respectively. Average linkage and complete-linkage are
often useful since they address problems arising with single-linkage problems, one
being the chaining problem. This refers to the fact that single linkage often produces
long sequences of clusters which consist of very small clusters adjoining one large one.
See Everitt et al. (2011) for a discussion of this problem.
Cluster analysis is a highly developed area in data science, and we refer the reader to
Everitt et al. (2011) for a thorough treatment.
2.3 Categorical Data and Sequences
Many data sets are not in numerical matrix format, and also they may not be naturally
equipped with dissimilarity measures. Finding methods for putting them into matrix or
dissimilarity format is of great value. Consider the following simple example. Suppose
that we are given a data set whose entries are “shopping baskets” for a store. We are
interested in understanding the set of these baskets in some way. Each entry in the data
set is a list of products carried by the store, perhaps with multiplicities for purchases
of multiple units, but the order in the entries is not meaningful for us because the
order in which the items are rung up is not of significance. The data could be encoded
as list of product identifiers. These identifiers might be numeric or alphanumeric, but a
numerical quantity carries no meaning beyond the purpose of identification. As it stands,
the methods in Sections 2.1 and 2.2 are of no use, because we do not have a numerical
matrix representation, either as vectors or as a dissimilarity measure. Moreover, the data
is given to us in a form which introduces extraneous information, namely the ordering of
the items in the basket. We introduce one-hot encoding as a method for overcoming both
of these problems. Let S be a finite set, and let X be a data set whose elements are subsets
of S. We construct the vector space R#(S), with its standard basis {es}s∈S, and assign to
each subset S0 ⊆ S the vector v(S0) =
P
s cses, where cs = 1 if s ∈ S0 and cs = 0 if

2.3 Categorical Data and Sequences 19
s < S0. We have thus assigned a vector to each subset of S and consequently we have
arrived at a data matrix, to which the matrix methods given above can be applied. The
collection of subsets of S can now be viewed as a dissimilarity space, in which distances
between subsets of S are determined by their symmetric differences. We note that the
construction extends to multisets of elements, where elements can occur with multiplici-
ties greater than one (see Hein 2003, for a discussion of these), in a straightforward way.
Let us consider another situation, where a company is studying its sales representatives
(we will call them reps), and the reps have collected data consisting of pairs (rep,
productid), where the two variables take values in the discrete sets of reps and company
products, respectively. Each data point represents one transaction generated by the rep.
The company would like to have information about the performance of its reps. The
given list is not informative about reps directly, since each data point corresponds to
only a single transaction. We would instead like to create a data set where each data point
corresponds only to a rep, and carries information about the distribution of products the
rep has sold. A method for doing this is via pivot tables.
For each rep, we have a multiset of products. Let us say the ith rep is associated with
a set Si of products. The pivot table transform associated with a variable salesrep begins
with a data set of transactions which are of the form
(rep, p),
where p is a product. Note that a particular value rep can occur in many data points,
one for each transaction. The pivot table transform now associates with the data set of
transactions a data set where the entries are of the form
(rep, {p1, p2, . . ., pn})
and where each value rep of salesrep can occur only once and the second coordinate,
within braces, is a multiset of products rather than a single product. The second
coordinate is the multiset of all products that have been sold by the sales representative
rep. To produce a matrix representation, we can apply one-hot encoding and obtain a
data set consisting of elements (rep, vrep). It corresponds to an m × n matrix, where m is
the number of sales reps and n is the number of all possible products. This transform is
extremely useful, and can also be used in situations where some variables are actually
numerical instead of belonging to a discrete set. Suppose, for example, that our data set
of transactions instead consisted of pairs (rep, x), where x is the price of the item in a
transaction. Then we could form the pivot table transform associated with the variable
salesrep, and obtain a pair
(rep, {x1, . . ., xn}),
where {x1, . . ., xn} is the set of all prices for all transactions involving the sales
representative rep. Since we might be interested in the total value of the transactions
involving rep, we might here apply a different method to obtain a vector, in this case
assigning to the set {x1, . . . xn} the sum x1 +· · ·+ xn. If we were interested in the average
value of the transactions for a given rep, we would instead associate the average value
of the xi. One could also consider the maximum or minimum values of the xi.

20 Data
The pivot table transform is in general defined in two steps. We suppose that we are
given a data set X consisting of coordinates xs, where s ∈ S for some finite set S. The
quantities s may correspond to variables of different types, namely one might consist
of elements of a discrete set, another might consist of real numbers, another of positive
integers, yet another of subsets of a fixed discrete set, etc. Suppose that S is the disjoint
union of two sets S0 and S1. An Si-vector is a tuple {xs}s∈Si , where xs has the type of the
variable s. For each data point x ∈ X, we have two projections π0(x) and π1(x), where
πi (x) is an Si-vector for i = 0, 1. The first step in the pivot table transform associates
with X a set P(X) consisting of pairs (ξ, S), where ξ is an S0-vector and S is a set
of S1-vectors. The set P(X) has exactly one element for each unique value taken by
π0(x) as x ranges over X, and, for a fixed ξ, the multiset S consists of all the S1-vectors
that occur as π1(x) for data points x ∈ X for which π0(x) = ξ. The second step is the
vectorization step, which applies a method for assigning vectors in a vector space to
sets of S1-vectors. The use of one-hot encoding above, as well as the sum, average, max
and min methods for sets of real numbers, are all such methods but one can imagine
many others.
In the examples above, we have examined situations where we have collections and are
not interested in the ordering of the elements in the collection. There are many situations
where one is interested in the ordering. A notable example is in genomics, where one
considers sequences of elements belonging to an alphabet, such as the 20-element
alphabet of amino acids. In this case, there is a simple vectorization step. Suppose that
we are considering sequences of length N in an alphabet A of cardinality a. The we may
use one-hot encoding to assign a vector h(a) of length a to members a of the alphabet
A. We may then assign to the sequence (a1, . . . aN ) the vector (h(a1), . . . h(aN )), which
is a vector of length aN. This is a simple vectorization method, but in this case it is often
more useful to construct dissimilarity measures directly. For two sequences σ and σ0 of
length N in the alphabet A, we define the Hamming distance between σ and σ0 to be
the number of elements i ∈ {1, . . ., N} such that σi , σ0
i. This is clearly a dissimilarity
measure, and it is in fact of metric type. This dissimilarity measure, and variants of it,
are very useful in many situations, particularly in genomics. See Gusfield (1997) for a
comprehensive treatment.
2.4 Text
A very interesting data type arises in natural language processing, namely that of a corpus
of documents. Each document is a sequence of words, and they may be of varying length.
It is very useful in for example in analyzing sentiment, or detecting trends in collection
of newspaper articles. The general setup is that a corpus is a collection of documents and
each document is a sequence of words, perhaps with some punctuation symbols included.
Since the documents could be regarded as sets of words (losing the information about
their order in the sequence), it is possible to use one-hot encoding directly, where a basis
element in a vector space would be associated with each word. What this effectively
does is to assign to each document the collection of word counts within the document

2.5 Graph Data 21
for every word within a dictionary. This is certainly a possible vectorization for the data
set, but it is not satisfactory because very common words, such as the word “the”, will
dominate the occurrences of other meaningful words in the document. A solution to this
problem is via tf–idf (term-frequency–inverse-document-frequency).
Let D be a collection of documents d, referred to as the corpus. In tf-idf methods, we
assign to each word–document pair (w, d) the number of occurrences of the word w in
the document d, and denote it by tf(w, d), the term frequency of w in d. In order to deal
with the problem of frequently occurring words, we will want to weight this count, and
the idea is to do that on the basis of the number of documents in D that include w. The
most standard choice is to multiply this count by the inverse document frequency, which
is defined to be
idf(w, d) = log
N
|{d ∈ D such that w ∈ d}|
.
Assigning to each document d the vector {tf(w, d) idf(w, D)}w gives a data matrix
with D rows and W columns, where D is the number of documents in the corpus
and W is the number of words in the dictionary. Note that, with this vectorization, any
word which occurs in every document will have an identically zero value, so that every
column in the data matrix corresponding to such words will be identically zero and can
therefore be ignored. The tf–idf approach has worked well in many situations; principal
component analysis has been applied extensively to these data matrices with interesting
results. Nevertheless, because the tf-idf approach treats each document as just a multiset
of words, it does not take full advantage of all the structure that is present. One could, for
example, begin to take advantage of the additional structure of the ordering by dealing
with k-grams, i.e. sequences of k consecutive words.
Another direction in natural language processing is the use of word embeddings.
A word embedding consists of an inner product space with an assignment w in the
dictionary. A canonical choice given above would be one-hot encoding, where the
vector space V is RW and where the vector vw would be the standard basis vector ew in
the vector space RW . The point of word embeddings is to embed the words in a vector
space of much smaller dimension, so that the distances between the associated vectors
in some way reflect correlations between words. It is possible to use idf reweighting in
conjunction with any such word embedding. One standard method is to apply PCA and
choose the top few hundred coordinates. Other popular examples include word2vec and
Glove. Useful references for natural language processing are Manning & Schütze (1999)
and Eisenstein (2018).
2.5 Graph Data
There is a large class of data where either (a) the data is equipped with a graph structure
or (b) the data set consists of elements which are graphs. The notion of a graph has
several variants. Graphs may be directed or undirected, and they may carry weights or
labels attached to edges and/or vertices. We present a number of different examples to
show what is possible.

22 Data
1. The internet is an example of a data set W of type (a) above, where the graph is
directed and whose vertices are web pages and edges are hyperlinks. The web pages
are considered as a data set, and the presence of a graph structure on W allows for
feature generation. For example, page rank and the hubs and authorities construction
(see Easley & Kleinberg 2011) create numerical features on the vertex set. One can
also use simple graph-theoretic properties to create additional numerical features,
such as total degree, inbound degree, and outbound degree. One can also create
graph valued features by constructing a neighborhood of radius k, or other graph-
valued constructions of a local nature. Using this idea, one can view the collection
W as a data set of type (b) from above. It is also possible to assign weights to edges
and vertices on the basis of the traffic at a webpage or across a particular hyperlink.
2. Any molecule can be regarded as an undirected graph whose vertices are atoms and
whose edges are bonds. One can also attach weights or labels to the vertices using
atomic weight, atomic number, or element label. Databases of molecules are therefore
an example of type (b) above. Of course, simple measures, such as the number of
atoms or various aggregate quantities concerning the bonds, are available but they
do not reflect the full content of the data. In addition, there are standard collections
of features called SMILES (see Weininger 1988). We will discuss how to apply
topological methods to generate new useful features systematically in Section 6.2.
3. Crystals are regularly arranged collections of atoms or molecules. The theory of
crystals is a highly developed area within physics, chemistry, and mathematics. The
high degree of regularity makes a sophisticated theory, involving complex symmetry
groups, possible. However, many materials of interest are amorphous, in that the
atoms are not arranged with precise regularity but nevertheless exhibit geometric
structure, which has been studied in various ways. One can encode information
about an amorphous solid via a graph structure where the vertices correspond to the
constitutent atoms or molecules and where the edges are assigned on the basis of
chemical or physical properties. Often one can actually recognize bonds between the
atoms and assign edges to them. In other situations, one assigns edges on the basis
of distance thresholds. In Section 6.10, we will see how such methods can be used
to generate features attached to various amorphous solid classes and to demonstrate
that they carry chemical and physical information.
4. In genomics, the concept of a phylogenetic tree is a very important one. It describes
the pattern of evolution within various classes of organisms and is certainly of a great
deal of value in the study of viruses. The notion of moduli spaces of trees is a useful
one, and it is studied in detail in Rabadan & Blumberg (2019).
2.6 Images
Images present another very interesting data type. Here the data typically consists of
rectangular arrays of pixels, with each pixel assigned either a gray-scale value or a
collection of color intensities using one of a number of possible ways of encoding color.
This means that image data sets can be expressed as matrices with MN columns, where

2.7 Time Series 23
the image consists of an M × N rectangular array of pixels. What we note, though, is that
the matrix is equipped with a kind of geometric structure, whether as a grid graph where
every node is connected to its four nearest neighbors or with a distance function as a
subset of the plane. What this means is that not only do we have the matrix entries, but
we have knowledge about which nodes are close to a given node. Moreover, the regular
grid structure supports translations in the image. Methods for working with images use
the grid structure in a number of ways, as follows.
1. It is often useful to smooth images by replacing a pixel value at p by a sum of other
pixel values, weighted by the inverse of their distance from p.
2. Convolutional neural networks (CNNs) (see Aggarwal 2018) constitute a computa
tional method for computing with image data sets. They use the grid structure to
devise a relatively sparse neural network architecture that performs strikingly well
for a number of image analysis tasks, particularly for classification.
3. CNNs also use the translation property to insure that the object recognition methods
that they construct have the property that an object (say a cat) in the lower left-hand
corner of an image is recognized in exactly the same way as a cat in the upper
right-hand corner.
4. The grid structure also allows for feature generation, one example being the so-called
Gabor filters (see Feichtinger & Strohmer 1998).
2.7 Time Series
Time series are data sets that consist of sequences of observations of some kind, typically
made at regular intervals. The most common version involves real-valued observations,
but vector-valued time series are also of great interest. One can also think of time series
with observations in other data types, such as regularly occurring text documents. Just
as images are data where the features are arranged geometrically in a rectangular grid,
so the variables in a time series are arranged in a one-dimensional linear array.
There is a large body of literature concerning the modeling of time series (see for
example Kirchgässner & Wolters 2007 or Kantz & Schreiber 2004). A commonly used
method for stationary times series, where the means and variances of the variables do
not change over time, is the autoregressive moving average (ARMA) model, which we
now describe. The assumption is that we generate a sequence of observations {Xi}i, say
for i ≥ 0, and that Xi is the observation obtained at a time ti. It is also assumed that the
ti are regularly spaced in time. The model further assumes that the observations at times
ti depend linearly on a fixed number (say p) of the preceding observations and also that
there are models for noise in the system. Formally, one writes
Xt = c + εt +
p
X
i=1
ϕi Xt−i +
q
X
i=1
θiεt i.

24 Data
Here the variables εi are typically assumed to be independent identically distributed
(i.i.d.) random variables sampled from a normal distribution with zero mean. Models
other than normal ones are possible, but it is required that the variables should still be
i.i.d. Because of the presence of the random variables εt, the above model is not amenable
a standard linear regression model but, rather, describes a distribution for each variable
Xt. The fitting of the model is then carried out using a method for fitting distributions,
a common choice being the maximum likelihood method (see Hastie et al. 2009). There
are simple extensions to vector-valued time series. Another extension is the so-called
autoregressive integrative moving average (ARIMA) model, which does not require that
the time series be stationary.
There are of course situations where such linear models are not appropriate, but
one nevertheless wants to predict the value of an observation using the preceding k
observations for a fixed positive integer k. There is a construction that is available for
time series, called delay embedding or Takens embedding.
To introduce this, we first consider how to construct dissimilarity spaces out of
sequences from a dissimilarity space. Let (X, D) be a dissimilarity space. We can form
a dissimilarity space structure on the space of k-tuples X(k) = (Xk, Dk ) where Dk can
be chosen in a number of different ways. The most popular ones are in direct analogy
with Lp-metrics on sequences:
Dk ((x1, . . ., xk ), (y1, . . ., yk )) = *
,
k
X
i=1
D(xi, yi)p+
-
1/p
.
Choosing p = 1 gives a particularly simple-to-use dissimilarity just sum up the
pairwise dissimilarities of the two sequences while p = 2 is a very popular choice
since in combination with the Euclidean metric it reduces to the Euclidean metric on
concatenation of the observations.
Now, for a single time series t = {xi}N
i=1 we can define a delay embedding with delay
and dimension k as the new time series
Tk, t = {(xi (k 1), . . ., xi , xi)}N
i=1+(k−1) ⊆ X(k).
If T ⊆ X N is a collection of time series samples, where each t = (x1, . . ., xN ) ∈ T
is a single time series taken over a family of times (t1, . . ., tN ) we may generate a new
collection of time series by applying delay embedding separately on each time series in
the collection, producing a delay-embedded collection Tk, T = {Tk, t | t ∈ T}.
We will meet the Takens embedding again in Section 6.4, where the point cloud
obtained by forgetting the time-ordering structure on Tk, T can be analyzed using persis-
tent cohomology to generate information about quasiperiodicity in the time series itself.
2.8 Density Estimation in Point Cloud Data
The theory of density estimation is a highly developed area of statistics. We will not
attempt to survey this area but will instead refer the reader to Scott (2015) or Duvroye

26 Data
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Figure 2.3 Points as in Figure 2.2. Left, codensity δ5. Right, codensity δ200. The 25% densest
values are marked in red, indicating the sets P(5, 25) and P(200, 25) respectively.
For a subset P0 ⊂ P, with a slight abuse of notation we shall write P0(k,T) for the
T
100 |P0| points of P0 on which the codensity – as calculated on all of P – takes on the
lowest values.

3 Topology
3.1 History
Euler (1741) is usually cited as the first paper in topology. In this paper, Euler studies
the so-called “Bridges of Königsberg” problem.
The question that was asked about the bridges was whether it is possible to traverse
all the bridges exactly once and return to one’s starting point. Euler answered this by
recognizing that it is a question about paths in an associated network; see below. In
fact, the question involves only certain properties of the paths which are independent
of the rates at which the paths are traversed. His result concerned the properties of an
infinite class of paths, i.e. a certain type of pattern in the network. Euler also derived
a polyhedral formula relating the number of vertices, edges, and faces in polyhedra
(Euler 1752a, 1752b). The subject developed in a sporadic fashion over the following
century and a half; developments included the work of Vandermonde on knot theory
(Vandermonde 1771), the proof of the Gauss–Bonnet theorem (never published by Gauss,
but with a special case proved in Bonnet 1848), the first book in the subject, by Listing
(1848), and the work of Riemann identifying the notion of a manifold (Riemann 1851).
The paper of Poincaré (1895) was a seminal work in which the notions of homology and
the fundamental group were introduced, with motivation from celestial mechanics. The
subject then developed at a greatly accelerated pace throughout the twentieth century.
The first paper in persistent homology was Robins (1999), and the subject of applying
topological methodologies to finite metric spaces has been developing rapidly since that
time.

30 Topology
3.2 Qualitative and Quantitative Properties
3.2.1 Topological Properties
The notion of a topological property is an old one in mathematics. We first discuss what
is meant by topological properties, and why they are important in data science.
In many problems in science, one can impose different coordinate systems on a single
physical problem. For example, one can apply rigid motions (rotations and translations)
to data in Euclidean space. This kind of coordinate change is, for example, ubiquitous
in physics. It has the useful property that it preserves distances between points. Often,
however, one needs to consider coordinate changes that are not rigid motions but are more
complicated, in the sense that they distort the geometric properties (such as distance) of
the data obtained from observation or experimentation.
Example 3.1 Temperature can be measured using many different scales, including the
Celsius, Fahrenheit, and Kelvin scales. The transition from degrees Celsius to degrees
Fahrenheit is given by the formula
F =
9
5
C + 32.
This transformation does not preserve distances; it dilates them by a constant factor. The
transformation law from degrees Celsius to kelvins is given by
K = C + 273.15.
This transformation does preserve distances, i.e. the size of a degree.
Example 3.2 A change of coordinates which is frequently used to make apparent
some properties of data is the so-called log–log coordinate system, in which every point
(x, y) is replaced by (log(x), log(y)). This transformation carries curves given by power
laws to straight lines of varying slopes. It does not preserve distances.
Example 3.3 Polar coordinates provide a convenient way to study a number of
problems. The transition law from polar coordinates to rectangular coordinates does
not preserve distances.
Example 3.4 Coordinate changes which apply multiplication by non-orthogonal
matrices will distort the geometric properties of sets in the plane or in space. For
example, dilation (multiplication by a positive multiple of the identity matrix) carries
a circle centered at the origin to another circle centered at the origin with a different
radius. Multiplication by a general diagonal matrix (perhaps with distinct eigenvalues)
carries circles centered at the origin to ellipses centered at the origin.
Example 3.5 It is often said that topology is the subject in which a coffee cup is
regarded as being the same as a doughnut. This means that there is a coordinate change
such that a set which is a coffee cup in one coordinate system is a doughnut in the new
coordinates. See the illustration below.

34 Topology
that arbitrary continuous transformations can distort distances in a very flexible way; it
is not possible to define topological properties based on transformations that preserve
the distance function. One wants, in a sense, to preserve “infinitesimal” distances, i.e.
points which are infinitely close together should remain infinitely close together. Of
course, there is no notion of infinitesimal closeness for pairs of points. However, in a
metric space one can speak of a point being infinitesimally close to a set. For (X, d) a
metric space, x ∈ X, and U ⊆ X, we say that x is infinitesimally close to U if, for every
0, there is a point in u ∈ U such that d(x, u) ≤ . The collection of all points which
are infinitesimally close to a set U is called the closure of U and is denoted by U. Note
that if a point is infinitesimally close to a set U then any continuous transformation will
preserve that property, even if it does not preserve distances.
Example 3.11 The closure of the open interval (0, 1) is the closed interval [0, 1].
Example 3.12 The closure of the set Rn − {0} in Rn is the entire set Rn.
Example 3.13 The closure of the set of all points of the form n
2k , where 0 ≤ n ≤ 2k
and k ≥ 0, is the entire closed interval [0, 1].
Closure can be viewed as an operator on the subsets of Rn. It turns out to be idempotent
in the sense that U = U. One can develop the notion of a topological space using only an
abstract closure operator on the family of subsets, with certain properties. However, it is
more conventional to develop the notion using the ideas of closed and open sets. A closed
set V in Rn is a subset for which V = V. A subset U of Rn is open if its complement in X
is closed, i.e. if U = X V for some closed set V. Another characterization is that a set is
open if and only if it is an arbitrary union of open balls Br (x) = {y ∈ Rn | d(x, y) r}.
The open sets in a metric space Rn satisfy the following three properties.
• ∅ and X are both open sets in X.
• Arbitrary unions of open sets are open.
• Finite intersections of open sets are open.
The identification of these properties leads us to the notion of a topological space.
Definition 3.14 A topological space is a pair (X, U), where X is a set and U is a
family of subsets (called the open sets) of X satisfying the following three properties.
1. ∅ ∈ U and X ∈ U.
2. For any family {Uα}α∈A of open sets, where A is any parameter set,
S
α∈A Uα is also
in U.
3. For any family {Uα}α∈A of open sets, where A is any finite parameter set,
T
α∈A Uα
is also in U.
The family U is called a topology on X. A subset C ⊆ X is closed if X C is open.
Here are some examples of open and closed sets in Rn.
Example 3.15 Open (respectively closed) intervals are open (respectively closed) sets
in R.

Example 3.16 Open balls (respectively closed balls) are open (respectively closed)
sets in Rn.
Example 3.17 Finite sets of points in Rn are always closed. Complements of finite
sets are open.
Example 3.18 For continuous function f : Rn → R, the set of points {v ∈ Rn |
f (v) 0} is always an open set. Similarly for . The set of points {v ∈ Rn | f (v) ≤ 0}
is always a closed set. Again, similarly for ≥. The terminology is consistent with the
notions of open and closed conditions used in analysis.
Example 3.19 For any family { f1, f2, . . ., fk } of continuous functions from Rn to R,
the set {v ∈ Rn | fi (v) = 0 for all i} is closed. This means that algebraic varieties (sets
of points defined by algebraic equations) are closed sets in Rn.
As we progress, we will give numerous examples of topological spaces. Here we
introduce the most elementary example.
Definition 3.20 For any subset X ⊆ Rn, we define a topology on X by declaring that
U ⊆ X is open if and only if U = V ∩ X for an open subset V ⊆ Rn. This defines familiar
spaces such as spheres and tori as topological spaces.
We are now able to be precise about what is meant by a topological property.
Definition 3.21 A property of subsets of Rn is said to be topological if and only if
it depends only on the topology defined above. Of course, topological properties can be
defined on other topological spaces as well.
The definition of a topological space is motivated by the case of Rn, and so clearly
applies to it. The power of the definition lies in the fact that there are a number of
systems far removed from the case of Rn but which nevertheless satisfy the axioms for
a topological space. It turns out that this enables the use of geometric and topological
intuition in cases where it was not anticipated.
Example 3.22 For any set X, we can let the family of all subsets of X with finite
complement be the open sets. This does give a topological space.
Example 3.23 Let p be a prime number. A topology U on Z, the set of integers, is
given by defining U to be the collection of sets U such that for any n ∈ U there is a
positive integer r such that the set {n + kpr | k ∈ Z} is contained in U. This is called the
p-adic topology, and it is of use in number theory.
3.2.2 Continuous Maps and Homeomorphisms
In the study of multivariable calculus, we identify the notion of continuous maps from
Rm to Rn using the –δ definition of continuity. We can characterize continuity directly
using the topology on Rn as follows.

36 Topology
Proposition 3.24 A map f : Rm → Rn is continuous if and only if f 1(U) is open
for all open sets U ⊆ Rn.
This leads us to the notion of a continuous map between topological spaces X and Y.
Definition 3.25 Let X and Y be topological spaces and f : X → Y a map of sets.
Then f is said to be continuous if and only if f −1V is open for all open sets V ⊆ Y.
Remark 3.26 One can formulate the notion of a closure operator in an arbitrary
topological space and then interpret this definition as requiring that if x ∈ X is in the
closure of a subset U ⊆ X then f (x) is in the closure of f (U).
Using this notion of maps, we will now identify the key notion that will allow us to
speak precisely about topological properties.
Definition 3.27 Let f : X → Y be a continuous map of topological spaces. If there
is a continuous function g: Y → X such that these functions are inverses of each other,
i.e. f (g(y)) = y and g( f (x)) = x, then f is a homeomorphism, and we say that X and
Y are homeomorphic.
Remark 3.28 This notion is the exact analogue of the notion of the isomorphisms
of groups or the bijections of sets. It means that the map f can be inverted and that
the inverse is continuous. It also gives precise meaning to the idea that Y can simply
be regarded as a reparametrization of X. Another point of view is that it makes precise
the notion of Y being obtained from X by stretching or deforming, without tearing
or “crushing”. If there is a homeomorphism from X to Y, we say that X and Y are
homeomorphic.
Here are examples of pairs of homeomorphic spaces.
Example 3.29 Let X ⊆ R2 denote the unit circle, and let Y denote the ellipse given
by the equation
x2
a2
+
y2
b2
= 1.
Both sets are equipped with the subspace topology, defined in Definition 3.20. Then the
map (x, y) → (ax, by) is a homeomorphism from the circle to the ellipse, so the circle
and the ellipse are homeomorphic.
Example 3.30 The parabola given by y = x2 and the real line are homeomorphic, the
homeomorphism being given by the map (x, y) → x.
Example 3.31 Let X ⊆ R2 denote the set X = {(x, y) ∈ R2 | x2 + y2 1}, and let
Y ⊆ R2 denote the subset R2 − ~
0. Polar coordinates are a good choice for describing a
map g: X → Y, which we define by
g(r, θ) = (r − 1, θ).
The inverse map is given by
g 1
(r, θ) = (r + 1, θ).

Example 3.32 Let X be the torus in three-dimensional space defined by the equation
(c
q
x2 + y2)2
+ z2
= a2
,
where 0 a c. Let Y ⊆ R4 be the subspace defined by the equations
(
x2 + y2 = 1,
z2 + w2 = 1.
We define a map f : X → Y via the formula
f (x, y, z) =
x
k(x, y)k2
,
y
k(x, y)k2
,
c k(x, y)k2
a
,
z
a
!
,
where k(x, y)k2 denotes
p
x2 + y2. It is easy to verify that this map has its image in
Y, and it is continuous since it is defined by simple formulas. It turns out that f is a
homeomorphism.
One way to prove that the map f in the above example is a homeomorphism is to
exhibit a formula for the inverse. This is somewhat involved even in this case and can
become much more so in more complicated examples. There is a simple statement that
allows one to bypass that process.
Proposition 3.33 Let f : X → Y be a continuous map, where X and Y are subsets
of Rm and Rn respectively. We say a subset of Rn is bounded if it is contained in a ball
BR (0) = {v | kvk R} for some R 0. Suppose that X and Y are closed and bounded.
If the map f is bijective as a map of sets, then f is a homeomorphism.
In Example 3.32 above, it is easy to check that both sets are closed and bounded and
that f , regarded as a map of sets, is bijective. We are therefore able to conclude that it is
a homeomorphism.
Remark 3.34 Note that the two descriptions of a topological space have different
strengths. The first description sits in three-dimensional space and can therefore be
readily visualized. The second is much simpler, in that the equations that define it are
much simpler. Depending on the kind of analysis one is doing, different coordinate
systems may be useful.
We should also ask how we can show that two spaces are not homeomorphic. Let
X = [0, 1] ⊆ R and let Y ⊆ R2 denote the closed unit ball {(x, y) ∈ R2 | x2 + y2 ≤ 1}.
These two spaces appear to be very different and we therefore suspect that they are
not homeomorphic, but in order to be convinced of this we must try to find some
topological property that applies to one but not to the other. Reasoning intuitively, we
observe that by removing a single point from X, such as the point 1
2 , we can form a
space which is disconnected in the sense that is breaks into two disjoint open pieces,
namely [0, 1
2 ) and (1
2, 1], while removing any single point from Y will always leave it in
a single connected piece. We have not formally defined connectedness yet (we will do so
in Section 3.2.8), but this informal argument suggests how we can develop topological
properties that enable discrimination between spaces, even when we allow ourselves to
modify the spaces by arbitrary coordinate changes.

Another Random Document on
Scribd Without Any Related Topics

towards Northern Europe is obtained from some of the rock
paintings and carvings of Sweden. Among the canoes depicted are
some with distinct Mediterranean characteristics. One at Tegneby in
Bohuslän bears a striking resemblance to a boat seen by Sir Henry
Stanley on Lake Victoria Nyanza. It seems undoubted that the
designs are of common origin, although separated not only by
centuries but by barriers of mountain, desert, and sea extending
many hundreds of miles. From the Maglemosian boat the Viking ship
was ultimately developed; the unprogressive Victoria Nyanza
boatbuilders continued through the Ages repeating the design
adopted by their remote ancestors. In both vessels the keel projects
forward, and the figure-head is that of a goat or ram. The northern
vessel has the characteristic inward curving stern of ancient Egyptian
ships. As the rock on which it was carved is situated in a metal-
yielding area, the probability is that this type of vessel is a relic of
the visits paid by searchers for metals in ancient times, who
established colonies of dark miners among the fair Northerners and
introduced the elements of southern culture.
The ancient boats found in Scotland are of a variety of types. One of
those at Glasgow lay, when discovered, nearly vertical, with prow
uppermost as if it had foundered; it had been built of several pieces
of oak, though without ribs. Another had the remains of an
outrigger attached to it: beside another, which had been partly
hollowed by fire, lay two planks that appear to have been wash-
boards like those on a Sussex dug-out. A Clyde clinker-built boat,
eighteen feet long, had a keel and a base of oak to which ribs had
been attached. An interesting find at Kinaven in Aberdeenshire,
several miles distant from the Ythan, a famous pearling river, was a
dug-out eleven feet long, and about four feet broad. It lay
embedded at the head of a small ravine in five feet of peat which
appears to have been the bed of an ancient lake. Near it were the
stumps of big oaks, apparently of the Upper Forestian period.
Among the longest of the ancient boats that have been discovered
are one forty-two feet long, with an animal head on the prow, from

Loch Arthur, near Dumfries, one thirty-five long from near the River
Arun in Sussex, one sixty-three feet long excavated near the Rother
in Kent, one forty-eight feet six inches long, found at Brigg,
Lincolnshire, with wooden patches where she had sprung a leak, and
signs of the caulking of cracks and small holes with moss.
These vessels do not all belong to the same period. The date of the
Brigg boat is, judging from the geological strata, between 1100 and
700 b.c. It would appear that some of the Clyde vessels found at
twenty-five feet above the present sea-level are even older. Beside
one Clyde boat was found an axe of polished green-stone similar to
the axes used by Polynesians and others in shaping dug-outs. This
axe may, however, have been a religious object. To the low bases of
some vessels were fixed ribs on which skins were stretched. These
boats were eminently suitable for rough seas, being more buoyant
than dug-outs. According to Himilco the inhabitants of the
Œstrymnides, the islands rich in tin and lead, had most sea-worthy
skiffs. These people do not make pine keels, nor, he says, do they
know how to fashion them; nor do they make fir barks, but, with
wonderful skill, fashion skiffs with sewn skins. In these hide-bound
vessels, they skim across the ocean. Apparently they were as daring
mariners as the Oregon Islanders of whom Washington Irving has
written:
It is surprising to see with what fearless unconcern these
savages venture in their light barks upon the roughest
and most tempestuous seas. They seem to ride upon the
wave like sea-fowl. Should a surge throw the canoe upon
its side, and endanger its over turn, those to the
windward lean over the upper gunwale, thrust their
paddles deep into the wave, and by this action not merely
regain an equilibrium, but give their bark a vigorous
impulse forward.
The ancient mariners whose rude vessels have been excavated
around our coasts were the forerunners of the Celtic sea-traders,
who, as the Gaelic evidence shows, had names not only for the

North Sea and the English Channel but also for the Mediterranean
Sea. They cultivated what is known as the sea sense, and
developed shipbuilding and the art of navigation in accordance with
local needs. When Julius Cæsar came into conflict with the Veneti of
Brittany he tells that their vessels were greatly superior to those of
the Romans. The bodies of the ships, he says, were built entirely
of oak, stout enough to withstand any shock or violence.... Instead
of cables for their anchors they used iron chains.... The encounter of
our fleet with these ships was of such a nature that our fleet
excelled in speed alone, and the plying of oars; for neither could our
ships injure theirs with their rams, so great was their strength, nor
was a weapon easily cast up to them owing to their height.... About
220 of their ships ... sailed forth from the harbour. In this great
allied fleet were vessels from our own country.[55]
It must not be imagined that the sea sense was cultivated because
man took pleasure in risking the perils of the deep. It was stern
necessity that at the beginning compelled him to venture on long
voyages. After England was cut off from France the peoples who had
adopted the Neolithic industry must have either found it absolutely
necessary to seek refuge in Britain, or were attracted towards it by
reports of prospectors who found it to be suitable for residence and
trade.

CHAPTER VIII
Neolithic Trade and Industries
Attractions of Ancient Britain—Romans search for Gold,
Silver, Pearls, c.—The Lure of Precious Stones and
Metals—Distribution of Ancient British Population—
Neolithic Settlements in Flint-yielding Areas—Trade in Flint
—Settlements on Lias Formation—Implements from Basic
Rocks—Trade in Body-painting Materials—Search for
Pearls—Gold in Britain and Ireland—Agriculture—The
Story of Barley—Neolithic Settlers in Ireland—Scottish
Neolithic Traders—Neolithic Peoples not Wanderers—
Trained Neolithic Craftsmen.
The drift of peoples into Britain which began in Aurignacian times
continued until the Roman period. There were definite reasons for
early intrusions as there were for the Roman invasion. Britain
contains to reward the conqueror, Tacitus wrote,[56]
mines of gold
and silver and other metals. The sea produces pearls. According to
Suetonius, who at the end of the first century of our era wrote the
Lives of the Cæsars, Julius Cæsar invaded Britain with the desire to
enrich himself with the pearls found on different parts of the coast.
On his return to Rome he presented a corselet of British pearls to
the goddess Venus. He was in need of money to further his political
ambitions. He found what he required elsewhere, however. After the
death of Queen Cleopatra sufficient gold and silver flowed to Rome
from Egypt to reduce the loan rate of interest from 12 to 4 per cent.
Spain likewise contributed its share to enrich the great predatory
state of Rome.[57]

Long ages before the Roman period the early peoples entered Britain
in search of pearls, precious stones, and precious metals because
these had a religious value. The Celts of Gaul offered great
quantities of gold to their deities, depositing the precious metals in
their temples and in their sacred lakes. Poseidonius of Apamea tells
that after conquering Gaul the Romans put up these sacred lakes to
public sale, and many of the purchasers found quantities of solid
silver in them. He also says that gold was similarly placed in these
lakes.[58]
Apparently the Celts believed, as did the Aryo-Indians, that
gold was a form of the gods and fire, light, and immortality, and
that it was a life giver.[59]
Personal ornaments continued to have a
religious value until Christian times.
FLINT LANCE-HEADS FROM IRELAND (British Museum)

Photo Oxford University Press
CHIPPED AND POLISHED ARTIFACTS FROM SOUTHERN ENGLAND
(British Museum)
As we have seen when dealing with the Red Man of Paviland, the
earliest ornaments were shells, teeth of wild animals, coloured
stones, ivory, c. Shells were carried great distances. Then arose the
habit of producing substitutes which were regarded as of great
potency as the originals. The ancient Egyptians made use of gold to
manufacture imitation shells, and before they worked copper they
wore charms of malachite, which is an ore of copper. They probably
used copper first for magical purposes just as they used gold. Pearls
found in shells were regarded as depositories of supernatural
influence, and so were coral and amber (see Chapter XIII). Like the
Aryo-Indians, the Egyptians, Phœnicians, Greeks, and others
connected precious metals, stones, pearls, c., with their deities,
and believed that these contained the influence of their deities, and
were therefore lucky. These and similar beliefs are of great
antiquity in Europe and Asia and North Africa. It would be rash to
assume that they were not known to the ancient mariners who
reached our shores in vessels of Mediterranean type.

The colonists who were attracted to Britain at various periods settled
in those districts most suitable for their modes of life. It was
necessary that they should obtain an adequate supply of the
materials from which their implements and weapons were
manufactured. The distribution of the population must have been
determined by the resources of the various districts.
At the present day the population of Britain is most dense in those
areas in which coal and iron are found and where commerce is
concentrated. In ancient times, before metals were used, it must
have been densest in those areas where flint was found—that is, on
the upper chalk formations. If worked flints are discovered in areas
which do not have deposits of flint, the only conclusion that can be
drawn is that the flint was obtained by means of trade, just as
Mediterranean shells were in Aurignacian and Magdalenian times
obtained by hunters who settled in Central Europe. In Devon and
Cornwall, for instance, large numbers of flint implements have been
found, yet in these counties suitable flint was exceedingly scarce in
ancient times, except in East Devon, where, however, the surface
flint is of inferior character. In Wilts and Dorset, however, the finest
quality of flint was found, and it was no doubt from these areas that
the early settlers in Cornwall and Devon received their chief supplies
of the raw material, if not of the manufactured articles.
In England, as on the Continent, the most abundant finds of the
earliest flint implements have been made in those areas where the
early hunters and fishermen could obtain their raw materials. River
drift implements are discovered in largest numbers on the chalk
formations of south-eastern England between the Wash and the
estuary of the Thames.
The Neolithic peoples, who made less use of horn and bone than did
the Azilians and Maglemosians, had many village settlements on the
upper chalk in Dorset and Wiltshire, and especially at Avebury where
there were veritable flint factories, and near the famous flint mines
at Grimes Graves in the vicinity of Weeting in Norfolk and at Cissbury
Camp not far from Worthing in Sussex. Implements were likewise

made of basic rocks, including quartzite, ironstone, green-stone,
hornblende schist, granite, mica-schist, c.; while ornaments were
made of jet, a hydrocarbon compound allied to cannel coal, which
takes on a fine polish, Kimeridge shale and ivory. Withal, like the
Aurignacians and Magdalenians, the Neolithic-industry people used
body paint, which was made with pigments of ochre, hæmatite, an
ore of iron, and ruddle, an earthy variety of iron ore.
In those districts, where the raw materials for stone implements,
ornaments, and body paint were found, traces survive of the
activities of the Neolithic peoples. Their graves of long-barrow type
are found not only in the chalk areas but on the margins of the lias
formations. Hæmatite is found in large quantities in West
Cumberland and north Lancashire and in south-western England,
while the chief source of jet is Whitby in Yorkshire, where it occurs in
large quantities in beds of the Upper Lias shale.

Mr. W. J. Perry, of Manchester University, who has devoted special
attention to the study of the distribution of megalithic monuments,
has been drawing attention to the interesting association of these
monuments with geological formations.[60]
In the Avebury district

stone circles, dolmens, chambered barrows, long barrows, and
Neolithic settlements are numerous; another group of megalithic
monuments occurs in Oxford on the margin of the lias formation,
and at the south-end of the great iron field extending as far as the
Clevelands. According to the memoir of the geological survey, there
are traces of ancient surface iron-workings in the Middle Lias
formation of Oxfordshire, where red and brown hæmatite were
found. Mr. Perry notes that there are megalithic monuments in the
vicinity of all these surface workings, as at Fawler, Adderbury, Hook
Norton, Woodstock, Steeple Aston, and Hanbury. Apparently the
Neolithic peoples were attracted to the lias formation because it
contains hæmatite, ochre, shale, c. There are significant megaliths
in the Whitby region where the jet is so plentiful. Amber was
obtained from the east coast of England and from the Baltic.
The Neolithic peoples appear to have searched for pearls, which are
found in a number of English, Welsh, Scottish, and Irish rivers, and
in the vicinity of most, if not all, of these megaliths occur. Gold was
the first metal worked by man, and it appears to have attracted
some of the early peoples who settled in Britain. The ancient
seafarers who found their way northward may have included
searchers for gold and silver. The latter metal was at one time found
in great abundance in Spain, while gold was at one time fairly
plentiful in south-western England, in North Wales, in various parts
of Scotland and especially in Lanarkshire, and in north-eastern,
eastern, and western Ireland. That there was a drift of civilized
peoples into Britain and Ireland during the period of the Neolithic
industry is made evident by the fact that the agricultural mode of life
was introduced. Barley does not grow wild in Europe. The nearest
area in which it grew wild and was earliest cultivated was the delta
area of Egypt, the region from which the earliest vessels set out to
explore the shores of the Mediterranean. It may be that the barley
seeds were carried to Britain not by the overland routes alone to
Channel ports, but also by the seafarers whose boats, like the
Glasgow one with the cork plug, coasted round by Spain and
Brittany, and crossed the Channel to south-western England and

thence went northward to Scotland. As Irish flints and ground axe-
heads occur chiefly in Ulster, it may be that the drift of early
Neolithic settlers into County Antrim, in which gold was also found,
was from south-western Scotland. The Neolithic settlement at
Whitepark Bay, five miles from the Giant's Causeway, was embedded
at a considerable depth, showing that there has been a sinking of
the land in this area since the Neolithic industry was introduced.
Neolithic remains are widely distributed over Scotland, but these
have not received the intensive study devoted to similar relics in
England. Mr. Ludovic Mann, the Glasgow archæologist, has, however,
compiled interesting data regarding one of the local industries that
bring out the resource and activities of early man. On the island of
Arran is a workable variety of the natural volcanic glass, called pitch-
stone, that of other parts of Scotland and of Ireland being too much
cracked into small pieces to be of use. It was used by the Neolithic
settlers in Arran for manufacturing arrowheads, and as it was
imported into Bute, Ayrshire, and Wigtownshire, a trade in this
material must have existed. If, writes Mr. Mann, the stone was
not locally worked up into implements in Bute, it was so manipulated
on the mainland, where workshops of the Neolithic period and the
immediately succeeding overlap period yielded long fine flakes,
testifying to greater expertness in manufacturing there than is
shown by the remains in the domestic sites yet awaiting adequate
exploration in Arran. The explanation may be that the Wigtownshire
flint knappers, accustomed to handle an abundance of flint, were
more proficient than in most other places, and that the pitch-stone
was brought to them as experts, because the material required even
more skilful handling than flint.[61]
In like manner obsidian, as has
been noted, was imported into Crete from the island of Melos by
seafarers, long before the introduction of metal working.[62]
It will be seen that the Neolithic peoples were no mere wandering
hunters, as some have represented them to have been, but they had
their social organization, their industries, and their system of trading
by land and sea. They settled not only in those areas where they

could procure a regular food supply, but those also in which they
obtained the raw materials for implements, weapons, and the
colouring material which they used for religious purposes. They
made pottery for grave offerings and domestic use, and wooden
implements regarding which, however, little is known. Withal, they
had their spinners and weavers. The conditions prevailing in
Neolithic settlements must have been similar to those of later times.
There must have been systems of laws to make trade and peaceful
social intercourse possible, and no doubt these had, as elsewhere, a
religious basis. Burial customs indicate a uniformity of beliefs over
wide areas. The skill displayed in working stone was so great that it
cannot now be emulated. Ripple-flaking has long been a lost art.
Craftsmen must have undergone a prolonged period of training
which was intelligently controlled under settled conditions of life. It
is possible that the so-called Neolithic folk were chiefly foreigners
who exploited the riches of the country. The evidence in this
connection will be found in the next chapter.

CHAPTER IX
Metal Workers and Megalithic
Monuments
Broad-heads of Bronze Age—The Irish Evidence—
Bronze Introduced by Traders—How Metals were Traced
—A Metal Working Tribe—Damnonii in England, Scotland,
and Ireland—Miners as Slaves—The Lot of Women
Workers—Megalithic Monuments in English Metal-yielding
Areas—Stone Circles in Barren Localities—Early Colonies
of Easterners in Spain—Egyptian and Babylonian Relics
associated with British Jet and Baltic Amber—A New Flint
Industry of Eastern Origin—British Bronze identical with
Continental—Ancient Furnaces of Common Origin
—Stones of Worship adorned with Metals—The Maggot
God of Stone Circles—Ancient Egyptian Beads at
Stonehenge—Earliest Authentic Date in British History—
The Aim of Conquests.
It used to be thought that the introduction of metal working into
Britain was the result of an invasion of alien peoples, who partly
exterminated and partly enslaved the long-headed Neolithic
inhabitants. This view was based on the evidence afforded by a new
type of grave known as the Round Barrow. In graves of this class
have been found Bronze Age relics, a distinctive kind of pottery, and
skulls of broad-heads. The invasion of broad-heads undoubtedly
took place, and their burial customs suggest that their religious
beliefs were not identical with those of the long-heads. But it

remains to be proved that they were the actual introducers of the
bronze industry. They do not appear to have reached Ireland, where
bronze relics are associated with a long-headed people of
comparatively low stature.
The early Irish bronze forms were obviously obtained from Spain,
while early English bronze forms resemble those of France and Italy.
Cutting implements were the first to be introduced. This fact does
not suggest that a conquest took place. The implements may have
been obtained by traders. Britain apparently had in those ancient
times its trading colonies, and was visited by active and enterprising
seafarers.
Long-head (Dolichocephalic) Skull

Broad-head (Brachycephalic) Skull
Both these specimens were found in Round Barrows in the East Riding
of Yorkshire
The discovery of metals in Britain and Ireland was, no doubt, first
made by prospectors who had obtained experience in working them
elsewhere. They may have simply come to exploit the country. How
these men conducted their investigations is indicated by the report
found in a British Museum manuscript, dating from about 1603, in
which the prospector gives his reason for believing that gold was to
be found on Crawford Moor in Lanarkshire. He tells that he saw
among the rocks what Scottish miners call mothers and English
miners leaders or metalline fumes. It was believed that the
fumes arose from veins of metal and coloured the rocks as smoke
passing upward through a tunnel blackens it, and leaves traces on
the outside. He professed to be able to distinguish between the
colours left by fumes of iron, lead, tin, copper, or silver. On
Crawford Moor he found sparr, keel, and brimstone between rocks,
and regarded this discovery as a sure indication that gold was in
situ. The mothers or leaders were more pronounced than any he
had ever seen in Cornwall, Somersetshire, about Keswick, or any
other mineral parts wheresoever I have travelled.[63]
Gold was found

in this area of Lanarkshire in considerable quantities, and was no
doubt worked in ancient times. Of special interest in this connection
is the fact that it was part of the territory occupied by Damnonians,
[64]
who appear to have been a metal-working people. Besides
occupying the richest metal-yielding area in Scotland, the
Damnonians were located in Devon and Cornwall, and in the east-
midland and western parts of Ireland, in which gold, copper, and tin-
stone were found as in south-western England. The Welsh Dyfneint
(Devon) is supposed by some to be connected with a form of this
tribal name. Another form in a Yarrow inscription is Dumnogeni. In
Ireland Inber Domnann is the old name of Malahide Bay north of
Dublin. Domnu, the genitive of which is Domnann, was the name of
an ancient goddess. In the Irish manuscripts these people are
referred to as Fir-domnann,[65]
and associated with the Fir-bolg (the
men with sacks). A sack-carrying people are represented in Spanish
rock paintings that date from the Azilian till early Bronze Age
times. In an Irish manuscript which praises the fair and tall people,
the Fir-bolg and Fir-domnann are included among the black-eyed
and black-haired people, the descendants of slaves and churls, and
the promoters of discord among the people.
The reference to slaves is of special interest because the lot of the
working miners was in ancient days an extremely arduous one. In
one of his collected records which describes the method of the
greatest antiquity Diodorus Siculus (a.d. first century) tells how
gold-miners, with lights bound on their foreheads, drove galleries
into the rocks, the fragments of which were carried out by frail old
men and boys. These were broken small by men in the prime of life.
The pounded stone was then ground in handmills by women: three
women to a mill and to each of those who bear this lot, death is
better than life. Afterwards the milled quartz was spread out on an
inclined table. Men threw water on it, work it through their fingers,
and dabbed it with sponges until the lighter matter was removed
and the gold was left behind. The precious metal was placed in a
clay crucible, which was kept heated for five days and five nights. It
may be that the Scandinavian references to the nine maidens who

turn the handle of the world mill which grinds out metal and soil,
and the Celtic references to the nine maidens who are associated
with the Celtic cauldron, survive from beliefs that reflected the habits
and methods of the ancient metal workers.
It is difficult now to trace the various areas in which gold was
anciently found in our islands. But this is not to be wondered at. In
Egypt there were once rich goldfields, especially in the Eastern
Desert, where about 100 square miles were so thoroughly worked in
ancient times that only the merest traces of gold remain.[66]
Gold,
as has been stated, was formerly found in south-western England,
North Wales, and, as historical records, archæological data, and
place names indicate, in various parts of Scotland and Ireland.
During the period of the Great Thaw a great deal of alluvial gold
must have distributed throughout the country. Silver was found in
various parts. In Sutherland it is mixed with gold as it is elsewhere
with lead. Copper was worked in a number of districts where the
veins cannot in modern times be economically worked, and tin was
found in Ireland and Scotland as well as in south-western England,
where mining operations do not seem to have been begun, as
Principal Sir John Rhys has shown,[67]
until after the supplies of
surface tin were exhausted. Of special interest in connection with
this problem is the association of megalithic monuments with ancient
mine workings. An interesting fact to be borne in mind in connection
with these relics of the activities and beliefs of the early peoples is
that they represent a distinct culture of complex character. Mr. T. Eric
Peet[68]
shows that the megalithic buildings occupy a very
remarkable position along a vast seaboard which includes the
Mediterranean coast of Africa and the Atlantic coast of Europe. In
other words, they lie entirely along a natural sea route. He gives
forcible reasons for arriving at the conclusion that it is impossible to
consider megalithic building as a mere phase through which many
nations passed, and it must therefore have been a system
originating with one race, and spreading far and wide, owing either
to trade influence or migration. He adds:

Great movements of races by sea were not by any
means unusual in primitive days. In fact, the sea has
always been less of an obstacle to early man than the
land with its deserts, mountains, and unfordable rivers.
There is nothing inherently impossible or even improbable
in the suggestion that a great immigration brought the
megalithic monuments from Sweden to India or vice
versa. History is full of instances of such migrations.
But there must have been a definite reason for these race
movements. It cannot be that in all cases they were forced merely
by natural causes, such as changes of climate, invasions of the sea,
and the drying up of once fertile districts, or by the propelling
influences of stronger races in every country from the British Isles to
Japan—that is, in all countries in which megalithic monuments of
similar type are found. The fact that the megalithic monuments are
distributed along a vast seaboard suggests that they were the
work of people who had acquired a culture of common origin, and
were attracted to different countries for the same reason. What that
attraction was is indicated by studying the elements of the
megalithic culture. In a lecture delivered before the British
Association in Manchester in 1915, Mr. W. J. Perry threw much light
on the problem by showing that the carriers of the culture practised
weaving linen, and in some cases the use of Tyrian purple, pearls,
precious stones, metals, and conch-shell trumpets, as well as curious
beliefs and superstitions attached to the latter, while they adopted
certain definite metallurgical methods, as well as mining. Mr. Perry's
paper was subsequently published by the Manchester Literary and
Philosophical Society. It shows that in Western Europe the megalithic
monuments are distributed in those areas in which ancient pre-
Roman and pre-Greek mine workings and metal washings have been
traced. The same correspondence, he writes, seems to hold in the
case of England and Wales. In the latter country the counties where
megalithic structures abound are precisely those where mineral
deposits and ancient mine-workings occur. In England the grouping
in Cumberland, Westmorland, Northumberland, Durham, and

Derbyshire is precisely that of old mines; in Cornwall the megalithic
structures are mainly grouped west of Falmouth, precisely in that
district where mining has always been most active.
Pearls, amber, coral, jet, c., were searched for as well as metals.
The megalithic monuments near pearling rivers, in the vicinity of
Whitby, the main source of jet, and in Denmark and the Baltic area
where amber was found were, in all likelihood, erected by people
who had come under the spell of the same ancient culture.
When, therefore, we come to deal with groups of monuments in
areas which were unsuitable for agriculture and unable to sustain
large populations, a reasonable conclusion to draw is that precious
metals, precious stones, or pearls were once found near them. The
pearling beds may have been destroyed or greatly reduced in value,
[69]
or the metals may have been worked out, leaving but slight if any
indication that they were ever in situ. Reference has been made to
the traces left by ancient miners in Egypt where no gold is now
found. In our own day rich gold fields in Australia and North America
have been exhausted. It would be unreasonable for us to suppose
that the same thing did not happen in our country, even although
but slight traces of the precious metal can now be obtained in areas
which were thoroughly explored by ancient miners.
When early man reached Scotland in search of suitable districts in
which to settle, he was not likely to be attracted by the barren or
semi-barren areas in which nature grudged soil for cultivation, where
pasture lands were poor and the coasts were lashed by great billows
for the greater part of the year, and the tempests of winter and
spring were particularly severe. Yet in such places as Carloway,
fronting the Atlantic on the west coast of Lewis, and at Stennis in
Orkney, across the dangerous Pentland Firth, are found the most
imposing stone circles north of Stonehenge and Avebury. Traces of
tin have been found in Lewis, and Orkney has yielded traces of lead,
including silver-lead, copper and zinc, and has flint in glacial drift.
Traces of tin have likewise been found on the mainlands of Ross-
shire and Argyllshire, in various islands of the Hebrides and in

Stirlingshire. The great Stonehenge circle is like the Callernish and
Stennis circles situated in a semi-barren area, but it is an area where
surface tin and gold were anciently obtained. One cannot help
concluding that the early people, who populated the wastes of
ancient Britain and erected megalithic monuments, were attracted
by something more tangible than the charms of solitude and wild
scenery. They searched for and found the things they required. If
they found gold, it must be recognized that there was a
psychological motive for the search for this precious metal. They
valued gold, or whatever other metal they worked in bleak and
isolated places, because they had learned to value it elsewhere.
Who were the people that first searched for, found, and used metals
in Western Europe? Some have assumed that the natives themselves
did so as a matter of course. Such a theory is, however, difficult to
maintain. Gold is a useless metal for all practical purposes. It is too
soft for implements. Besides, it cannot be found or worked except by
those who have acquired a great deal of knowledge and skill. The
men who first washed it from the soil in Britain must have
obtained the necessary knowledge and skill in a country where it
was more plentiful and much easier to work, and where—and this
point is a most important one—the magical and religious beliefs
connected with gold have a very definite history. Copper, tin, and
silver were even more difficult to find and work in Britain. The
ancient people who reached Britain and first worked metals or
collected ores were not the people who were accustomed to use
implements of bone, horn, and flint, and had been attracted to its
shores merely because fish, fowl, deer, and cows, were numerous.
The searchers for metals must have come from centres of Eastern
civilization, or from colonies of highly skilled peoples that had been
established in Western Europe. They did not necessarily come to
settle permanently in Britain, but rather to exploit its natural riches.
This conclusion is no mere hypothesis. Siret,[70]
the Belgian
archæologist, has discovered in southern Spain and Portugal traces
of numerous settlements of Easterners who searched for minerals,

c., long before the introduction of bronze working in Western
Europe. They came during the archæological Stone Age; they even
introduced some of the flint implements classed as Neolithic by the
archæologists of a past generation.
These Eastern colonists do not appear to have been an organized
people. Siret considers that they were merely groups of people from
Asia—probably the Syrian coast—who were in contact with Egypt.
During the Empire period of Egypt, the Egyptian sphere of influence
extended to the borders of Asia Minor. At an earlier period
Babylonian influence permeated the Syrian coast and part of Asia
Minor. The religious beliefs of seafarers from Syria were likely
therefore to bear traces of the Egyptian and Babylonian religious
systems. Evidence that this was the case has been forthcoming in
Spain.
These Eastern colonists not only operated in Spain and Portugal, but
established contact with Northern Europe. They exported what they
had searched for and found to their Eastern markets. No doubt, they
employed native labour, but they do not appear to have instructed
the natives how to make use of the ores they themselves valued so
highly. In time they were expelled from Spain and Portugal by the
people or mixed peoples who introduced the working of bronze and
made use of bronze weapons. These bronze carriers and workers
came from Central Europe, where colonies of peoples skilled in the
arts of mining and metal working had been established. In the
Central European colonies Ægean and Danubian influences have
been detected.

Valentine
THE RING OF STENNIS, ORKNEY (see page 94)
Among the archæological finds, which prove that the Easterners
settled in Iberia before bronze working was introduced among the
natives, are idol-like objects made of hippopotamus ivory from
Egypt, a shell (Dentalium elephantum) from the Red Sea, objects
made from ostrich eggs which must have been carried to Spain from
Africa, alabaster perfume flasks, cups of marble and alabaster of
Egyptian character which had been shaped with copper implements,
Oriental painted vases with decorations in red, black, blue, and
green,[71]
mural paintings on layers of plaster, feminine statuettes in
alabaster which Siret considers to be of Babylonian type, for they
differ from Ægean and Egyptian statuettes, a cult object (found in
graves) resembling the Egyptian ded amulet, c. The Iberian burial
places of these Eastern colonists have arched cupolas and entrance
corridors of Egyptian-Mycenæan character.
Of special interest are the beautifully worked flints associated with
these Eastern remains in Spain and Portugal. Siret draws attention to

the fact that no trace has been found of flint factories. This
particular flint industry was an entirely new one. It was not a
development of earlier flint-working in Iberia. Apparently the new
industry, which suddenly appears in full perfection, was introduced
by the Eastern colonists. It afterwards spread over the whole
maritime west, including Scandinavia where the metal implements of
more advanced countries were imitated in flint. This important fact
emphasizes the need for caution in making use of such a term as
Neolithic Age. Siret's view in this connection is that the Easterners,
who established trading colonies in Spain and elsewhere, prevented
the local use of metals which they had come to search for and
export. It was part of their policy to keep the natives in ignorance of
the uses to which metals could be put.
Evidence has been forthcoming that the operations of the Eastern
colonies in Spain and Portugal were extended towards the maritime
north. Associated with the Oriential relics already referred to, Siret
has discovered amber from the Baltic, jet from Britain (apparently
from Whitby in Yorkshire) and the green-stone called callais usually
found in beds of tin. The Eastern seafarers must have visited
Northern Europe to exploit its virgin riches. A green-stone axe was
found, as has been stated, near the boat with the cork plug, which
lay embedded in Clyde silt at Glasgow. Artifacts of callais have been
discovered in Brittany, in the south of France, in Portugal, and in
south-eastern Spain. In the latter area, as Siret has proved, the
Easterners worked silver-bearing lead and copper.
The colonists appear to have likewise searched for and found gold. A
diadem of gold was discovered in a necropolis in the south of Spain,
where some eminent ancient had been interred. This find is,
however, an exception. Precious metals do not as a rule appear in
the graves of the period under consideration.
As has been suggested, the Easterners who exploited the wealth of
ancient Iberia kept the natives in ignorance. This ignorance, Siret
says, was the guarantee of the prosperity of the commerce carried
on by the strangers.... The first action of the East on the West was

the exploitation for its exclusive and personal profit of the virgin
riches of the latter. These early Westerners had no idea of the use
and value of the metals lying on the surface of their native land,
while the Orientals valued them, were in need of them, and were
anxious to obtain them. As Siret puts it:
The West was a cow to be milked, a sheep to be fleeced,
a field to be cultivated, a mine to be exploited.
In the traditions preserved by classical writers, there are references
to the skill and cunning of the Phœnicians in commerce, and in the
exploitation of colonies founded among the ignorant Iberians. They
did not inform rival traders where they found metals. Formerly, as
Strabo says, the Phœnicians monopolized the trade from Gades
(Cadiz) with the islanders (of the Cassiterides); and they kept the
route a close secret. A vague ancient tradition is preserved by Pliny,
who tells that tin was first fetched from Cassiteris (the tin island) by
Midacritus.[72]
We owe it to the secretive Phœnicians that the
problem of the Cassiterides still remains a difficult one to solve.
To keep the native people ignorant the Easterners, Siret believes,
forbade the use of metals in their own colonies. A direct result of
this policy was the great development which took place in the
manufacture of the beautiful flint implements already referred to.
These the natives imitated, never dreaming that they were imitating
some forms that had been developed by a people who used copper
in their own country. When, therefore, we pick up beautiful Neolithic
flints, we cannot be too sure that the skill displayed belongs entirely
to the Stone Age, or that the flints evolved from earlier native
forms in those areas in which they are found.
The Easterners do not appear to have extracted the metals from
their ores either in Iberia or in Northern Europe. Tin-stone and
silver-bearing lead were used for ballast for their ships, and they
made anchors of lead. Gold washed from river beds could be easily
packed in small bulk. A people who lived by hunting and fishing were
not likely to be greatly interested in the laborious process of gold-

washing. Nor were they likely to attach to gold a magical and
religious value as did the ancient Egyptians and Sumerians.
So far as can be gathered from the Iberian evidence, the period of
exploitation by the colonists from the East was a somewhat
prolonged one. How many centuries it covered we can only guess. It
is of interest to find, in this connection, however, that something was
known in Mesopotamia before 2000 b.c. regarding the natural riches
of Western Europe. Tablets have recently been found on the site of
Asshur, the ancient capital of Assyria, which was originally a
Sumerian settlement. These make reference to the Empire of Sargon
of Akkad (c. 2600 b.c.), which, according to tradition, extended from
the Persian Gulf to the Syrian coast. Sargon was a great conqueror.
He poured out his glory over the world, declares a tablet found a
good many years ago. It was believed, too, that Sargon embarked
on the Mediterranean and occupied Cyprus. The fresh evidence from
the site of Asshur is to the effect that he conquered Kaptara (?Crete)
and the Tin Land beyond the Upper Sea (the Mediterranean). The
explanation may be that he obtained control of the markets to which
the Easterners carried from Spain and the coasts of Northern Europe
the ores, pearls, c., they had searched for and found. It may be,
therefore, that Britain was visited by Easterners even before
Sargon's time, and that the Glasgow boat with the plug of cork was
manned by dark Orientals who were prospecting the Scottish coast
before the last land movement had ceased—that is, some time after
3000 b.c.

MEGALITHS
Upper: Kit's Coty House, Kent. Lower: Trethevy Stone, Cornwall.
When the Easterners were expelled from Spain by a people from
Central Europe who used weapons of bronze, some of them appear
to have found refuge in Gaul. Siret is of opinion that others withdrew
from Brittany, where subsidences were taking place along the coast,
leaving their megalithic monuments below high-water mark, and
even under several feet of water as at Morbraz. He thinks that the

settlements of Easterners in Brittany were invaded at one and the
same time by the enemy and the ocean. Other refugees from the
colonies may have settled in Etruria, and founded the Etruscan
civilization. Etruscan menhirs resemble those of the south of France,
while the Etruscan crozier or wand, used in the art of augury,
resembles the croziers of the megaliths, c., of France, Spain, and
Portugal. There are references in Scottish Gaelic stories to magic
wands possessed by wise women, and by the mothers of
Cyclopean one-eyed giants. Ammianus Marcellinus, quoting
Timagenes,[73]
attributes to the Druids the statement that part of the
inhabitants of Gaul were indigenous, but that some had come from
the farthest shores and districts across the Rhine, having been
expelled from their own lands by frequent wars and the
encroachments of the ocean.
The bronze-using peoples who established overland trade routes in
Europe, displacing in some localities the colonies of Easterners and
isolating others, must have instructed the natives of Western Europe
how to mine and use metals. Bronze appears to have been
introduced into Britain by traders. That the ancient Britons did not
begin quite spontaneously to work copper and tin and manufacture
bronze is quite evident, because the earliest specimens of British
bronze which have been found are made of ninety per cent of
copper and ten per cent of tin as on the Continent. Now, since a
knowledge of the compound, wrote Dr. Robert Munro, implies a
previous acquaintance with its component elements, it follows that
progress in metallurgy had already reached the stage of knowing the
best combination of these metals for the manufacture of cutting
tools before bronze was practically known in Britain.[74]
The furnaces used were not invented in Britain. Professor Gowland
has shown that in Europe and Asia the system of working mines and
melting metals was identical in ancient times. Summarizing Professor
Gowland's articles in Archæologia and the Journal of the Royal
Anthropological Institute, Mr. W. J. Perry writes in this connection:[75]
The furnaces employed were similar; the crucibles were of the

same material, and generally of the same form; the process of
smelting, first on the surface and then in the crucibles was found
everywhere, even persisting down to present times in the absence of
any fresh cultural influence. The study of the technique of mining
and smelting has served to consolidate the floating mass of facts
which we have accumulated, and to add support for the contention
that one cultural influence is responsible for the earliest mining and
smelting and washing of metals and the getting of precious stones
and metals. The cause of the distribution of the megalithic culture
was the search for certain forms of material wealth.
That certain of the megalithic monuments were intimately connected
with the people who attached a religious value to metals is brought
out very forcibly in the references to pagan customs and beliefs in
early Christian Gaelic literature. There are statements in the Lives of
St. Patrick regarding a pagan god called Cenn Cruach and Crom
Cruach whose stone statue was adorned with gold and silver, and
surrounded by twelve other statues with bronze ornaments. The
statue is called the king idol of Erin, and it is stated that the
twelve idols were made of stone, but he ('Crom Cruach') was of
gold. To this god of a stone circle were offered up the firstlings of
every issue and the chief scions of every clan. Another idol was
called Crom Dubh (Black Crom), and his name is still connected,
O'Curry has written, with the first Sunday of August in Munster and
Connaught. An Ulster idol was called Crom Chonnaill, which was
either a living animal or a tree, or was believed to have been such,
O'Curry says. De Jubainville translates Cenn Cruach as Bloody
Head and Crom Cruach as Bloody Curb or Bloody Crescent.
O'Curry, on the other hand, translates Crom Cruach as Bloody
Maggot and Crom Dubh as Black Maggot. In Gaelic legends
maggots or worms are referred to as forms of supernatural
beings. The maggot which appeared on the flesh of a slain animal
was apparently regarded as a new form assumed by the
indestructible soul, just as in the Egyptian story of Bata the germ of
life passes from his bull form in a drop of blood from which two trees
spring up, and then in a chip from one of the trees from which the

man is restored in his original form.[76]
A similar belief, which is
widespread, is that bees have their origin as maggots placed in
trees. One form of the story was taken over by the early Christians,
which tells that Jesus was travelling with Peter and Paul and asked
hospitality from an old woman. The woman refused it and struck
Paul on the head. When the wound putrified maggots were
produced. Jesus took the maggots from the wound and placed them
in the hollow of a tree. When next they passed that way, Jesus
directed Paul to look in the tree hollow where, to his surprise, he
found bees and honey sprung from his own head.[77]
The custom of
placing crape on hives and telling the bees when a death takes
place, which still survives in the south of England and in the north of
Scotland, appears to be connected with the ancient belief that the
maggot, bee, and tree were connected with the sacred animal and
the sacred stone in which was the spirit of a deity. Sacred trees and
sacred stones were intimately connected. Tacitus tells us that the
Romans invaded Mona (Anglesea), they destroyed the sacred groves
in which the Druids and black-robed priestesses covered the altars
with the blood of captives.[78]
There are a number of dolmens on this
island and traces of ancient mine-workings, indicating that it had
been occupied by the early seafarers who colonized Britain and
Ireland and worked metals. A connection between the tree cult of
the Druids and the cult of the builders of megaliths is thus suggested
by Tacitus, as well as by the Irish evidence regarding the Ulster idol
Crom Chonnaill, referred to above (see also Chapter XII).
Who were the people that followed the earliest Easterners and
visited our shores to search like them for metals and erect megalithic
monuments? It is impossible to answer that question with certainty.
There were after the introduction of bronze working, as has been
indicated, intrusions of aliens. These included the introducers of the
short-barrow method of burial and the later introducers of burial by
cremation. It does not follow that all intrusions were those of
conquerors. Traders and artisans may have come with their families
in large numbers and mingled with the earlier peoples. Some
intruders appear to have come by overland routes from southern

and central France and from Central Europe and the Danube valley,
while others came across the sea from Spain. That a regular over-
seas trade-route was in existence is indicated by the references
made by classical writers to the Cassiterides (Tin Islands). Strabo
tells that the natives bartered tin and hides with merchants for
pottery, salt, and articles of bronze. The Phœnicians, as has been
noted, monopolized the trade from Gades (Cadiz) with the islanders
and kept the route a close secret. It was probably along this sea-
route that Egyptian blue beads reached Britain. Professor Sayce has
identified a number of these in Devizes Museum, and writes:
They are met with plentifully in the Early Bronze Age
tumuli of Wiltshire in association with amber beads and
barrel-shaped beads of jet or lignite. Three of them come
from Stonehenge itself. Similar beads of ivory have been
found in a Bronze Age cist near Warminster: if the
material is really ivory it must have been derived from the
East. The cylindrical faience beads, it may be added, have
been discovered in Dorsetshire as well as in Wiltshire.
Professor Sayce emphasizes that these blue beads belong to one
particular period in Egyptian history, the latter part of the Eighteenth
Dynasty and the earlier part of the Nineteenth Dynasty.... The period
to which they belong may be dated 1450-1250 b.c., and as we must
allow some time for their passage across the trade routes to
Wiltshire an approximate date for their presence in the British
barrows will be 1300 b.c.

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

Topological Data Analysis With Applications Carlsson Gunnar Vejdemojohansson

More Related Content

Similar to Topological Data Analysis With Applications Carlsson Gunnar Vejdemojohansson (20)

Recently uploaded (20)

Topological Data Analysis With Applications Carlsson Gunnar Vejdemojohansson