SlideShare a Scribd company logo
Read Anytime Anywhere Easy Ebook Downloads at ebookmeta.com
Data Management at Scale, Second Edition Piethein
Strengholt
https://ebookmeta.com/product/data-management-at-scale-
second-edition-piethein-strengholt/
OR CLICK HERE
DOWLOAD EBOOK
Visit and Get More Ebook Downloads Instantly at https://ebookmeta.com
Data Management at Scale, Second Edition Piethein Strengholt
Microsoft
Data Management at Scale, Second Edition Piethein Strengholt
Data Management at Scale
Modern Data Architecture with Data Mesh and Data
Fabric
SECOND EDITION
Piethein Strengholt
Data Management at Scale
by Piethein Strengholt
Copyright © 2023 Piethein Strengholt. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North,
Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales
promotional use. Online editions are also available for most titles
(http://oreilly.com). For more information, contact our
corporate/institutional sales department: 800-998-9938 or
corporate@oreilly.com.
Acquisitions Editor: Michelle Smith
Development Editor: Shira Evans
Production Editor: Katherine Tozer
Copyeditor: Rachel Head
Proofreader: Piper Editorial Consulting, LLC
Indexer: nSight, Inc.
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Kate Dullea
April 2023: Second Edition
Revision History for the Second Edition
2023-04-10: First Release
See https://oreilly.com/catalog/errata.csp?isbn=9781098138868 for
release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc.
Data Management at Scale, the cover image, and related trade dress
are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the author, and do not
represent the publisher’s views. While the publisher and the author
have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and
the author disclaim all responsibility for errors or omissions,
including without limitation responsibility for damages resulting from
the use of or reliance on this work. Use of the information and
instructions contained in this work is at your own risk. If any code
samples or other technology this work contains or describes is
subject to open source licenses or the intellectual property rights of
others, it is your responsibility to ensure that your use thereof
complies with such licenses and/or rights.
This work is part of a collaboration between O’Reilly and Microsoft.
See our statement of editorial independence.
978-1-098-15207-9
[LSI]
Foreword
Whenever we talk about software, we inevitably end up talking
about data—how much there is, where it all lives, what it means,
where it came from or needs to go, and what happens when it
changes. These questions have stuck with us over the years, while
the technology we use to manage our data has changed rapidly.
Today’s databases provide instantaneous access to vast online
datasets; analytics systems answer complex, probing questions;
event-streaming platforms not only connect different applications but
also provide storage, query processing, and built-in data
management tools.
As these technologies have evolved, so have the expectations of our
users. A user is often connected to many different backend systems,
located in different parts of a company, as they switch from mobile
to desktop to call center, change location, or move from one
application to another. All the while, they expect a seamless and
real-time experience. I think the implications of this are far greater
than many may realize. The challenge involves a large estate of
software, data, and people that must appear—at least to our users—
to be a single joined-up unit.
Managing company-wide systems like this has always been a dark
art, something I got a feeling for when I helped build the
infrastructure that backs LinkedIn. All of LinkedIn’s data is generated
continuously, 24 hours a day, by processes that never stop. But
when I first arrived at the company, the infrastructure for harnessing
that data was often limited to big, slow, batch data dumps at the
end of the day and simplistic lookups, jerry-rigged together with
homegrown data feeds. The concept of “end-of-the-day batch
processing” seemed to me to be some legacy of a bygone era of
punch cards and mainframes. Indeed, for a global business, the day
doesn’t end.
As LinkedIn grew, it too became a sprawling software estate, and it
was clear to me that there was no off-the-shelf solution for this kind
of problem. Furthermore, having built the NoSQL databases that
powered LinkedIn’s website, I knew that there was an emerging
renaissance of distributed systems techniques, which meant
solutions could be built that weren’t possible before. This led to
Apache Kafka, which combined scalable messaging, storage, and
processing over the profile updates, page visits, payments, and
other event streams that sat at the core of LinkedIn.
While Kafka streamlined LinkedIn’s dataflows, it also affected the
way applications were built. Like many Silicon Valley firms at the turn
of the last decade, we had been experimenting with microservices,
and it took several iterations to come up with something that was
both functional and stable. This problem was as much about data
and people as it was about software: a complex, interconnected
system that had to evolve as the company grew. Handling a problem
this big required a new kind of technology, but it also needed a new
skill set to go with it.
Of course, there was no manual for navigating this problem back
then. We worked it out as we went along, but this book may well
have been the missing manual we needed. In it, Piethein provides a
comprehensive strategy for managing data not simply in a solitary
database or application but across the many databases, applications,
microservices, storage layers, and all other types of software that
make up today’s technology landscapes.
He also takes an opinionated view, with an architecture to match,
grounded in a well-thought-out set of principles. These help to
bound the decision space with logical guardrails, inside of which a
host of practical solutions should fit. I think this approach will be
very valuable to architects and engineers as they map their own
problem domain to the trade-offs described in this book. Indeed,
Piethein takes you on a journey that goes beyond data and
applications into the rich fabric of interactions that bind entire
companies together.
Jay Kreps
Cofounder and CEO at Confluent
Preface
Data management is an emerging and disruptive subject.
Datafication is everywhere. This transformation is happening all
around us: in smartphones, TV devices, ereaders, industrial
machines, self-driving cars, robots, and so on. It’s changing our lives
at an accelerating speed.
As the amount of data generated skyrockets, so does its complexity.
Disruptive trends like cloudification, API and ecosystem connectivity,
microservices, open data, software as a service (SaaS), and new
software delivery models have a tremendous effect on data
management. In parallel, we see an enormous number of new
applications transforming our businesses. All these trends are
fragmenting the data landscape. As a result, we are seeing more
point-to-point interfaces, endless discussions about data quality and
ownership, and plenty of ethical and legal dilemmas regarding
privacy, safety, and security. Agility, long-term stability, and clear
data governance compete with the need to develop new business
cases swiftly. We sorely need a clear vision for the future of data
management.
This book’s perspective on data management is informed by my
personal experience driving the data architecture agenda for a large
enterprise as chief data architect. Executing that role showed me
clearly the impact a good data strategy can have on a large
organization. After leaving that company, I started working as the
chief data officer for Microsoft Netherlands. In this exciting new
position, I’ve worked with over 50 large customers discussing and
attempting to come up with a perfect data solution. Here are some
of the common threads I’ve identified across all enterprises:
An overarching data strategy is often missing or not connected
to the business objectives. Discussions about data management
mostly pivot to technology trends and engineering discussions.
What is needed is business engagement: a good strategy and
well-thought-out data management and analysis plan that
includes tangible value in the form of business use cases. To
make my point: the focus must be put on usage and turning
data into business value.
Enterprises have difficulties in interpreting new concepts like the
data mesh and data fabric, because pragmatic guidance and
experiences from the field are missing. In addition to that, the
data mesh fully embraces a decentralized approach, which is a
transformational change not only for the data architecture and
technology, but even more so for organization and processes.
This means the transformation cannot only be led by IT; it’s a
business transformation as well.
Enterprises find it difficult to comprehend the latest technology
trends. They’re unable to interpret nuances or make pragmatic
choices.
Enterprises struggle to get started: large ambitions often end
with limited action; the execution plan and architecture remain
too high-level, too conceptual; top-down commitment from
leadership is missing.
These experiences and my observations across a range of
enterprises inspired me to write this second edition of Data
Management at Scale. You may wonder why this book is worth
reading, over the first edition—let’s take a closer look.
Why I Wrote This Book and Why Now
The first edition was founded on the experience I gained while
working at ABN AMRO as chief data architect.1
In that role, my team
and I practiced the approach of federation: shifting activities and
responsibilities in response to the need for a faster pace of change.
We used governance for balancing the imperatives of centralization
and decentralization. This shift was supported by a central data team
that started to develop platforms for empowering business units to
meet their goals. With platforms, we introduced self-service and
aligned analysts to domains, supporting them in implementing their
use cases. We experimented with domain-driven design and
eventually switched to business architecture for managing the
architectural landscape as a whole. I used all these experiences as
input for writing the first edition.
The term data mesh as a description of a sociotechnical approach to
using data at large was coined at around the time the manuscript for
the first edition was being finalized. When Zhamak Dehghani’s article
describing the concept appeared on Martin Fowler’s website, it
revealed concrete names for concepts we’d already been using at
ABN AMRO for many years. These names became industry terms,
and the concept quickly began to resonate with large organizations
as a solution to the friction enterprises encounter when scaling up.
So, why write a second edition? To start with, it was the data mesh
concept. I love the ideas of bringing data management and software
architecture closer together and businesses taking ownership of their
data, but I firmly believe that, with all the fuss, a more nuanced
view is needed.
In my previous role as an enterprise architect, we had hundreds of
application teams, thousands of services, and many large legacy
applications to manage. In such situations, you approach complexity
differently. With the data mesh architecture, artist, song, and playlist
are often used as data domain examples. This approach of
decomposing data into fine-grained domains might work well when
designing microservices, but it isn’t well suited to (re)structuring
large data landscapes. A different viewpoint is needed for scale.
Next, a more nuanced and pragmatic view of data products is
needed. There are good reasons why data must be managed
holistically and end-to-end. Enterprises have reusability and
consistency concerns. They’re forced by regulation to conform to the
same dimensions for group reporting, accounting, financial
reporting, and auditing and risk management. I know this might
sound controversial, but a data product cannot be advocated to be
managed as a container: something that packages data, metadata,
code, and infrastructure all together in an architecture as tiny as a
microservice. This doesn’t reflect how today’s big data platforms
work. Finally, the data mesh story isn’t complete: it focuses only on
data that is used for analytical purposes, not operational purposes; it
omits master data management;2
the consumer side must be
complemented with an intelligent data fabric; and it doesn’t provide
much data modeling guidance for building data products.
Another incentive for publishing a second edition was concerns
about the book’s practicality. The first version was perceived by
various readers as too abstract. Some critical reviewers even left
comments questioning my hands-on experience. In this second
edition I’ve worked hard to address these concerns, providing many
real-world examples and concrete solution diagrams. From time to
time, I also refer to blog posts that I’ve written about how to
implement designs. One final note on this: there are a large number
of very complex topics to cover, which are also highly context-
sensitive. It would be impossible to provide examples of everything
in a single volume, so I’ve had to use some discretion.
I’m excited to share my thoughts on best practices and observations
from the field, and I hope this book inspires you. Reflecting on my
time working at ABN AMRO, there are lots of good lessons to be
taken from other enterprises. I’ve seen a lot of good approaches.
There’s no right or wrong when building good data architecture; it’s
all about making the right trade-offs and discovering what works
best for your situation.
If you’ve already read the first edition, you should find this one
significantly different and much improved. Structurally it’s more or
less the same, but every chapter has been revised and enhanced. All
the diagrams have also been revised, new content has been added,
and it’s much more practical. Within each chapter you’ll find many
tips, starting points, and references to helpful articles.
Who Is This Book For?
This book is intended for large enterprises, though smaller
organizations may find much of value in it. It’s geared toward:
Executives and architects
Chief data officers, chief technology officers, chief architects,
enterprise architects, and lead data architects
Analytics teams
Data scientists, data engineers, data analysts, and heads of
analytics
Development teams
Data engineers, data scientists, business intelligence engineers,
data modelers and designers, and other data professionals
Compliance and governance teams
Chief information security officers, data protection officers,
information security analysts, regulatory compliance heads, data
stewards, and business analysts
How to Read or Use This Book
It’s important to say up front that this book touches upon a lot of
complex topics that are often interrelated or intertwined with other
subjects. So we’ll be hopping between different technologies,
business methods, frameworks, and architecture patterns. From time
to time I bring in my own operational experience when
implementing different architectures, so we’ll be working at different
levels of abstraction. To describe the journey through the book, I’ll
use the analogy of a helicopter ride.
We’ll start with a zoomed-out view, looking at data management,
data strategy, and data architecture at an abstract and higher level.
From this helicopter view, we’ll start to zoom in and first explore
what data domains and landing zones are. We’ll then fly to the
source system side of our landscape, in which applications are
managed and data is created, and circle until we have covered most
of the areas of data management. Then we’ll fly over to the
consumer side of the landscape and start learning about the
dynamics there. After that, we’ll bring everything we’ve covered
together by putting things into practice.
To help you navigate through the book, the following table gives a
high-level overview of which subjects will be intensively discussed in
each chapter.
Table P-1. Key topics in each chapter
Ch. 1 Ch. 2 Ch. 3 Ch. 4
Data
management
x
Data strategy x x x
Data
architecture
x x
Data
integration
x
Data
modeling
x
Data
governance
Data security
Data quality x
Metadata
management
MDM
Business
intelligence
Ch. 1 Ch. 2 Ch. 3 Ch. 4
Advanced
analytics
Enterprise
architecture
Chapter 1 introduces the topic of data management. It gives a
contextual view of what data management is, how it’s changing, and
how it affects our digital transformation. It provides an assessment
of the state of the field in recent years and guidance for working out
a data strategy. In Chapter 2, we’ll jump into the details of
managing data at large, exploring domain-driven design and
business architecture as methodologies for managing a large data
landscape using data domains. Next, Chapter 3 focuses on
topologies and data landing zones as a way of structuring your data
architecture and aligning with your data domains.
The following chapters discuss the specifics of distributing data.
Chapter 4 focuses on data products, Command Query Responsibility
Segregation (CQRS), and guiding principles, and presents an
example solution design. Chapter 5 discusses API management, and
Chapter 6 covers event and notification management. Chapter 7
brings it all together for a comprehensive overview, complemented
with architecture guidance and experience.
Next, we delve deeper into more advanced aspects of data
management. Chapter 8 examines how to approach data
governance and security in ways that are practical and sustainable
for the long term, even in rapidly changing times. Chapter 9 is a
deep dive into the use, significance, and democratizing potential of
metadata. Chapter 10 offers guidance on using master data
management (MDM) to keep data consistent over distributed, wide-
ranging assets, while Chapter 11 addresses turning data into value.
Chapter 12 concludes the book with an example of making it real
and a vision for the future of data management and enterprise
architecture.
Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file
extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to
program elements such as variable or function names, databases,
data types, environment variables, statements, and keywords.
TIP
This element signifies a tip or suggestion.
NOTE
This element signifies a general note.
WARNING
This element indicates a warning or caution.
O’Reilly Online Learning
NOTE
For more than 40 years, O’Reilly Media has provided technology and
business training, knowledge, and insight to help companies succeed.
Our unique network of experts and innovators share their knowledge
and expertise through books, articles, and our online learning
platform. O’Reilly’s online learning platform gives you on-demand
access to live training courses, in-depth learning paths, interactive
coding environments, and a vast collection of text and video from
O’Reilly and 200+ other publishers. For more information, visit
http://oreilly.com.
How to Contact Us
Please address comments and questions concerning this book to the
publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples,
and any additional information. You can access this page at
https://oreil.ly/data-mgmt-at-scale-2e.
Email bookquestions@oreilly.com to comment or ask technical
questions about this book.
For more information about our books, courses, conferences, and
news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly.
Follow us on Twitter: http://twitter.com/oreillymedia.
Watch us on YouTube: http://youtube.com/oreillymedia.
Acknowledgments
I would like to acknowledge Jessica Strengholt-Geitenbeek for
allowing me to write this book. She has supported me throughout
this journey, taking care of the kids and creating room to allow me
to work on this, and she’s the love of my life.
I also would like to thank ABN AMRO, and especially Santhosh Pillai
for his trust and for guiding me throughout my career at the
company. Many of the initial ideas for this project originated in his
mind. Without the countless discussions he and I had, this book
wouldn’t exist. Next, I would like to thank Microsoft for providing the
support I needed to write this second edition. In addition, many
others provided support and feedback on the book: thanks to Tim
Ward (CEO at CluedIn), Batuhan Tuter, Nasim Merhshid, Rob Worrall,
Frank Leisten, and all the others who contributed in various ways.
Thanks also to the book’s technical reviewers, John Mallinder and
Ole Olesen-Bagneux. Your valuable insights and feedback helped
validate the technical content and make this a better book.
Finally, I would like to thank all the fantastic crew members from
O’Reilly for their support and trust. Shira, thank you for taking care
of me. I enjoyed our conversations, and I’m grateful for your
constructive feedback. Katie, thank you for your continuous support
and transparency. To my fantastic copyeditor Rachel Head, thank
you for your hard work to review and edit all content. You really
have done an outstanding job by debugging the content and
connecting my sentences.
1 The statements and opinions expressed in this book don’t necessarily
reflect the positions of ABN AMRO or Microsoft.
2 The terminology “master/slave” is clearly offensive, and many organizations
have switched to alternatives like “source/replica” or “primary/subordinate.”
We strive to be as inclusive as possible, but will use “master data
management” in this book because the industry hasn’t yet adopted an
alternative.
Chapter 1. The Journey to
Becoming Data-Driven
The pre-COVID-19 world was already fast and highly data-driven,
but the pace of change has accelerated rapidly. Fierce competition, a
digital-first era, ever-increasing customer expectations, and rising
regulatory scrutiny require organizations to transform themselves
into modern data-driven enterprises. This transformation will
inevitably result in future organizations being more digital than those
of today, and having a different view of data. Tomorrow’s
organizations will breathe data and embrace a philosophy that places
it at the heart of their business. They will manage data as a product,
make strategic decisions based on data analysis, and have a culture
that acts on data.
Data-driven isn’t just a buzzword.1
Being data-driven provides an
organization with a significant competitive advantage over other
organizations. It can be proactive, and it can predict what will
happen before it does. By using data correctly, organizations can
quickly react to changes. Using data leads to greater confidence
because decisions are based on facts, not intuition. With data, new
industry trends and business opportunities can be spotted sooner.
Customer retention and satisfaction are improved as well, because
data tells organizations what customers think, how they behave, and
what they want. With data, organizations can be more flexible, agile,
and cost effective, because data provides insights into measured
results, employee loyalty, dependencies, applications, and processes.
So, the imperative for organizations to transform themselves into
data-driven enterprises is definitively there.
Before we jump into the transformation itself, we’ll explore the
present-day challenges that require us to reevaluate how data must
be managed. We’ll establish a common definition of data
management, encompassing all the disciplines and activities related
to managing data as an asset. After that, we’ll zoom in on several
key technology developments and industry trends, and consider their
impact on data management. We’ll look at some best practices from
the last decade of data management, providing insights into why
previous-generation architectures are hard to scale. Finally, we’ll
consider what a next-generation data architecture might look like,
and I’ll present a set of action points that you will need to address
while developing your data strategy.
Recent Technology Developments and
Industry Trends
Transforming an organization to become data-driven isn’t easy. It’s a
long-term process that requires patience and fortitude. With more
data available, traditional architectures can no longer be scaled up
because of their size, complexity, monolithic designs, and centralistic
operating models. Enterprises need a new data strategy and cloud-
based architecture. A paradigm shift and change of culture are
needed, too, because the centralized data and IT operating models
that work today will no longer work when applying federated data
ownership and self-serve consumption models. This requires
organizations to redefine how people, processes, and technology are
aligned with data.
Recent technology developments and industry trends force us to
reevaluate how data must be managed. We need to shift away from
funneling all data into a single silo toward an approach that enables
domains, teams, and users to distribute, consume, and use data
themselves easily and securely. Platforms, processes, and patterns
should simplify the work for others. We need interfaces that are
simple, well documented, fast, and easy to consume. We need an
architecture that works at scale.
Although there are many positives about evolving into a truly data-
driven organization, it’s important to be aware of several technology
developments and industry trends that are impacting data
landscapes. In this chapter, I’ll discuss each of these and its
influence on data management. Firstly, analytics is fragmenting the
data landscape because of use case diversity. Secondly, new
software development methodologies are making data harder to
manage. Thirdly, cloud computing and faster networks are
fragmenting data landscapes. In addition, there are privacy, security,
and regulatory concerns to be aware of, and the rapid growth of
data and intensive data consumption are making operational
systems suffer. Lastly, data monetization requires an ecosystem-to-
ecosystem architecture. The impact these trends have on data
management is tremendous, and they are forcing the whole industry
to rethink how data management must be conducted in the future.
Fortunately, new approaches to data management have emerged
over the past few years, including the ideas of a data mesh and data
fabric:
The data mesh is an exciting new methodology for managing
data at large. The concept foresees an architecture in which
data is highly distributed and a future in which scalability is
achieved by federating responsibilities. It puts an emphasis on
the human factor and addressing the challenges of managing
the increasing complexity of data architectures.
The data fabric is an approach that addresses today’s data
management and scalability challenges by adding intelligence
and simplifying data access using self-service. In contrast to the
data mesh, it focuses more on the technology layer. It’s an
architectural vision using unified metadata with an end-to-end
integrated layer (fabric) for easily accessing, integrating,
provisioning, and using data.
These emerging approaches to data management are
complementary and often overlap. Despite popular belief among
data practitioners and what commercial vendors say, they shouldn’t
be seen as rigid or standalone techniques. In fact, I expect these
approaches to coexist and complement one another and any existing
investments in operational data stores, data warehouses, and data
lakes.
In the transition to becoming data-driven, organizations need to
make trade-offs to balance the imperatives of centralization and
decentralization. Some prefer a high degree of autonomy for their
business teams, while others prioritize quality and control. Some
organizations have a relatively simple structure, while others are
brutally large and complex. Creating the perfect governance
structure and data architecture isn’t easy, so while developing your
strategy, I encourage you to view these approaches to data
management as frameworks. There’s no right or wrong. With the
data mesh approach, for example, you might like some of the best
practices and principles, but not others; you don’t necessarily have
to apply all of them.
In this book, I’ll share my view on data management—one that is
based on observations from the field while working closely with
many large enterprises, and that helps you to make the right
decisions by learning from others. We’ll go beyond the concepts of
data mesh and data fabric, because I strongly believe that a data
strategy should be inclusive of both the operational and analytical
planes, and that the decomposition method for both data domains
and data products must be altered to fit the scale of large
enterprises. To help you in your journey, I’ll share my observations
on the strategies and architectures different enterprises have
designed, why, and what trade-offs they have made.
Before we jump into details, we need to agree on what data
management is, and why it’s important. Next, we need to determine
how to define boundaries and shape our landscape based on various
trade-offs. Finally, we’ll examine how current enterprise data
architectures can be designed and organized for today and
tomorrow.
Let me lay my cards out on the table: decentralization is not a
desired state, but the inevitable future of data. I therefore have a
strong belief that scalability is forcing data management to become
more decentrally organized. Managing data at scale requires you to
federate key responsibilities, set strong standards, and properly align
central and local resources and activities. This change affects
multiple areas: people, processes, and technology. It forces you to
decompose your architecture, dividing and grouping responsibilities.
The shift from centralization to decentralization also contradicts the
established best practice from the past decade: building large data
silos in which all data is collected and integrated before being
served. Although data warehouse and lake architectures are
excellent approaches for utilizing data, these centralized models are
not suited to a decentralized distributed data architecture.
Now that we’ve set the scene, I ask that you take a deep breath and
put your biases aside. Many of us might be deeply invested in
centralized data architectures; this has been a best practice for many
years. I acknowledge that the need for data harmonization and for
bringing large amounts of data into a particular context remains, and
that doing so brings value to organizations, but something we must
consider is the scale at which we want to apply this discipline. In a
highly distributed ecosystem with hundreds or even thousands of
applications, is the best way of managing data to apply centralization
on all dimensions? Is it best to integrate and harmonize all data?
Data Management
The term data management refers to the set of processes and
procedures used to manage data. The Data Management
Association’s Data Management Body of Knowledge (DAMA-DMBOK)
has a more extensive explanation of data management, which it
defines as “the development, execution, and supervision of plans,
policies, programs, and practices that deliver, control, protect, and
enhance the value of data and information assets throughout their
life cycles.”2
The DAMA-DMBOK identifies 11 functional areas of data
management, with data governance at the heart, as shown in
Figure 1-1. It’s crucial to embed all of these deeply into your
organization. Otherwise, you’ll lack insight and become ineffective,
and your data will get out of control. Becoming data-driven—getting
as much value as possible out of your data—will become a
challenge. Analytics, for example, is worth nothing if you have low-
quality data.
Figure 1-1. The 11 functional areas of data management
The activities and disciplines of data management are wide ranging
and cover multiple areas, some closely related to software
architecture.3
In this book, I’ll focus on the aspects of data
management that are most relevant for managing a modern data
architecture at scale. Let’s take a closer look at the 11 areas
identified in this figure and where they’re covered in the book:
Data governance, shown at the heart of Figure 1-1, involves all
activities around implementing and enforcing authority and
control over the management of data, including all
corresponding assets. This area is described in detail in
Chapter 8.
Data architecture involves the definition of the master plan for
your data, including the blueprints,4
reference architectures,
future state vision, and dependencies. Managing these helps
organizations make decisions. The entire book revolves around
data architecture generally, but the discipline and its activities
will be covered fully in Chapters 2 and 3.
Data modeling and design is about structuring and representing
data within a specific context and specific systems. Discovering,
designing, and analyzing data requirements are all part of this
discipline. We’ll discuss these topics in Chapters 4, 7, and 11.
Data storage and operations refers to the management of the
database design, correct implementation, and support in order
to maximize the value of the data. Database management also
includes database operations management. We’ll address this in
Chapter 11.
Data security includes all disciplines and activities that provide
secure authentication, authorization, and access to the data.
These activities include prevention, auditing, and escalation-
mitigating actions. This area is described in more detail in
Chapter 8.
Data integration and interoperability includes all the disciplines
and activities related to moving, collecting, consolidating,
combining, and transforming data in order to move it efficiently
from one context into another. Data interoperability refers to the
capability to communicate, invoke functions, or transfer data
among various applications in a way that requires little or no
knowledge of the application characteristics. Data integration,
on the other hand, is about consolidating data from different
(multiple) sources into a unified view. This process, which I
consider most important, is often supported by extra tools, such
as replication and ETL (extract, transform, and load) tools. It’s
described extensively in Chapters 4, 5, and 6.
Document and content management is the process of managing
data stored in unstructured (media) and data formats. Some
aspects of this will be discussed in Chapters 5 and 6.
Reference and master data management is about managing
critical data to make sure the data is accessible, accurate,
secure, transparent, and trustworthy. This area is described in
more detail in Chapter 10.5
Data warehousing and business intelligence management
includes all the activities that provide business insights and
support decision making. This area, including advanced
analytics, is described in more depth in Chapter 11.
Metadata management involves managing all data that classifies
and describes the data. Metadata can be used to make the data
understandable, ready for integration, and secure. It can also be
used to ensure the quality of data. This area is described in
more detail in Chapter 9.
Data quality management includes all activities related to
managing the quality of data to ensure the data can be used.
Some aspects of this area are described in Chapters 2 and 3.
The part of the DAMA-DMBOK that needs more work, which inspired
me to write the first edition of this book, is the section on data
integration and interoperability. I believe this section is lacking
depth: the relationship to application integration and software
architecture is not clear. It doesn’t discuss decentralized
architectures, and it lacks modern guidance on the interoperability of
data, such as observability best practices and modern data pipeline
management. In addition, the link to metadata management is
weak. Metadata needs integration and interoperability, too, because
it is scattered across many tools, applications, platforms, and
environments in diverse shapes and forms. The interoperability of
metadata—the ability of two or more systems or components to
exchange descriptive data about data—gets insufficient treatment:
building and managing a large-scale architecture is very much about
metadata integration. Interoperability and metadata also aren’t well
connected to the area of data architecture. If metadata is utilized in
the right way, you can see what data passes by, how it can be
integrated, distributed, and secured, and how it connects to
applications, business capabilities, and so on. There’s limited
guidance in the DAMA-DMBOK about managing your data as a whole
by utilizing and connecting metadata.
Another concern I have is the view DAMA and many organizations
have on achieving end-to-end semantic consistency. As of today,
attempts to unify semantics to provide enterprise-wide consistency
are still taking place. This is called a single version of the truth.
However, applications are always unique, and so is data. Designing
applications involves a lot of implicit thinking. The (domain) context
of the business problem influences the design of the application and
finds its way into the data. We pass through this context when we
move from conceptual design into logical application design and
physical application design.6
It’s essential to understand this because
it frames any future architecture. When data is moved across
applications, a data transformation step is always necessary. There’s
no escape from this data transformation dilemma! In the following
chapters, I’ll return to this idea.
Another view I see in many organizations is that data management
should be central and must be connected to the strategic goals of
the enterprise. Some organizations still believe that operational costs
can be reduced by centralizing all data and management activities.
There’s also a deep assumption that a centralized platform can take
away the pain of data integration for its users and consumers.
Companies have invested heavily in their enterprise data platforms,
which include data warehouses, data lakes, and service buses. The
activities of master data management are strongly connected to
these platforms because consolidating allows us to simultaneously
improve the accuracy of our most critical data.
A centralized platform—and the centralized model that comes with it
—will be subject to failure because it won’t be able to keep up with
the developments and trends that underpin decentralization, such as
analytics, cloud computing, new software development
methodologies, real-time decision making, and data monetization.
While they may be aware of these trends, many companies fail to
comprehend the impact they have on data management. Let’s
examine the most important trends and determine the magnitude of
that impact.
Analytics Is Fragmenting the Data Landscape
The most trailblazing trend is advanced analytics, which exploits data
to make companies more responsive, competitive, and innovative.
Why does advanced analytics disrupt the existing data landscape?
With more data available, the number of options and opportunities
skyrockets. Advanced analytics is about making what-if analyses,
projecting future trends and outcomes or events, detecting hidden
relations and behaviors, and automating decision making. Because
of the recognized value and strategic benefits of advanced analytics,
many methodologies, frameworks, and tools have been developed to
use it in divergent ways. We’ve only scratched the surface of what
artificial intelligence (AI), machine learning (ML), and natural
language processing (NLP) will be capable of in the future.
NOTE
OpenAI’s new ChatGPT is a mind-blowing example of what AI is capable
of. The models behind OpenAI, which include the Generative Pre-trained
Transformer (GPT) series, can work on complex tasks such as analyzing
math problems, writing code snippets, producing book essays, creating
recipes using a list of ingredients, and much more.
These analytical trends force data to be distributed across many
analytical applications because every individual use case requires
different data. Unique business problems require unique thinking,
unique data, and optimized technology to provide the best solution.
Take, for example, a marketing business unit whose goal is to
identify new sales opportunities for older and younger customers.
Targeting these two audiences requires different features—
measurable properties for analyzing—in datasets. For example,
prospects that are younger will be segmented and clustered
differently than prospects that are older. Asking the marketing
department to use a single dataset for both target audiences
requires many compromises, and you’ll probably end up with a
feature store that doesn’t add any value to each use case.7
The
optimal solution for generating the most value is to give either use
case its own unique set of features optimized for each learning
algorithm. The increasing popularity of advanced analytics and
resulting use case diversity issues lead to two problems: data
proliferation and data intensiveness.
With data proliferation, data is distributed and scattered across a
myriad of locations, applications, and databases. This is because
consuming domains need to process the data to fit it into their
unique solutions. This data distribution introduces other problems.
For one, when data is repeatedly copied and scattered throughout
the organization, it becomes more difficult to find its origin and
judge its quality. This requires you to develop a single logical view of
the same data that is managed in different locations. Additionally,
extensive data distribution makes controlling the data much more
difficult because data can be spread even further as soon as it leaves
any given application. This requires you to develop a framework for
efficiently reusing data, while applying governance and staying
compliant with external regulations.
The proliferation of analytical techniques is also accelerating the
growth of data intensiveness: the read-versus-write ratio is changing
significantly. Analytical models that are constantly retrained, for
example, constantly read large volumes of data. This impacts
application and database designs because we need to optimize for
data readability. It might also mean that we need to duplicate data
to relieve systems from the pressure of constantly serving it, or to
preprocess the data because of the large number of diverse use case
variations and their associated read patterns. Additionally, we might
need to provide different representations of the same data for many
different consumers. Facilitating a high variety of read patterns while
duplicating data and staying in control isn’t easy. A solution for this
problem will be provided in Chapter 4.
The Speed of Software Delivery Is Changing
In today’s world, software-based services are at the core of most
businesses, which means that new features and functionality must
be delivered quickly. In response to the demands for greater agility,
new ideologies have emerged at companies like Amazon, Netflix,
Meta, Google, and Uber. These companies have advanced their
software development practices based on two beliefs.
The first belief is that software development (Dev) and information
technology operations (Ops) must be combined to shorten the
systems development life cycle and provide continuous delivery with
high software quality. This methodology, called DevOps, requires a
new culture that embraces more autonomy, open communication,
trust, transparency, and cross-discipline teamwork.
The second belief is about the size at which applications must be
developed. Flexibility and speed of development are expected to
increase when applications are transformed into smaller decomposed
services. This development approach incorporates several
buzzwords: microservices, containers, Kubernetes, domain-driven
design, serverless computing, etc. I won’t go into detail on all of
these concepts yet, but it’s important to recognize that this evolution
in software development involves increased complexity and a greater
demand to better control data.
The transformation of monolithic applications into distributed
applications—for example, microservices—creates many difficulties
for data management. When breaking up applications into smaller
pieces, the data is spread across different smaller components.
Development teams must also transition their (single) unique data
stores, where they fully understand the data model and have all the
data objects together, to a design where data objects are spread all
over the place. This introduces several challenges, including
increased network communication, data read replicas that need to
be synchronized, difficulties when combining many datasets, data
consistency problems, referential integrity issues, and so on.
The recent shift in software development trends requires an
architecture that allows more fine-grained applications to distribute
their data. It also requires a new DataOps culture and a different
design philosophy with more emphasis on data interoperability, the
capture of immutable events, and reproducible and loose coupling.
We’ll discuss this in more detail in Chapter 2.
The Cloud’s Impact on Data Management Is
Immeasurable
Networks are becoming faster, and bandwidth increases year after
year. Large cloud vendors have proven that it’s possible to move
terabytes of data in the cloud in minutes, which allows for an
interesting approach: instead of bringing the computational power to
the data—which has been the common best practice because of
network limitations—we can turn it around and bring the data to the
computational power by distributing it. The network is no longer the
bottleneck, so we can move data quickly between environments to
allow applications to consume and use it. This model becomes
especially interesting as software as a service (SaaS) and machine
learning as a service (MLaaS) markets become more popular.
Instead of doing all the complex stuff in-house, we can use networks
to provide large quantities of data to other parties.
This distribution pattern of copying (duplicating) data and bringing it
to the computational power in a different facility, such as a cloud
data center, will fragment the data landscape even more, making a
clear data management strategy more important than ever. It
requires you to provide guidelines, because fragmentation of data
can negatively impact performance due to data access lag. It also
requires you to organize and model your data differently, because
cloud service providers architected separate compute and storage to
make them independently scalable.
Privacy and Security Concerns Are a Top
Priority
Enterprises need to rethink data security in response to the
proliferation of data sources and growing volume of data. Data is
inarguably key for organizations looking to optimize, innovate, or
differentiate themselves, but there is also a darker side with
unfriendly undertones that include data thefts, discrimination, and
political harm undermining democratic values. Let’s take a look at a
few examples to get an idea of the impact of bad data privacy and
security.
UpGuard maintains a long list of the biggest data breaches to date,
many of which have capitalized on the same mistakes. The
Cambridge Analytica breach and 500 million hacked accounts at
Marriott are impressive examples of damaging events. Governments
are increasingly getting involved in security and privacy because all
aspects of our personal and professional lives are now connected to
the internet. The COVID-19 pandemic, which forced so many of us
to work and socialize from home, accelerated this trend. Enterprises
cannot afford to ignore the threats of intellectual property
infringements and data privacy scandals.
The trends of massive data, more powerful advanced analytics, and
faster distribution of data have triggered a debate around the
dangers of data, raising ethical questions and discussions. Let me
share an example from my own country. In the Netherlands, the
Dutch Tax Administration practiced unlawful and discriminatory
activities. They tracked dual nationals in their systems and used
racial and ethnic classifications to train models for entitlements to
childcare benefits. The result: thousands of families were incorrectly
classified as criminals and had their benefits stopped. They were
ordered to repay what they already had received, sometimes
because of technical transgressions such as failing to correctly sign a
form. Some people were forced to sell their homes and possessions
after they were denied access to debt restructuring.
This is just one example of improper use of data. As organizations
inevitably make mistakes and cross ethical lines, I expect
governments to sharpen regulation by demanding more security,
control, and insight. We’ve only scratched the surface of true data
privacy and ethical problems. Regulations, such as the new
European laws on data governance and artificial intelligence will
force large companies to be transparent about what data is collected
and purchased, what data is combined, how data is used within
analytical models, and what data is distributed (sold). Big companies
need to start thinking about transparency and privacy-first
approaches and how to deal with large regulatory issues now, if they
haven’t already.
Regulation is a complex subject. Imagine a situation in which several
cloud regions and different SaaS services are used and data is
scattered. Satisfying regulations, such as the GDPR, CCPA, BCBS
239, and the new Trans-Atlantic Data Privacy Framework is difficult
because companies are required to have insight and control over all
personal data, regardless of where it is stored. Data governance and
correctly handling personal data is at the top of the agenda for many
large companies.8
These stronger regulatory requirements and data ethics concerns
will result in further restrictions, additional processes, and enhanced
control. Insights about where data originated, how models are
trained, and how data is distributed are crucial. Stronger internal
governance is required, but this trend of increased control runs
contrary to the methodologies for fast software development, which
involve less documentation and fewer internal controls. It requires a
different, more defensive viewpoint on how data management is
handled, with more integrated processes and better tools. Several of
these concerns will be addressed in Chapter 8.
Operational and Analytical Systems Need to Be
Integrated
The need to react faster to business events introduces new
challenges. Traditionally, there has been a great divide between
transactional (operational) applications and analytical applications
because transactional systems are generally not sufficient for
delivering large amounts of data or constantly pushing out data. The
accepted best practice has been to split the data strategy into two
parts: operational transactional processing and analytical data
warehousing and big data processing.
However, this divide is subject to disruption. Operational analytics,
which focuses on predicting and improving the existing operational
processes, is expected to work closely with both the transactional
and analytical systems. The analytical results need to be integrated
back into the operational system’s core so that insights become
relevant in the operational context. I could make the same argument
for real-time events: when events carry state, the same events can
be used for operational decision making and data distribution.
This trend requires a different integration architecture, one that
better manages both the operational and analytical systems at the
same time. It also requires data integration to work at different
velocities, as these tend to be different for operational systems and
analytical systems. In this book, we’ll explore the options for
preserving historical data in the original operational context while
simultaneously making it available to both operational and analytical
systems.
Organizations Operate in Collaborative
Ecosystems
Many people think that all business activities take place within the
single logical boundary in which the enterprise operates. The reality
is different, because many organizations work closely with other
organizations. Companies are increasingly integrating their core
business capabilities with third-party services. This collaboration
aspect influences the design of your architecture because you need
to be able to quickly distribute data, incorporate open data,9
make
APIs publicly available, and so on.
These changes mean that data is more often distributed between
environments, and thus is more decentralized. When data is shared
Another Random Scribd Document
with Unrelated Content
"By the way," he said, "I must guard you against saying too much
about me or your relation with me. I have a great dislike to have
myself or my affairs talked about."
"I will remember, sir."
"You need not mention that I have desired you to bear a different
name from your own."
"I will not mention it, sir, if you object."
"With me it is a matter of sentiment," said Mr. Grafton in a low voice.
"I had a dear son named Philip. He died, and left me alone in the
world. You resemble him. It is pleasant to me to call some one by
his name, yet I cannot bear to excite the curiosity of a cold,
unsympathizing world, and be forced to make to them an
explanation which will harrow up my feelings and recall to me my
bitter loss."
"I quite understand you, Mr. Grafton," said Ben, with quiet sympathy.
"Though I would prefer to be called by my own name, I am glad if I
can help make up to you for your loss."
"Enough, my boy! I felt that I had judged you aright. Now go where
you please. Only try to be back at the hotel at one o'clock."
As Ben walked away Richard Grafton said to himself, in a tone of
self-congratulation:
"I might have sought far and wide without finding a boy that would
suit my purpose as well as this one. Codicil, as shrewd as he thinks
himself, was quite taken in. I confess I looked forward to the
interview with dread. Had I allowed the boy to be closely questioned
all would have come out, and I would have lost the handsome
income which I receive as his guardian. While the real Philip Grafton
sleeps in his foreign grave, his substitute will answer my purpose,
and insure me ease and comfort. But it won't do to remain in New
York. There are too many chances of discovery. I must put the sea
between me and the lynx-eyed sharpness of old Codicil."
Mr. Grafton's urgent business engagement was at the Park Bank,
where he got his check cashed. He next proceeded to the office of
the Cunard Steamship Company, and engaged passage for the next
Saturday for Richard Grafton and Master Philip Grafton.
Data Management at Scale, Second Edition Piethein Strengholt
CHAPTER XI.
The Home of Poverty.
The time has come to introduce some new characters, who will play
a part in my story.
Five minutes' walk from Bleecker street, in a tall, shabby tenement
house, divided, as the custom is, into suites of three rooms, or
rather two, one being a common room, and the other being
subdivided into two small, narrow chambers, lived Rose and Adeline
Beaufort, respectively nineteen and seventeen years of age, and
their young brother Harry, a boy of thirteen.
It is five o'clock in the afternoon when we look in upon them.
"Rose," said her sister, "you look very tired. Can't you leave off for
an hour and rest?"
Rose was bending over a vest which she was making. Her drooping
figure and the lines on her face bespoke fatigue, yet her fingers
swiftly plied the needle, and she seemed anxiously intent upon her
task.
She shook her head in answer to her sister's words.
"No, Addie," she said; "it won't do for me to stop. You know how
little I earn at the most. I can't make more than one vest in a day,
and I get but thirty-five cents apiece."
"I know it, Rose," replied Adeline, with a sigh; "it is a great deal of
work to do for that paltry sum. If I were able to help you we might
get along better, even at such wages. I feel that I am very useless,
and a burden on you and Harry."
"You mustn't think anything of the kind, Addie," said Rose, quickly,
looking affectionately at her sister. "You know you are not strong
enough to work."
"And so you have to work the harder, Rose."
"Never mind, Addie; I am strong, and I enjoy working for you."
"But still I am so useless."
"You chase us up, and we can work all the better."
"I earn nothing. I wonder if I shall always be so weak and useless?"
"No. Don't you remember the doctor said you would in all probability
outgrow your weakness and be as strong as I am? All that is needed
is patience."
"Ah, it is not so easy to be always patient—when I think, too, of how
differently we should have been situated if grandfather had treated
us justly."
A shadow came over the face of Rose.
"Yes; I don't like to think of that. Why should he have left all his
property to our cousin Philip and none to us?"
"But if Philip should die it would all be ours, so Mr. Codicil says."
"I don't want anything to happen to the poor boy."
"Nor I, Rose. But don't you think he might do something for us?"
"So he would, very probably, if he were left to himself; but you know
he is under the guardianship of that uncle of his, Richard Grafton,
who is said to be intensely selfish and wholly unprincipled. He means
to live as handsomely as he can at Philip's expense."
"Did grandfather appoint him guardian?"
"I believe so. Richard Grafton is very artful, and he led grandfather
to believe him fitted to be an excellent guardian for the boy."
"I suppose he is in Europe?"
"No; I heard from Mr. Codicil, yesterday, that he was in New York."
"Is Philip with him?"
"Yes. He was to take the boy to Mr. Codicil's office to-day. There was
a report some time since—I did not mention it to you for fear of
exciting you—that Philip was dead. Mr. Codicil wrote to Mr. Grafton
to make inquiry. In answer, he has come to New York, bringing Philip
with him. While the boy lives, he receives an annual income of six
thousand dollars for the boy's expenses, and to compensate him for
his guardianship. You see, therefore, that Philip's death would make
a great difference to him."
"And to us," sighed Adeline.
"Addie," said Rose, gravely, "don't allow yourself to wish for the
death of our young cousin. It would be wicked."
"I know it, Rose; but when I consider how hard you work, and how
confined Harry is as a cash-boy, I am strongly tempted."
"Then put away the temptation, and trust to a good Providence to
take good care of us. God will not fail us."
"I wish I had your faith, Rose," said her younger sister.
"So you would, Addie, if you had my strength," said Rose, in an
affectionate tone. "It is harder for you to be idle than for me to
work."
"You are right there, Rose. I only wish I could work. Do you know
where Philip and his guardian are staying?"
"Yes; Mr. Codicil told me they were staying at the Metropolitan
Hotel."
"Did you ever see Philip?"
"Not since he was a little boy. I would not know him."
"Do you suppose he knows anything about us?"
"Probably Mr. Grafton never mentions us. Yet he must know that he
has cousins living, but he may not know how hard we have to
struggle for a livelihood."
"I wish we could get a chance to speak to him. He might feel
disposed to help us."
"Probably his power is not great. He is only sixteen, and I presume
has little command of money."
"How do you think it would do for Harry to carry him a letter, asking
him to call upon us?"
"His guardian would intercept it."
"It might be delivered to him privately."
"There is something in what you say," returned Rose, thoughtfully.
"He is our cousin, and we are his only living relatives. It would only
be proper for him to call upon us."
"The sooner we communicate with him the better, then," said
Adeline, whose temperament was quick and impulsive. "Suppose I
write a letter and get Harry to carry it to the hotel when he comes
home."
"As you please, Addie. I would write it, but I want to finish this vest
to-night."
"I will write it. I want to be of some little use."
She rose, and with languid step drew near the table. Procuring
writing materials, she penned a brief note, which she handed to
Rose, when completed, with the inquiry, "How will that do?"
Rose cast her eyes rapidly over the brief note, which read as follows:
"Dear Cousin Philip:—No doubt you are aware that you have
three cousins in this city—my sister Rose, my brother Harry,
who will hand you this note, and myself. We have not seen you
for many years. Will it be too much to ask you to call on us? We
are in humble quarters, but shall be glad to welcome you to our
poor home.
"Your cousin,
"Adeline Beaufort."
In a line below, the address was given.
"That will do very nicely, Addie," said Rose. "I am glad you did not
hint at our need of assistance."
"If he comes to see us, he can see that for himself. I hope
something may come of it," continued the younger sister.
"Don't count too much on it, or your disappointment will be the
more keen."
"Harry can carry it around after supper."
"Philip may be at supper."
"Then he can wait. I wish he would come home."
As if in answer to her wish the door was hastily opened, and a
bright, ruddy-faced boy entered.
"Welcome back, Harry," said Rose, with a smile. "How have you
passed the day?"
"Running round as usual, Rose. It's no joke to be a cash-boy."
"I wish I could run round, Harry," sighed Addie.
"So do I. That would be jolly. How are you feeling to-day, Addie?"
"About the same. Are you very tired?"
"Oh, no; only about the same as usual."
"Because I would like to have you do an errand for me."
"Of course I will," said Harry, cheerfully. "What is it?"
"I want you to take this note to the Metropolitan Hotel."
"Who do you know there?" asked Harry, in surprise.
An explanation was given.
"I want you to be very particular to give the note to Philip without
his guardian's knowledge. Can you manage it?"
"I'll try. I'll go the first thing after supper."
Data Management at Scale, Second Edition Piethein Strengholt
CHAPTER XII.
A Surprising Announcement.
Harry Beaufort entered the Metropolitan Hotel with the confidence of
a city boy who knew that hotels are places of general resort, and
that his entrance would not attract attention. He walked slowly
through to the rear, looking about him guardedly to see if he could
discover anybody who answered to his idea of Philip Grafton. Had he
seen Ben, he would doubtless have supposed that he was the cousin
of whom he was in search; but Ben had come in about five o'clock
and had gone out again with his friend, the reporter, who had called
for him.
Thus Harry looked in vain, and was disposed to think that he would
have to leave the hotel with his errand unaccomplished. This he
didn't like to do. He concluded, therefore, to go up to the desk and
inquire of the clerk.
"Is there a boy staying here named Philip Grafton?" asked Harry.
"Yes, my boy. Do you want to see him?" returned the clerk.
"Yes, sir, if you please."
"He went out half an hour since," said a bell-boy, who chanced to be
near.
"You can leave any message," said the clerk.
"I have a note for him," said Harry, in a doubtful tone.
"I will give it to him when he comes in."
Harry hesitated. He had been told to put the note into Philip's own
hand. But there was no knowing when Philip would come in.
"I guess it'll do to leave it," he thought. "Please give it into his own
hands," he said; and the clerk carelessly assented.
Harry left the hotel, and five minutes later Richard Grafton, or Major
Richard Grafton, as he called himself, entered and walked up to the
clerk's desk.
"Any letters or cards for me?" he asked.
"There's a note for your nephew," said the clerk, producing the one
just left.
"Ha!" said the major, pricking up his ears suspiciously. "Very well, I
will take it and give it to him."
Of course the clerk presumed that this was all right, and passed it
over.
Major Grafton took the note carelessly and sauntered into the
reading-room, where he deliberately opened it.
"I must see who is writing to Philip," he said to himself. "It may be
necessary to suppress the note."
As he read the note, the contents of which are already familiar to
the reader, his brow darkened with anger and anxiety.
"It is fortunate that this came into my hands," he reflected. "It
would have puzzled the boy, and had he gone to see these people
the murder would have been out and probably my plans would have
ended in disaster. There is something about the boy that leads me to
doubt whether he would second my plans if he suspected what they
were. I must devise some means for throwing these people off the
scent and keeping the boy in the dark. What shall I do?"
After a little reflection, Major Grafton decided to remove at once to a
different hotel. He resolved to do it that very night, lest there should
be another attempt made to communicate with his young secretary.
He must wait, however, till Ben returned.
Half an hour later Ben entered, and found the major walking
impatiently up and down the office.
"I thought you would never come back," he said, impatiently.
"I am sorry if I inconvenienced you, sir," Ben said. "I didn't know you
wished me back early."
"Come up stairs with me and pack. We are going to leave the hotel."
"Where are we going?" asked Ben in surprise.
"You will know very soon," answered the major.
Major Grafton notified the clerk that he wished a hack in fifteen
minutes, as he was about to leave the hotel.
"Very well, major. Are you going to leave the city?"
"Not at once. I may spend a few days at the house of a friend,"
answered Grafton, evasively.
"Shall we forward any letters?"
"No; I will call here for them."
In fifteen minutes a porter called at the door of Major Grafton's
room and took down the two trunks. A hack was in waiting.
"Where to, sir?" asked the driver.
"You may drive to the Windsor Hotel," was the answer.
The Windsor Hotel, on Fifth avenue, is over two miles farther up
town than the Metropolitan. Leaning back in his comfortable seat,
Ben enjoyed the ride, and was pleased with the quiet, aristocratic
appearance of the Windsor. A good suite of rooms was secured, and
he found himself even more luxuriously accommodated than at the
Metropolitan.
"I wonder why we have changed our hotel," he thought.
As if aware what was passing through his mind, Major Grafton said:
"This hotel is much more conveniently located for my business than
the other."
"It seems a very nice hotel," said Ben.
"There is none better in New York."
"I wonder what his business is," passed through Ben's mind, but he
was afraid of offending by the inquiry.
Another thing puzzled him. He was ostensibly Major Grafton's private
secretary, and as such was paid a liberal salary, but thus far he had
not been called upon to render any service. There was nothing in
this to complain of, to be sure. If Major Grafton chose to pay him for
doing nothing, that was his lookout. Meanwhile he would be able to
save up at least half of his salary, and transmit it to his mother.
When they were fairly installed in their new home Major Grafton
said:
"I have a call to make, and shall be absent till late. I suppose you
can take care of yourself?"
"Oh, yes, sir. If there is anything you wish me to do——"
"Not this evening. I have not got my affairs settled yet. That is all
the better for you, as you can spend your time as you choose."
About an hour later, as Ben was in the billiard-room, looking with
interest at a game, his cousin, Clarence Plantagenet, and Percy Van
Dyke entered.
"How are you?" said Clarence, graciously. "Percy, this is my cousin,
Ben Baker."
"Glad to see you, I'm sure," said Percy.
"Won't you join us in a little game?"
"No, thank you," answered Ben. "I don't play billiards."
"Then you ought to learn."
"I thought you said you were staying at the Metropolitan," said
Plantagenet.
"So I was, but we have moved to the Windsor."
"Have you a good room?"
"Tip-top!"
"Does that mean on the top floor?" asked Percy, laughing.
"Not exactly. We are on the third floor."
"Come, Percy, here's a table. Let us have a game."
They began to play, and Ben sat down in a comfortable arm-chair
and looked on. Though neither of the boys was an expert, they
played a fair game, and Ben was interested in watching it.
"It's wonderful how he's improved," thought Clarence. "When I saw
him in pa's office I thought he was awkward and gawky; now he
looks just like one of us. He's had great luck in falling in with this
Major Grafton. Really, I think we can afford to recognize him as a
relation."
When the boys had played a couple of games, they prepared to go.
"By the way, Ben," said Clarence, "the governor told me to invite you
to dinner on Sunday. Have you any other engagement?"
"Not that I know of. I will come if I can."
"That's right. Ta-ta, old fellow."
"He treats me a good deal better than he did when we first met,"
thought Ben. "There's a great deal of virtue in good clothes, I
expect."
Ben was asleep before Major Grafton came home.
In the morning, when he awoke, he found that the major was
already dressing.
"By the way, Philip," said his employer, quietly, "we sail for Europe
this afternoon at three."
"Sail for Europe!" ejaculated Ben, overwhelmed with surprise.
"Yes. See that your trunk is packed by eleven."
Data Management at Scale, Second Edition Piethein Strengholt
CHAPTER XIII.
A Farewell Call.
Ben was startled by Major Grafton's abrupt proposal. To go to
Europe would be delightful, he admitted to himself, but to start at a
few hours' notice was naturally exciting. What would his mother and
sister say?
"I suppose there isn't time for me to go home and see my mother
before sailing?" he ventured to say, interrogatively.
"As we are to sail at three o'clock this afternoon, you can judge for
yourself about that," said the major, coolly. "Don't you want to go?"
"Oh, yes, sir. There is nothing I should like better. I should like to
have said good-by to my mother, but——"
"Unfortunately, you can't. I am glad you take so sensible a view of
the matter. I will depend on you to be ready."
"How long shall we probably be gone?" asked Ben.
"I can tell you better some weeks hence, Philip. By the way," he
added, after a moment's thought, "if any letters should come here
addressed to you, don't open them till I come back."
Ben looked at the major in surprise. Why should he not open any
letters that came for him? He was not likely, he thought, to receive
any except from Sunderland.
"I will explain," continued the major. "There are some people in the
city that are continually writing begging letters to me. They use
every method to annoy me, and might go so far as to write to you
and ask your intercession."
"I understand," said Ben, unsuspiciously.
"I thought you would," returned the major, evidently relieved. "Of
course if you get any letter from home you will open that."
"Thank you, sir."
After breakfast Major Grafton left the hotel without saying where he
was going, and Ben addressed himself first to packing his trunk, and
then going down to the reading-room. There he sat down and wrote
a letter to his mother, which ran thus:
"Dear Mother:—I can imagine how much you will be surprised
when I tell you that when this letter reaches you I shall be on
my way to Europe. Major Grafton, my employer, only told me an
hour since, and we sail this afternoon at three. I should be glad
to come home and bid you and my little sister good-by, but
there is no time. I know you will miss me, but it is a splendid
chance for me to go, and I shall be receiving a liberal salary, out
of which I can send you money from time to time. I know I shall
enjoy myself, for I have always had a longing to go to Europe,
though I did not dream that I should have the chance so soon. I
will write to you as soon as we get on the other side.
"Your loving son, Ben.
"P. S.—We sail on the Parthia."
It may be readily understood that this letter made a great sensation
in Sunderland. Mrs. Baker hardly knew whether to be glad or sorry.
It was hard to part from Ben for an uncertain period. On the other
hand, all her friends congratulated her on Ben's great success in
securing so good a position and salary. It was certainly a remarkable
stroke of good fortune.
Ben was about to write another letter to Clarence, explaining why he
could not accept the invitation for dinner on Sunday, but a glance at
the clock showed him that he would have a chance to go to his
uncle's store, and that seemed, on the whole, more polite.
He jumped on board a Broadway car at Twenty-third street, and half
an hour later got out in front of his uncle's large business
establishment. He entered with quite a different feeling from that
attending his first visit, when, in his country attire, poor and without
prospects, he came to make an appeal to his rich uncle.
Handsome clothes are apt to secure outward respect, and one of the
salesmen came forward, obsequiously, and asked:
"What can I show you, young gentleman?"
"Nothing, thank you," answered Ben, politely. "Is my uncle in?"
"Your uncle?"
"Mr. Walton."
"Oh, yes; you will find him in his office."
"Thank you."
Nicholas Walton looked up as Ben entered his presence, and did not
immediately recognize the handsomely-dressed boy who stood
before him. He concluded that it was one of Clarence's high-toned
acquaintances.
"Did you wish to see Clarence?" he asked affably. "I am sorry to say
that he has not been in this morning."
"I should like to see him, Uncle Nicholas; but I also wished to see
you."
"Oh, it's Ben!" said Mr. Walton, in a slightly changed tone.
"Yes, uncle; I met my cousin at the Windsor last evening."
"He told me so. You are staying there, he says."
"For a very short time. My cousin was kind enough to invite me to
dinner on Sunday."
"Yes; we shall be glad to have you dine with us."
"I am sorry I cannot come. I am to sail for Europe this afternoon."
"You sail for Europe!" repeated his uncle, in amazement.
"Yes, uncle. I knew nothing of it till this morning."
"It is indeed surprising. To what part do you go?"
"I believe we sail for Liverpool in the Parthia. More than that I know
nothing."
"You are certainly strangely fortunate," said the merchant, musingly.
"Does this Major Grafton appear to be wealthy?"
"I judge that he is."
"Does he pay you well?"
"He gives me fifty dollars per month."
"And what do you do?"
"I am his private secretary, but thus far I have not been called upon
to do much. I suppose I shall have more to do when I get to
Europe."
"He seems to be eccentric as well as rich. Perhaps he will want to
adopt you. I advise you to try to please him."
"I shall certainly do that, though I don't think he will adopt me."
"Clarence will be sorry not to have seen you. He has taken a trip to
Long Branch this morning with Percy Van Dyke."
"I saw Percy last evening."
"This country nephew of mine gets into fashionable society
remarkably quick," thought the merchant. "There must be something
in the boy, or he would not make his way so readily."
"We are all going to Long Branch next week," said Mr. Walton, aloud.
"We are to stay at the West End. If you had remained here you
could have tried to persuade Major Grafton to spend part of the
season at the Branch."
"I shall be satisfied with Europe," said Ben, smiling.
"You have reason to be satisfied. Clarence will envy you when he
hears that you are going."
"It didn't look as if he were likely to envy me for anything when I
met him here the other day," thought Ben.
"Please remember me to my cousin," said Ben, and shaking his
uncle's extended hand he left the store.
He was passing through the store when he felt a touch on his
shoulder.
Turning, he recognized the tall lady he had met just after his last
visit.
"Are you not the boy who told me I had a ticket on my shawl?" she
inquired.
"Yes, madam," replied Ben, smiling.
"I recognize your face, but otherwise you look very different."
"You mean I am better dressed."
"Yes; I thought you a country boy when I met you."
"So I am, but I am trying to be mistaken for a city boy."
"I am relieved to meet you, for some one told me you had got into
some trouble with the unmannerly boys who were following me."
"I am much obliged to you for your solicitude in my behalf," said
Ben, not caring to acknowledge the fact of the arrest.
"I had hoped to be of service to you, but I see you don't appear to
need it. I am here buying a suit of clothes for a poor boy in whom I
am interested. Let me give you my card, and if you ever need a
friend, come and see me."
The card bore the name of "Jane Wilmot, 300 Madison avenue."
Ben thanked Miss Wilmot and left his uncle's store.
Data Management at Scale, Second Edition Piethein Strengholt
CHAPTER XIV.
What Ben's Friends Thought.
"Did you see Philip?" asked Adeline, eagerly, when her young
brother returned from his visit to the Metropolitan Hotel.
"No," answered Harry. "He was out."
"And you brought back the note, then?" said his sister, disappointed.
"No; the clerk said he would give it to him; so I left it with him."
Adeline looked anxious.
"I am afraid his guardian will get hold of it," she said, turning to
Rose.
"Even if he does, there is nothing in it that you need regret writing."
"It would never reach Philip."
"Probably you are right. In that case we must make another effort
when there seems a good chance."
It was decided that Harry should call the next day, at his dinner
hour, and ascertain whether the note had been delivered. He did so,
but only to learn that the note had been given to Major Grafton, and
that both he and Philip had left the hotel.
"Do you know where they went," asked Harry, eagerly.

More Related Content

PDF
Data Management at Scale Piethein Strengholt
PDF
Data Mesh in Action (MEAP V04) Jacek Majchrzak
PDF
Mighty Guides- Data Disruption
PDF
Big data's impact on online marketing
PDF
Analytics as a Service in SL
PDF
The Right Data Warehouse: Automation Now, Business Value Thereafter
PDF
Data foundation for analytics excellence
Data Management at Scale Piethein Strengholt
Data Mesh in Action (MEAP V04) Jacek Majchrzak
Mighty Guides- Data Disruption
Big data's impact on online marketing
Analytics as a Service in SL
The Right Data Warehouse: Automation Now, Business Value Thereafter
Data foundation for analytics excellence

Similar to Data Management at Scale, Second Edition Piethein Strengholt (20)

PDF
Agnostic Tool Chain Key to Fixing the Broken State of Data and Information Ma...
PDF
How 3 trends are shaping analytics and data management
PDF
Building an Event-Driven Data Mesh (Early Release) Adam Bellemare
PDF
Building an Event-Driven Data Mesh (Early Release) Adam Bellemare
PDF
The Evolving Role of the Data Engineer - Whitepaper | Qubole
DOCX
Assignment 1 Your Mobile Ordering Project team needs to provide a s
PDF
Business in the Driver’s Seat – An Improved Model for Integration
PDF
A Primer for a layman about Big Data, Business Analytics and Cloud
PDF
Rapid-fire BI
PDF
Big Data at a Glance
PPT
Semantic Code Camp Presentation
DOCX
Data governance for now
PPTX
SegmentOfOne
PDF
Encrypted Data Management With Deduplication In Cloud...
PPTX
Innovation med big data – chr. hansens erfaringer
PDF
Top 10 guidelines for deploying modern data architecture for the data driven ...
PDF
The Case for Business Modeling
PPT
Collaboration Tools, Portals And Intranets
PDF
Trust Factory Slides (2015)
Agnostic Tool Chain Key to Fixing the Broken State of Data and Information Ma...
How 3 trends are shaping analytics and data management
Building an Event-Driven Data Mesh (Early Release) Adam Bellemare
Building an Event-Driven Data Mesh (Early Release) Adam Bellemare
The Evolving Role of the Data Engineer - Whitepaper | Qubole
Assignment 1 Your Mobile Ordering Project team needs to provide a s
Business in the Driver’s Seat – An Improved Model for Integration
A Primer for a layman about Big Data, Business Analytics and Cloud
Rapid-fire BI
Big Data at a Glance
Semantic Code Camp Presentation
Data governance for now
SegmentOfOne
Encrypted Data Management With Deduplication In Cloud...
Innovation med big data – chr. hansens erfaringer
Top 10 guidelines for deploying modern data architecture for the data driven ...
The Case for Business Modeling
Collaboration Tools, Portals And Intranets
Trust Factory Slides (2015)
Ad

Recently uploaded (20)

PDF
Business Ethics Teaching Materials for college
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Cell Types and Its function , kingdom of life
PDF
Insiders guide to clinical Medicine.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
RMMM.pdf make it easy to upload and study
PDF
01-Introduction-to-Information-Management.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Basic Mud Logging Guide for educational purpose
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Business Ethics Teaching Materials for college
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Anesthesia in Laparoscopic Surgery in India
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Cell Types and Its function , kingdom of life
Insiders guide to clinical Medicine.pdf
Pre independence Education in Inndia.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
RMMM.pdf make it easy to upload and study
01-Introduction-to-Information-Management.pdf
Microbial disease of the cardiovascular and lymphatic systems
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Basic Mud Logging Guide for educational purpose
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
VCE English Exam - Section C Student Revision Booklet
Final Presentation General Medicine 03-08-2024.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPH.pptx obstetrics and gynecology in nursing
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Ad

Data Management at Scale, Second Edition Piethein Strengholt

  • 1. Read Anytime Anywhere Easy Ebook Downloads at ebookmeta.com Data Management at Scale, Second Edition Piethein Strengholt https://ebookmeta.com/product/data-management-at-scale- second-edition-piethein-strengholt/ OR CLICK HERE DOWLOAD EBOOK Visit and Get More Ebook Downloads Instantly at https://ebookmeta.com
  • 5. Data Management at Scale Modern Data Architecture with Data Mesh and Data Fabric SECOND EDITION Piethein Strengholt
  • 6. Data Management at Scale by Piethein Strengholt Copyright © 2023 Piethein Strengholt. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Michelle Smith Development Editor: Shira Evans Production Editor: Katherine Tozer Copyeditor: Rachel Head Proofreader: Piper Editorial Consulting, LLC Indexer: nSight, Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea April 2023: Second Edition
  • 7. Revision History for the Second Edition 2023-04-10: First Release See https://oreilly.com/catalog/errata.csp?isbn=9781098138868 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Data Management at Scale, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. This work is part of a collaboration between O’Reilly and Microsoft. See our statement of editorial independence. 978-1-098-15207-9 [LSI]
  • 8. Foreword Whenever we talk about software, we inevitably end up talking about data—how much there is, where it all lives, what it means, where it came from or needs to go, and what happens when it changes. These questions have stuck with us over the years, while the technology we use to manage our data has changed rapidly. Today’s databases provide instantaneous access to vast online datasets; analytics systems answer complex, probing questions; event-streaming platforms not only connect different applications but also provide storage, query processing, and built-in data management tools. As these technologies have evolved, so have the expectations of our users. A user is often connected to many different backend systems, located in different parts of a company, as they switch from mobile to desktop to call center, change location, or move from one application to another. All the while, they expect a seamless and real-time experience. I think the implications of this are far greater than many may realize. The challenge involves a large estate of software, data, and people that must appear—at least to our users— to be a single joined-up unit. Managing company-wide systems like this has always been a dark art, something I got a feeling for when I helped build the infrastructure that backs LinkedIn. All of LinkedIn’s data is generated continuously, 24 hours a day, by processes that never stop. But when I first arrived at the company, the infrastructure for harnessing that data was often limited to big, slow, batch data dumps at the end of the day and simplistic lookups, jerry-rigged together with homegrown data feeds. The concept of “end-of-the-day batch processing” seemed to me to be some legacy of a bygone era of punch cards and mainframes. Indeed, for a global business, the day doesn’t end. As LinkedIn grew, it too became a sprawling software estate, and it was clear to me that there was no off-the-shelf solution for this kind
  • 9. of problem. Furthermore, having built the NoSQL databases that powered LinkedIn’s website, I knew that there was an emerging renaissance of distributed systems techniques, which meant solutions could be built that weren’t possible before. This led to Apache Kafka, which combined scalable messaging, storage, and processing over the profile updates, page visits, payments, and other event streams that sat at the core of LinkedIn. While Kafka streamlined LinkedIn’s dataflows, it also affected the way applications were built. Like many Silicon Valley firms at the turn of the last decade, we had been experimenting with microservices, and it took several iterations to come up with something that was both functional and stable. This problem was as much about data and people as it was about software: a complex, interconnected system that had to evolve as the company grew. Handling a problem this big required a new kind of technology, but it also needed a new skill set to go with it. Of course, there was no manual for navigating this problem back then. We worked it out as we went along, but this book may well have been the missing manual we needed. In it, Piethein provides a comprehensive strategy for managing data not simply in a solitary database or application but across the many databases, applications, microservices, storage layers, and all other types of software that make up today’s technology landscapes. He also takes an opinionated view, with an architecture to match, grounded in a well-thought-out set of principles. These help to bound the decision space with logical guardrails, inside of which a host of practical solutions should fit. I think this approach will be very valuable to architects and engineers as they map their own problem domain to the trade-offs described in this book. Indeed, Piethein takes you on a journey that goes beyond data and applications into the rich fabric of interactions that bind entire companies together.
  • 10. Jay Kreps Cofounder and CEO at Confluent
  • 11. Preface Data management is an emerging and disruptive subject. Datafication is everywhere. This transformation is happening all around us: in smartphones, TV devices, ereaders, industrial machines, self-driving cars, robots, and so on. It’s changing our lives at an accelerating speed. As the amount of data generated skyrockets, so does its complexity. Disruptive trends like cloudification, API and ecosystem connectivity, microservices, open data, software as a service (SaaS), and new software delivery models have a tremendous effect on data management. In parallel, we see an enormous number of new applications transforming our businesses. All these trends are fragmenting the data landscape. As a result, we are seeing more point-to-point interfaces, endless discussions about data quality and ownership, and plenty of ethical and legal dilemmas regarding privacy, safety, and security. Agility, long-term stability, and clear data governance compete with the need to develop new business cases swiftly. We sorely need a clear vision for the future of data management. This book’s perspective on data management is informed by my personal experience driving the data architecture agenda for a large enterprise as chief data architect. Executing that role showed me clearly the impact a good data strategy can have on a large organization. After leaving that company, I started working as the chief data officer for Microsoft Netherlands. In this exciting new position, I’ve worked with over 50 large customers discussing and attempting to come up with a perfect data solution. Here are some of the common threads I’ve identified across all enterprises:
  • 12. An overarching data strategy is often missing or not connected to the business objectives. Discussions about data management mostly pivot to technology trends and engineering discussions. What is needed is business engagement: a good strategy and well-thought-out data management and analysis plan that includes tangible value in the form of business use cases. To make my point: the focus must be put on usage and turning data into business value. Enterprises have difficulties in interpreting new concepts like the data mesh and data fabric, because pragmatic guidance and experiences from the field are missing. In addition to that, the data mesh fully embraces a decentralized approach, which is a transformational change not only for the data architecture and technology, but even more so for organization and processes. This means the transformation cannot only be led by IT; it’s a business transformation as well. Enterprises find it difficult to comprehend the latest technology trends. They’re unable to interpret nuances or make pragmatic choices. Enterprises struggle to get started: large ambitions often end with limited action; the execution plan and architecture remain too high-level, too conceptual; top-down commitment from leadership is missing. These experiences and my observations across a range of enterprises inspired me to write this second edition of Data Management at Scale. You may wonder why this book is worth reading, over the first edition—let’s take a closer look.
  • 13. Why I Wrote This Book and Why Now The first edition was founded on the experience I gained while working at ABN AMRO as chief data architect.1 In that role, my team and I practiced the approach of federation: shifting activities and responsibilities in response to the need for a faster pace of change. We used governance for balancing the imperatives of centralization and decentralization. This shift was supported by a central data team that started to develop platforms for empowering business units to meet their goals. With platforms, we introduced self-service and aligned analysts to domains, supporting them in implementing their use cases. We experimented with domain-driven design and eventually switched to business architecture for managing the architectural landscape as a whole. I used all these experiences as input for writing the first edition. The term data mesh as a description of a sociotechnical approach to using data at large was coined at around the time the manuscript for the first edition was being finalized. When Zhamak Dehghani’s article describing the concept appeared on Martin Fowler’s website, it revealed concrete names for concepts we’d already been using at ABN AMRO for many years. These names became industry terms, and the concept quickly began to resonate with large organizations as a solution to the friction enterprises encounter when scaling up. So, why write a second edition? To start with, it was the data mesh concept. I love the ideas of bringing data management and software architecture closer together and businesses taking ownership of their data, but I firmly believe that, with all the fuss, a more nuanced view is needed. In my previous role as an enterprise architect, we had hundreds of application teams, thousands of services, and many large legacy applications to manage. In such situations, you approach complexity differently. With the data mesh architecture, artist, song, and playlist
  • 14. are often used as data domain examples. This approach of decomposing data into fine-grained domains might work well when designing microservices, but it isn’t well suited to (re)structuring large data landscapes. A different viewpoint is needed for scale. Next, a more nuanced and pragmatic view of data products is needed. There are good reasons why data must be managed holistically and end-to-end. Enterprises have reusability and consistency concerns. They’re forced by regulation to conform to the same dimensions for group reporting, accounting, financial reporting, and auditing and risk management. I know this might sound controversial, but a data product cannot be advocated to be managed as a container: something that packages data, metadata, code, and infrastructure all together in an architecture as tiny as a microservice. This doesn’t reflect how today’s big data platforms work. Finally, the data mesh story isn’t complete: it focuses only on data that is used for analytical purposes, not operational purposes; it omits master data management;2 the consumer side must be complemented with an intelligent data fabric; and it doesn’t provide much data modeling guidance for building data products. Another incentive for publishing a second edition was concerns about the book’s practicality. The first version was perceived by various readers as too abstract. Some critical reviewers even left comments questioning my hands-on experience. In this second edition I’ve worked hard to address these concerns, providing many real-world examples and concrete solution diagrams. From time to time, I also refer to blog posts that I’ve written about how to implement designs. One final note on this: there are a large number of very complex topics to cover, which are also highly context- sensitive. It would be impossible to provide examples of everything in a single volume, so I’ve had to use some discretion. I’m excited to share my thoughts on best practices and observations from the field, and I hope this book inspires you. Reflecting on my time working at ABN AMRO, there are lots of good lessons to be
  • 15. taken from other enterprises. I’ve seen a lot of good approaches. There’s no right or wrong when building good data architecture; it’s all about making the right trade-offs and discovering what works best for your situation. If you’ve already read the first edition, you should find this one significantly different and much improved. Structurally it’s more or less the same, but every chapter has been revised and enhanced. All the diagrams have also been revised, new content has been added, and it’s much more practical. Within each chapter you’ll find many tips, starting points, and references to helpful articles.
  • 16. Who Is This Book For? This book is intended for large enterprises, though smaller organizations may find much of value in it. It’s geared toward: Executives and architects Chief data officers, chief technology officers, chief architects, enterprise architects, and lead data architects Analytics teams Data scientists, data engineers, data analysts, and heads of analytics Development teams Data engineers, data scientists, business intelligence engineers, data modelers and designers, and other data professionals Compliance and governance teams Chief information security officers, data protection officers, information security analysts, regulatory compliance heads, data stewards, and business analysts How to Read or Use This Book It’s important to say up front that this book touches upon a lot of complex topics that are often interrelated or intertwined with other subjects. So we’ll be hopping between different technologies, business methods, frameworks, and architecture patterns. From time to time I bring in my own operational experience when implementing different architectures, so we’ll be working at different levels of abstraction. To describe the journey through the book, I’ll use the analogy of a helicopter ride.
  • 17. We’ll start with a zoomed-out view, looking at data management, data strategy, and data architecture at an abstract and higher level. From this helicopter view, we’ll start to zoom in and first explore what data domains and landing zones are. We’ll then fly to the source system side of our landscape, in which applications are managed and data is created, and circle until we have covered most of the areas of data management. Then we’ll fly over to the consumer side of the landscape and start learning about the dynamics there. After that, we’ll bring everything we’ve covered together by putting things into practice. To help you navigate through the book, the following table gives a high-level overview of which subjects will be intensively discussed in each chapter.
  • 18. Table P-1. Key topics in each chapter Ch. 1 Ch. 2 Ch. 3 Ch. 4 Data management x Data strategy x x x Data architecture x x Data integration x Data modeling x Data governance Data security Data quality x Metadata management MDM Business intelligence
  • 19. Ch. 1 Ch. 2 Ch. 3 Ch. 4 Advanced analytics Enterprise architecture Chapter 1 introduces the topic of data management. It gives a contextual view of what data management is, how it’s changing, and how it affects our digital transformation. It provides an assessment of the state of the field in recent years and guidance for working out a data strategy. In Chapter 2, we’ll jump into the details of managing data at large, exploring domain-driven design and business architecture as methodologies for managing a large data landscape using data domains. Next, Chapter 3 focuses on topologies and data landing zones as a way of structuring your data architecture and aligning with your data domains. The following chapters discuss the specifics of distributing data. Chapter 4 focuses on data products, Command Query Responsibility Segregation (CQRS), and guiding principles, and presents an example solution design. Chapter 5 discusses API management, and Chapter 6 covers event and notification management. Chapter 7 brings it all together for a comprehensive overview, complemented with architecture guidance and experience. Next, we delve deeper into more advanced aspects of data management. Chapter 8 examines how to approach data governance and security in ways that are practical and sustainable for the long term, even in rapidly changing times. Chapter 9 is a deep dive into the use, significance, and democratizing potential of metadata. Chapter 10 offers guidance on using master data
  • 20. management (MDM) to keep data consistent over distributed, wide- ranging assets, while Chapter 11 addresses turning data into value. Chapter 12 concludes the book with an example of making it real and a vision for the future of data management and enterprise architecture. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. TIP This element signifies a tip or suggestion. NOTE This element signifies a general note. WARNING This element indicates a warning or caution.
  • 21. O’Reilly Online Learning NOTE For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at
  • 22. https://oreil.ly/data-mgmt-at-scale-2e. Email bookquestions@oreilly.com to comment or ask technical questions about this book. For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com. Find us on Facebook: http://facebook.com/oreilly. Follow us on Twitter: http://twitter.com/oreillymedia. Watch us on YouTube: http://youtube.com/oreillymedia. Acknowledgments I would like to acknowledge Jessica Strengholt-Geitenbeek for allowing me to write this book. She has supported me throughout this journey, taking care of the kids and creating room to allow me to work on this, and she’s the love of my life. I also would like to thank ABN AMRO, and especially Santhosh Pillai for his trust and for guiding me throughout my career at the company. Many of the initial ideas for this project originated in his mind. Without the countless discussions he and I had, this book wouldn’t exist. Next, I would like to thank Microsoft for providing the support I needed to write this second edition. In addition, many others provided support and feedback on the book: thanks to Tim Ward (CEO at CluedIn), Batuhan Tuter, Nasim Merhshid, Rob Worrall, Frank Leisten, and all the others who contributed in various ways. Thanks also to the book’s technical reviewers, John Mallinder and Ole Olesen-Bagneux. Your valuable insights and feedback helped validate the technical content and make this a better book. Finally, I would like to thank all the fantastic crew members from O’Reilly for their support and trust. Shira, thank you for taking care of me. I enjoyed our conversations, and I’m grateful for your constructive feedback. Katie, thank you for your continuous support
  • 23. and transparency. To my fantastic copyeditor Rachel Head, thank you for your hard work to review and edit all content. You really have done an outstanding job by debugging the content and connecting my sentences. 1 The statements and opinions expressed in this book don’t necessarily reflect the positions of ABN AMRO or Microsoft. 2 The terminology “master/slave” is clearly offensive, and many organizations have switched to alternatives like “source/replica” or “primary/subordinate.” We strive to be as inclusive as possible, but will use “master data management” in this book because the industry hasn’t yet adopted an alternative.
  • 24. Chapter 1. The Journey to Becoming Data-Driven The pre-COVID-19 world was already fast and highly data-driven, but the pace of change has accelerated rapidly. Fierce competition, a digital-first era, ever-increasing customer expectations, and rising regulatory scrutiny require organizations to transform themselves into modern data-driven enterprises. This transformation will inevitably result in future organizations being more digital than those of today, and having a different view of data. Tomorrow’s organizations will breathe data and embrace a philosophy that places it at the heart of their business. They will manage data as a product, make strategic decisions based on data analysis, and have a culture that acts on data. Data-driven isn’t just a buzzword.1 Being data-driven provides an organization with a significant competitive advantage over other organizations. It can be proactive, and it can predict what will happen before it does. By using data correctly, organizations can quickly react to changes. Using data leads to greater confidence because decisions are based on facts, not intuition. With data, new industry trends and business opportunities can be spotted sooner. Customer retention and satisfaction are improved as well, because data tells organizations what customers think, how they behave, and what they want. With data, organizations can be more flexible, agile, and cost effective, because data provides insights into measured results, employee loyalty, dependencies, applications, and processes. So, the imperative for organizations to transform themselves into data-driven enterprises is definitively there. Before we jump into the transformation itself, we’ll explore the present-day challenges that require us to reevaluate how data must
  • 25. be managed. We’ll establish a common definition of data management, encompassing all the disciplines and activities related to managing data as an asset. After that, we’ll zoom in on several key technology developments and industry trends, and consider their impact on data management. We’ll look at some best practices from the last decade of data management, providing insights into why previous-generation architectures are hard to scale. Finally, we’ll consider what a next-generation data architecture might look like, and I’ll present a set of action points that you will need to address while developing your data strategy. Recent Technology Developments and Industry Trends Transforming an organization to become data-driven isn’t easy. It’s a long-term process that requires patience and fortitude. With more data available, traditional architectures can no longer be scaled up because of their size, complexity, monolithic designs, and centralistic operating models. Enterprises need a new data strategy and cloud- based architecture. A paradigm shift and change of culture are needed, too, because the centralized data and IT operating models that work today will no longer work when applying federated data ownership and self-serve consumption models. This requires organizations to redefine how people, processes, and technology are aligned with data. Recent technology developments and industry trends force us to reevaluate how data must be managed. We need to shift away from funneling all data into a single silo toward an approach that enables domains, teams, and users to distribute, consume, and use data themselves easily and securely. Platforms, processes, and patterns should simplify the work for others. We need interfaces that are simple, well documented, fast, and easy to consume. We need an architecture that works at scale.
  • 26. Although there are many positives about evolving into a truly data- driven organization, it’s important to be aware of several technology developments and industry trends that are impacting data landscapes. In this chapter, I’ll discuss each of these and its influence on data management. Firstly, analytics is fragmenting the data landscape because of use case diversity. Secondly, new software development methodologies are making data harder to manage. Thirdly, cloud computing and faster networks are fragmenting data landscapes. In addition, there are privacy, security, and regulatory concerns to be aware of, and the rapid growth of data and intensive data consumption are making operational systems suffer. Lastly, data monetization requires an ecosystem-to- ecosystem architecture. The impact these trends have on data management is tremendous, and they are forcing the whole industry to rethink how data management must be conducted in the future. Fortunately, new approaches to data management have emerged over the past few years, including the ideas of a data mesh and data fabric: The data mesh is an exciting new methodology for managing data at large. The concept foresees an architecture in which data is highly distributed and a future in which scalability is achieved by federating responsibilities. It puts an emphasis on the human factor and addressing the challenges of managing the increasing complexity of data architectures. The data fabric is an approach that addresses today’s data management and scalability challenges by adding intelligence and simplifying data access using self-service. In contrast to the data mesh, it focuses more on the technology layer. It’s an architectural vision using unified metadata with an end-to-end integrated layer (fabric) for easily accessing, integrating, provisioning, and using data.
  • 27. These emerging approaches to data management are complementary and often overlap. Despite popular belief among data practitioners and what commercial vendors say, they shouldn’t be seen as rigid or standalone techniques. In fact, I expect these approaches to coexist and complement one another and any existing investments in operational data stores, data warehouses, and data lakes. In the transition to becoming data-driven, organizations need to make trade-offs to balance the imperatives of centralization and decentralization. Some prefer a high degree of autonomy for their business teams, while others prioritize quality and control. Some organizations have a relatively simple structure, while others are brutally large and complex. Creating the perfect governance structure and data architecture isn’t easy, so while developing your strategy, I encourage you to view these approaches to data management as frameworks. There’s no right or wrong. With the data mesh approach, for example, you might like some of the best practices and principles, but not others; you don’t necessarily have to apply all of them. In this book, I’ll share my view on data management—one that is based on observations from the field while working closely with many large enterprises, and that helps you to make the right decisions by learning from others. We’ll go beyond the concepts of data mesh and data fabric, because I strongly believe that a data strategy should be inclusive of both the operational and analytical planes, and that the decomposition method for both data domains and data products must be altered to fit the scale of large enterprises. To help you in your journey, I’ll share my observations on the strategies and architectures different enterprises have designed, why, and what trade-offs they have made. Before we jump into details, we need to agree on what data management is, and why it’s important. Next, we need to determine how to define boundaries and shape our landscape based on various
  • 28. trade-offs. Finally, we’ll examine how current enterprise data architectures can be designed and organized for today and tomorrow. Let me lay my cards out on the table: decentralization is not a desired state, but the inevitable future of data. I therefore have a strong belief that scalability is forcing data management to become more decentrally organized. Managing data at scale requires you to federate key responsibilities, set strong standards, and properly align central and local resources and activities. This change affects multiple areas: people, processes, and technology. It forces you to decompose your architecture, dividing and grouping responsibilities. The shift from centralization to decentralization also contradicts the established best practice from the past decade: building large data silos in which all data is collected and integrated before being served. Although data warehouse and lake architectures are excellent approaches for utilizing data, these centralized models are not suited to a decentralized distributed data architecture. Now that we’ve set the scene, I ask that you take a deep breath and put your biases aside. Many of us might be deeply invested in centralized data architectures; this has been a best practice for many years. I acknowledge that the need for data harmonization and for bringing large amounts of data into a particular context remains, and that doing so brings value to organizations, but something we must consider is the scale at which we want to apply this discipline. In a highly distributed ecosystem with hundreds or even thousands of applications, is the best way of managing data to apply centralization on all dimensions? Is it best to integrate and harmonize all data? Data Management The term data management refers to the set of processes and procedures used to manage data. The Data Management Association’s Data Management Body of Knowledge (DAMA-DMBOK)
  • 29. has a more extensive explanation of data management, which it defines as “the development, execution, and supervision of plans, policies, programs, and practices that deliver, control, protect, and enhance the value of data and information assets throughout their life cycles.”2 The DAMA-DMBOK identifies 11 functional areas of data management, with data governance at the heart, as shown in Figure 1-1. It’s crucial to embed all of these deeply into your organization. Otherwise, you’ll lack insight and become ineffective, and your data will get out of control. Becoming data-driven—getting as much value as possible out of your data—will become a challenge. Analytics, for example, is worth nothing if you have low- quality data.
  • 30. Figure 1-1. The 11 functional areas of data management The activities and disciplines of data management are wide ranging and cover multiple areas, some closely related to software architecture.3 In this book, I’ll focus on the aspects of data management that are most relevant for managing a modern data architecture at scale. Let’s take a closer look at the 11 areas identified in this figure and where they’re covered in the book:
  • 31. Data governance, shown at the heart of Figure 1-1, involves all activities around implementing and enforcing authority and control over the management of data, including all corresponding assets. This area is described in detail in Chapter 8. Data architecture involves the definition of the master plan for your data, including the blueprints,4 reference architectures, future state vision, and dependencies. Managing these helps organizations make decisions. The entire book revolves around data architecture generally, but the discipline and its activities will be covered fully in Chapters 2 and 3. Data modeling and design is about structuring and representing data within a specific context and specific systems. Discovering, designing, and analyzing data requirements are all part of this discipline. We’ll discuss these topics in Chapters 4, 7, and 11. Data storage and operations refers to the management of the database design, correct implementation, and support in order to maximize the value of the data. Database management also includes database operations management. We’ll address this in Chapter 11. Data security includes all disciplines and activities that provide secure authentication, authorization, and access to the data. These activities include prevention, auditing, and escalation- mitigating actions. This area is described in more detail in Chapter 8. Data integration and interoperability includes all the disciplines and activities related to moving, collecting, consolidating, combining, and transforming data in order to move it efficiently from one context into another. Data interoperability refers to the capability to communicate, invoke functions, or transfer data among various applications in a way that requires little or no
  • 32. knowledge of the application characteristics. Data integration, on the other hand, is about consolidating data from different (multiple) sources into a unified view. This process, which I consider most important, is often supported by extra tools, such as replication and ETL (extract, transform, and load) tools. It’s described extensively in Chapters 4, 5, and 6. Document and content management is the process of managing data stored in unstructured (media) and data formats. Some aspects of this will be discussed in Chapters 5 and 6. Reference and master data management is about managing critical data to make sure the data is accessible, accurate, secure, transparent, and trustworthy. This area is described in more detail in Chapter 10.5 Data warehousing and business intelligence management includes all the activities that provide business insights and support decision making. This area, including advanced analytics, is described in more depth in Chapter 11. Metadata management involves managing all data that classifies and describes the data. Metadata can be used to make the data understandable, ready for integration, and secure. It can also be used to ensure the quality of data. This area is described in more detail in Chapter 9. Data quality management includes all activities related to managing the quality of data to ensure the data can be used. Some aspects of this area are described in Chapters 2 and 3. The part of the DAMA-DMBOK that needs more work, which inspired me to write the first edition of this book, is the section on data integration and interoperability. I believe this section is lacking depth: the relationship to application integration and software architecture is not clear. It doesn’t discuss decentralized architectures, and it lacks modern guidance on the interoperability of
  • 33. data, such as observability best practices and modern data pipeline management. In addition, the link to metadata management is weak. Metadata needs integration and interoperability, too, because it is scattered across many tools, applications, platforms, and environments in diverse shapes and forms. The interoperability of metadata—the ability of two or more systems or components to exchange descriptive data about data—gets insufficient treatment: building and managing a large-scale architecture is very much about metadata integration. Interoperability and metadata also aren’t well connected to the area of data architecture. If metadata is utilized in the right way, you can see what data passes by, how it can be integrated, distributed, and secured, and how it connects to applications, business capabilities, and so on. There’s limited guidance in the DAMA-DMBOK about managing your data as a whole by utilizing and connecting metadata. Another concern I have is the view DAMA and many organizations have on achieving end-to-end semantic consistency. As of today, attempts to unify semantics to provide enterprise-wide consistency are still taking place. This is called a single version of the truth. However, applications are always unique, and so is data. Designing applications involves a lot of implicit thinking. The (domain) context of the business problem influences the design of the application and finds its way into the data. We pass through this context when we move from conceptual design into logical application design and physical application design.6 It’s essential to understand this because it frames any future architecture. When data is moved across applications, a data transformation step is always necessary. There’s no escape from this data transformation dilemma! In the following chapters, I’ll return to this idea. Another view I see in many organizations is that data management should be central and must be connected to the strategic goals of the enterprise. Some organizations still believe that operational costs can be reduced by centralizing all data and management activities.
  • 34. There’s also a deep assumption that a centralized platform can take away the pain of data integration for its users and consumers. Companies have invested heavily in their enterprise data platforms, which include data warehouses, data lakes, and service buses. The activities of master data management are strongly connected to these platforms because consolidating allows us to simultaneously improve the accuracy of our most critical data. A centralized platform—and the centralized model that comes with it —will be subject to failure because it won’t be able to keep up with the developments and trends that underpin decentralization, such as analytics, cloud computing, new software development methodologies, real-time decision making, and data monetization. While they may be aware of these trends, many companies fail to comprehend the impact they have on data management. Let’s examine the most important trends and determine the magnitude of that impact. Analytics Is Fragmenting the Data Landscape The most trailblazing trend is advanced analytics, which exploits data to make companies more responsive, competitive, and innovative. Why does advanced analytics disrupt the existing data landscape? With more data available, the number of options and opportunities skyrockets. Advanced analytics is about making what-if analyses, projecting future trends and outcomes or events, detecting hidden relations and behaviors, and automating decision making. Because of the recognized value and strategic benefits of advanced analytics, many methodologies, frameworks, and tools have been developed to use it in divergent ways. We’ve only scratched the surface of what artificial intelligence (AI), machine learning (ML), and natural language processing (NLP) will be capable of in the future.
  • 35. NOTE OpenAI’s new ChatGPT is a mind-blowing example of what AI is capable of. The models behind OpenAI, which include the Generative Pre-trained Transformer (GPT) series, can work on complex tasks such as analyzing math problems, writing code snippets, producing book essays, creating recipes using a list of ingredients, and much more. These analytical trends force data to be distributed across many analytical applications because every individual use case requires different data. Unique business problems require unique thinking, unique data, and optimized technology to provide the best solution. Take, for example, a marketing business unit whose goal is to identify new sales opportunities for older and younger customers. Targeting these two audiences requires different features— measurable properties for analyzing—in datasets. For example, prospects that are younger will be segmented and clustered differently than prospects that are older. Asking the marketing department to use a single dataset for both target audiences requires many compromises, and you’ll probably end up with a feature store that doesn’t add any value to each use case.7 The optimal solution for generating the most value is to give either use case its own unique set of features optimized for each learning algorithm. The increasing popularity of advanced analytics and resulting use case diversity issues lead to two problems: data proliferation and data intensiveness. With data proliferation, data is distributed and scattered across a myriad of locations, applications, and databases. This is because consuming domains need to process the data to fit it into their unique solutions. This data distribution introduces other problems. For one, when data is repeatedly copied and scattered throughout the organization, it becomes more difficult to find its origin and judge its quality. This requires you to develop a single logical view of
  • 36. the same data that is managed in different locations. Additionally, extensive data distribution makes controlling the data much more difficult because data can be spread even further as soon as it leaves any given application. This requires you to develop a framework for efficiently reusing data, while applying governance and staying compliant with external regulations. The proliferation of analytical techniques is also accelerating the growth of data intensiveness: the read-versus-write ratio is changing significantly. Analytical models that are constantly retrained, for example, constantly read large volumes of data. This impacts application and database designs because we need to optimize for data readability. It might also mean that we need to duplicate data to relieve systems from the pressure of constantly serving it, or to preprocess the data because of the large number of diverse use case variations and their associated read patterns. Additionally, we might need to provide different representations of the same data for many different consumers. Facilitating a high variety of read patterns while duplicating data and staying in control isn’t easy. A solution for this problem will be provided in Chapter 4. The Speed of Software Delivery Is Changing In today’s world, software-based services are at the core of most businesses, which means that new features and functionality must be delivered quickly. In response to the demands for greater agility, new ideologies have emerged at companies like Amazon, Netflix, Meta, Google, and Uber. These companies have advanced their software development practices based on two beliefs. The first belief is that software development (Dev) and information technology operations (Ops) must be combined to shorten the systems development life cycle and provide continuous delivery with high software quality. This methodology, called DevOps, requires a
  • 37. new culture that embraces more autonomy, open communication, trust, transparency, and cross-discipline teamwork. The second belief is about the size at which applications must be developed. Flexibility and speed of development are expected to increase when applications are transformed into smaller decomposed services. This development approach incorporates several buzzwords: microservices, containers, Kubernetes, domain-driven design, serverless computing, etc. I won’t go into detail on all of these concepts yet, but it’s important to recognize that this evolution in software development involves increased complexity and a greater demand to better control data. The transformation of monolithic applications into distributed applications—for example, microservices—creates many difficulties for data management. When breaking up applications into smaller pieces, the data is spread across different smaller components. Development teams must also transition their (single) unique data stores, where they fully understand the data model and have all the data objects together, to a design where data objects are spread all over the place. This introduces several challenges, including increased network communication, data read replicas that need to be synchronized, difficulties when combining many datasets, data consistency problems, referential integrity issues, and so on. The recent shift in software development trends requires an architecture that allows more fine-grained applications to distribute their data. It also requires a new DataOps culture and a different design philosophy with more emphasis on data interoperability, the capture of immutable events, and reproducible and loose coupling. We’ll discuss this in more detail in Chapter 2. The Cloud’s Impact on Data Management Is
  • 38. Immeasurable Networks are becoming faster, and bandwidth increases year after year. Large cloud vendors have proven that it’s possible to move terabytes of data in the cloud in minutes, which allows for an interesting approach: instead of bringing the computational power to the data—which has been the common best practice because of network limitations—we can turn it around and bring the data to the computational power by distributing it. The network is no longer the bottleneck, so we can move data quickly between environments to allow applications to consume and use it. This model becomes especially interesting as software as a service (SaaS) and machine learning as a service (MLaaS) markets become more popular. Instead of doing all the complex stuff in-house, we can use networks to provide large quantities of data to other parties. This distribution pattern of copying (duplicating) data and bringing it to the computational power in a different facility, such as a cloud data center, will fragment the data landscape even more, making a clear data management strategy more important than ever. It requires you to provide guidelines, because fragmentation of data can negatively impact performance due to data access lag. It also requires you to organize and model your data differently, because cloud service providers architected separate compute and storage to make them independently scalable. Privacy and Security Concerns Are a Top Priority Enterprises need to rethink data security in response to the proliferation of data sources and growing volume of data. Data is inarguably key for organizations looking to optimize, innovate, or differentiate themselves, but there is also a darker side with unfriendly undertones that include data thefts, discrimination, and political harm undermining democratic values. Let’s take a look at a
  • 39. few examples to get an idea of the impact of bad data privacy and security. UpGuard maintains a long list of the biggest data breaches to date, many of which have capitalized on the same mistakes. The Cambridge Analytica breach and 500 million hacked accounts at Marriott are impressive examples of damaging events. Governments are increasingly getting involved in security and privacy because all aspects of our personal and professional lives are now connected to the internet. The COVID-19 pandemic, which forced so many of us to work and socialize from home, accelerated this trend. Enterprises cannot afford to ignore the threats of intellectual property infringements and data privacy scandals. The trends of massive data, more powerful advanced analytics, and faster distribution of data have triggered a debate around the dangers of data, raising ethical questions and discussions. Let me share an example from my own country. In the Netherlands, the Dutch Tax Administration practiced unlawful and discriminatory activities. They tracked dual nationals in their systems and used racial and ethnic classifications to train models for entitlements to childcare benefits. The result: thousands of families were incorrectly classified as criminals and had their benefits stopped. They were ordered to repay what they already had received, sometimes because of technical transgressions such as failing to correctly sign a form. Some people were forced to sell their homes and possessions after they were denied access to debt restructuring. This is just one example of improper use of data. As organizations inevitably make mistakes and cross ethical lines, I expect governments to sharpen regulation by demanding more security, control, and insight. We’ve only scratched the surface of true data privacy and ethical problems. Regulations, such as the new European laws on data governance and artificial intelligence will force large companies to be transparent about what data is collected and purchased, what data is combined, how data is used within
  • 40. analytical models, and what data is distributed (sold). Big companies need to start thinking about transparency and privacy-first approaches and how to deal with large regulatory issues now, if they haven’t already. Regulation is a complex subject. Imagine a situation in which several cloud regions and different SaaS services are used and data is scattered. Satisfying regulations, such as the GDPR, CCPA, BCBS 239, and the new Trans-Atlantic Data Privacy Framework is difficult because companies are required to have insight and control over all personal data, regardless of where it is stored. Data governance and correctly handling personal data is at the top of the agenda for many large companies.8 These stronger regulatory requirements and data ethics concerns will result in further restrictions, additional processes, and enhanced control. Insights about where data originated, how models are trained, and how data is distributed are crucial. Stronger internal governance is required, but this trend of increased control runs contrary to the methodologies for fast software development, which involve less documentation and fewer internal controls. It requires a different, more defensive viewpoint on how data management is handled, with more integrated processes and better tools. Several of these concerns will be addressed in Chapter 8. Operational and Analytical Systems Need to Be Integrated The need to react faster to business events introduces new challenges. Traditionally, there has been a great divide between transactional (operational) applications and analytical applications because transactional systems are generally not sufficient for delivering large amounts of data or constantly pushing out data. The accepted best practice has been to split the data strategy into two
  • 41. parts: operational transactional processing and analytical data warehousing and big data processing. However, this divide is subject to disruption. Operational analytics, which focuses on predicting and improving the existing operational processes, is expected to work closely with both the transactional and analytical systems. The analytical results need to be integrated back into the operational system’s core so that insights become relevant in the operational context. I could make the same argument for real-time events: when events carry state, the same events can be used for operational decision making and data distribution. This trend requires a different integration architecture, one that better manages both the operational and analytical systems at the same time. It also requires data integration to work at different velocities, as these tend to be different for operational systems and analytical systems. In this book, we’ll explore the options for preserving historical data in the original operational context while simultaneously making it available to both operational and analytical systems. Organizations Operate in Collaborative Ecosystems Many people think that all business activities take place within the single logical boundary in which the enterprise operates. The reality is different, because many organizations work closely with other organizations. Companies are increasingly integrating their core business capabilities with third-party services. This collaboration aspect influences the design of your architecture because you need to be able to quickly distribute data, incorporate open data,9 make APIs publicly available, and so on. These changes mean that data is more often distributed between environments, and thus is more decentralized. When data is shared
  • 42. Another Random Scribd Document with Unrelated Content
  • 43. "By the way," he said, "I must guard you against saying too much about me or your relation with me. I have a great dislike to have myself or my affairs talked about." "I will remember, sir." "You need not mention that I have desired you to bear a different name from your own." "I will not mention it, sir, if you object." "With me it is a matter of sentiment," said Mr. Grafton in a low voice. "I had a dear son named Philip. He died, and left me alone in the world. You resemble him. It is pleasant to me to call some one by his name, yet I cannot bear to excite the curiosity of a cold, unsympathizing world, and be forced to make to them an explanation which will harrow up my feelings and recall to me my bitter loss." "I quite understand you, Mr. Grafton," said Ben, with quiet sympathy. "Though I would prefer to be called by my own name, I am glad if I can help make up to you for your loss." "Enough, my boy! I felt that I had judged you aright. Now go where you please. Only try to be back at the hotel at one o'clock." As Ben walked away Richard Grafton said to himself, in a tone of self-congratulation: "I might have sought far and wide without finding a boy that would suit my purpose as well as this one. Codicil, as shrewd as he thinks himself, was quite taken in. I confess I looked forward to the interview with dread. Had I allowed the boy to be closely questioned all would have come out, and I would have lost the handsome income which I receive as his guardian. While the real Philip Grafton sleeps in his foreign grave, his substitute will answer my purpose, and insure me ease and comfort. But it won't do to remain in New York. There are too many chances of discovery. I must put the sea between me and the lynx-eyed sharpness of old Codicil."
  • 44. Mr. Grafton's urgent business engagement was at the Park Bank, where he got his check cashed. He next proceeded to the office of the Cunard Steamship Company, and engaged passage for the next Saturday for Richard Grafton and Master Philip Grafton.
  • 46. CHAPTER XI. The Home of Poverty. The time has come to introduce some new characters, who will play a part in my story. Five minutes' walk from Bleecker street, in a tall, shabby tenement house, divided, as the custom is, into suites of three rooms, or rather two, one being a common room, and the other being subdivided into two small, narrow chambers, lived Rose and Adeline Beaufort, respectively nineteen and seventeen years of age, and their young brother Harry, a boy of thirteen. It is five o'clock in the afternoon when we look in upon them. "Rose," said her sister, "you look very tired. Can't you leave off for an hour and rest?" Rose was bending over a vest which she was making. Her drooping figure and the lines on her face bespoke fatigue, yet her fingers swiftly plied the needle, and she seemed anxiously intent upon her task. She shook her head in answer to her sister's words. "No, Addie," she said; "it won't do for me to stop. You know how little I earn at the most. I can't make more than one vest in a day, and I get but thirty-five cents apiece."
  • 47. "I know it, Rose," replied Adeline, with a sigh; "it is a great deal of work to do for that paltry sum. If I were able to help you we might get along better, even at such wages. I feel that I am very useless, and a burden on you and Harry." "You mustn't think anything of the kind, Addie," said Rose, quickly, looking affectionately at her sister. "You know you are not strong enough to work." "And so you have to work the harder, Rose." "Never mind, Addie; I am strong, and I enjoy working for you." "But still I am so useless." "You chase us up, and we can work all the better." "I earn nothing. I wonder if I shall always be so weak and useless?" "No. Don't you remember the doctor said you would in all probability outgrow your weakness and be as strong as I am? All that is needed is patience." "Ah, it is not so easy to be always patient—when I think, too, of how differently we should have been situated if grandfather had treated us justly." A shadow came over the face of Rose. "Yes; I don't like to think of that. Why should he have left all his property to our cousin Philip and none to us?" "But if Philip should die it would all be ours, so Mr. Codicil says." "I don't want anything to happen to the poor boy." "Nor I, Rose. But don't you think he might do something for us?" "So he would, very probably, if he were left to himself; but you know he is under the guardianship of that uncle of his, Richard Grafton, who is said to be intensely selfish and wholly unprincipled. He means to live as handsomely as he can at Philip's expense." "Did grandfather appoint him guardian?"
  • 48. "I believe so. Richard Grafton is very artful, and he led grandfather to believe him fitted to be an excellent guardian for the boy." "I suppose he is in Europe?" "No; I heard from Mr. Codicil, yesterday, that he was in New York." "Is Philip with him?" "Yes. He was to take the boy to Mr. Codicil's office to-day. There was a report some time since—I did not mention it to you for fear of exciting you—that Philip was dead. Mr. Codicil wrote to Mr. Grafton to make inquiry. In answer, he has come to New York, bringing Philip with him. While the boy lives, he receives an annual income of six thousand dollars for the boy's expenses, and to compensate him for his guardianship. You see, therefore, that Philip's death would make a great difference to him." "And to us," sighed Adeline. "Addie," said Rose, gravely, "don't allow yourself to wish for the death of our young cousin. It would be wicked." "I know it, Rose; but when I consider how hard you work, and how confined Harry is as a cash-boy, I am strongly tempted." "Then put away the temptation, and trust to a good Providence to take good care of us. God will not fail us." "I wish I had your faith, Rose," said her younger sister. "So you would, Addie, if you had my strength," said Rose, in an affectionate tone. "It is harder for you to be idle than for me to work." "You are right there, Rose. I only wish I could work. Do you know where Philip and his guardian are staying?" "Yes; Mr. Codicil told me they were staying at the Metropolitan Hotel." "Did you ever see Philip?" "Not since he was a little boy. I would not know him."
  • 49. "Do you suppose he knows anything about us?" "Probably Mr. Grafton never mentions us. Yet he must know that he has cousins living, but he may not know how hard we have to struggle for a livelihood." "I wish we could get a chance to speak to him. He might feel disposed to help us." "Probably his power is not great. He is only sixteen, and I presume has little command of money." "How do you think it would do for Harry to carry him a letter, asking him to call upon us?" "His guardian would intercept it." "It might be delivered to him privately." "There is something in what you say," returned Rose, thoughtfully. "He is our cousin, and we are his only living relatives. It would only be proper for him to call upon us." "The sooner we communicate with him the better, then," said Adeline, whose temperament was quick and impulsive. "Suppose I write a letter and get Harry to carry it to the hotel when he comes home." "As you please, Addie. I would write it, but I want to finish this vest to-night." "I will write it. I want to be of some little use." She rose, and with languid step drew near the table. Procuring writing materials, she penned a brief note, which she handed to Rose, when completed, with the inquiry, "How will that do?" Rose cast her eyes rapidly over the brief note, which read as follows: "Dear Cousin Philip:—No doubt you are aware that you have three cousins in this city—my sister Rose, my brother Harry, who will hand you this note, and myself. We have not seen you for many years. Will it be too much to ask you to call on us? We
  • 50. are in humble quarters, but shall be glad to welcome you to our poor home. "Your cousin, "Adeline Beaufort." In a line below, the address was given. "That will do very nicely, Addie," said Rose. "I am glad you did not hint at our need of assistance." "If he comes to see us, he can see that for himself. I hope something may come of it," continued the younger sister. "Don't count too much on it, or your disappointment will be the more keen." "Harry can carry it around after supper." "Philip may be at supper." "Then he can wait. I wish he would come home." As if in answer to her wish the door was hastily opened, and a bright, ruddy-faced boy entered. "Welcome back, Harry," said Rose, with a smile. "How have you passed the day?" "Running round as usual, Rose. It's no joke to be a cash-boy." "I wish I could run round, Harry," sighed Addie. "So do I. That would be jolly. How are you feeling to-day, Addie?" "About the same. Are you very tired?" "Oh, no; only about the same as usual." "Because I would like to have you do an errand for me." "Of course I will," said Harry, cheerfully. "What is it?" "I want you to take this note to the Metropolitan Hotel."
  • 51. "Who do you know there?" asked Harry, in surprise. An explanation was given. "I want you to be very particular to give the note to Philip without his guardian's knowledge. Can you manage it?" "I'll try. I'll go the first thing after supper."
  • 53. CHAPTER XII. A Surprising Announcement. Harry Beaufort entered the Metropolitan Hotel with the confidence of a city boy who knew that hotels are places of general resort, and that his entrance would not attract attention. He walked slowly through to the rear, looking about him guardedly to see if he could discover anybody who answered to his idea of Philip Grafton. Had he seen Ben, he would doubtless have supposed that he was the cousin of whom he was in search; but Ben had come in about five o'clock and had gone out again with his friend, the reporter, who had called for him. Thus Harry looked in vain, and was disposed to think that he would have to leave the hotel with his errand unaccomplished. This he didn't like to do. He concluded, therefore, to go up to the desk and inquire of the clerk. "Is there a boy staying here named Philip Grafton?" asked Harry. "Yes, my boy. Do you want to see him?" returned the clerk. "Yes, sir, if you please." "He went out half an hour since," said a bell-boy, who chanced to be near. "You can leave any message," said the clerk. "I have a note for him," said Harry, in a doubtful tone.
  • 54. "I will give it to him when he comes in." Harry hesitated. He had been told to put the note into Philip's own hand. But there was no knowing when Philip would come in. "I guess it'll do to leave it," he thought. "Please give it into his own hands," he said; and the clerk carelessly assented. Harry left the hotel, and five minutes later Richard Grafton, or Major Richard Grafton, as he called himself, entered and walked up to the clerk's desk. "Any letters or cards for me?" he asked. "There's a note for your nephew," said the clerk, producing the one just left. "Ha!" said the major, pricking up his ears suspiciously. "Very well, I will take it and give it to him." Of course the clerk presumed that this was all right, and passed it over. Major Grafton took the note carelessly and sauntered into the reading-room, where he deliberately opened it. "I must see who is writing to Philip," he said to himself. "It may be necessary to suppress the note." As he read the note, the contents of which are already familiar to the reader, his brow darkened with anger and anxiety. "It is fortunate that this came into my hands," he reflected. "It would have puzzled the boy, and had he gone to see these people the murder would have been out and probably my plans would have ended in disaster. There is something about the boy that leads me to doubt whether he would second my plans if he suspected what they were. I must devise some means for throwing these people off the scent and keeping the boy in the dark. What shall I do?" After a little reflection, Major Grafton decided to remove at once to a different hotel. He resolved to do it that very night, lest there should
  • 55. be another attempt made to communicate with his young secretary. He must wait, however, till Ben returned. Half an hour later Ben entered, and found the major walking impatiently up and down the office. "I thought you would never come back," he said, impatiently. "I am sorry if I inconvenienced you, sir," Ben said. "I didn't know you wished me back early." "Come up stairs with me and pack. We are going to leave the hotel." "Where are we going?" asked Ben in surprise. "You will know very soon," answered the major. Major Grafton notified the clerk that he wished a hack in fifteen minutes, as he was about to leave the hotel. "Very well, major. Are you going to leave the city?" "Not at once. I may spend a few days at the house of a friend," answered Grafton, evasively. "Shall we forward any letters?" "No; I will call here for them." In fifteen minutes a porter called at the door of Major Grafton's room and took down the two trunks. A hack was in waiting. "Where to, sir?" asked the driver. "You may drive to the Windsor Hotel," was the answer. The Windsor Hotel, on Fifth avenue, is over two miles farther up town than the Metropolitan. Leaning back in his comfortable seat, Ben enjoyed the ride, and was pleased with the quiet, aristocratic appearance of the Windsor. A good suite of rooms was secured, and he found himself even more luxuriously accommodated than at the Metropolitan. "I wonder why we have changed our hotel," he thought.
  • 56. As if aware what was passing through his mind, Major Grafton said: "This hotel is much more conveniently located for my business than the other." "It seems a very nice hotel," said Ben. "There is none better in New York." "I wonder what his business is," passed through Ben's mind, but he was afraid of offending by the inquiry. Another thing puzzled him. He was ostensibly Major Grafton's private secretary, and as such was paid a liberal salary, but thus far he had not been called upon to render any service. There was nothing in this to complain of, to be sure. If Major Grafton chose to pay him for doing nothing, that was his lookout. Meanwhile he would be able to save up at least half of his salary, and transmit it to his mother. When they were fairly installed in their new home Major Grafton said: "I have a call to make, and shall be absent till late. I suppose you can take care of yourself?" "Oh, yes, sir. If there is anything you wish me to do——" "Not this evening. I have not got my affairs settled yet. That is all the better for you, as you can spend your time as you choose." About an hour later, as Ben was in the billiard-room, looking with interest at a game, his cousin, Clarence Plantagenet, and Percy Van Dyke entered. "How are you?" said Clarence, graciously. "Percy, this is my cousin, Ben Baker." "Glad to see you, I'm sure," said Percy. "Won't you join us in a little game?" "No, thank you," answered Ben. "I don't play billiards." "Then you ought to learn."
  • 57. "I thought you said you were staying at the Metropolitan," said Plantagenet. "So I was, but we have moved to the Windsor." "Have you a good room?" "Tip-top!" "Does that mean on the top floor?" asked Percy, laughing. "Not exactly. We are on the third floor." "Come, Percy, here's a table. Let us have a game." They began to play, and Ben sat down in a comfortable arm-chair and looked on. Though neither of the boys was an expert, they played a fair game, and Ben was interested in watching it. "It's wonderful how he's improved," thought Clarence. "When I saw him in pa's office I thought he was awkward and gawky; now he looks just like one of us. He's had great luck in falling in with this Major Grafton. Really, I think we can afford to recognize him as a relation." When the boys had played a couple of games, they prepared to go. "By the way, Ben," said Clarence, "the governor told me to invite you to dinner on Sunday. Have you any other engagement?" "Not that I know of. I will come if I can." "That's right. Ta-ta, old fellow." "He treats me a good deal better than he did when we first met," thought Ben. "There's a great deal of virtue in good clothes, I expect." Ben was asleep before Major Grafton came home. In the morning, when he awoke, he found that the major was already dressing. "By the way, Philip," said his employer, quietly, "we sail for Europe this afternoon at three."
  • 58. "Sail for Europe!" ejaculated Ben, overwhelmed with surprise. "Yes. See that your trunk is packed by eleven."
  • 60. CHAPTER XIII. A Farewell Call. Ben was startled by Major Grafton's abrupt proposal. To go to Europe would be delightful, he admitted to himself, but to start at a few hours' notice was naturally exciting. What would his mother and sister say? "I suppose there isn't time for me to go home and see my mother before sailing?" he ventured to say, interrogatively. "As we are to sail at three o'clock this afternoon, you can judge for yourself about that," said the major, coolly. "Don't you want to go?" "Oh, yes, sir. There is nothing I should like better. I should like to have said good-by to my mother, but——" "Unfortunately, you can't. I am glad you take so sensible a view of the matter. I will depend on you to be ready." "How long shall we probably be gone?" asked Ben. "I can tell you better some weeks hence, Philip. By the way," he added, after a moment's thought, "if any letters should come here addressed to you, don't open them till I come back." Ben looked at the major in surprise. Why should he not open any letters that came for him? He was not likely, he thought, to receive any except from Sunderland.
  • 61. "I will explain," continued the major. "There are some people in the city that are continually writing begging letters to me. They use every method to annoy me, and might go so far as to write to you and ask your intercession." "I understand," said Ben, unsuspiciously. "I thought you would," returned the major, evidently relieved. "Of course if you get any letter from home you will open that." "Thank you, sir." After breakfast Major Grafton left the hotel without saying where he was going, and Ben addressed himself first to packing his trunk, and then going down to the reading-room. There he sat down and wrote a letter to his mother, which ran thus: "Dear Mother:—I can imagine how much you will be surprised when I tell you that when this letter reaches you I shall be on my way to Europe. Major Grafton, my employer, only told me an hour since, and we sail this afternoon at three. I should be glad to come home and bid you and my little sister good-by, but there is no time. I know you will miss me, but it is a splendid chance for me to go, and I shall be receiving a liberal salary, out of which I can send you money from time to time. I know I shall enjoy myself, for I have always had a longing to go to Europe, though I did not dream that I should have the chance so soon. I will write to you as soon as we get on the other side. "Your loving son, Ben. "P. S.—We sail on the Parthia." It may be readily understood that this letter made a great sensation in Sunderland. Mrs. Baker hardly knew whether to be glad or sorry. It was hard to part from Ben for an uncertain period. On the other hand, all her friends congratulated her on Ben's great success in securing so good a position and salary. It was certainly a remarkable stroke of good fortune.
  • 62. Ben was about to write another letter to Clarence, explaining why he could not accept the invitation for dinner on Sunday, but a glance at the clock showed him that he would have a chance to go to his uncle's store, and that seemed, on the whole, more polite. He jumped on board a Broadway car at Twenty-third street, and half an hour later got out in front of his uncle's large business establishment. He entered with quite a different feeling from that attending his first visit, when, in his country attire, poor and without prospects, he came to make an appeal to his rich uncle. Handsome clothes are apt to secure outward respect, and one of the salesmen came forward, obsequiously, and asked: "What can I show you, young gentleman?" "Nothing, thank you," answered Ben, politely. "Is my uncle in?" "Your uncle?" "Mr. Walton." "Oh, yes; you will find him in his office." "Thank you." Nicholas Walton looked up as Ben entered his presence, and did not immediately recognize the handsomely-dressed boy who stood before him. He concluded that it was one of Clarence's high-toned acquaintances. "Did you wish to see Clarence?" he asked affably. "I am sorry to say that he has not been in this morning." "I should like to see him, Uncle Nicholas; but I also wished to see you." "Oh, it's Ben!" said Mr. Walton, in a slightly changed tone. "Yes, uncle; I met my cousin at the Windsor last evening." "He told me so. You are staying there, he says."
  • 63. "For a very short time. My cousin was kind enough to invite me to dinner on Sunday." "Yes; we shall be glad to have you dine with us." "I am sorry I cannot come. I am to sail for Europe this afternoon." "You sail for Europe!" repeated his uncle, in amazement. "Yes, uncle. I knew nothing of it till this morning." "It is indeed surprising. To what part do you go?" "I believe we sail for Liverpool in the Parthia. More than that I know nothing." "You are certainly strangely fortunate," said the merchant, musingly. "Does this Major Grafton appear to be wealthy?" "I judge that he is." "Does he pay you well?" "He gives me fifty dollars per month." "And what do you do?" "I am his private secretary, but thus far I have not been called upon to do much. I suppose I shall have more to do when I get to Europe." "He seems to be eccentric as well as rich. Perhaps he will want to adopt you. I advise you to try to please him." "I shall certainly do that, though I don't think he will adopt me." "Clarence will be sorry not to have seen you. He has taken a trip to Long Branch this morning with Percy Van Dyke." "I saw Percy last evening." "This country nephew of mine gets into fashionable society remarkably quick," thought the merchant. "There must be something in the boy, or he would not make his way so readily."
  • 64. "We are all going to Long Branch next week," said Mr. Walton, aloud. "We are to stay at the West End. If you had remained here you could have tried to persuade Major Grafton to spend part of the season at the Branch." "I shall be satisfied with Europe," said Ben, smiling. "You have reason to be satisfied. Clarence will envy you when he hears that you are going." "It didn't look as if he were likely to envy me for anything when I met him here the other day," thought Ben. "Please remember me to my cousin," said Ben, and shaking his uncle's extended hand he left the store. He was passing through the store when he felt a touch on his shoulder. Turning, he recognized the tall lady he had met just after his last visit. "Are you not the boy who told me I had a ticket on my shawl?" she inquired. "Yes, madam," replied Ben, smiling. "I recognize your face, but otherwise you look very different." "You mean I am better dressed." "Yes; I thought you a country boy when I met you." "So I am, but I am trying to be mistaken for a city boy." "I am relieved to meet you, for some one told me you had got into some trouble with the unmannerly boys who were following me." "I am much obliged to you for your solicitude in my behalf," said Ben, not caring to acknowledge the fact of the arrest. "I had hoped to be of service to you, but I see you don't appear to need it. I am here buying a suit of clothes for a poor boy in whom I
  • 65. am interested. Let me give you my card, and if you ever need a friend, come and see me." The card bore the name of "Jane Wilmot, 300 Madison avenue." Ben thanked Miss Wilmot and left his uncle's store.
  • 67. CHAPTER XIV. What Ben's Friends Thought. "Did you see Philip?" asked Adeline, eagerly, when her young brother returned from his visit to the Metropolitan Hotel. "No," answered Harry. "He was out." "And you brought back the note, then?" said his sister, disappointed. "No; the clerk said he would give it to him; so I left it with him." Adeline looked anxious. "I am afraid his guardian will get hold of it," she said, turning to Rose. "Even if he does, there is nothing in it that you need regret writing." "It would never reach Philip." "Probably you are right. In that case we must make another effort when there seems a good chance." It was decided that Harry should call the next day, at his dinner hour, and ascertain whether the note had been delivered. He did so, but only to learn that the note had been given to Major Grafton, and that both he and Philip had left the hotel. "Do you know where they went," asked Harry, eagerly.