SlideShare a Scribd company logo
Building an Event-Driven Data Mesh (Early
Release) Adam Bellemare install download
https://ebookmeta.com/product/building-an-event-driven-data-mesh-
early-release-adam-bellemare/
Download more ebook from https://ebookmeta.com
We believe these products will be a great fit for you. Click
the link to download now, or visit ebookmeta.com
to discover even more!
Practical Event-Driven Microservices Architecture:
Building Sustainable and Highly Scalable Event-Driven
Microservices Oliveira Rocha
https://ebookmeta.com/product/practical-event-driven-
microservices-architecture-building-sustainable-and-highly-
scalable-event-driven-microservices-oliveira-rocha/
Data Mesh Delivering Data Driven Value at Scale 1st
Edition Zhamak Dehghani
https://ebookmeta.com/product/data-mesh-delivering-data-driven-
value-at-scale-1st-edition-zhamak-dehghani/
Data Mesh: Delivering Data-Driven Value at Scale 3rd
Edition Zhamak Dehghani
https://ebookmeta.com/product/data-mesh-delivering-data-driven-
value-at-scale-3rd-edition-zhamak-dehghani/
Letters from Pharmacy Residents 1st Edition Sara J.
White
https://ebookmeta.com/product/letters-from-pharmacy-
residents-1st-edition-sara-j-white/
Designing the Forest and Other Mass Timber Futures 1st
Edition Lindsey Wikstrom
https://ebookmeta.com/product/designing-the-forest-and-other-
mass-timber-futures-1st-edition-lindsey-wikstrom/
Intersections of Mothering: Feminist Accounts 1st
Edition Carole Zufferey
https://ebookmeta.com/product/intersections-of-mothering-
feminist-accounts-1st-edition-carole-zufferey/
FOUND YOU - Katie Winter FBI Mystery 1st Edition Molly
Black
https://ebookmeta.com/product/found-you-katie-winter-fbi-
mystery-1st-edition-molly-black/
Miscellanies About the Buddha Image BAR International
Claudine Bautze Picron
https://ebookmeta.com/product/miscellanies-about-the-buddha-
image-bar-international-claudine-bautze-picron/
Handbook of Climate Change Mitigation and Adaptation
3rd Edition Maximilian Lackner
https://ebookmeta.com/product/handbook-of-climate-change-
mitigation-and-adaptation-3rd-edition-maximilian-lackner/
Temptation Free Grace Broadcaster Issue 257 1st Edition
Various
https://ebookmeta.com/product/temptation-free-grace-broadcaster-
issue-257-1st-edition-various/
Building an Event-Driven Data Mesh (Early Release) Adam Bellemare
Building an Event-Driven Data
Mesh
Patterns for Designing & Building Event-Driven
Architectures
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take
advantage of these technologies long before the official release of these
titles.
Adam Bellemare
Learning Events for Distributed Systems
by Adam Bellemare
Copyright © 2022 Adam Bellemare. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North,
Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales
promotional use. Online editions are also available for most titles
(http://oreilly.com). For more information, contact our
corporate/institutional sales department: 800-998-9938 or
corporate@oreilly.com.
Acquisitions Editor: Nicole Tache
Development Editor: Melissa Duffield
Production Editor: Jonathon Owen
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Kate Dullea
March 2023: First Edition
Revision History for the Early Release
2022-07-25: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781098127602 for release
details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc.
Learning Events for Distributed Systems, the cover image, and related trade
dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the author and do not
represent the publisher’s views. While the publisher and the author have
used good faith efforts to ensure that the information and instructions
contained in this work are accurate, the publisher and the author disclaim all
responsibility for errors or omissions, including without limitation
responsibility for damages resulting from the use of or reliance on this
work. Use of the information and instructions contained in this work is at
your own risk. If any code samples or other technology this work contains
or describes is subject to open source licenses or the intellectual property
rights of others, it is your responsibility to ensure that your use thereof
complies with such licenses and/or rights.
978-1-098-12754-1
Chapter 1. Introducing Event
Streams for Data
Communication
A NOTE FOR EARLY RELEASE READERS
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take
advantage of these technologies long before the official release of these
titles.
This will be the 1st chapter of the final book.
If you have comments about how we might improve the content and/or
examples in this book, or if you notice missing material within this
chapter, please reach out to the editor at mpotter@oreilly.com.
The way that businesses relate to their data is changing rapidly. Gone are
the days when all of a business’ data would fit neatly into a single relational
database. The big data revolution, started more than two decades ago, has
since evolved, and it is no longer sufficient to store your massive datasets in
a big data lake for batch analysis. Speed and inter-connectivity have
emerged as the next major competitive business requirements, again
transforming the way that businesses create, store, access, and share their
important data.
The modern business faces three main problems relating to data. First, big
data systems, underpinning a company’s business analytics engine, have
exploded in size and complexity. There have been many attempts to address
and reduce this complexity, but they each fall short of the mark. Secondly,
business operations for large companies have long since passed the point of
being served by a single monolithic deployment. Multi-service deployments
are the norm, including microservice and service oriented architectures. The
boundaries of these modular systems are seldomly easily defined, especially
when many separate operational and analytical systems rely on read-only
access to the same data sets. There is an opposing tension here: on one
hand, co-locating business functions in a single application provides
consistent access to all data produced and stored in that system. On the
other, these business functions may have absolutely no relation to one
another aside from needing common read-only access to important business
data.
Both the analytical and operational domains suffer from a common third
problem: the inability to access high-quality, well-documented, self-
updating, and reliable data, to apply it to their own business use-cases. The
sheer volume of data that an organization deals with increases substantially
year-over-year, fueling a need for better ways to sort, store, and use it. This
pressure deals the final blow to the ideal of keeping everything in a single
database, and forces developers to split up monolithic applications into
separate deployments with their own databases. Meanwhile, the big data
teams struggle to keep up with the fragmentation and refactoring of these
operational systems, as they remain solely responsible for obtaining their
own data. Data has historically been treated as a second class citizen, as a
form of exhaust or byproduct emitted by the business applications. This
application-first thinking remains the major source of problems in today’s
computing environments.
Important business data needs to be readily and reliably available as
building block primitives for your applications, regardless of the runtime,
environment, or code base of your application. We can accomplish this by
treating data as a first-class citizen, complete with dedicated ownership,
minimum quality guarantees, service-level agreements, and scalable
mechanisms for clean and reliable access. Event streams are the ideal
mechanism for serving this data, providing a simple yet powerful way of
reliably communicating important business data across an organization,
enabling each consumer to access and use the data primitives they need.
In this chapter we’ll take a look at the forces that have shaped the
operational and analytical tools and systems that we commonly use today
and the problems that go along with them. The massive inefficiencies of
contemporary data architectures provide us with rich learnings that we will
apply to our event-driven solutions, and will set the stage for us going into
the next chapter when we talk about Data Mesh as a whole.
The Case for Event Streams and Event-
Driven Architectures
The new competitive requirements of needing big data, in motion,
combined with modern cloud computing, requires a rethink of how a
business creates, stores, moves, and uses data. The foundation of this new
data architecture is the event, the data quantum that represents real activity
within the business, as made available in the event stream. Together, event
streams lead to a central nervous system for unified data, enabling business
units to access and use fundamental, self-updating data building blocks.
These data building blocks join the ranks of containerization, infrastructure-
as-a-service, CI/CD pipelines, and monitoring solutions, the components on
which modern cloud applications are built.
Event streams are not new, and many past architectures have used them,
though not as extensively as outlined in this book. But the technological
limitations underpinning previous event-driven architectures are no longer a
concern in today’s world of easily scalable cloud computing. Modern multi-
tenant event brokers, complete with tiered storage, can store an unbounded
amount of data, removing the strict capacity restrictions that limited their
predecessors and prevented the storage of historical data. Producers write
their important business domain data to an event stream, enabling others to
couple on that stream and use the data building blocks for their own
applications. Finally, consumer applications can in turn create their own
event streams to share their own business facts with others, resulting in a
standardized communications mesh for all to use.
A Brief History of the Operational and
Analytical Plane
History has provided us with ample opportunity to learn from technological
choices that just didn’t work out as well as we’d hoped. In this section,
we’ll focus on the operational plane, the analytical plane (in the form of
“big data” solutions), and event streams. We’ll take a brief look at each of
these to get a better understanding of the forces that helped shape them, and
how those forces may still be relevant for a data communicatin system built
on event streams.
The Operational Plane
The year is 1970. The relational database model had just been proposed by
E.F. Codd. This model began to catch on over the next decade, with
implementations by IBM, Honeywell, and Oracle (to name a few) released
for general business usage. This model promoted two very important things
for the business world. One, the ability to relationally structure your data
models, such that you could get good real-time performance for insertions,
updates, deletions, and importantly, complex, multi-condition queries. Two,
the enablement of atomic, consistent, isolated, and durable (ACID)
transactions, which allow multiple actors to commit to the database without
inadvertently corrupting the underlying data. These two properties underpin
all modern relational databases, and have enabled the online transactional
processing (OLTP) of countless businesses across the globe.
OLTP databases and applications form the basis of much of today’s
operational computing, often in the form of a monolith. This is for many
good reasons: A monolith is a great way to get a business venture started,
with low overhead for creating, testing, and evaluating business strategies.
There are a wide selection of technologies that enable it, as well as a large
pool of engineering talent to draw on, given its commonality. Monolithic
applications also provide useful data access frameworks and abstractions to
the underlying storage layer, which is most commonly a relational database
such as PostgreSQL or MySQL. The database itself provides the
aforementioned ACID transactions, high operational performance,
durability, and reliable error handling. Together, the application and
database demonstrate the monolith data principles:
The Database is the Source of Truth
The monolith relies upon the underlying database to be the durable store
of information for the application. Any new or updated records are first
recorded into the database, making it the definitive source of truth for
that business data.
Data is Strongly Consistent
The monolith’s data, when stored in a typical relational database, is
strongly consistent. This provides the business logic with strong read-
after-write consistency, and thanks to transactions, it will not
inadvertently access partially updated records.
Read-Only Data is readily available
The monolith code can also readily access data from anywhere in the
database. A modular monolith, where each module is responsible for
certain business functionality, will have restricted write access, but tend
to allow other modules to read their underlying data as necessary.
These three properties form a binding force that make monolithic
architectures very powerful. Your application code has read-only access to
the entire span of data stored in the monolith’s database as a set of
authoritative, consistent, and accessible data primitives. This foundation
makes it easy to build new application functionality provided it is in the
same application. But what if you need to build a new application?
The Difficulties of Communicating Data Between
Operational Services
A new application cannot rely on the same easy access to data primitives
that it would have if it were built as part of the monolith. This would not be
a problem if the new service had no need for any of the business data in the
monolith. However, this is rarely the case, as businesses are effectively a set
of overlapping domains, particularly the common core, with the same data
serving multiple business requirements. For example, an e-commerce
retailer may rely on its monolith to handle its orders, sales, and inventory,
but requires a new service powered by a document-based search index to
give its customers fast and effective search results, including available
items in the stores. Figure 1-1 highlights the crux of the issue: how do we
get the data from Ol' Reliable into the new document database to
power search?
Figure 1-1. The new search service team must figure out how to get the data it needs out of the
monolith keep it up to date
This puts the new search service team in a bit of a predicament. The service
needs access to the item, store, and inventory data in the monolith,
but it also needs to model it all as a set of documents for the search engine.
There are two common ways that teams attempt to resolve this. One is to
replicate and transform the data to the search engine, in an attempt to
preserve the three monolith data principles. The second is to use APIs to
restructure the service boundaries of the source system, such that the same
data simply isn’t copied out - but served completely from a single system.
Both can achieve some success, but are ultimately insufficient as a general
solution. Let’s take a look at these in more detail to see why.
Strategy 1: Replicate Data Between Services
There are several mechanisms that fall under this strategy. The first and
simplest is to just reach into the database and grab the data you need, when
you need it. A slightly more structured approach is to periodically query the
source database, and dump the set of results into your new structure. While
this gives you the benefit of selecting a different data store technology for
your new service, you remain coupled on the source database’s internal
model and rely exclusively on it to handle your query needs. Large datasets
and complex queries can grind the database to a halt, requiring an
alternative solution.
The next most common mechanism for the data replication strategy is a
read-only replica of the source database. While this may help alleviate
query performance issues, consumers remain coupled on the internal model.
Unfortunately, each additional external coupling on the internal data model
makes change more expensive, risky, and difficult for all involved
members.
WARNING
Coupling on the internal data model of a source system causes many problems. The
source model will change in the course of normal business evolution, which often
causes breakages in both the periodic queries and internal operations of all external
consumers. Each coupled service will need to refactor their copy of the model to match
what is available from the source, migrate data from the old model to the new model,
and update their business code accordingly. There is a substantial amount of risk in each
of these steps, as a failure to perform each one correctly can lead to misunderstandings
in the meaning of the models, divergent copies of the data, and ultimately incorrect
business results.
One final issue with data replication strategies is that it becomes more
difficult with each new independent service. Authoritative data sets can be
difficult to locate, and it is not uncommon for a team to accidentally couple
on a copy of the original. Meanwhile, each new independent service may
become its own authoritative source of data, increasing the number of
point-to-point connections. Finally, data is no longer strongly consistent
between datasets, particularly as clock skew between services, along with
long query and copy times make it more difficult to reason about the
completeness of your data copies.
Strategy 2: Use APIs to Avoid Data Replication Needs
Another way of creating a new service alongside a monolith is to take a
look at directly-coupled request/response microservices, also sometimes
known as synchronous microservices. They use direct API calls between
one another to exchange small amounts of information and perform work
on each other’s behalf.
These are small, purpose built systems that address a specific domain of
business functionality. For example, you may have one microservice that
manages inventory related operations, while you have other
microservices dedicated to shipping, accounts. Each of these services
requests originating from the dedicated mobile frontend and web
frontend microservices, which stitch together operations and return a
seamless view to the users, as shown in Figure 1-2.
Figure 1-2. An example of a simple ecommerce microservice architecture
Synchronous microservices have many benefits. Services are purpose-built
to serve the needs of the business domain, giving the owners a high level of
independence to use the tools, technologies, and models that work best for
their needs. Teams also have more control over the boundaries of their
domain, including control and decision making over how to expand it to
help serve other clients needs. There are numerous books written on this
subject that go into far more detail than I have space for, so I won’t delve
into it in much detail here.
The main downside of this strategy is the same as with a single service -
there is no easy and reliable mechanism for accessing data beyond the
mechanisms provided in the microservices’ API. Most synchronous
microservices are structured to offer up an API of business operations, and
not for serving reliable bulk data access to the underlying domain. Thus,
most teams resort to the same fallbacks as a single monolith: reach into the
database and pull out the data you need, when you need it, as shown in
Figure 1-3.
Figure 1-3. The microservice boundaries may not line up with the needs of the business problem
In this figure, the new service is reliant upon the inventory, accounts, and
shipping services just for access to the underlying business data - but not for
the execution of any business logic. While this form of data access can be
served via a synchronous API, it may not be suitable for all use-cases. For
example, large datasets, time-sensitive data, and complex models can
prevent this from becoming a reality, in addition to the operational burden
of providing the data access API and data serving performance on top of
that of the base microservice functionality.
Operational systems lack a generalized solution for communicating
important business data between services. But this isn’t something that’s
isolated to just operations. The big data domain, underpinning and
powering analytics, reporting, machine learning, AI, and other business
services, is a voracious consumers of full data sets from all across the
business. While domain boundary violations in the form of smash and grab
data access is the foundation on which big data engineering has been built (I
have been a part of such raids during the decade or so I have spent in this
space), fortunately for us, it has provided us with rich insights that we can
applyt to make a better solution all data users. But before we get to that,
let’s take a look at the big data domain requirements for accessing and using
data, and how this space evolved to where it is today.
A short opinionated history of big data - Hadoop, Data
Warehouse, Lakes, and Analytics
Whereas operational concerns focus primarily on OLTP and server-to-
server communication, analytical concerns are historically focused on
answering questions about the overall performance of the business. Online
analytical processing (OLAP) systems are purpose-built to help solve these
problems, and have been actively developed and released since the first
commercial offering in 1970 (the same year as E.F. Codd’s paper on
relational databases). In brief, these OLAP systems are used to store
restructured data in a multidimensional “cube” format, such that analysts
can evaluate data on different vectors. The cube could be used to answer
questions such as “how many items did we sell last year?” and “what was
our most popular item?”
Answering these questions requires remodeling operational data into a
model suitable for analytics, but also accounting for the vast amounts of
data where the answers are ultimately found. While OLAP cubes provided
early success, their scaling strategy to accommodate ever-increasing data
loads relied upon larger disk drives and more powerful compute power, and
ultimately ran into the physical limits of computer hardware. Instead of
further scaling up, it became time to scale out; This was plainly evident to
Google, who published a paper about the Google File System in October,
2003 (https://research.google/pubs/pub51/). This is the era that saw the
birth of big data, and caused a massive global rethink of how we create,
store, process, and ultimately, use data.
Hadoop quickly caught on as the definitive way to solve the compute and
scale problems encountered by OLAP cubes and quickly transformed the
analytical domain. Its free and open-source model meant that it could be
used by any company, anywhere, provided you could figure out how to
manage the infrastructure requirements, among a number of other technical
hurdles. But it meant a new way to compute analytics, and one where you
were no longer constrained to a proprietary system limited by the resources
of a single computer.
Hadoop introduced the Hadoop Distributed File System (HDFS), a durable,
fault-tolerant file system that made it possible to create, store, and process
truly massive data sets spanning multiple commodity hardware nodes.
While this has now been largely supplanted by options such as Amazon’s
S3 and Google’s Cloud Storage, it paved the way for a bold new idea: Copy
all of the data you need into a single logical location, and apply processing
after the fact to derive important business analytics.
The Hadoop ecosystem was only as useful as the data it had available, and
integrating it with existing systems and data sources remained problematic.
The creators of Hadoop focused on making it easy to get data into HDFS,
and in 2009 introduced a tool named Sqoop for this very purpose. One of
Sqoop’s most commonly used purpose is to pull data from external sources
(databases) on a periodic basis. For example, a scheduler kicks off a Sqoop
job at the top of every hour, which in turn queries a source database for all
of the data that changed in the past hour, then loads it directly into an HDFS
folder as part of the incremental update of a larger data set. Once the Sqoop
job completed loading the new data into the cloud storage, you can then
execute your analytical processing job on the newly updated batch of data.
The big data architecture introduced a significant shift in the mentality
towards data. While it did solve the capacity issues of existing OLAP
systems, it introduced a new concept into the data world: that it was not
only acceptable, but implicitly implied that writing unstructured and semi-
structured data was the recommended way to get data into the Hadoop
ecosystem! OLAP cubes required an explicit schema for their table
definitions, and incoming data had to be transformed and resolved to fit that
schema. These schemas are carefully constructed, maintained, and
migrated, and any insertion conflicts had to be resolved at write time, or
else the insertion would simply fail to protect the integrity of the data.
Such a restriction did not explicitly exist when using Hadoop. And in
practice, many data engineers were pleased to get rid of the schema on
write barriers that so often caused them problems when ingesting data. In
this new world of Big Data, you were free to write data with or without any
schema or structure, and resolve it all later at query time by applying a
schema on read. And this isn’t just my opinion - in fact, writing
unstructured data and resolving at read time was one of the prime features
of contemporary Hadoop marketing material and technical documents.
Consider Figure 1-4, comparing the data structures and use-cases between
the relational database and MapReduce. MapReduce is Hadoop’s first
processing framework, and is the system that would read the data, apply a
schema (on read), perform transformations and aggregations, and produce
the final result. MapReduce was essential for analytical processing of data
sourced from HDFS, though today has largely been supplanted by newer,
more performant options.
Building an Event-Driven Data Mesh (Early Release) Adam Bellemare
Figure 1-4. A comparison of RDBMS and Hadoop, circa 2009. Hadoop: The Definitive Guide By
Tom White (2009 - O’Reilly Media, Inc.)
Note how this definitive guide promotes MapReduce as a solution for
handling low integrity data with a dynamic schema. This supports the
notion that HDFS should be storing unstructured data with low integrity, or
with varying and possibly conflicting schemas to be resolved at run time. It
also points out that this data is Write once, read many times, which is
precisely the scenario in which you want a strong, consistent, enforced
schema - providing ample opportunity for well-meaning but unconstrained
users of the data to apply a read-time schema that misinterprets or
invalidates the data.
At the time, I don’t think that I nor many others appreciated just how
difficult these precepts would end up making collecting and using data. We
believed in and supported the idea that it was okay to grab data as you need
it, and figure it out after the fact, restructuring, cleaning, and enforcing
schemas at a later date. This also made it very palatable for those
considering migrating to Hadoop to aleviate their analytical constraints, as
this move didn’t constrain your write processes nor bother you with strict
rules of data ingestion - after all, you can just fix the data once it’s copied
over!
Unfortunately, the funamental principle of storing unstructured data to be
used with schema on read proved to be the costliest and most damaging
tenets introduced by the big data revolution.
The Organizational Impact of Schema on
Read
Enforcing a schema at read time, instead of at write time, leads to a
proliferation of what we call “bad data”. The lack of write-time checks
means that data written into HDFS may not adhere to the schemas that the
readers are using in their existing work Figure 1-5. Some bad data will
cause consumers to halt processing, while other bad data may go silently
undetected. While both of these are problematic, silent failures can be
deadly and difficult to detect. We’ll see more on this later.
Figure 1-5. Examples of bad data in a dataset, discovered only at read time
To get a better understanding of the damaging influence of schema on read,
let’s take a look at three roles and their relationship to one another. While I
am limited to my own personal experiences, I have been fortunate to talk
with many other data people in my career, many from very different
companies and lines of business. I can say with confidence that while
responsibilities vary somewhat from organization to organization, this
summary of roles is by and large universal to most contemporary
organizations using big data:
The data analyst: is charged with answering business questions
related to big data sets. They query the data sets provided to them by
the data engineers.
The data engineer: is charged with obtaining the important business
data from systems around the organization, and putting it into a usable
format for the data analysts.
The application developer: is charged with developing an application
to solve business problems. That application’s database is also the
source of data required by the data analyst to do their job.
Historically, the most common way to adopt Hadoop was to establish a
dedicated data team, as a subset, or fully separate from, the regular
engineering team. The data engineer would reach into the application
developer’s database, grab the required data (often using Sqoop), and pull it
out to put into HDFS. Data scientists would help structure it and clean it up
(and possibly build machine-learning models off of it), before passing it on
to be used in analytics. Finally, the data analysts would then query and
process the captured datasets to produce answers to their analytical
questions. But this model led to many issues. Conversations between the
data team and the application developers would be infrequent, and usually
revolve around ensuring the data team’s query load does not affect the
production serving capabilities.
There are three main problems with this model. Let’s take a look at them.
Problem 1: Improper Data Model Boundaries
Data ingested into the analytics domain is coupled on the source’s internal
data model, and results in direct coupling by all downstream users of that
data. For very simple, seldomly-changing sources this may not be much of
a problem. But many models span multiple tables and are purpose built to
serve OLTP operations, and may become subject to substantial refactoring
as business use-cases change. Direct coupling on this internal model
exposes all downstream users to these changes.
OLAP cubes with schema on write forced the data engineers to reconcile
this incompatibility at write time, usually with the help of the application
developer who made the change. OLAP cube updates, and therefore new
processing results, would be halted until the new schema and data were
reconciled with the existing OLAP structure. However, with schema on
read, you can simply update the Sqoop job to pull from the newly defined
and altered tables, and push reconciling the horrible mess down to the data
scientists and analysts. While this is obviously not a true soluton to the
problem at hand, it’s unfortunately one that I have commonly seen - push it
down and let the users of the data figure it out.
TIP
Schema on read purportedly gives you the freedom to define the schema any way you
need. However, implementing a schema on write for a data set doesn’t prevent you from
subsequently transforming it downstream, in another job, into the format that you need
for your business domain. Schema on write gives you a well-defined and clean interface
to base your work on, while still letting you transform it as you need. We’ll be relying
on schema on write for the majority of our solutions in this book.
WARNING
One example I have seen of a table modification that silently broke downstream jobs
involved changing a premium field from boolean to long. The original version
represented an answer to the question of “did the customer pay to promote this”. The
updated version represented the budget_id of the newly expanded domain, linking
this part of the model to the budget and its associated type (including the new trial type).
The business had adopted a “try before you buy” model where we would reserve, say,
several hundred dollars in advertising credits to showcase the effectiveness of
promotion, without counting it in our total gross revenue.
The jobs ingesting this data to HDFS didn’t miss a beat (no schema on write), but some
of our downstream jobs started to report odd values. A majority of these were python
jobs, easily evaluating the new long values as booleans, and resulted in over-attribution
of various user analytics. Unfortunately, because no jobs were actually broken, this
problem wasn’t actually detected until one of our customers started asking questions
about abnormal results in their reports. This is just one example of many that I have
encountered, where well-meaning, reasonable changes to a systems’ data model have
unintended consequences on all of those who have coupled on it.
Problem 2: Ownership is spread across multiple teams
The application developer retains ownership of their data model, but
typically are largely unaware of the needs of anyone directly querying their
underlying database. As we saw with Problem 1, a change to this model can
have unintended consequences on the analysts relying on that data.
Application developers are the domain experts and masters of the source
data model, but their responsibility for communicating that data to other
teams (such as the big data team) is usually non-existent. Instead, their
responsibilities usually end at the boundaries of their application and
database.
Meanwhile, the data engineer is tasked with finding a way to get that data
out of the application developer’s database, in a timely manner, without
negatively affecting the production system. The data engineer is dependent
on the data sources, but often has little to no influence on what happens to
the data sources, making their role very reactive. This production / data
divide is a very real barrier in many organizations, and despite best-efforts,
agreements, integration checks and preventative tooling, breakages in data
ingestion pipelines remains a common theme.
Finally, the data analyst, responsible for actually using the data to derive
business value, remains two degrees of separation away from the domain
expert (application developer), and three degrees separated if you have a
layer of data scientists in there further munging the data. Both analysts and
data scientists have to deal with whatever data the data engineers were able
to extract, including resolving inconsistent data that doesn’t match their
existing read schemas. As data analysts often share their schemas with other
data analysts, they also need to ensure that their resolved schemas don’t
break each other’s work. This is increasingly difficult to do as an
organization and its data grows, and unfortunately, their resolution efforts
remain limited to only benefiting other analysts. Operational systems are on
their own.
Problem 3: Do-it-yourself & Custom Point-to-Point Data
Connections
While a data team in a small organization may consist of only a handful of
members, larger organizations have data teams that number in the hundreds
or thousands of members. For large data organizations, it is common to
have the same data sources pulled in to different sub domains of the data
platform. For example, sales data may be pulled into the analytics
department, consumer reporting department and accounts receivable
department. If we’re using Sqoop, each of these jobs may be independently
scheduled, unbeknownst to one another, resulting in multiple areas where
you have the same data. These types of workloads are commonly known as
ETL (extract, transform, and load) jobs, and are a staple in ingesting,
transforming, and loading data from operational systems to analytical data
warehouses and lakes.
Figure 1-6 shows what this tends to look like in practice. Each arrow in the
diagram is a dedicated periodic copy job, pulling data from one area to
another. Ad-hoc copying of data like this makes it difficult for the data
owners to track where their data is going, and who is using it. Its common
that all jobs pull data using a single shared account, to avoid the hassle of
changing security credentials and limiting the scope of access. And even
when such restrictions are imposed, such as in the case of the Predictions
domain, often times they can simply circumnavigate the restrictions by
finding an copy of that data available somewhere in the grand repository of
HDFS, and copy it over to their own domain. While infosec and data
governance are important topics in their own right, my own experience (and
that of many trusted peers) has indicated to me that unless you make it very
easy for people to get the data they need, they will create their own clever
(if illicit) ways to get it.
Figure 1-6. Three typically analytical domains, each grabbing data from where it can to get their
work done
But are these copies really the same data? There are many factors that can
affect this, including the frequency at which the data is acquired,
transformation code, intermittent failures, and misunderstanding that which
you are copying. You may also be ingesting a dataset that is itself a copy of
the original, with its own idiosyncracies, something that may not be
apparent unless you are intimately familiar with domain of the source
system. The end result is that it’s possible to end up with data that is
supposed to be the same, but is actually different. A very simple example of
this would be two ingested datasets, one aggregated daily with boundaries
at UTC-0 time, while the other is aggregated around daily with boundaries
based on local time. The format, partitioning, and ordering of data may
appear identical, yet these hard to detect, undocumented differences still
remain.
A domain modeling change in one of the data sources may require that the
downstream data engineers update their ETL pipelines to accommodate the
new format. However, three separate ETL jobs means three separate code
updates, each with its own independent changes, reviews, and commits. If
owned by different teams, as is often the case in a very large company with
lots of data sets and ETL jobs, these updates may be done many days apart -
or even not at all (you’d be surprised how many times a critical ETL job
can go unnoticed until it breaks). Thus, the results of each ETL job may
(and often do) report conflicting results with one another, despite being
sourced from the same original datasource.
Common ways to address this can include tribal knowledge (eg: tell me
when you’re changing the domain model!) and automated pull request
checks (eg: notify me if a pull request is made against the database
schema), but this is only a partial solution. All affected jobs and datasets
need to be identified and their owners need to be notified. At this point, a
case-by-case remediation of all affected downstream data can begin. This
tends to be a very expensive and complex task, and while it is possible,
preventing yourself from getting into this situation is a far better area of
investment than remediating your way out of it.
Bad Data: The Costs of Inaction
The reason that bad data costs so much is that it must be accommodated by
everyone consuming, using, and processing this data. This is further
complicated by the fact that not every consumer of bad data will “fix it” in
the same way, leading to divergent results with other consumers based on
their interpretation of the data. Tracking these divergent interpretations
down is fiendishly expensive, and is further complicated by mixing in data
from other domains that further muddies the interpretation of a dataset.
I have seen bad data created by well-meaning individuals trying their very
best, simply because of the point-to-point, reach in and grab it nature of the
existing data transfer tools. This has been further augmented by massive
scale, where a team discovers that not only is their copy of the dataset
wrong, but that it’s been wrong for several months, and the results of each
big data job computed on that dataset are also wrong. Some of these jobs
use hundreds or thousands of processing nodes, with 32x to 128x more in
GB of RAM, churning through 100s of TBs of data on each nightly run.
This can easily amount to hundreds of thousands, or millions of dollars just
in processing costs, for jobs not only need to be thrown away and rerun, but
that have also negatively affected all downstream jobs. In turn, those jobs
also need to be rerun, incurring their own processing costs.
But processing costs are certainly not the only factor. The business
decisions made off of that bad data and their subsequent effects can ruin a
business. While I will do my past employers a favor and explicitly state that
“this did not happen there”, I have been privy to details on one scenario
where a company had misbilled its customers collectively by several
million dollars, in some cases by too much, and in others by too little. The
cause of this was actually quite innocent and well-intentioned: a somewhat
long chain of data sets created by reaching in and grabbing data, coupled
with some schema changes that weren’t detected by the big data processing
teams that prepared the data for billing. It was only when a customer
noticed that their billing costs far exceeded their engagement costs that an
investigation was kicked off. And this is only one of many that I have been
witness to.
I will readily acknowledge that a single person’s experience is certainly
insufficient to sway a skeptic. I will, however, guarantee you that I am
thoroughly convicted that difficult to obtain data is one of the primary
causes of bad data. If you are someone in big data analytics, I suspect you
have found this to be evident in your own personal experience. If you are a
developer working on a critical operational business application, you may
want to check with your analytics team, along with other operational teams
who are dependent on your data, to see what their experiences are like in
accessing it. Fortunately for me, there are others who have researched this
field to figure out just how much bad data costs businesses. The results are
staggeringly high.
In 2016, one report by IBM, as highlighted by the Harvard Business
Review (HBR) put an estimate of the financial impacts of bad data at 3.1
Trillion US Dollars, in the USA alone. Though the original report is
(frustratingly) no longer available, HBR has extracted some of the more
relevant numbers related to the time spent by those trying to use data. To
paraphrase:
50% — the amount of time that knowledge workers waste hunting for
data, finding and correcting errors, and searching for confirmatory
sources for data they don’t trust.
60% — the estimated fraction of time that data scientists spend
cleaning and organizing data.
As I was researching this subject, I came across an even older HBR article
from 2013:
“Studies show that knowledge workers waste up to 50% of time hunting for
data, identifying and correcting errors, and seeking confirmatory sources for
data they do not trust. … fixing the problem is not as hard as many might
think. The solution is not better technology: It’s better communication
between the creators of data and the data users;”
The problem of bad data has existed for a very long time. Data copies
diverge as their original source changes. Copies get stale. Errors detected in
one dataset are not fixed in other duplicate datasets. Domain knowledge
related to interpreting and understanding data remains patchy, as does
support from the owners of the original data.
By promoting data to a first-class citizen, as a product like any other, we
can eliminate many of the root causes of bad data. A data product with a
well-defined schema, domain documentation, standardized access
mechanisms, and service level agreements can substantially reduce the
impact of bad data right at the source. Consumers, once coupled on the data
product, may still make their own business logic mistakes - this is
unavoidable. They will, however, seldomly make any more inadvertent
mistakes in merely trying to acquire, understand, and interpret the data they
need to solve their business problems. Inaction is not a solution.
One more problem to solve: Unifying
analytical and operational workflows
It’s clear that batch big data, as it’s conventionally set up, has some
problems that remain to be solved. But there’s one more problem, that sits
at the heart of engineering - it’s not just the data team that has these data
access and quality problems. Every single OLTP application that needs data
stored in another database has the same data access problems as the data
team. How do you access important business data, locked away in another
service, for operational concerns?
There have been several previous attempts at enabling better operational
communication between services, including service-oriented architecture,
enterprise service buses, and of course, point-to-point request/response
microservices. But in each of these architectures, the service’s data is
encapsulated within its own database, and is out of reach to other services.
In one way this is good - the internal model is sheltered, and you have a
single source of truth. Applications provide operational APIs that other
applications can call for it to do work on its behalf. But this also doesn’t
resolve the fundamental issue of wholesale access to definitive data sets for
teams to use for their own business purposes. Failing to provide this is also
not an option, as illustrated by the decades of big data’s “reach in and grab
it” strategy and the numerous substantial expenses that come with it.
A further complication is that many operational use-cases nowadays depend
on analytical results. Think machine learning, recommendation engines, AI,
etc. Some use cases, such as producing a monthly report of top selling
products, can very clearly be labelled as “analytical”, to be derived from a
Hadoop inquiry. Other use cases are not so clear cut. Consider an e-
commerce retailer that wants to advertise shoes based on current inventory
(operational), previous user purchases (analytical), and the user’s realtime
estimated shopping session intentions (analytical & operational). In
practice, the boundary between operational and analytical is seldom neatly
defined, and the exact same data set may be needed for a multitude of
purposes, analytical, operational, or somewhere in between.
Both big data analytics and conventional operational systems have
substantial difficulty in accessing data sets are contained within other
databases. These difficulties are further exacerbated by the increasing
volume, velocity, and scale of data, while systems are simultaneously
forced to scale outwards instead of upwards as compute limitations of
individual services are reached. The data communication strategies of most
organizations are based on yesterday’s technology and fail to account for
the offerings of modern cloud storage, computing, and software as a
service. These tools and technologies have changed the way that data can be
modeled, stored, and communicated across an organization, which we will
examine in more detail throughout the remainder of this book.
NOTE
Silos form as teams grow. While we have seen that there is often a divide between the
engineering and data teams, divisions will also certainly exist within the larger team.
Take a look around your own organization, and go talk to people in other silos. Ask
them how they get the data they need to do their business operations, as well as if they
know all of the customers who are consuming their data. You may learn something from
their answers.
How Do We Resolve All Of These Data
Issues?
The premise of the solution is simple. Publish important business facts to
dedicated, durable, and replayable event streams. These streams become a
fundamental source of truth for operational, analytical, and all other forms
of workloads across the organization. Producers of this data are responsible
for the modeling, evolution, and quality of the data provided in the event
stream, i treating it as a first-class citizen, on par with any other product in
the organization. Prospective consumers can explore, discover, and
subscribe to the event streams they need for their business use-cases. The
event streams should be well-described, easy to interpret, and form the basis
for a set of self-updating data primitives for powering both business
services and analytics.
This architecture is built by leveraging modern cloud computing and
Software-as-a-Service (SaaS) options, as we shall see when we cover
building a self-service platform. A good engineering stack makes it easy to
create and manage applications throughout their lifecycle, including
acquiring compute resources, providing scalability, logging, and monitoring
capabilities. Event streams provide the modern engineering stack with the
formalized and standardized access to the data it needs to get things done.
Let’s revisit the monolith data principles from earlier in this chapter
through the lens of this proposal. These three principles outline the major
influences for colocating new business functionality within a monolith.
How would a set of self-updating event streams relate to these principles?
The Database is the Source of Truth → The Event Stream is the Source
of Truth
The owner of the data domain is now responsible for composing an
external-facing model and writing it as a set of events to one (or more)
event streams. In exchange, other services can no longer directly access
and couple on the internal data model, and the producer is no longer
responsible for serving tailored business tasks on behalf of the querying
service, as is often the case in a microservices architecture. The event
stream becomes the main point of coupling between systems.
Downstream services consume events from the event stream, model it
for their purposes, and store it in their own dedicated data stores.
Data is Strongly Consistent → Data is Eventually Consistent
The event stream producer can retain strong read-after-write consistency
for its own internal state, along with other database benefits such as
local ACID transactions. Consumers of the event stream, however, are
independent in their processing of events and modeling of state, and
thus rely on their own eventually consistent view of the processed data.
A consumer does not have write-access to the event stream, and so
cannot modify the source of data. Consumer system designs must
account for eventual consistency, and we will be exploring this subject
in greater detail later in this book.
Read-Only Data is readily available (remains unchanged!)
Event streams provide the formalized mechanism for communicating
data in a read-only, self-updating format, and consumers no longer need
to create, manage, and maintain their own extraction mechanism. If a
consumer application needs to retain state, then it does so using its own
dedicated data store, completely independent of the producer’s database.
Data mesh formalizes the ownership boundaries of data within an
organization and standardizes the mechanisms of storage and
communication. It also provides a reusable framework for producing,
consuming, modeling, and using data, not only for current systems, but also
for systems yet to be built.
Common Objections to an Event-Driven Data
Mesh
There are several common objections that I have frequently encountered
when sharing discussing an event-driven data mesh with others. Though we
will cover these situations in more detail throughout the book, I wanted to
bring it up now to acknowledge that these objections do exist, but that each
one of them is manageable.
Producers cannot model data for everyone’s use cases
This argument is actually true, though it misses the point. The main duty of
the producer is to provide an accurate and reliable external public model of
their domain data for consumer use. These data models only need to expose
the parts of the domain that other teams can couple on; The remainder of
their internal model remains off limits. For example, an ecommerce domain
would have independent sales, an item, and an inventory models
and event streams, simply detailing the current properties and values of
each sale, item, and inventory level, whereas a shipping company may have
event streams for each of shipment, truck, and driver.
These models are deliberately simple and hyper focused around a single
domain definition, resulting in tight, modular data building blocks that other
systems can use to build their own data models. Consumers that ingest these
events can restructure them as needed, including joining them with events
from other streams or merging with existing state, to derive a model that
works for solving their business use-cases. Consumers can also engage the
producer teams, requesting additional information be added to the public
model, or for clarification on certain fields and values.
As the producer team owns the original data model, they are the most
qualified to decide what aspects of it they should expose and allow others to
couple on. In fact, there is no other team more qualified than the team that
actually creates the original source of data in defining what it actually
means, and how others should interpret what its fields, relationships and
values mean. This approach lets the data source owners abstract away their
internal complexities, such as their highly normalized relational model or
document store. Changes to the internal source model can be hidden from
consumers that would otherwise have coupled directly on it, reducing
breakages and errors.
Making multiple copies of data is bad
This objection, ironically, is implicitly in opposition of the first argument.
Though just like the previous argument, it does have a grain of truth.
Multiple copies of the same data set can and do inadvertently get out of
sync, become stale, or otherwise provide a source of data that is in
disagreement with the original source. However, our proposal is not to
make copying data a free for all, but rather to a formalized and well-
supported process that establishes clear rules and responsibilities,
embracing this reality rather than hiding from it.
There are three main subtypes of this argument.
There should only be a single master copy of the data, and all
systems should reference it directly
This belief fails to account for the fact that big data analytics teams
worldwide have already been violating this principle since the dawn of the
big data movement (and really, OLAP in general), because their needs
cannot be met by a single master copy, stored in a single database
somewhere. It also fails to account for the various needs of other
operational systems, which follow the same boundary-breaching data
acquisition strategies. It’s simply non-tenable.
Insufficiency of the source system to model its data for all business use-
cases is a prime reason why multiple copies of the same data set will
eventually exist. One system may need to support ACID transactions in a
relational model, whereas a second system must support a document store
for geo-location and plain-text search. A third consumer may need to write
these datasets to HDFS, to apply MapReduce style processing to yield
results from the previous 364 copies of that data it made, cross-referenced
to other annual datasets. All of these cannot be served from a single central
database, if not just for the modeling, then for the impossibility of
satisfactory performance for all use cases.
It’s too computationally expensive to create, store, and update
multiple copies of the same data
This argument is hyper-focused on the fact that moving and storing data
costs money, and thus storing a copy of the same data is wasteful
(disregarding factors such as remodeling and performance, of course). This
argument fails to account for the inexpensiveness of cloud computing,
particularly the exceptionally cheap storage and network costs of today’s
major cloud providers. It also fails to account for the developer-hours
necessary to build and support custom ETL pipelines, part of the multi-
trillion dollar inefficiencies in creating, finding, and using data.
Optimizing for minimizing data transfer, application size, and disk usage
are no longer as important as they once were for the vast majority of
business applications. Instead, the priority should be on minimizing
developer efforts for accessing data building blocks, with a focus on
operational flexibility.
Managing infosec policies across systems and distributd
datasets is too hard
I will acknowledge that it is challenging. However, the principles of data
mesh, which we will get into in the next chapter, acknlowedge this and
make it a responsibility of the implementers to add this in from the start.
One of the bigger difficulties in infosec management is in applying policies
to an already existing (and usually sprawling) distributed data architecture.
By formalizing these requirements ahead of time, creating necessary self-
service tooling, and making infosec adherence a minimal barrier to entry for
participation in the data mesh, we tame the complexity of this problem and
make it tenable.
Eventual Consistency is too difficult to manage
Data communicated through event streams does require consideration of
and planning for eventual consistency. However, the complaint that
eventual consistency is too much of a barrier is typically founded on a
misunderstanding of how much of an impact it can have on business
processes as a whole. We can properly define our system boundaries to
account for eventual consistency between systems, while having access to
strong consistency within a system. There’s no getting around it - if a
certain business process needs perfect consistency, then the creation and
usage of the data must be within the same service boundary. But the
majority of business processes don’t need this, and for those that do,
nothing we’re proposing in this book precludes you from obtaining it. We’ll
be discussing how to handle eventual consistency in more detail later in this
book.
NOTE
Eventual consistency is a property of copying data from one data store to another.
Regardless of how it is done, the copy immediately begins to go stale. An event-stream
approach embraces this by enabling a simple mechanism to see just how up to date a
data set really is - checking the topic to see if there are any new events to process. If the
topic is empty, then we’re up to date. Though this mechanism is subject to network
failures, outages, and other delays, it is far superior to periodically polling a system’s
database every 15 minutes for the rows updated since the last periodic query.
Figure 1-7. Data is stale as soon as a copy is made. Cross referencing the copies will give you
inconsistent results
Chapter Summary
Existing data communication strategies fall flat in the face of real business
requirements. Breaching a service’s boundary by reaching in to grab its data
is not a sustainable practice, but it is extremely common, and often supports
multiple critical systems and analytics workflows. Restructuring your
systems into neat modular microservices does not solve the problem of data
access; Other parts of your business, such as the big data analytics and
machine learning teams will still require wholesale access to both current
and historical data from domains across the organizations. One way or
another, copies of data will be created, and we can either fight this or
embrace this fact and work to make it better. In choosing the latter, we can
use event streams to standardize and simplify the communication of data
across the organization as self-updating single sources of truth.
Events form the basis of communication in event-driven architectures, and
fundamentally shape the space in which we solve our problems. Events, as
delivered through event-streams, form the building blocks for building
asynchronous and reactive systems. These building blocks are primitives
that are similar to synchronous APIs: other applications can discover them,
couple on them, and use them to build their own services. Eventual
consistency, consumer-specific models, read-only replicas, and stream
materializations are just some of the concepts we’ll explore in this book,
along with the roles that modern cloud compute, storage, and networking
resources, have in this new data architecture.
The following chapters will dig deeper into building and using an event-
driven data mesh. We’ll explore how to design events, including state,
action, and notification events, as well as patterns for producing and
consuming them. This book covers handling events at scale, including
multi-cluster and multi-region, best practices for privacy and regulatory
compliance, as well as principles for handling eventual consistency and
asynchronous communication. We’ll explore the social and cultural changes
necessary to accommodate an event-driven data mesh, and look at some
real-world case studies highlighting the successes and lessons learned by
others.
Finally, we’ll also look at the practical steps you can take to start building
towards this in your own organization. One of the best things about this
architecture is that it’s modular and incremental, and you can start
leveraging the benefits in one sector of your business at a time. While there
are some initial investments, modern cloud compute and software as a
service solutions have all but eliminated the barriers to entry, making it far
easier to get started and test if this is the right solution for you.
Chapter 2. Designing Events
A NOTE FOR EARLY RELEASE READERS
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take
advantage of these technologies long before the official release of these
titles.
This will be the 7th chapter of the final book.
If you have comments about how we might improve the content and/or
examples in this book, or if you notice missing material within this
chapter, please reach out to the editor at mpotter@oreilly.com.
As we saw in the self-service platform chapter, you can extract and publish
a data product to an event stream by using a self-service connector
framework, such as Debezium or Kafka Connect. But as you and your peers
become more comfortable using event streams, you’re going to come to a
point where you can start natively generating your own events instead of
using a connector. This chapter covers how to design events in such a way
that make it easy for others to use and apply them, and how to avoid the
numerous pitfalls that you will encounter along the way.
Introduction to Event Types
There are two main types of events that underpin all of event design: The
state event and the action event. As an example, Figure 2-1 shows a simple
square wave in steady state, periodically altered by an action to result in a
new state.
Building an Event-Driven Data Mesh (Early Release) Adam Bellemare
Figure 2-1. State and Action during a change
There are three stages to any occurrence in a system:
1. The initial state
2. The action that alters the initial state to produce the final state
3. The final state (which is also the initial state for the next change cycle)
The vast majority of events we encounter can be ascribed to either a “state”
event or an “action” event. Though there are nuances to this (such as a state
event that includes some elements of actions), this simple division of types
proves to be quite useful when discussing events with colleagues, coming
up with designs, and framing the purpose that events have in the solution
space in question.
State Events
These form the foundation of the majority of event streams used in
inter-domain communication. State events fully describe the state of an
entity at a given point in time. We first encounted state events when we
covered extracting domain data with a change-data capture service. In
this chapter we’ll build on what we’ve learned so far, and dig deeper
into precisely why state events remain the best choice for
communicating data products between domains.
Action Events
These describe the transition between states, are best reserved for the
purposes of event sourcing within the domain. They do, however, have
the distintion of being the event definition type that an inexperienced
developer or architect first tries to use for inter-domain communication.
We’ll explore why this is the case, and the pitfalls and hazards that
make these unsuitable for such choices.
For completeness, we’re also going to look into a few more event types to
see if, and how, you should use them in your event stream data products.
These include:
Measurement events as used for collecting user behaviour metrics,
Internet of Things (IoT) communications, and system monitoring
Notification events as used for indicating a process or event has
completed, with minimal infomation within the event itself
Let’s take a look at state events first.
State Events with Event-Carried State
Transfer
A state event contains the entire public state of a specific entity at the time
the event was created. It does not contain any state that is private to the
source domain, but only the data that has been publicly declared as part of
the data contract. State events enable Event-Carried State Transfer (ECST),
where a read-only model of the state can be recreated and processed by any
consumer who needs it.
State events can contain just the “now” state, or they may contain the
“before/after” state, as we saw with change-data capture. Both of these
options have their own advantages and disadvantages, which we’ll examine
in turn. But for starters, let’s take a look at how each of these options affects
compaction of event streams.
There are three main design strategies for defining the structure and
contents of ECST events:
Current State
These contain the full public state at the moment the event was created.
Before/After State
These contains both the full public state before the event occurred, and
the full public state after the event occurred.
Hybrid State with Action
These contain either the current state or before/after, but also contains
some action information as to why the event happened.
Lets look into each of these in detail to get a better understanding of their
tradeoffs.
Current State
In this design the event contains only the current state of the entity, and is
most common form of ECST definition. For example, an inventory
event for a given itemId will contain only the latest value for the
quantity in stock. Previous values of the itemId in that event stream
will remain until they are periodically compacted away.
This design has several main benefits:
Lean: It takes up a minimal amount of space in the event stream.
Network traffic is also minimized.
Simple: It relies on the event broker to store previous state, and
doesn’t represent it in the event itself (We’ll cover this more in the next
section on before/after). You can set the compaction policies
independently for each event stream, in case you need a really long
back log of historical events.
Compactable: Deleting older state is as easy as publishing an updated
record for a given key.
It also has a few nuances, that I wouldn’t quite call drawbacks:
Agnostic to why the state changed:
The downstream consumer of the ECST event is not given a reason why the
data has changed, only the new public state. The reason for this is simple: it
removes the ability of consumers to couple on the internal state transitions
Discovering Diverse Content Through
Random Scribd Documents
Building an Event-Driven Data Mesh (Early Release) Adam Bellemare
INDEX TO THE CHAPTERS.
Schiltberger to the Reader 1
1. Of the first combat between King Sigmund and the Turks 1
2. How the Turkish king treated the prisoners 4
3. How Wyasit subjugated an entire country 6
4. How Wyasit made war on his brother-in-law, and killed him 7
5. How Weyasit drives away the king of Sebast 10
6. What sixty of us Christians had agreed upon 10
7. How Wyasit took the city of Samson 12
8. Of serpents and vipers 12
9. How the Infidels remain in the fields with their cattle, in winter and
summer 14
10. How Weyasit took a country that belonged to the Sultan 18
11. Of the King-Sultan 19
12. How Temerlin conquered the kingdom of Sebast 20
13. Weyasit conquers Lesser Armenia 20
14. How Tämerlin goes to war with the King-Sultan 22
15. How Tämerlin conquered Babiloni 24
16. How Tämerlin conquered Lesser India 24
17. How a vassal carried off riches that belonged to Tämerlin 26
18. How Tämerlin caused MMM children to be killed 27
19. Tämerlin wants to go to war with the Great Chan 28
20. Of Tämerlin’s death 29
21. Of the sons of Tämerlin 30
22. How Joseph caused Mirenschach to be beheaded, and took possession
of all his territory 31
23. How Joseph vanquished a king and beheaded him 32
24. How Schiltberger came to Aububachir 33
25. Of a king’s son 33
26. How one lord succeeds another lord 36
27. Of an Infidel woman, who had four thousand maidens 37
28. In what countries I have been 38
29. In which countries I have been, that lay between the Tonow and the sea 39
30. Of the castle of the sparrow-hawk, and how it is guarded 41
31. How a poor fellow watched the sparrow-hawk 42
32. xxxii More about the castle of the sparrow-hawk 42
33. In which countries silk is grown, and of Persia and of other kingdoms 44
34. Of the tower of Babilony that is of such great height 46
35. Of great Tartaria 48
36. The countries in which I have been, that belong to Tartary 49
37. How many kings-sultan there were, whilst I was amongst the Infidels 51
38. Of the mountain of St. Catherine 54
39. Of the withered tree 56
40. Of Jherusalem and of the Holy Sepulchre 57
41. Of the spring in Paradise, with IIII rivers 61
42. How pepper grows in India 61
43. Of Allexandria 62
44. Of a great giant 64
45. Of the many religions the Infidels have 65
46. How Machmet and his religion appeared 65
47. Of the Infidels’ Easter-day 70
48. Of the other Easter-day 71
49. Of the law of the Infidels 71
50. Why Machmet has forbidden wine to Infidels 72
51. Of a fellowship the Infidels have among themselves 73
52. How a Christian becomes an Infidel 74
53. What the Infidels believe of Christ 75
54. What the Infidels say of Christians 76
55. How Christians are said not to hold to their religion 77
56. How long ago it is, since Machmet lived 78
57. Of Constantinoppel 79
58. Of the Greeks 80
59. Of the Greek religion 81
60. How the city of Constantinoppel was built 83
61. How the Jassen have their marriages 85
62. Of Armenia 86
63. Of the religion of the Armenians 87
64. Of a Saint Gregory 89
65. Of a dragon and a unicorn 90
66. Why the Greeks and Armani are enemies 96
67. Through which countries I have come away 99
The Armenian Pater Noster 102
The Tartar Pater Noster 102
Building an Event-Driven Data Mesh (Early Release) Adam Bellemare
SCHILTBERGER TO THE READER.
I, Johanns Schiltberger, left my home near the city of Munich, situated
in Payren, at the time that King Sigmund of Hungary left for the land
of the Infidels. This was, counting from Christ’s birth, in the thirteen
hundred and ninety-fourth year,1 with a lord named Leinhart
Richartingen. And I came back again from the land of the Infidels,
counting from Christ’s birth, fourteen hundred and twenty seven. All
that I saw in the land of the Infidels, of wars, and that was
wonderful, also what chief towns and seas I have seen and visited,
you will find described hereafter, perhaps not quite completely, but I
was a prisoner and not independent. But so far as I was able to
understand and to note, so have I [noted] the countries and cities as
they are called in those countries, and I here make known and
publish many interesting and strange adventures, which are worth
listening to.
1Neumann states in a note that this date, through the
transcriber’s error, appears as 1344 in the Heidelberg MS.
1.—Of the first combat between King Sigmund and
the Turks.
From the first, King Sigmund appealed in the above-named year,
thirteen hundred and ninety-four, to Christendom for assistance, at
the time that the Infidels were doing great injury to Hungern. There
came many people from all countries to help him;(1) then he took
the people and led them to the Iron Gate, which separates Ungern
from Pulgery and Walachy, and he crossed the Tunow into Pulgary,
and made for a city called Pudem.(2) It is the capital of Pulgery.
Then came the ruler of the country and of the city, and gave himself
up to the king; then the king took possession of the city with three
hundred men, good horse and foot soldiers, and then went to
another city where were many Turks. There he remained five days,
but the Turks would not give up the city; but the fighting men
expelled them by force, and delivered the city to the king. Many
Turks were killed and others made prisoners. The king took
possession of this city also, with two hundred men, and continued
his march towards another city called Schiltaw, but called in the
Infidel tongue, Nicopoly.(3) He besieged it by water and by land for
XVI days, then came the Turkish king, called Wyasit, with two
hundred thousand men, to the relief of the city. When the king,
Sigmund, heard this, he went one mile to meet him with his people,
the number of whom were reckoned at sixteen thousand men. Then
came the Duke of Walachy, called Werterwaywod,1(4) who asked the
king to allow him to look at the winds.2 This the king allowed, and
he took with him one thousand men for the purpose of looking at
the winds, and returned to the king and told him that he had looked
at the winds, and had seen twenty banners, and that there were ten
thousand men under each banner, and each banner was separate
from the other. When the king heard this, he wanted to arrange the
order of battle. The Duke of Walachy asked that he might be the
first to attack, to which the king would willingly have consented.
When the Duke of Burguny heard this, he refused to cede this
honour to any other person, for the just reason that he had come a
great distance with six thousand men,(5) and had expended much
money in the expedition, and he begged the king that he should be
the first to attack. The king asked him to allow the Ungern to begin,
as they had already fought with the Turks, and knew better than
others how they were armed. This he would not allow to the
Ungern, and assembled his men, attacked the enemy, and fought his
way through two corps; and when he came to the third, he turned
and would have retreated, but found himself surrounded, and more
than half his horsemen were unhorsed, for the Turks aimed at the
horses only, so that he could not get away, and was taken prisoner.
When the king heard that the Duke of Burgony was forced to
surrender, he took the rest of the people and defeated a body of
twelve thousand foot soldiers that had been sent to oppose him.
They were all trampled upon and destroyed, and in this engagement
a shot killed the horse of my lord Lienhart Richartinger; and I, Hanns
Schiltberger his runner, when I saw this, rode up to him in the crowd
and assisted him to mount my own horse, and then I mounted
another which belonged to the Turks, and rode back to the other
runners. And when all the [Turkish] foot-soldiers were killed, the
king advanced upon another corps which was of horse. When the
Turkish king saw the king advancing, he was about to fly, but the
Duke of Iriseh, known as the despot,(6) seeing this, went to the
assistance of the Turkish king with fifteen thousand chosen men and
many other bannerets, and the despot threw himself with his people
on the king’s banner and overturned it; and when the king saw that
the banner was overturned and that he could not remain, he took to
flight.3 Then came he of Cily,4 and Hanns, Burgrave of Nuremberg,
who took the king and conducted him to a galley on board of which
he went to Constantinoppel. When the horse and foot soldiers saw
that the king had fled, many escaped to the Tünow and went on
board the shipping; but the vessels were so full that they could not
all remain, and when they tried to get on board they struck them on
the hands, so that they were drowned in the river; many were killed
on the mountain as they were going to the Tunow. My lord Lienhart
Richartinger, Wernher Pentznawer, Ulrich Kuchler, and little Stainer,
all bannerets, were killed in the fight, also many other brave knights
and soldiers. Of those who could not cross the water and reach the
vessels, a portion were killed; but the larger number were made
prisoners. Among the prisoners were the Duke of Burgony(7) and
Hanns Putzokardo,5 and a lord named Centumaranto.6 These were
two lords of France, and the Great Count of Hungern. And other
mighty lords, horsemen, and foot-soldiers, were made prisoners, and
I also was made a prisoner.
1This name appears as Martin in edition of 1814; Merter Waywod
in edition of 1475; and Merte Weydwod in that of 1549.
2To reconnoitre. In the edition of 1814 the term employed is “zu
recognosciren”.
3The battle of Nicopolis was fought September 28th, 1396.
4Herman of Cily. N.
5Boucicault, who has described the battle in his Memoirs. H.
6Saint Omer. F.
2.—How the Turkish king treated the prisoners.
And now when the King Weyasat had had the battle, he went near
the city where King Sigmund had encamped with his army, and then
went to the battle-field and looked upon his people that were killed;
and when he saw that so many of his people were killed, he was
torn by great grief, and swore he would not leave their blood
unavenged, and ordered his people to bring every prisoner before
him the next day, by fair means or foul. So they came the next day,
each with as many prisoners as he had made, bound with a cord. I
was one of three bound with the same cord, and was taken by him
who had captured us. When the prisoners were brought before the
king, he took the Duke of Burgony that he might see his vengeance
because of his people that had been killed. When the Duke of
Burgony saw his anger, he asked him to spare the lives of several he
would name; this was granted by the king. Then he selected twelve
lords, his own countrymen, also Stephen Synüher and the lord
Hannsen of Bodem.(1) Then each was ordered to kill his own
prisoners, and for those who did not wish to do so the king
appointed others in their place. Then they took my companions and
cut off their heads, and when it came to my turn, the king’s son saw
me and ordered that I should be left alive, and I was taken to the
other boys, because none under xx years of age were killed, and I
was scarcely sixteen years old. Then I saw the lord Hannsen Greiff,
who was a noble of Payern, and four others, bound with the same
cord. When he saw the great revenge that was taking place, he cried
with a loud voice and consoled the horse- and foot-soldiers who
were standing there to die. “Stand firm”, he said, “when our blood
this day is spilt for the Christian faith, and we by God’s help shall
become the children of heaven.” When he said this he knelt, and
was beheaded together with his companions. Blood was spilled from
morning until vespers, and when the king’s counsellors saw that so
much blood was spilled and that still it did not stop, they rose and
fell upon their knees before the king, and entreated him for the sake
of God that he would forget his rage, that he might not draw down
upon himself the vengeance of God, as enough blood was already
spilled. He consented, and ordered that they should stop, and that
the rest of the people should be brought together, and from them he
took his share and left the rest to his people who had made them
prisoners. I was amongst those the king took for his share, and the
people that were killed on that day were reckoned at ten thousand
men. The prisoners of the king were then sent to Greece to a chief
city called Andranopoli, where we remained prisoners for fifteen
days. Then we were taken by sea to a city called Kalipoli;(2) it is the
city where the Turks cross the sea, and there three hundred of us
remained for two months confined in a tower. The Duke of Burgony
also was there in the upper part of the tower with those prisoners he
had saved; and whilst we were there, the King Sigmund passed us
on his way to Windischy land.(3) When the Turks heard this, they
took us out of the tower and led us to the sea, and one after the
other they abused the king and mocked him, and called to him to
come out of the boat and deliver his people; and this they did to
make fun of him, and skirmished a long time with each other on the
sea. But they did not do him any harm, and so he went away.
3.—How Wyasit subjugated an entire country.
On the third day after the Turkish king had killed the people and
sent us prisoners to the above named city, he marched upon Ungern
and crossed the river called the Saw, at a city called Mittrotz, and
took it and all the country around; and then he went into the Duchy
of Petaw, and took with him from the said country sixteen thousand
men with their wives and children and all their property, and took
the city of the above name and burnt it; and the people he took
away and some he left in Greece.1(1) And after he passed the river
called the Saw, he sent orders to Karipoli that we were to be taken
across the sea; and when we were taken across the sea, we were
taken to the king’s capital called Wursa, where we remained until he
himself came. And when he arrived in the city he took the Duke of
Burgony and those the duke had saved, and lodged them in a house
near to his palace. The king then sent a lord named Hoder of
Ungern, with sixty boys, as a mark of honour to the king-sultan;(2)
and he would have sent me to the king-sultan, but I was severely
wounded, having three wounds, so for fear I might die on the way I
was left with the Turkish king. Other prisoners were sent as an
offering to the king of Babilony(3) and the king of Persia,(4) also into
White Tartary,2(5) into Greater Armenia,(6) and also into other
countries. I was taken to the palace of the Turkish king; there for six
years I was obliged to run on my feet with the others, wherever he
went, it being the custom that the lords have people to run before
them. After six years I deserved to be allowed to ride, and I rode six
years with him, so that I was twelve years with him; and it is to be
noted what the said Turkish king did during these twelve years, all of
which is written down piece by piece.
1Styrian historians have overlooked this statement of Schiltberger.
N.
2White Tartars, i.e., Free Tartars. White signifies free in the Tartar
and Russian tongues; black, on the contrary, signifies subject-
races or those that are tributary. N.
4.—How Wyasit made war on his brother-in-law, and
killed him.
From the first he was at war with his brother-in-law, who was called
Caraman, and this name he had because of his country. The capital
of the country is called Karanda,(1) and because he would not be
subject to him, he marched upon him with one hundred and fifty
thousand men. When he knew that King Weyasit had advanced, he
went to meet him with seventy thousand men, the best he had in
the land, and with whom he intended to resist the king. They met
each other on the plain in front of the city called Konia, which
belonged to the said lord, Caraman. Here they attacked each other
and began to fight, and had on the same day two encounters by
which one tried to overcome the other, and both sides had rest at
night, that one might not do harm to the other. That same night
Karaman made merry with trumpets, with drums, and with his
guards, with the object of causing alarm to Weyasit; but Weyasit
arranged with his people that they should not make a fire except for
cooking, and should immediately again put it out. At night he sent
thirty thousand men to the rear of the enemy, and said to them that
when he should attack in the morning they should also attack. When
the day broke, Weyasit went against the enemy, and the thirty
thousand men attacked in the rear as they were ordered, and when
Karaman saw that the enemy was attacking him in front and behind,
he fled into his city of Konia, and remained in it to defend himself.
Weyasit lay siege to the city for xi days without being able to take it;
then the citizens sent word to Weyasit that they would surrender the
city if he would secure to them their lives and property. To this he
agreed. Then they sent word to say that they would retire from the
walls when he came to storm, and thus he might take the city. And
this occurred. And when Karaman saw that Weyasit was entering the
city, he attacked him with his warriors, and fought with him in the
town, and if he had received the least assistance from the
inhabitants he would have forced Weyasit out of the city; but when
he saw that he had no assistance, he fled, but was taken before
Weyasit, who said to him: “Why wilt thou not be subject to me?”
Karaman answered, “Because I am as great a lord as thyself.”
Weyasit became angry, and asked three times if there was anybody
who would rid him of Karaman. At the third time came one who took
him aside and cut off his head and went back with it to Weyasit, who
asked what he had done with him? He answered, “I have beheaded
him.” Then he shed tears and ordered that another man should do
to him what he did to Karaman, and he was taken to the place
where he beheaded Karaman and he was also beheaded. This was
done because Weyasit thought that nobody should have killed so
mighty a lord, but should have waited until his lord’s anger had
passed away. He then ordered that the head of Karaman should be
fixed on a lance and carried about the country, so that other cities
might submit to him on hearing that their lord was killed. After this
he occupied the city of Konia with his people and marched upon the
city of Karanda, and called upon them to surrender as he was their
lord, and if they would not do so he would compel them with the
sword. Then the citizens sent out to him four of their most eminent
[fellow citizens], to beg that he would ensure to them their lives and
their property, and begged, as their lord Karaman was dead, and
they had two of his sons in the city, that he would appoint one of
them to be their lord; and should he do so, they would surrender to
him the city. He replied that he should spare their lives and property,
but when he would have possession of the city, he should know
what lord to appoint, whether the son of Karaman or one of his own
sons. And so they parted. When the citizens heard Weyasit’s answer
they would not give up the city, and said that although their lord was
dead he had left two sons, under whom they will recover or die. And
so they defended themselves against the king until the fifth day. And
as Weyasit saw that they continued to resist, he sent for more
people and ordered arquebuses to be brought, and platforms to be
constructed. When Karaman’s sons and their mother saw this, they
sent for the chief citizens and said to them: “You see plainly that we
cannot resist Weyasit, who is too powerful for us; we should be
sorry if you died for our sakes, and we have agreed with our mother
that we will trust to his mercy.” The citizens were pleased at this,
and the sons of Karaman and their mother, and the chief citizens of
the city, opened the gates and went out. And as they were
advancing, the mother took a son in each hand and went up to
Weyasit, who, when he saw his sister with her sons, went out of his
tent towards her, and when they were near him they threw
themselves at his feet, kissed them, and begged for mercy, and they
gave the keys of the gates and of the city. When the king saw this,
he ordered his lords who were near him to raise them. When this
was done he took possession of the city, and appointed one of his
lords to be governor, and he sent his sister and her two sons to his
capital called Wurssa.
5.—How Weyasit drives away the king of Sebast.(1)
There was a vassal named Mirachamad who resided in a city called
Marsüany; it was on the border of Karaman’s country. When
Mirachamad heard that King Weyasit had conquered Karaman’s
country, he sent to him to ask him to drive away also the king of
Sebast, who was called Wurthanadin, who had seized upon his
territory because he could not himself expel him, and he should give
him the territory in exchange for one in his own country. Weyasit
sent to his assistance his son Machamet with thirty thousand men,
and they forcibly expelled the king called Wurthanadin out of the
country.1 Then Mirachamad bestowed upon Machamet2 the capital
and all the territory, because his first engagement had been in its
behalf. Then Weyasit took Mirachamad with him to his own country,
and gave him another territory for his own.
1 1394.
3Mouhammed, a younger son of Bajazet.
6.—What sixty of us Christians had agreed upon.
And when Weyasit came to his capital, there were sixty of us
Christians agreed that we should escape, and made a bond between
ourselves and swore to each other that we should die or succeed
together; and each of us took time to get ready, and at the time we
met together, and chose two leaders from amongst ourselves by lot,
and whatever they ordered we were to obey. Then we rose after
midnight and rode to a mountain and came to it by daybreak. And
when we came to the mountain we dismounted, and let our horses
rest until sunrise, when we remounted and rode the same day and
night. And when Weyasit heard that we had taken to flight, he sent
five hundred horse with orders that we were to be found, that we
were to be caught, and brought to him. They overtook us near a
defile, and called to us to give ourselves up. This we would not do,
and we dismounted from our horses and defended ourselves against
them as well as we could. When their commander saw that we
defended ourselves, he came forward and asked for peace for one
hour. We consented. He came to us and asked us to give ourselves
up as prisoners; he would answer for the safety of our lives. We said
we would consult, and did consult, and gave him this answer: We
knew that so soon as we were made prisoners, we should die so
soon as we came before the king, and it would be better that we
should die here, with arms in our hands, for the Christian faith.
When the commander saw that we were determined, he again asked
that we should give ourselves as prisoners, and promised on his oath
that he would ensure our lives, and if the king was so angry as to
want to kill us, he would let them kill him first. He promised this on
his oath, and therefore we gave ourselves up as prisoners. He took
us before the king, who ordered that we should be killed
immediately; the commander went and knelt before the king, and
said that he had trusted in his mercy and had promised us our lives,
and asked him also that he should spare us because he had even
sworn that such would be the case. The king then asked him if we
had done any harm? He said: No. Then he ordered that we should
be put into prison; there we remained for nine months as prisoners,
during which time twelve of us died. And when it was the Easter-day
of the Infidels, his eldest son Wirmirsiana,1(1) begged for us, then
he set us free, and ordered that we should be brought to him; then
we were obliged to promise him that we should never try to escape
again, and he gave us back our horses and increased our pay.
1The Amir Souleiman. The other sons of Bajazet were
Mouhammed and Mousa.
7.—How Wyasit took the city of Samson.(1)
Afterwards, in the summer, Wyasit took eighty thousand men into a
country called Genyck, and lay siege to a capital called Samson. This
city was built by the strong man Samson, from whom it has its
name. The lord of the country was of the same name as the country,
Zymayd, and the king expelled the lord out of the land; and when it
was heard in the city that their lord was driven away, the people
gave themselves up to Weyasit, who occupied the city and all the
country with his people.
8.—Of serpents and vipers.
A great miracle is to be noted which took place near the said city of
Samson, during the time that I was with Weyasit. There came
around the city such a lot of vipers and serpents, that they took up
the space of a mile all round. There is a country called Tcyenick
which belongs to Sampson; it is a wooded country in which are
many forests. One part of the vipers came from the said forests, and
one part came out of the sea. The vipers remained for xi days, and
then they fought with each other, and nobody dared to leave the city
on account of the vipers, although they did no harm either to men or
to cattle. Then the lord of the city and of the country gave orders
that likewise no harm should be done to these reptiles, and said it
was a sign and a manifestation from Almighty God. And now on the
tenth day, the serpents and vipers fought with each other from
morning until the going down of the sun, and when the lord and the
people of the city saw what was done, the lord caused the gate to
be opened, and rode out with a few people out of the city, and
looked where the vipers were fighting, and saw that the vipers from
the sea had to succumb to those of the forests. And the next
morning early, the lord again rode out of the city to see if the
reptiles were still there; he found none but dead vipers, which he
ordered to be collected and counted. There were eight thousand. He
then ordered a pit to be made, and ordered all to be thrown in and
covered with earth, and he sent to Weyasit, who at that time was
lord in Turkey, to tell him of the marvel. He took it for a piece of
luck, as he had only just taken the city and country of Samson, and
almost rejoiced that the forest adders had succumbed to the sea
adders, and said it was a manifestation from Almighty God, and he
hoped that as he was a powerful lord and king of the sea-board, so
he would also, by the help of God the Almighty, become the
powerful lord and king of the sea. Samson consists of two cities
opposite to each other, and their walls are distant, one from the
other, an arrow’s flight. In one of these cities there are Christians,
and at that time the Italians of Genoa(1) possessed it. In the other
are Infidels to whom the country belongs. At that time the lord of
the city and country was a duke called Schuffmanes, son of [the
duke of] Middle Pulgrey, the chief city of which country is Ternowa,
(2) and who at that time had three hundred fortified towns, cities,
and castles. This country was conquered by Weyasit who took the
duke and his son. The father died in prison, and the son became
converted to the faith of the Infidels, so that his life might be
spared. Weyasit conquered Samson and the country, and conquered
Zyenick; and the city and the country he gave to him for his lifetime,
in place of his fatherland.
9.—How the Infidels remain in the fields with their
cattle, in winter and summer.
It is the custom among the Infidels for some lords to lead a
wandering life with their cattle, and when they come to a country
that has good pasturage, they rent it of the lord of the country for a
time. There was a Turkish lord called Otman, who wandered about
with his cattle, and in the summer came to a country called Tamast,
and the capital of the country is also so called. He asked the king of
Tamast, who was called Wurchanadin,(1) that he would lend him a
pasturage where he and his cattle might feed during the summer.
The king lent him such a pasturage, to which he went with his
dependants and cattle, and remained there the summer; and in
autumn he broke up and returned to his country, without the king’s
permission and knowledge; and when the king heard of this he
became very angry, and took one thousand men with him and went
to the pasturage that Otman had occupied, and encamped there,
and sent four thousand horsemen after Otman, and ordered that
they should bring back Otman alive, with all his belongings. And
when Otman heard that the king had sent after him, he hid himself
in a mountain, so that those who rode after him could not find him;
and they encamped on a meadow in front of the mountain where
Otman was with his people, and remained there that night without
troubling themselves about him. And when the day dawned, Otman
took one thousand of his best horsemen to look at the winds, and
when he saw that they were not on their guard, and were without
care, he rode towards them and suddenly took them by surprise, so
that they could not defend themselves, and many of them were
killed; the others took to flight. The king was told how Otman had
annihilated his expedition, but he would not believe it, and thought
that fun was being made of him, until some of them came running
to him. Even then he would not believe it, and sent one hundred
horsemen to see if such was the case; and when the hundred
horsemen went to see about it, Otman was on his way with his
people to attack the king; and when he saw the hundred horsemen
he overtook them, and came with them into the camp. And when
the king and his people saw that they were overtaken, and that they
could not defend themselves any more, they took to flight. The king
himself had scarcely time to mount his horse, and took to flight to a
mountain; but one of Otman’s servants saw him, and hastened after
him on the mountain; then the king could fly no farther, and the
soldier called upon him to surrender; but he would not give himself
up. Then he took his bow and would have shot him, when the king
made himself known and asked him to let him go, promising to give
him a fine castle, and he wanted to give him the ring he had on his
hand as a pledge. The soldier would not do so, and made him a
prisoner and brought him to his lord. And Otman pursued the people
all day until the evening, and killed many of them, and encamped
where the king had stayed, and sent for the people and cattle that
he had left to run about the mountains. And when the people came
with the cattle, he took the king, and went to the capital called
Tamastk, where he encamped with all his people, and sent word into
the city that he had captured the king, and that if they would deliver
to him the city, he would give peace and security. The city made this
answer: If he had their king, they had his son, and they had lords
enough, as he was too weak to be a lord. He then said to the king,
that if he wanted his life to be spared, he should speak to the
citizens that they give up the city. So they took him before the city,
and he asked the citizens that they should deliver him from death,
and give up the city to Otman. They replied: We will not give up the
city to Otman, because he is too feeble a lord for us; and if thou
shouldst no longer care to be our lord, we have thy son, whom we
will have for our lord. When Otman heard this, he was angry, and
seeing his anger, the king begged him to spare his life, promising to
give him the city of Gaissaria, with all its dependancies. This Otman
would not do, and he ordered the king to be beheaded in sight of
the people of the city, and ordered that afterwards he should be
quartered, each part being fixed on a stake stuck in the ground in
sight of the city, and the head on the point of a lance, together with
the four quarters. And whilst the king lay before the city, the king’s
son sent to his father-in-law, the powerful ruler of White Tartary,
that he should come to his assistance, because Otman had killed his
father and many others, and that he was before the city. And so
soon as his father-in-law heard this, he took with him all his people,
with their wives, children, and all their cattle, as is the custom of the
country, because he intended going to Tamast to deliver the country
from Otman, and his people were numbered at forty thousand men,
without including women and children. When Otman heard that the
Tartar king was approaching, he went with his people to the
mountains, where he encamped. The Tartar king encamped before
the city, and so soon as Otman heard of it, he took fifteen hundred
men and divided them into two parts, and when night came he
marched upon them on both sides with loud cries. When the Tartar
king heard of this, he thought they wanted to betray him, and fled
into the city, which, when his people heard, they also took to flight.
Otman pursued them and killed a great many, and captured much
booty. They returned to their country, and Otman took with him to
the mountain where he had left his cattle, the cattle and the booty
that he had taken from them. Before it was day, the Tartar king rode
after his people to make them turn back; this they would not do, so
he turned back again. Then Otman again lay siege to the city, and
invited them to give him the city, and he would do as he had
promised. This they would not do, and sent to beg Weyasit to come
and drive Otman out of the country, and they would surrender the
city to him. Weyasit sent his eldest son, with twenty thousand
horsemen and four thousand foot-soldiers, to the help of the town;
and I also was in this expedition. And when he heard that the son of
Weyasit was coming, he sent his property and cattle to the mountain
where he had been, and he himself remained in the plain with one
thousand horsemen. Then the king’s son sent two thousand
horsemen to see if they could find Otman; and when they saw
Otman, they attacked each other. And when they saw that they
could not overcome him, they sent for assistance. Then came
Weyasit’s son, with all his people. But when Otman saw him, he rode
against him, and would quickly have put him to flight, for the people
were not close together. The king’s son cried to his people, and they
began to fight, and they fought for three hours consecutively. And
when they were fighting with each other, four thousand foot-soldiers
attacked the tent of Otman, and when he heard this, he sent four
hundred horsemen, who, with the assistance of those who kept the
goods and cattle, expelled the foot-soldiers out of the tent. Otman
went with a force into the mountain, where his property was, and
sent it away, and remained during that time before the mountain.
Then the king’s son appeared before the city, and the citizens
opened the gates of Damastchk, and rode out and asked him to take
the city. This he would not do, and sent to his father, that he should
come and take the city and territory. He came with one hundred and
fifty thousand men, took the city and country, and gave them to his
son Machmet, and not to him who had expelled Otman from being
king of the city and country.(2)
10.—How Weyasit took a country that belonged to
the Sultan.
After Weyasit had installed his son in the kingdom, he sent to the
king-sultan in respect to a city called Malathea,(1) and the country
that belonged to the city, because the city and the country belonged
to the above-named kingdom which was in the possession of the
king-sultan, and therefore required that he should surrender the city
of Malathea and the territory, because he had conquered the
kingdom. The king-sultan sent word to him that he had won the
kingdom by the sword, and he who wished to have it must also win
it by the sword. When Weyasit received this answer, he went into
the country with two hundred thousand men, and lay siege to the
city for two months; and when he found that it would not surrender,
he filled up the ditches and surrounded the city with his people, and
began to storm. When they saw this they asked for mercy, and gave
themselves up. Then he took the city and the country, and occupied
it.
At about the same time, the White Tartars besieged the city called
Angarus, which belonged to Weyasit; and when he heard of this, he
sent to its assistance his eldest son with thirty-two thousand men.
He fought a battle, but he was obliged to return to Wyasit, who
ordered more men, and sent him back again. But he fought with
him, and took the Tartar lord and two vassals, and brought them as
prisoners to Weyasit, and thus the White Tartars gave themselves up
to Weyasit. He put another lord over them, took the three lords to
his capital, and then marched against another city called Adalia,1
which belonged to the sultan, and the city is not far from Zypern;
and in the country to which the city belongs, there are no other
cattle but camels. After Weyasit took the city and the country, the
country made him a present of ten thousand camels; and after he
occupied the city and the country, he took the camels into his own
country.
1Adalia or Satalia, on the sea-shore. William of Tyre so called the
chief city of Pamphylia. The town lies, as correctly stated,
opposite to Cyprus. N.
11.—Of the King-Sultan.
About this time died the king-sultan, named Warchhoch, and his son
named Joseph became king; but one of his father’s dependants went
to war with him for the kingdom. Then Joseph sent to Weyasit, and
became reconciled with him, and asked him that he should come to
help him. So he sent twenty thousand men to help him, in which
expedition I was also. Thus Joseph expelled his rival, and became a
powerful king.(1) After this it was told him, that five hundred of his
dependants were against him, and were in favour of his rival. He
ordered that they should be taken to a plain, where they were all cut
into two parts. Afterwards, we again returned to our lord, Weyasit.
12.—How Temerlin conquered the kingdom of
Sebast.
When Weyasit had expelled Otman from Tamast, as has already
been stated, he went to his lord named Tämerlin, to whom he was
subject, and complained of Weyasit, how he had driven him away
from the kingdom of Tamask, which he had conquered, and at the
same time asked him to help him to reconquer his kingdom.
Tämerlin said that he would send to Weyasit, to restore the country.
This he did, but Weyasit sent word that he would not give it up, for
as he had won it by the sword, it might as well be his as another’s.
So soon as Tämerlin heard this, he assembled ten hundred thousand
men, and conducted them into the kingdom of Sebast, and lay siege
to the capital, before which he remained XXI days, and he
undermined the walls of the city in several places, and took the city
by force, although there were in it five thousand horsemen sent by
Weyasit.(1) They were all buried alive in this way. When Tämerlin
took the city, the governor begged that he would not shed their
blood. To this he consented, and so they were buried alive. Then he
levelled the city, and carried away the inhabitants into captivity in his
own country. There were also nine thousand virgins taken into
captivity by Tämerlin to his own country.(2) Before he took the city,
he had at least three thousand men killed. Then he returned to his
own country.
13.—Weyasit conquers Lesser Armenia.
Scarcely had Tämerlin returned to his own country,(1) than Weyasit
assembled three hundred thousand men, and went into Lesser
Ermenia and took it from Tämerlin, and took the capital called
Ersingen, together with its lord who was named Tarathan,(2) and
then went back to his own country. So soon as Tämerlin heard that
Weyasit had conquered the said country, he went to meet him with
sixteen hundred thousand men; and when Weyasit heard this, he
went to meet him with fourteen hundred thousand men. They met
near a city called Augury, where they fought desperately. Weyasit
had quite thirty thousand men of White Tartary, whom he placed in
the van at the battle. They went over to Tämerlin; then they had
two encounters, but neither could overcome the other. Now Tämerlin
had thirty-two trained elephants at the battle, and ordered, after
mid-day, that they should be brought into the battle. This was done,
and they attacked each other; but Weyasit took to flight, and went
with at least one thousand horsemen to a mountain. Tämerlin
surrounded the mountain so that he could not move, and took him.1
Then he remained eight months in the country, conquered more
territory and occupied it, and then went to Weyasit’s capital and took
him with him, and took his treasure, and silver and gold, as much as
one thousand camels could carry; and he would have taken him into
his own country, but he died2 on the way3 (3). And so I became
Tämerlin’s prisoner, and was taken by him to his country. After this I
rode after him. What I have described took place during the time
that I was with Weyasit.
1July 20th, 1402.
2March 8th, 1403, at Aksheher.
3Schiltberger’s accounts agree perfectly with the statements
made by Byzantine and Eastern historians. We are forced to
conclude, after Hammer’s searching enquiries, that there is no
truth whatever in the story of Bajasid having been confined by
Timur in an iron cage. N.

More Related Content

PDF
Data Management at Scale Piethein Strengholt
PDF
A blueprint for data in a multicloud world
PDF
Top 10 guidelines for deploying modern data architecture for the data driven ...
DOCX
Read the Discussions below and give a good replyDiscussion 1..docx
PDF
The data center impact of cloud, analytics, mobile, social and security rlw03...
PDF
Cloud in the sky of Business Intelligence
DOCX
Mann assignment
PDF
Data Management Trends 2022_Shailendra Mruthyunjayappa.pdf
Data Management at Scale Piethein Strengholt
A blueprint for data in a multicloud world
Top 10 guidelines for deploying modern data architecture for the data driven ...
Read the Discussions below and give a good replyDiscussion 1..docx
The data center impact of cloud, analytics, mobile, social and security rlw03...
Cloud in the sky of Business Intelligence
Mann assignment
Data Management Trends 2022_Shailendra Mruthyunjayappa.pdf

Similar to Building an Event-Driven Data Mesh (Early Release) Adam Bellemare (20)

PDF
Analytics as a Service in SL
PDF
IRJET - Cloud Computing Over Traditional Computing
PDF
Data foundation for analytics excellence
PDF
The Cloud Data Lake Early Release Rukmani Gopalan
PPTX
Should business move to the cloud
PDF
pwc-data-mesh.pdf
PDF
Data Management at Scale, Second Edition Piethein Strengholt
PDF
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
PDF
The Evolving Role of the Data Engineer - Whitepaper | Qubole
PDF
From Benefits to Challenges A Guide on Cloud Software Development.pdf
PDF
Evolving Challenges and Best Practices in Modernizing Your Cloud Data Governa...
PDF
What is cloud computing report
PDF
The Growth Of Data Centers
PPTX
Going to the SP2013 Cloud - what does a business need to make it successful?
PPT
Cloud Data Integration Best Practices
PPTX
Cloud Computing .pptx
DOCX
Cloud Computing Applications and Benefits for Small Businesses .docx
PDF
Cloud computing Paper
PDF
5 Steps for Architecting a Data Lake
Analytics as a Service in SL
IRJET - Cloud Computing Over Traditional Computing
Data foundation for analytics excellence
The Cloud Data Lake Early Release Rukmani Gopalan
Should business move to the cloud
pwc-data-mesh.pdf
Data Management at Scale, Second Edition Piethein Strengholt
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
The Evolving Role of the Data Engineer - Whitepaper | Qubole
From Benefits to Challenges A Guide on Cloud Software Development.pdf
Evolving Challenges and Best Practices in Modernizing Your Cloud Data Governa...
What is cloud computing report
The Growth Of Data Centers
Going to the SP2013 Cloud - what does a business need to make it successful?
Cloud Data Integration Best Practices
Cloud Computing .pptx
Cloud Computing Applications and Benefits for Small Businesses .docx
Cloud computing Paper
5 Steps for Architecting a Data Lake
Ad

Recently uploaded (20)

PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Classroom Observation Tools for Teachers
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Computing-Curriculum for Schools in Ghana
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
01-Introduction-to-Information-Management.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Classroom Observation Tools for Teachers
A systematic review of self-coping strategies used by university students to ...
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Computing-Curriculum for Schools in Ghana
202450812 BayCHI UCSC-SV 20250812 v17.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
GDM (1) (1).pptx small presentation for students
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Final Presentation General Medicine 03-08-2024.pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
Microbial disease of the cardiovascular and lymphatic systems
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
01-Introduction-to-Information-Management.pdf
Ad

Building an Event-Driven Data Mesh (Early Release) Adam Bellemare

  • 1. Building an Event-Driven Data Mesh (Early Release) Adam Bellemare install download https://ebookmeta.com/product/building-an-event-driven-data-mesh- early-release-adam-bellemare/ Download more ebook from https://ebookmeta.com
  • 2. We believe these products will be a great fit for you. Click the link to download now, or visit ebookmeta.com to discover even more! Practical Event-Driven Microservices Architecture: Building Sustainable and Highly Scalable Event-Driven Microservices Oliveira Rocha https://ebookmeta.com/product/practical-event-driven- microservices-architecture-building-sustainable-and-highly- scalable-event-driven-microservices-oliveira-rocha/ Data Mesh Delivering Data Driven Value at Scale 1st Edition Zhamak Dehghani https://ebookmeta.com/product/data-mesh-delivering-data-driven- value-at-scale-1st-edition-zhamak-dehghani/ Data Mesh: Delivering Data-Driven Value at Scale 3rd Edition Zhamak Dehghani https://ebookmeta.com/product/data-mesh-delivering-data-driven- value-at-scale-3rd-edition-zhamak-dehghani/ Letters from Pharmacy Residents 1st Edition Sara J. White https://ebookmeta.com/product/letters-from-pharmacy- residents-1st-edition-sara-j-white/
  • 3. Designing the Forest and Other Mass Timber Futures 1st Edition Lindsey Wikstrom https://ebookmeta.com/product/designing-the-forest-and-other- mass-timber-futures-1st-edition-lindsey-wikstrom/ Intersections of Mothering: Feminist Accounts 1st Edition Carole Zufferey https://ebookmeta.com/product/intersections-of-mothering- feminist-accounts-1st-edition-carole-zufferey/ FOUND YOU - Katie Winter FBI Mystery 1st Edition Molly Black https://ebookmeta.com/product/found-you-katie-winter-fbi- mystery-1st-edition-molly-black/ Miscellanies About the Buddha Image BAR International Claudine Bautze Picron https://ebookmeta.com/product/miscellanies-about-the-buddha- image-bar-international-claudine-bautze-picron/ Handbook of Climate Change Mitigation and Adaptation 3rd Edition Maximilian Lackner https://ebookmeta.com/product/handbook-of-climate-change- mitigation-and-adaptation-3rd-edition-maximilian-lackner/
  • 4. Temptation Free Grace Broadcaster Issue 257 1st Edition Various https://ebookmeta.com/product/temptation-free-grace-broadcaster- issue-257-1st-edition-various/
  • 6. Building an Event-Driven Data Mesh Patterns for Designing & Building Event-Driven Architectures With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles. Adam Bellemare
  • 7. Learning Events for Distributed Systems by Adam Bellemare Copyright © 2022 Adam Bellemare. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Tache Development Editor: Melissa Duffield Production Editor: Jonathon Owen Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea March 2023: First Edition Revision History for the Early Release 2022-07-25: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098127602 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Learning Events for Distributed Systems, the cover image, and related trade
  • 8. dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-098-12754-1
  • 9. Chapter 1. Introducing Event Streams for Data Communication A NOTE FOR EARLY RELEASE READERS With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles. This will be the 1st chapter of the final book. If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material within this chapter, please reach out to the editor at mpotter@oreilly.com. The way that businesses relate to their data is changing rapidly. Gone are the days when all of a business’ data would fit neatly into a single relational database. The big data revolution, started more than two decades ago, has since evolved, and it is no longer sufficient to store your massive datasets in a big data lake for batch analysis. Speed and inter-connectivity have emerged as the next major competitive business requirements, again transforming the way that businesses create, store, access, and share their important data. The modern business faces three main problems relating to data. First, big data systems, underpinning a company’s business analytics engine, have exploded in size and complexity. There have been many attempts to address and reduce this complexity, but they each fall short of the mark. Secondly, business operations for large companies have long since passed the point of
  • 10. being served by a single monolithic deployment. Multi-service deployments are the norm, including microservice and service oriented architectures. The boundaries of these modular systems are seldomly easily defined, especially when many separate operational and analytical systems rely on read-only access to the same data sets. There is an opposing tension here: on one hand, co-locating business functions in a single application provides consistent access to all data produced and stored in that system. On the other, these business functions may have absolutely no relation to one another aside from needing common read-only access to important business data. Both the analytical and operational domains suffer from a common third problem: the inability to access high-quality, well-documented, self- updating, and reliable data, to apply it to their own business use-cases. The sheer volume of data that an organization deals with increases substantially year-over-year, fueling a need for better ways to sort, store, and use it. This pressure deals the final blow to the ideal of keeping everything in a single database, and forces developers to split up monolithic applications into separate deployments with their own databases. Meanwhile, the big data teams struggle to keep up with the fragmentation and refactoring of these operational systems, as they remain solely responsible for obtaining their own data. Data has historically been treated as a second class citizen, as a form of exhaust or byproduct emitted by the business applications. This application-first thinking remains the major source of problems in today’s computing environments. Important business data needs to be readily and reliably available as building block primitives for your applications, regardless of the runtime, environment, or code base of your application. We can accomplish this by treating data as a first-class citizen, complete with dedicated ownership, minimum quality guarantees, service-level agreements, and scalable mechanisms for clean and reliable access. Event streams are the ideal mechanism for serving this data, providing a simple yet powerful way of reliably communicating important business data across an organization, enabling each consumer to access and use the data primitives they need.
  • 11. In this chapter we’ll take a look at the forces that have shaped the operational and analytical tools and systems that we commonly use today and the problems that go along with them. The massive inefficiencies of contemporary data architectures provide us with rich learnings that we will apply to our event-driven solutions, and will set the stage for us going into the next chapter when we talk about Data Mesh as a whole. The Case for Event Streams and Event- Driven Architectures The new competitive requirements of needing big data, in motion, combined with modern cloud computing, requires a rethink of how a business creates, stores, moves, and uses data. The foundation of this new data architecture is the event, the data quantum that represents real activity within the business, as made available in the event stream. Together, event streams lead to a central nervous system for unified data, enabling business units to access and use fundamental, self-updating data building blocks. These data building blocks join the ranks of containerization, infrastructure- as-a-service, CI/CD pipelines, and monitoring solutions, the components on which modern cloud applications are built. Event streams are not new, and many past architectures have used them, though not as extensively as outlined in this book. But the technological limitations underpinning previous event-driven architectures are no longer a concern in today’s world of easily scalable cloud computing. Modern multi- tenant event brokers, complete with tiered storage, can store an unbounded amount of data, removing the strict capacity restrictions that limited their predecessors and prevented the storage of historical data. Producers write their important business domain data to an event stream, enabling others to couple on that stream and use the data building blocks for their own applications. Finally, consumer applications can in turn create their own event streams to share their own business facts with others, resulting in a standardized communications mesh for all to use.
  • 12. A Brief History of the Operational and Analytical Plane History has provided us with ample opportunity to learn from technological choices that just didn’t work out as well as we’d hoped. In this section, we’ll focus on the operational plane, the analytical plane (in the form of “big data” solutions), and event streams. We’ll take a brief look at each of these to get a better understanding of the forces that helped shape them, and how those forces may still be relevant for a data communicatin system built on event streams. The Operational Plane The year is 1970. The relational database model had just been proposed by E.F. Codd. This model began to catch on over the next decade, with implementations by IBM, Honeywell, and Oracle (to name a few) released for general business usage. This model promoted two very important things for the business world. One, the ability to relationally structure your data models, such that you could get good real-time performance for insertions, updates, deletions, and importantly, complex, multi-condition queries. Two, the enablement of atomic, consistent, isolated, and durable (ACID) transactions, which allow multiple actors to commit to the database without inadvertently corrupting the underlying data. These two properties underpin all modern relational databases, and have enabled the online transactional processing (OLTP) of countless businesses across the globe. OLTP databases and applications form the basis of much of today’s operational computing, often in the form of a monolith. This is for many good reasons: A monolith is a great way to get a business venture started, with low overhead for creating, testing, and evaluating business strategies. There are a wide selection of technologies that enable it, as well as a large pool of engineering talent to draw on, given its commonality. Monolithic applications also provide useful data access frameworks and abstractions to the underlying storage layer, which is most commonly a relational database such as PostgreSQL or MySQL. The database itself provides the
  • 13. aforementioned ACID transactions, high operational performance, durability, and reliable error handling. Together, the application and database demonstrate the monolith data principles: The Database is the Source of Truth The monolith relies upon the underlying database to be the durable store of information for the application. Any new or updated records are first recorded into the database, making it the definitive source of truth for that business data. Data is Strongly Consistent The monolith’s data, when stored in a typical relational database, is strongly consistent. This provides the business logic with strong read- after-write consistency, and thanks to transactions, it will not inadvertently access partially updated records. Read-Only Data is readily available The monolith code can also readily access data from anywhere in the database. A modular monolith, where each module is responsible for certain business functionality, will have restricted write access, but tend to allow other modules to read their underlying data as necessary. These three properties form a binding force that make monolithic architectures very powerful. Your application code has read-only access to the entire span of data stored in the monolith’s database as a set of authoritative, consistent, and accessible data primitives. This foundation makes it easy to build new application functionality provided it is in the same application. But what if you need to build a new application? The Difficulties of Communicating Data Between Operational Services A new application cannot rely on the same easy access to data primitives that it would have if it were built as part of the monolith. This would not be
  • 14. a problem if the new service had no need for any of the business data in the monolith. However, this is rarely the case, as businesses are effectively a set of overlapping domains, particularly the common core, with the same data serving multiple business requirements. For example, an e-commerce retailer may rely on its monolith to handle its orders, sales, and inventory, but requires a new service powered by a document-based search index to give its customers fast and effective search results, including available items in the stores. Figure 1-1 highlights the crux of the issue: how do we get the data from Ol' Reliable into the new document database to power search? Figure 1-1. The new search service team must figure out how to get the data it needs out of the monolith keep it up to date
  • 15. This puts the new search service team in a bit of a predicament. The service needs access to the item, store, and inventory data in the monolith, but it also needs to model it all as a set of documents for the search engine. There are two common ways that teams attempt to resolve this. One is to replicate and transform the data to the search engine, in an attempt to preserve the three monolith data principles. The second is to use APIs to restructure the service boundaries of the source system, such that the same data simply isn’t copied out - but served completely from a single system. Both can achieve some success, but are ultimately insufficient as a general solution. Let’s take a look at these in more detail to see why. Strategy 1: Replicate Data Between Services There are several mechanisms that fall under this strategy. The first and simplest is to just reach into the database and grab the data you need, when you need it. A slightly more structured approach is to periodically query the source database, and dump the set of results into your new structure. While this gives you the benefit of selecting a different data store technology for your new service, you remain coupled on the source database’s internal model and rely exclusively on it to handle your query needs. Large datasets and complex queries can grind the database to a halt, requiring an alternative solution. The next most common mechanism for the data replication strategy is a read-only replica of the source database. While this may help alleviate query performance issues, consumers remain coupled on the internal model. Unfortunately, each additional external coupling on the internal data model makes change more expensive, risky, and difficult for all involved members.
  • 16. WARNING Coupling on the internal data model of a source system causes many problems. The source model will change in the course of normal business evolution, which often causes breakages in both the periodic queries and internal operations of all external consumers. Each coupled service will need to refactor their copy of the model to match what is available from the source, migrate data from the old model to the new model, and update their business code accordingly. There is a substantial amount of risk in each of these steps, as a failure to perform each one correctly can lead to misunderstandings in the meaning of the models, divergent copies of the data, and ultimately incorrect business results. One final issue with data replication strategies is that it becomes more difficult with each new independent service. Authoritative data sets can be difficult to locate, and it is not uncommon for a team to accidentally couple on a copy of the original. Meanwhile, each new independent service may become its own authoritative source of data, increasing the number of point-to-point connections. Finally, data is no longer strongly consistent between datasets, particularly as clock skew between services, along with long query and copy times make it more difficult to reason about the completeness of your data copies. Strategy 2: Use APIs to Avoid Data Replication Needs Another way of creating a new service alongside a monolith is to take a look at directly-coupled request/response microservices, also sometimes known as synchronous microservices. They use direct API calls between one another to exchange small amounts of information and perform work on each other’s behalf. These are small, purpose built systems that address a specific domain of business functionality. For example, you may have one microservice that manages inventory related operations, while you have other microservices dedicated to shipping, accounts. Each of these services requests originating from the dedicated mobile frontend and web frontend microservices, which stitch together operations and return a seamless view to the users, as shown in Figure 1-2.
  • 17. Figure 1-2. An example of a simple ecommerce microservice architecture Synchronous microservices have many benefits. Services are purpose-built to serve the needs of the business domain, giving the owners a high level of independence to use the tools, technologies, and models that work best for
  • 18. their needs. Teams also have more control over the boundaries of their domain, including control and decision making over how to expand it to help serve other clients needs. There are numerous books written on this subject that go into far more detail than I have space for, so I won’t delve into it in much detail here. The main downside of this strategy is the same as with a single service - there is no easy and reliable mechanism for accessing data beyond the mechanisms provided in the microservices’ API. Most synchronous microservices are structured to offer up an API of business operations, and not for serving reliable bulk data access to the underlying domain. Thus, most teams resort to the same fallbacks as a single monolith: reach into the database and pull out the data you need, when you need it, as shown in Figure 1-3.
  • 19. Figure 1-3. The microservice boundaries may not line up with the needs of the business problem In this figure, the new service is reliant upon the inventory, accounts, and shipping services just for access to the underlying business data - but not for
  • 20. the execution of any business logic. While this form of data access can be served via a synchronous API, it may not be suitable for all use-cases. For example, large datasets, time-sensitive data, and complex models can prevent this from becoming a reality, in addition to the operational burden of providing the data access API and data serving performance on top of that of the base microservice functionality. Operational systems lack a generalized solution for communicating important business data between services. But this isn’t something that’s isolated to just operations. The big data domain, underpinning and powering analytics, reporting, machine learning, AI, and other business services, is a voracious consumers of full data sets from all across the business. While domain boundary violations in the form of smash and grab data access is the foundation on which big data engineering has been built (I have been a part of such raids during the decade or so I have spent in this space), fortunately for us, it has provided us with rich insights that we can applyt to make a better solution all data users. But before we get to that, let’s take a look at the big data domain requirements for accessing and using data, and how this space evolved to where it is today. A short opinionated history of big data - Hadoop, Data Warehouse, Lakes, and Analytics Whereas operational concerns focus primarily on OLTP and server-to- server communication, analytical concerns are historically focused on answering questions about the overall performance of the business. Online analytical processing (OLAP) systems are purpose-built to help solve these problems, and have been actively developed and released since the first commercial offering in 1970 (the same year as E.F. Codd’s paper on relational databases). In brief, these OLAP systems are used to store restructured data in a multidimensional “cube” format, such that analysts can evaluate data on different vectors. The cube could be used to answer questions such as “how many items did we sell last year?” and “what was our most popular item?”
  • 21. Answering these questions requires remodeling operational data into a model suitable for analytics, but also accounting for the vast amounts of data where the answers are ultimately found. While OLAP cubes provided early success, their scaling strategy to accommodate ever-increasing data loads relied upon larger disk drives and more powerful compute power, and ultimately ran into the physical limits of computer hardware. Instead of further scaling up, it became time to scale out; This was plainly evident to Google, who published a paper about the Google File System in October, 2003 (https://research.google/pubs/pub51/). This is the era that saw the birth of big data, and caused a massive global rethink of how we create, store, process, and ultimately, use data. Hadoop quickly caught on as the definitive way to solve the compute and scale problems encountered by OLAP cubes and quickly transformed the analytical domain. Its free and open-source model meant that it could be used by any company, anywhere, provided you could figure out how to manage the infrastructure requirements, among a number of other technical hurdles. But it meant a new way to compute analytics, and one where you were no longer constrained to a proprietary system limited by the resources of a single computer. Hadoop introduced the Hadoop Distributed File System (HDFS), a durable, fault-tolerant file system that made it possible to create, store, and process truly massive data sets spanning multiple commodity hardware nodes. While this has now been largely supplanted by options such as Amazon’s S3 and Google’s Cloud Storage, it paved the way for a bold new idea: Copy all of the data you need into a single logical location, and apply processing after the fact to derive important business analytics. The Hadoop ecosystem was only as useful as the data it had available, and integrating it with existing systems and data sources remained problematic. The creators of Hadoop focused on making it easy to get data into HDFS, and in 2009 introduced a tool named Sqoop for this very purpose. One of Sqoop’s most commonly used purpose is to pull data from external sources (databases) on a periodic basis. For example, a scheduler kicks off a Sqoop job at the top of every hour, which in turn queries a source database for all
  • 22. of the data that changed in the past hour, then loads it directly into an HDFS folder as part of the incremental update of a larger data set. Once the Sqoop job completed loading the new data into the cloud storage, you can then execute your analytical processing job on the newly updated batch of data. The big data architecture introduced a significant shift in the mentality towards data. While it did solve the capacity issues of existing OLAP systems, it introduced a new concept into the data world: that it was not only acceptable, but implicitly implied that writing unstructured and semi- structured data was the recommended way to get data into the Hadoop ecosystem! OLAP cubes required an explicit schema for their table definitions, and incoming data had to be transformed and resolved to fit that schema. These schemas are carefully constructed, maintained, and migrated, and any insertion conflicts had to be resolved at write time, or else the insertion would simply fail to protect the integrity of the data. Such a restriction did not explicitly exist when using Hadoop. And in practice, many data engineers were pleased to get rid of the schema on write barriers that so often caused them problems when ingesting data. In this new world of Big Data, you were free to write data with or without any schema or structure, and resolve it all later at query time by applying a schema on read. And this isn’t just my opinion - in fact, writing unstructured data and resolving at read time was one of the prime features of contemporary Hadoop marketing material and technical documents. Consider Figure 1-4, comparing the data structures and use-cases between the relational database and MapReduce. MapReduce is Hadoop’s first processing framework, and is the system that would read the data, apply a schema (on read), perform transformations and aggregations, and produce the final result. MapReduce was essential for analytical processing of data sourced from HDFS, though today has largely been supplanted by newer, more performant options.
  • 24. Figure 1-4. A comparison of RDBMS and Hadoop, circa 2009. Hadoop: The Definitive Guide By Tom White (2009 - O’Reilly Media, Inc.) Note how this definitive guide promotes MapReduce as a solution for handling low integrity data with a dynamic schema. This supports the notion that HDFS should be storing unstructured data with low integrity, or with varying and possibly conflicting schemas to be resolved at run time. It also points out that this data is Write once, read many times, which is precisely the scenario in which you want a strong, consistent, enforced schema - providing ample opportunity for well-meaning but unconstrained users of the data to apply a read-time schema that misinterprets or invalidates the data. At the time, I don’t think that I nor many others appreciated just how difficult these precepts would end up making collecting and using data. We believed in and supported the idea that it was okay to grab data as you need it, and figure it out after the fact, restructuring, cleaning, and enforcing schemas at a later date. This also made it very palatable for those considering migrating to Hadoop to aleviate their analytical constraints, as this move didn’t constrain your write processes nor bother you with strict rules of data ingestion - after all, you can just fix the data once it’s copied over! Unfortunately, the funamental principle of storing unstructured data to be used with schema on read proved to be the costliest and most damaging tenets introduced by the big data revolution. The Organizational Impact of Schema on Read Enforcing a schema at read time, instead of at write time, leads to a proliferation of what we call “bad data”. The lack of write-time checks means that data written into HDFS may not adhere to the schemas that the readers are using in their existing work Figure 1-5. Some bad data will cause consumers to halt processing, while other bad data may go silently
  • 25. undetected. While both of these are problematic, silent failures can be deadly and difficult to detect. We’ll see more on this later. Figure 1-5. Examples of bad data in a dataset, discovered only at read time To get a better understanding of the damaging influence of schema on read, let’s take a look at three roles and their relationship to one another. While I am limited to my own personal experiences, I have been fortunate to talk with many other data people in my career, many from very different companies and lines of business. I can say with confidence that while responsibilities vary somewhat from organization to organization, this summary of roles is by and large universal to most contemporary organizations using big data: The data analyst: is charged with answering business questions related to big data sets. They query the data sets provided to them by the data engineers. The data engineer: is charged with obtaining the important business data from systems around the organization, and putting it into a usable
  • 26. format for the data analysts. The application developer: is charged with developing an application to solve business problems. That application’s database is also the source of data required by the data analyst to do their job. Historically, the most common way to adopt Hadoop was to establish a dedicated data team, as a subset, or fully separate from, the regular engineering team. The data engineer would reach into the application developer’s database, grab the required data (often using Sqoop), and pull it out to put into HDFS. Data scientists would help structure it and clean it up (and possibly build machine-learning models off of it), before passing it on to be used in analytics. Finally, the data analysts would then query and process the captured datasets to produce answers to their analytical questions. But this model led to many issues. Conversations between the data team and the application developers would be infrequent, and usually revolve around ensuring the data team’s query load does not affect the production serving capabilities. There are three main problems with this model. Let’s take a look at them. Problem 1: Improper Data Model Boundaries Data ingested into the analytics domain is coupled on the source’s internal data model, and results in direct coupling by all downstream users of that data. For very simple, seldomly-changing sources this may not be much of a problem. But many models span multiple tables and are purpose built to serve OLTP operations, and may become subject to substantial refactoring as business use-cases change. Direct coupling on this internal model exposes all downstream users to these changes. OLAP cubes with schema on write forced the data engineers to reconcile this incompatibility at write time, usually with the help of the application developer who made the change. OLAP cube updates, and therefore new processing results, would be halted until the new schema and data were reconciled with the existing OLAP structure. However, with schema on
  • 27. read, you can simply update the Sqoop job to pull from the newly defined and altered tables, and push reconciling the horrible mess down to the data scientists and analysts. While this is obviously not a true soluton to the problem at hand, it’s unfortunately one that I have commonly seen - push it down and let the users of the data figure it out. TIP Schema on read purportedly gives you the freedom to define the schema any way you need. However, implementing a schema on write for a data set doesn’t prevent you from subsequently transforming it downstream, in another job, into the format that you need for your business domain. Schema on write gives you a well-defined and clean interface to base your work on, while still letting you transform it as you need. We’ll be relying on schema on write for the majority of our solutions in this book. WARNING One example I have seen of a table modification that silently broke downstream jobs involved changing a premium field from boolean to long. The original version represented an answer to the question of “did the customer pay to promote this”. The updated version represented the budget_id of the newly expanded domain, linking this part of the model to the budget and its associated type (including the new trial type). The business had adopted a “try before you buy” model where we would reserve, say, several hundred dollars in advertising credits to showcase the effectiveness of promotion, without counting it in our total gross revenue. The jobs ingesting this data to HDFS didn’t miss a beat (no schema on write), but some of our downstream jobs started to report odd values. A majority of these were python jobs, easily evaluating the new long values as booleans, and resulted in over-attribution of various user analytics. Unfortunately, because no jobs were actually broken, this problem wasn’t actually detected until one of our customers started asking questions about abnormal results in their reports. This is just one example of many that I have encountered, where well-meaning, reasonable changes to a systems’ data model have unintended consequences on all of those who have coupled on it. Problem 2: Ownership is spread across multiple teams The application developer retains ownership of their data model, but typically are largely unaware of the needs of anyone directly querying their
  • 28. underlying database. As we saw with Problem 1, a change to this model can have unintended consequences on the analysts relying on that data. Application developers are the domain experts and masters of the source data model, but their responsibility for communicating that data to other teams (such as the big data team) is usually non-existent. Instead, their responsibilities usually end at the boundaries of their application and database. Meanwhile, the data engineer is tasked with finding a way to get that data out of the application developer’s database, in a timely manner, without negatively affecting the production system. The data engineer is dependent on the data sources, but often has little to no influence on what happens to the data sources, making their role very reactive. This production / data divide is a very real barrier in many organizations, and despite best-efforts, agreements, integration checks and preventative tooling, breakages in data ingestion pipelines remains a common theme. Finally, the data analyst, responsible for actually using the data to derive business value, remains two degrees of separation away from the domain expert (application developer), and three degrees separated if you have a layer of data scientists in there further munging the data. Both analysts and data scientists have to deal with whatever data the data engineers were able to extract, including resolving inconsistent data that doesn’t match their existing read schemas. As data analysts often share their schemas with other data analysts, they also need to ensure that their resolved schemas don’t break each other’s work. This is increasingly difficult to do as an organization and its data grows, and unfortunately, their resolution efforts remain limited to only benefiting other analysts. Operational systems are on their own. Problem 3: Do-it-yourself & Custom Point-to-Point Data Connections While a data team in a small organization may consist of only a handful of members, larger organizations have data teams that number in the hundreds or thousands of members. For large data organizations, it is common to
  • 29. have the same data sources pulled in to different sub domains of the data platform. For example, sales data may be pulled into the analytics department, consumer reporting department and accounts receivable department. If we’re using Sqoop, each of these jobs may be independently scheduled, unbeknownst to one another, resulting in multiple areas where you have the same data. These types of workloads are commonly known as ETL (extract, transform, and load) jobs, and are a staple in ingesting, transforming, and loading data from operational systems to analytical data warehouses and lakes. Figure 1-6 shows what this tends to look like in practice. Each arrow in the diagram is a dedicated periodic copy job, pulling data from one area to another. Ad-hoc copying of data like this makes it difficult for the data owners to track where their data is going, and who is using it. Its common that all jobs pull data using a single shared account, to avoid the hassle of changing security credentials and limiting the scope of access. And even when such restrictions are imposed, such as in the case of the Predictions domain, often times they can simply circumnavigate the restrictions by finding an copy of that data available somewhere in the grand repository of HDFS, and copy it over to their own domain. While infosec and data governance are important topics in their own right, my own experience (and that of many trusted peers) has indicated to me that unless you make it very easy for people to get the data they need, they will create their own clever (if illicit) ways to get it.
  • 30. Figure 1-6. Three typically analytical domains, each grabbing data from where it can to get their work done But are these copies really the same data? There are many factors that can affect this, including the frequency at which the data is acquired, transformation code, intermittent failures, and misunderstanding that which you are copying. You may also be ingesting a dataset that is itself a copy of the original, with its own idiosyncracies, something that may not be apparent unless you are intimately familiar with domain of the source system. The end result is that it’s possible to end up with data that is supposed to be the same, but is actually different. A very simple example of this would be two ingested datasets, one aggregated daily with boundaries
  • 31. at UTC-0 time, while the other is aggregated around daily with boundaries based on local time. The format, partitioning, and ordering of data may appear identical, yet these hard to detect, undocumented differences still remain. A domain modeling change in one of the data sources may require that the downstream data engineers update their ETL pipelines to accommodate the new format. However, three separate ETL jobs means three separate code updates, each with its own independent changes, reviews, and commits. If owned by different teams, as is often the case in a very large company with lots of data sets and ETL jobs, these updates may be done many days apart - or even not at all (you’d be surprised how many times a critical ETL job can go unnoticed until it breaks). Thus, the results of each ETL job may (and often do) report conflicting results with one another, despite being sourced from the same original datasource. Common ways to address this can include tribal knowledge (eg: tell me when you’re changing the domain model!) and automated pull request checks (eg: notify me if a pull request is made against the database schema), but this is only a partial solution. All affected jobs and datasets need to be identified and their owners need to be notified. At this point, a case-by-case remediation of all affected downstream data can begin. This tends to be a very expensive and complex task, and while it is possible, preventing yourself from getting into this situation is a far better area of investment than remediating your way out of it. Bad Data: The Costs of Inaction The reason that bad data costs so much is that it must be accommodated by everyone consuming, using, and processing this data. This is further complicated by the fact that not every consumer of bad data will “fix it” in the same way, leading to divergent results with other consumers based on their interpretation of the data. Tracking these divergent interpretations down is fiendishly expensive, and is further complicated by mixing in data from other domains that further muddies the interpretation of a dataset.
  • 32. I have seen bad data created by well-meaning individuals trying their very best, simply because of the point-to-point, reach in and grab it nature of the existing data transfer tools. This has been further augmented by massive scale, where a team discovers that not only is their copy of the dataset wrong, but that it’s been wrong for several months, and the results of each big data job computed on that dataset are also wrong. Some of these jobs use hundreds or thousands of processing nodes, with 32x to 128x more in GB of RAM, churning through 100s of TBs of data on each nightly run. This can easily amount to hundreds of thousands, or millions of dollars just in processing costs, for jobs not only need to be thrown away and rerun, but that have also negatively affected all downstream jobs. In turn, those jobs also need to be rerun, incurring their own processing costs. But processing costs are certainly not the only factor. The business decisions made off of that bad data and their subsequent effects can ruin a business. While I will do my past employers a favor and explicitly state that “this did not happen there”, I have been privy to details on one scenario where a company had misbilled its customers collectively by several million dollars, in some cases by too much, and in others by too little. The cause of this was actually quite innocent and well-intentioned: a somewhat long chain of data sets created by reaching in and grabbing data, coupled with some schema changes that weren’t detected by the big data processing teams that prepared the data for billing. It was only when a customer noticed that their billing costs far exceeded their engagement costs that an investigation was kicked off. And this is only one of many that I have been witness to. I will readily acknowledge that a single person’s experience is certainly insufficient to sway a skeptic. I will, however, guarantee you that I am thoroughly convicted that difficult to obtain data is one of the primary causes of bad data. If you are someone in big data analytics, I suspect you have found this to be evident in your own personal experience. If you are a developer working on a critical operational business application, you may want to check with your analytics team, along with other operational teams who are dependent on your data, to see what their experiences are like in
  • 33. accessing it. Fortunately for me, there are others who have researched this field to figure out just how much bad data costs businesses. The results are staggeringly high. In 2016, one report by IBM, as highlighted by the Harvard Business Review (HBR) put an estimate of the financial impacts of bad data at 3.1 Trillion US Dollars, in the USA alone. Though the original report is (frustratingly) no longer available, HBR has extracted some of the more relevant numbers related to the time spent by those trying to use data. To paraphrase: 50% — the amount of time that knowledge workers waste hunting for data, finding and correcting errors, and searching for confirmatory sources for data they don’t trust. 60% — the estimated fraction of time that data scientists spend cleaning and organizing data. As I was researching this subject, I came across an even older HBR article from 2013: “Studies show that knowledge workers waste up to 50% of time hunting for data, identifying and correcting errors, and seeking confirmatory sources for data they do not trust. … fixing the problem is not as hard as many might think. The solution is not better technology: It’s better communication between the creators of data and the data users;” The problem of bad data has existed for a very long time. Data copies diverge as their original source changes. Copies get stale. Errors detected in one dataset are not fixed in other duplicate datasets. Domain knowledge related to interpreting and understanding data remains patchy, as does support from the owners of the original data. By promoting data to a first-class citizen, as a product like any other, we can eliminate many of the root causes of bad data. A data product with a well-defined schema, domain documentation, standardized access mechanisms, and service level agreements can substantially reduce the impact of bad data right at the source. Consumers, once coupled on the data
  • 34. product, may still make their own business logic mistakes - this is unavoidable. They will, however, seldomly make any more inadvertent mistakes in merely trying to acquire, understand, and interpret the data they need to solve their business problems. Inaction is not a solution. One more problem to solve: Unifying analytical and operational workflows It’s clear that batch big data, as it’s conventionally set up, has some problems that remain to be solved. But there’s one more problem, that sits at the heart of engineering - it’s not just the data team that has these data access and quality problems. Every single OLTP application that needs data stored in another database has the same data access problems as the data team. How do you access important business data, locked away in another service, for operational concerns? There have been several previous attempts at enabling better operational communication between services, including service-oriented architecture, enterprise service buses, and of course, point-to-point request/response microservices. But in each of these architectures, the service’s data is encapsulated within its own database, and is out of reach to other services. In one way this is good - the internal model is sheltered, and you have a single source of truth. Applications provide operational APIs that other applications can call for it to do work on its behalf. But this also doesn’t resolve the fundamental issue of wholesale access to definitive data sets for teams to use for their own business purposes. Failing to provide this is also not an option, as illustrated by the decades of big data’s “reach in and grab it” strategy and the numerous substantial expenses that come with it. A further complication is that many operational use-cases nowadays depend on analytical results. Think machine learning, recommendation engines, AI, etc. Some use cases, such as producing a monthly report of top selling products, can very clearly be labelled as “analytical”, to be derived from a Hadoop inquiry. Other use cases are not so clear cut. Consider an e- commerce retailer that wants to advertise shoes based on current inventory
  • 35. (operational), previous user purchases (analytical), and the user’s realtime estimated shopping session intentions (analytical & operational). In practice, the boundary between operational and analytical is seldom neatly defined, and the exact same data set may be needed for a multitude of purposes, analytical, operational, or somewhere in between. Both big data analytics and conventional operational systems have substantial difficulty in accessing data sets are contained within other databases. These difficulties are further exacerbated by the increasing volume, velocity, and scale of data, while systems are simultaneously forced to scale outwards instead of upwards as compute limitations of individual services are reached. The data communication strategies of most organizations are based on yesterday’s technology and fail to account for the offerings of modern cloud storage, computing, and software as a service. These tools and technologies have changed the way that data can be modeled, stored, and communicated across an organization, which we will examine in more detail throughout the remainder of this book. NOTE Silos form as teams grow. While we have seen that there is often a divide between the engineering and data teams, divisions will also certainly exist within the larger team. Take a look around your own organization, and go talk to people in other silos. Ask them how they get the data they need to do their business operations, as well as if they know all of the customers who are consuming their data. You may learn something from their answers. How Do We Resolve All Of These Data Issues? The premise of the solution is simple. Publish important business facts to dedicated, durable, and replayable event streams. These streams become a fundamental source of truth for operational, analytical, and all other forms of workloads across the organization. Producers of this data are responsible for the modeling, evolution, and quality of the data provided in the event
  • 36. stream, i treating it as a first-class citizen, on par with any other product in the organization. Prospective consumers can explore, discover, and subscribe to the event streams they need for their business use-cases. The event streams should be well-described, easy to interpret, and form the basis for a set of self-updating data primitives for powering both business services and analytics. This architecture is built by leveraging modern cloud computing and Software-as-a-Service (SaaS) options, as we shall see when we cover building a self-service platform. A good engineering stack makes it easy to create and manage applications throughout their lifecycle, including acquiring compute resources, providing scalability, logging, and monitoring capabilities. Event streams provide the modern engineering stack with the formalized and standardized access to the data it needs to get things done. Let’s revisit the monolith data principles from earlier in this chapter through the lens of this proposal. These three principles outline the major influences for colocating new business functionality within a monolith. How would a set of self-updating event streams relate to these principles? The Database is the Source of Truth → The Event Stream is the Source of Truth The owner of the data domain is now responsible for composing an external-facing model and writing it as a set of events to one (or more) event streams. In exchange, other services can no longer directly access and couple on the internal data model, and the producer is no longer responsible for serving tailored business tasks on behalf of the querying service, as is often the case in a microservices architecture. The event stream becomes the main point of coupling between systems. Downstream services consume events from the event stream, model it for their purposes, and store it in their own dedicated data stores. Data is Strongly Consistent → Data is Eventually Consistent The event stream producer can retain strong read-after-write consistency for its own internal state, along with other database benefits such as
  • 37. local ACID transactions. Consumers of the event stream, however, are independent in their processing of events and modeling of state, and thus rely on their own eventually consistent view of the processed data. A consumer does not have write-access to the event stream, and so cannot modify the source of data. Consumer system designs must account for eventual consistency, and we will be exploring this subject in greater detail later in this book. Read-Only Data is readily available (remains unchanged!) Event streams provide the formalized mechanism for communicating data in a read-only, self-updating format, and consumers no longer need to create, manage, and maintain their own extraction mechanism. If a consumer application needs to retain state, then it does so using its own dedicated data store, completely independent of the producer’s database. Data mesh formalizes the ownership boundaries of data within an organization and standardizes the mechanisms of storage and communication. It also provides a reusable framework for producing, consuming, modeling, and using data, not only for current systems, but also for systems yet to be built. Common Objections to an Event-Driven Data Mesh There are several common objections that I have frequently encountered when sharing discussing an event-driven data mesh with others. Though we will cover these situations in more detail throughout the book, I wanted to bring it up now to acknowledge that these objections do exist, but that each one of them is manageable. Producers cannot model data for everyone’s use cases
  • 38. This argument is actually true, though it misses the point. The main duty of the producer is to provide an accurate and reliable external public model of their domain data for consumer use. These data models only need to expose the parts of the domain that other teams can couple on; The remainder of their internal model remains off limits. For example, an ecommerce domain would have independent sales, an item, and an inventory models and event streams, simply detailing the current properties and values of each sale, item, and inventory level, whereas a shipping company may have event streams for each of shipment, truck, and driver. These models are deliberately simple and hyper focused around a single domain definition, resulting in tight, modular data building blocks that other systems can use to build their own data models. Consumers that ingest these events can restructure them as needed, including joining them with events from other streams or merging with existing state, to derive a model that works for solving their business use-cases. Consumers can also engage the producer teams, requesting additional information be added to the public model, or for clarification on certain fields and values. As the producer team owns the original data model, they are the most qualified to decide what aspects of it they should expose and allow others to couple on. In fact, there is no other team more qualified than the team that actually creates the original source of data in defining what it actually means, and how others should interpret what its fields, relationships and values mean. This approach lets the data source owners abstract away their internal complexities, such as their highly normalized relational model or document store. Changes to the internal source model can be hidden from consumers that would otherwise have coupled directly on it, reducing breakages and errors. Making multiple copies of data is bad This objection, ironically, is implicitly in opposition of the first argument. Though just like the previous argument, it does have a grain of truth. Multiple copies of the same data set can and do inadvertently get out of sync, become stale, or otherwise provide a source of data that is in
  • 39. disagreement with the original source. However, our proposal is not to make copying data a free for all, but rather to a formalized and well- supported process that establishes clear rules and responsibilities, embracing this reality rather than hiding from it. There are three main subtypes of this argument. There should only be a single master copy of the data, and all systems should reference it directly This belief fails to account for the fact that big data analytics teams worldwide have already been violating this principle since the dawn of the big data movement (and really, OLAP in general), because their needs cannot be met by a single master copy, stored in a single database somewhere. It also fails to account for the various needs of other operational systems, which follow the same boundary-breaching data acquisition strategies. It’s simply non-tenable. Insufficiency of the source system to model its data for all business use- cases is a prime reason why multiple copies of the same data set will eventually exist. One system may need to support ACID transactions in a relational model, whereas a second system must support a document store for geo-location and plain-text search. A third consumer may need to write these datasets to HDFS, to apply MapReduce style processing to yield results from the previous 364 copies of that data it made, cross-referenced to other annual datasets. All of these cannot be served from a single central database, if not just for the modeling, then for the impossibility of satisfactory performance for all use cases. It’s too computationally expensive to create, store, and update multiple copies of the same data This argument is hyper-focused on the fact that moving and storing data costs money, and thus storing a copy of the same data is wasteful (disregarding factors such as remodeling and performance, of course). This argument fails to account for the inexpensiveness of cloud computing, particularly the exceptionally cheap storage and network costs of today’s
  • 40. major cloud providers. It also fails to account for the developer-hours necessary to build and support custom ETL pipelines, part of the multi- trillion dollar inefficiencies in creating, finding, and using data. Optimizing for minimizing data transfer, application size, and disk usage are no longer as important as they once were for the vast majority of business applications. Instead, the priority should be on minimizing developer efforts for accessing data building blocks, with a focus on operational flexibility. Managing infosec policies across systems and distributd datasets is too hard I will acknowledge that it is challenging. However, the principles of data mesh, which we will get into in the next chapter, acknlowedge this and make it a responsibility of the implementers to add this in from the start. One of the bigger difficulties in infosec management is in applying policies to an already existing (and usually sprawling) distributed data architecture. By formalizing these requirements ahead of time, creating necessary self- service tooling, and making infosec adherence a minimal barrier to entry for participation in the data mesh, we tame the complexity of this problem and make it tenable. Eventual Consistency is too difficult to manage Data communicated through event streams does require consideration of and planning for eventual consistency. However, the complaint that eventual consistency is too much of a barrier is typically founded on a misunderstanding of how much of an impact it can have on business processes as a whole. We can properly define our system boundaries to account for eventual consistency between systems, while having access to strong consistency within a system. There’s no getting around it - if a certain business process needs perfect consistency, then the creation and usage of the data must be within the same service boundary. But the majority of business processes don’t need this, and for those that do, nothing we’re proposing in this book precludes you from obtaining it. We’ll
  • 41. be discussing how to handle eventual consistency in more detail later in this book. NOTE Eventual consistency is a property of copying data from one data store to another. Regardless of how it is done, the copy immediately begins to go stale. An event-stream approach embraces this by enabling a simple mechanism to see just how up to date a data set really is - checking the topic to see if there are any new events to process. If the topic is empty, then we’re up to date. Though this mechanism is subject to network failures, outages, and other delays, it is far superior to periodically polling a system’s database every 15 minutes for the rows updated since the last periodic query.
  • 42. Figure 1-7. Data is stale as soon as a copy is made. Cross referencing the copies will give you inconsistent results
  • 43. Chapter Summary Existing data communication strategies fall flat in the face of real business requirements. Breaching a service’s boundary by reaching in to grab its data is not a sustainable practice, but it is extremely common, and often supports multiple critical systems and analytics workflows. Restructuring your systems into neat modular microservices does not solve the problem of data access; Other parts of your business, such as the big data analytics and machine learning teams will still require wholesale access to both current and historical data from domains across the organizations. One way or another, copies of data will be created, and we can either fight this or embrace this fact and work to make it better. In choosing the latter, we can use event streams to standardize and simplify the communication of data across the organization as self-updating single sources of truth. Events form the basis of communication in event-driven architectures, and fundamentally shape the space in which we solve our problems. Events, as delivered through event-streams, form the building blocks for building asynchronous and reactive systems. These building blocks are primitives that are similar to synchronous APIs: other applications can discover them, couple on them, and use them to build their own services. Eventual consistency, consumer-specific models, read-only replicas, and stream materializations are just some of the concepts we’ll explore in this book, along with the roles that modern cloud compute, storage, and networking resources, have in this new data architecture. The following chapters will dig deeper into building and using an event- driven data mesh. We’ll explore how to design events, including state, action, and notification events, as well as patterns for producing and consuming them. This book covers handling events at scale, including multi-cluster and multi-region, best practices for privacy and regulatory compliance, as well as principles for handling eventual consistency and asynchronous communication. We’ll explore the social and cultural changes necessary to accommodate an event-driven data mesh, and look at some real-world case studies highlighting the successes and lessons learned by others.
  • 44. Finally, we’ll also look at the practical steps you can take to start building towards this in your own organization. One of the best things about this architecture is that it’s modular and incremental, and you can start leveraging the benefits in one sector of your business at a time. While there are some initial investments, modern cloud compute and software as a service solutions have all but eliminated the barriers to entry, making it far easier to get started and test if this is the right solution for you.
  • 45. Chapter 2. Designing Events A NOTE FOR EARLY RELEASE READERS With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles. This will be the 7th chapter of the final book. If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material within this chapter, please reach out to the editor at mpotter@oreilly.com. As we saw in the self-service platform chapter, you can extract and publish a data product to an event stream by using a self-service connector framework, such as Debezium or Kafka Connect. But as you and your peers become more comfortable using event streams, you’re going to come to a point where you can start natively generating your own events instead of using a connector. This chapter covers how to design events in such a way that make it easy for others to use and apply them, and how to avoid the numerous pitfalls that you will encounter along the way. Introduction to Event Types There are two main types of events that underpin all of event design: The state event and the action event. As an example, Figure 2-1 shows a simple square wave in steady state, periodically altered by an action to result in a new state.
  • 47. Figure 2-1. State and Action during a change There are three stages to any occurrence in a system: 1. The initial state 2. The action that alters the initial state to produce the final state 3. The final state (which is also the initial state for the next change cycle) The vast majority of events we encounter can be ascribed to either a “state” event or an “action” event. Though there are nuances to this (such as a state event that includes some elements of actions), this simple division of types proves to be quite useful when discussing events with colleagues, coming up with designs, and framing the purpose that events have in the solution space in question. State Events These form the foundation of the majority of event streams used in inter-domain communication. State events fully describe the state of an entity at a given point in time. We first encounted state events when we covered extracting domain data with a change-data capture service. In this chapter we’ll build on what we’ve learned so far, and dig deeper into precisely why state events remain the best choice for communicating data products between domains. Action Events These describe the transition between states, are best reserved for the purposes of event sourcing within the domain. They do, however, have the distintion of being the event definition type that an inexperienced developer or architect first tries to use for inter-domain communication. We’ll explore why this is the case, and the pitfalls and hazards that make these unsuitable for such choices. For completeness, we’re also going to look into a few more event types to see if, and how, you should use them in your event stream data products.
  • 48. These include: Measurement events as used for collecting user behaviour metrics, Internet of Things (IoT) communications, and system monitoring Notification events as used for indicating a process or event has completed, with minimal infomation within the event itself Let’s take a look at state events first. State Events with Event-Carried State Transfer A state event contains the entire public state of a specific entity at the time the event was created. It does not contain any state that is private to the source domain, but only the data that has been publicly declared as part of the data contract. State events enable Event-Carried State Transfer (ECST), where a read-only model of the state can be recreated and processed by any consumer who needs it. State events can contain just the “now” state, or they may contain the “before/after” state, as we saw with change-data capture. Both of these options have their own advantages and disadvantages, which we’ll examine in turn. But for starters, let’s take a look at how each of these options affects compaction of event streams. There are three main design strategies for defining the structure and contents of ECST events: Current State These contain the full public state at the moment the event was created. Before/After State These contains both the full public state before the event occurred, and the full public state after the event occurred.
  • 49. Hybrid State with Action These contain either the current state or before/after, but also contains some action information as to why the event happened. Lets look into each of these in detail to get a better understanding of their tradeoffs. Current State In this design the event contains only the current state of the entity, and is most common form of ECST definition. For example, an inventory event for a given itemId will contain only the latest value for the quantity in stock. Previous values of the itemId in that event stream will remain until they are periodically compacted away. This design has several main benefits: Lean: It takes up a minimal amount of space in the event stream. Network traffic is also minimized. Simple: It relies on the event broker to store previous state, and doesn’t represent it in the event itself (We’ll cover this more in the next section on before/after). You can set the compaction policies independently for each event stream, in case you need a really long back log of historical events. Compactable: Deleting older state is as easy as publishing an updated record for a given key. It also has a few nuances, that I wouldn’t quite call drawbacks: Agnostic to why the state changed: The downstream consumer of the ECST event is not given a reason why the data has changed, only the new public state. The reason for this is simple: it removes the ability of consumers to couple on the internal state transitions
  • 50. Discovering Diverse Content Through Random Scribd Documents
  • 52. INDEX TO THE CHAPTERS. Schiltberger to the Reader 1 1. Of the first combat between King Sigmund and the Turks 1 2. How the Turkish king treated the prisoners 4 3. How Wyasit subjugated an entire country 6 4. How Wyasit made war on his brother-in-law, and killed him 7 5. How Weyasit drives away the king of Sebast 10 6. What sixty of us Christians had agreed upon 10 7. How Wyasit took the city of Samson 12 8. Of serpents and vipers 12 9. How the Infidels remain in the fields with their cattle, in winter and summer 14 10. How Weyasit took a country that belonged to the Sultan 18 11. Of the King-Sultan 19 12. How Temerlin conquered the kingdom of Sebast 20 13. Weyasit conquers Lesser Armenia 20 14. How Tämerlin goes to war with the King-Sultan 22 15. How Tämerlin conquered Babiloni 24 16. How Tämerlin conquered Lesser India 24 17. How a vassal carried off riches that belonged to Tämerlin 26 18. How Tämerlin caused MMM children to be killed 27 19. Tämerlin wants to go to war with the Great Chan 28 20. Of Tämerlin’s death 29 21. Of the sons of Tämerlin 30 22. How Joseph caused Mirenschach to be beheaded, and took possession of all his territory 31 23. How Joseph vanquished a king and beheaded him 32 24. How Schiltberger came to Aububachir 33 25. Of a king’s son 33 26. How one lord succeeds another lord 36 27. Of an Infidel woman, who had four thousand maidens 37 28. In what countries I have been 38 29. In which countries I have been, that lay between the Tonow and the sea 39 30. Of the castle of the sparrow-hawk, and how it is guarded 41 31. How a poor fellow watched the sparrow-hawk 42
  • 53. 32. xxxii More about the castle of the sparrow-hawk 42 33. In which countries silk is grown, and of Persia and of other kingdoms 44 34. Of the tower of Babilony that is of such great height 46 35. Of great Tartaria 48 36. The countries in which I have been, that belong to Tartary 49 37. How many kings-sultan there were, whilst I was amongst the Infidels 51 38. Of the mountain of St. Catherine 54 39. Of the withered tree 56 40. Of Jherusalem and of the Holy Sepulchre 57 41. Of the spring in Paradise, with IIII rivers 61 42. How pepper grows in India 61 43. Of Allexandria 62 44. Of a great giant 64 45. Of the many religions the Infidels have 65 46. How Machmet and his religion appeared 65 47. Of the Infidels’ Easter-day 70 48. Of the other Easter-day 71 49. Of the law of the Infidels 71 50. Why Machmet has forbidden wine to Infidels 72 51. Of a fellowship the Infidels have among themselves 73 52. How a Christian becomes an Infidel 74 53. What the Infidels believe of Christ 75 54. What the Infidels say of Christians 76 55. How Christians are said not to hold to their religion 77 56. How long ago it is, since Machmet lived 78 57. Of Constantinoppel 79 58. Of the Greeks 80 59. Of the Greek religion 81 60. How the city of Constantinoppel was built 83 61. How the Jassen have their marriages 85 62. Of Armenia 86 63. Of the religion of the Armenians 87 64. Of a Saint Gregory 89 65. Of a dragon and a unicorn 90 66. Why the Greeks and Armani are enemies 96 67. Through which countries I have come away 99 The Armenian Pater Noster 102 The Tartar Pater Noster 102
  • 55. SCHILTBERGER TO THE READER. I, Johanns Schiltberger, left my home near the city of Munich, situated in Payren, at the time that King Sigmund of Hungary left for the land of the Infidels. This was, counting from Christ’s birth, in the thirteen hundred and ninety-fourth year,1 with a lord named Leinhart Richartingen. And I came back again from the land of the Infidels, counting from Christ’s birth, fourteen hundred and twenty seven. All that I saw in the land of the Infidels, of wars, and that was wonderful, also what chief towns and seas I have seen and visited, you will find described hereafter, perhaps not quite completely, but I was a prisoner and not independent. But so far as I was able to understand and to note, so have I [noted] the countries and cities as they are called in those countries, and I here make known and publish many interesting and strange adventures, which are worth listening to. 1Neumann states in a note that this date, through the transcriber’s error, appears as 1344 in the Heidelberg MS.
  • 56. 1.—Of the first combat between King Sigmund and the Turks. From the first, King Sigmund appealed in the above-named year, thirteen hundred and ninety-four, to Christendom for assistance, at the time that the Infidels were doing great injury to Hungern. There came many people from all countries to help him;(1) then he took the people and led them to the Iron Gate, which separates Ungern from Pulgery and Walachy, and he crossed the Tunow into Pulgary, and made for a city called Pudem.(2) It is the capital of Pulgery. Then came the ruler of the country and of the city, and gave himself up to the king; then the king took possession of the city with three hundred men, good horse and foot soldiers, and then went to another city where were many Turks. There he remained five days, but the Turks would not give up the city; but the fighting men expelled them by force, and delivered the city to the king. Many Turks were killed and others made prisoners. The king took possession of this city also, with two hundred men, and continued his march towards another city called Schiltaw, but called in the Infidel tongue, Nicopoly.(3) He besieged it by water and by land for XVI days, then came the Turkish king, called Wyasit, with two hundred thousand men, to the relief of the city. When the king, Sigmund, heard this, he went one mile to meet him with his people, the number of whom were reckoned at sixteen thousand men. Then came the Duke of Walachy, called Werterwaywod,1(4) who asked the king to allow him to look at the winds.2 This the king allowed, and he took with him one thousand men for the purpose of looking at the winds, and returned to the king and told him that he had looked at the winds, and had seen twenty banners, and that there were ten thousand men under each banner, and each banner was separate from the other. When the king heard this, he wanted to arrange the order of battle. The Duke of Walachy asked that he might be the
  • 57. first to attack, to which the king would willingly have consented. When the Duke of Burguny heard this, he refused to cede this honour to any other person, for the just reason that he had come a great distance with six thousand men,(5) and had expended much money in the expedition, and he begged the king that he should be the first to attack. The king asked him to allow the Ungern to begin, as they had already fought with the Turks, and knew better than others how they were armed. This he would not allow to the Ungern, and assembled his men, attacked the enemy, and fought his way through two corps; and when he came to the third, he turned and would have retreated, but found himself surrounded, and more than half his horsemen were unhorsed, for the Turks aimed at the horses only, so that he could not get away, and was taken prisoner. When the king heard that the Duke of Burgony was forced to surrender, he took the rest of the people and defeated a body of twelve thousand foot soldiers that had been sent to oppose him. They were all trampled upon and destroyed, and in this engagement a shot killed the horse of my lord Lienhart Richartinger; and I, Hanns Schiltberger his runner, when I saw this, rode up to him in the crowd and assisted him to mount my own horse, and then I mounted another which belonged to the Turks, and rode back to the other runners. And when all the [Turkish] foot-soldiers were killed, the king advanced upon another corps which was of horse. When the Turkish king saw the king advancing, he was about to fly, but the Duke of Iriseh, known as the despot,(6) seeing this, went to the assistance of the Turkish king with fifteen thousand chosen men and many other bannerets, and the despot threw himself with his people on the king’s banner and overturned it; and when the king saw that the banner was overturned and that he could not remain, he took to flight.3 Then came he of Cily,4 and Hanns, Burgrave of Nuremberg, who took the king and conducted him to a galley on board of which he went to Constantinoppel. When the horse and foot soldiers saw that the king had fled, many escaped to the Tünow and went on board the shipping; but the vessels were so full that they could not all remain, and when they tried to get on board they struck them on
  • 58. the hands, so that they were drowned in the river; many were killed on the mountain as they were going to the Tunow. My lord Lienhart Richartinger, Wernher Pentznawer, Ulrich Kuchler, and little Stainer, all bannerets, were killed in the fight, also many other brave knights and soldiers. Of those who could not cross the water and reach the vessels, a portion were killed; but the larger number were made prisoners. Among the prisoners were the Duke of Burgony(7) and Hanns Putzokardo,5 and a lord named Centumaranto.6 These were two lords of France, and the Great Count of Hungern. And other mighty lords, horsemen, and foot-soldiers, were made prisoners, and I also was made a prisoner. 1This name appears as Martin in edition of 1814; Merter Waywod in edition of 1475; and Merte Weydwod in that of 1549. 2To reconnoitre. In the edition of 1814 the term employed is “zu recognosciren”. 3The battle of Nicopolis was fought September 28th, 1396. 4Herman of Cily. N. 5Boucicault, who has described the battle in his Memoirs. H. 6Saint Omer. F.
  • 59. 2.—How the Turkish king treated the prisoners. And now when the King Weyasat had had the battle, he went near the city where King Sigmund had encamped with his army, and then went to the battle-field and looked upon his people that were killed; and when he saw that so many of his people were killed, he was torn by great grief, and swore he would not leave their blood unavenged, and ordered his people to bring every prisoner before him the next day, by fair means or foul. So they came the next day, each with as many prisoners as he had made, bound with a cord. I was one of three bound with the same cord, and was taken by him who had captured us. When the prisoners were brought before the king, he took the Duke of Burgony that he might see his vengeance because of his people that had been killed. When the Duke of Burgony saw his anger, he asked him to spare the lives of several he would name; this was granted by the king. Then he selected twelve lords, his own countrymen, also Stephen Synüher and the lord Hannsen of Bodem.(1) Then each was ordered to kill his own prisoners, and for those who did not wish to do so the king appointed others in their place. Then they took my companions and cut off their heads, and when it came to my turn, the king’s son saw me and ordered that I should be left alive, and I was taken to the other boys, because none under xx years of age were killed, and I was scarcely sixteen years old. Then I saw the lord Hannsen Greiff, who was a noble of Payern, and four others, bound with the same cord. When he saw the great revenge that was taking place, he cried with a loud voice and consoled the horse- and foot-soldiers who were standing there to die. “Stand firm”, he said, “when our blood this day is spilt for the Christian faith, and we by God’s help shall become the children of heaven.” When he said this he knelt, and was beheaded together with his companions. Blood was spilled from morning until vespers, and when the king’s counsellors saw that so much blood was spilled and that still it did not stop, they rose and
  • 60. fell upon their knees before the king, and entreated him for the sake of God that he would forget his rage, that he might not draw down upon himself the vengeance of God, as enough blood was already spilled. He consented, and ordered that they should stop, and that the rest of the people should be brought together, and from them he took his share and left the rest to his people who had made them prisoners. I was amongst those the king took for his share, and the people that were killed on that day were reckoned at ten thousand men. The prisoners of the king were then sent to Greece to a chief city called Andranopoli, where we remained prisoners for fifteen days. Then we were taken by sea to a city called Kalipoli;(2) it is the city where the Turks cross the sea, and there three hundred of us remained for two months confined in a tower. The Duke of Burgony also was there in the upper part of the tower with those prisoners he had saved; and whilst we were there, the King Sigmund passed us on his way to Windischy land.(3) When the Turks heard this, they took us out of the tower and led us to the sea, and one after the other they abused the king and mocked him, and called to him to come out of the boat and deliver his people; and this they did to make fun of him, and skirmished a long time with each other on the sea. But they did not do him any harm, and so he went away.
  • 61. 3.—How Wyasit subjugated an entire country. On the third day after the Turkish king had killed the people and sent us prisoners to the above named city, he marched upon Ungern and crossed the river called the Saw, at a city called Mittrotz, and took it and all the country around; and then he went into the Duchy of Petaw, and took with him from the said country sixteen thousand men with their wives and children and all their property, and took the city of the above name and burnt it; and the people he took away and some he left in Greece.1(1) And after he passed the river called the Saw, he sent orders to Karipoli that we were to be taken across the sea; and when we were taken across the sea, we were taken to the king’s capital called Wursa, where we remained until he himself came. And when he arrived in the city he took the Duke of Burgony and those the duke had saved, and lodged them in a house near to his palace. The king then sent a lord named Hoder of Ungern, with sixty boys, as a mark of honour to the king-sultan;(2) and he would have sent me to the king-sultan, but I was severely wounded, having three wounds, so for fear I might die on the way I was left with the Turkish king. Other prisoners were sent as an offering to the king of Babilony(3) and the king of Persia,(4) also into White Tartary,2(5) into Greater Armenia,(6) and also into other countries. I was taken to the palace of the Turkish king; there for six years I was obliged to run on my feet with the others, wherever he went, it being the custom that the lords have people to run before them. After six years I deserved to be allowed to ride, and I rode six years with him, so that I was twelve years with him; and it is to be noted what the said Turkish king did during these twelve years, all of which is written down piece by piece. 1Styrian historians have overlooked this statement of Schiltberger. N.
  • 62. 2White Tartars, i.e., Free Tartars. White signifies free in the Tartar and Russian tongues; black, on the contrary, signifies subject- races or those that are tributary. N.
  • 63. 4.—How Wyasit made war on his brother-in-law, and killed him. From the first he was at war with his brother-in-law, who was called Caraman, and this name he had because of his country. The capital of the country is called Karanda,(1) and because he would not be subject to him, he marched upon him with one hundred and fifty thousand men. When he knew that King Weyasit had advanced, he went to meet him with seventy thousand men, the best he had in the land, and with whom he intended to resist the king. They met each other on the plain in front of the city called Konia, which belonged to the said lord, Caraman. Here they attacked each other and began to fight, and had on the same day two encounters by which one tried to overcome the other, and both sides had rest at night, that one might not do harm to the other. That same night Karaman made merry with trumpets, with drums, and with his guards, with the object of causing alarm to Weyasit; but Weyasit arranged with his people that they should not make a fire except for cooking, and should immediately again put it out. At night he sent thirty thousand men to the rear of the enemy, and said to them that when he should attack in the morning they should also attack. When the day broke, Weyasit went against the enemy, and the thirty thousand men attacked in the rear as they were ordered, and when Karaman saw that the enemy was attacking him in front and behind, he fled into his city of Konia, and remained in it to defend himself. Weyasit lay siege to the city for xi days without being able to take it; then the citizens sent word to Weyasit that they would surrender the city if he would secure to them their lives and property. To this he agreed. Then they sent word to say that they would retire from the walls when he came to storm, and thus he might take the city. And this occurred. And when Karaman saw that Weyasit was entering the city, he attacked him with his warriors, and fought with him in the town, and if he had received the least assistance from the
  • 64. inhabitants he would have forced Weyasit out of the city; but when he saw that he had no assistance, he fled, but was taken before Weyasit, who said to him: “Why wilt thou not be subject to me?” Karaman answered, “Because I am as great a lord as thyself.” Weyasit became angry, and asked three times if there was anybody who would rid him of Karaman. At the third time came one who took him aside and cut off his head and went back with it to Weyasit, who asked what he had done with him? He answered, “I have beheaded him.” Then he shed tears and ordered that another man should do to him what he did to Karaman, and he was taken to the place where he beheaded Karaman and he was also beheaded. This was done because Weyasit thought that nobody should have killed so mighty a lord, but should have waited until his lord’s anger had passed away. He then ordered that the head of Karaman should be fixed on a lance and carried about the country, so that other cities might submit to him on hearing that their lord was killed. After this he occupied the city of Konia with his people and marched upon the city of Karanda, and called upon them to surrender as he was their lord, and if they would not do so he would compel them with the sword. Then the citizens sent out to him four of their most eminent [fellow citizens], to beg that he would ensure to them their lives and their property, and begged, as their lord Karaman was dead, and they had two of his sons in the city, that he would appoint one of them to be their lord; and should he do so, they would surrender to him the city. He replied that he should spare their lives and property, but when he would have possession of the city, he should know what lord to appoint, whether the son of Karaman or one of his own sons. And so they parted. When the citizens heard Weyasit’s answer they would not give up the city, and said that although their lord was dead he had left two sons, under whom they will recover or die. And so they defended themselves against the king until the fifth day. And as Weyasit saw that they continued to resist, he sent for more people and ordered arquebuses to be brought, and platforms to be constructed. When Karaman’s sons and their mother saw this, they sent for the chief citizens and said to them: “You see plainly that we cannot resist Weyasit, who is too powerful for us; we should be
  • 65. sorry if you died for our sakes, and we have agreed with our mother that we will trust to his mercy.” The citizens were pleased at this, and the sons of Karaman and their mother, and the chief citizens of the city, opened the gates and went out. And as they were advancing, the mother took a son in each hand and went up to Weyasit, who, when he saw his sister with her sons, went out of his tent towards her, and when they were near him they threw themselves at his feet, kissed them, and begged for mercy, and they gave the keys of the gates and of the city. When the king saw this, he ordered his lords who were near him to raise them. When this was done he took possession of the city, and appointed one of his lords to be governor, and he sent his sister and her two sons to his capital called Wurssa.
  • 66. 5.—How Weyasit drives away the king of Sebast.(1) There was a vassal named Mirachamad who resided in a city called Marsüany; it was on the border of Karaman’s country. When Mirachamad heard that King Weyasit had conquered Karaman’s country, he sent to him to ask him to drive away also the king of Sebast, who was called Wurthanadin, who had seized upon his territory because he could not himself expel him, and he should give him the territory in exchange for one in his own country. Weyasit sent to his assistance his son Machamet with thirty thousand men, and they forcibly expelled the king called Wurthanadin out of the country.1 Then Mirachamad bestowed upon Machamet2 the capital and all the territory, because his first engagement had been in its behalf. Then Weyasit took Mirachamad with him to his own country, and gave him another territory for his own. 1 1394. 3Mouhammed, a younger son of Bajazet.
  • 67. 6.—What sixty of us Christians had agreed upon. And when Weyasit came to his capital, there were sixty of us Christians agreed that we should escape, and made a bond between ourselves and swore to each other that we should die or succeed together; and each of us took time to get ready, and at the time we met together, and chose two leaders from amongst ourselves by lot, and whatever they ordered we were to obey. Then we rose after midnight and rode to a mountain and came to it by daybreak. And when we came to the mountain we dismounted, and let our horses rest until sunrise, when we remounted and rode the same day and night. And when Weyasit heard that we had taken to flight, he sent five hundred horse with orders that we were to be found, that we were to be caught, and brought to him. They overtook us near a defile, and called to us to give ourselves up. This we would not do, and we dismounted from our horses and defended ourselves against them as well as we could. When their commander saw that we defended ourselves, he came forward and asked for peace for one hour. We consented. He came to us and asked us to give ourselves up as prisoners; he would answer for the safety of our lives. We said we would consult, and did consult, and gave him this answer: We knew that so soon as we were made prisoners, we should die so soon as we came before the king, and it would be better that we should die here, with arms in our hands, for the Christian faith. When the commander saw that we were determined, he again asked that we should give ourselves as prisoners, and promised on his oath that he would ensure our lives, and if the king was so angry as to want to kill us, he would let them kill him first. He promised this on his oath, and therefore we gave ourselves up as prisoners. He took us before the king, who ordered that we should be killed immediately; the commander went and knelt before the king, and said that he had trusted in his mercy and had promised us our lives, and asked him also that he should spare us because he had even
  • 68. sworn that such would be the case. The king then asked him if we had done any harm? He said: No. Then he ordered that we should be put into prison; there we remained for nine months as prisoners, during which time twelve of us died. And when it was the Easter-day of the Infidels, his eldest son Wirmirsiana,1(1) begged for us, then he set us free, and ordered that we should be brought to him; then we were obliged to promise him that we should never try to escape again, and he gave us back our horses and increased our pay. 1The Amir Souleiman. The other sons of Bajazet were Mouhammed and Mousa.
  • 69. 7.—How Wyasit took the city of Samson.(1) Afterwards, in the summer, Wyasit took eighty thousand men into a country called Genyck, and lay siege to a capital called Samson. This city was built by the strong man Samson, from whom it has its name. The lord of the country was of the same name as the country, Zymayd, and the king expelled the lord out of the land; and when it was heard in the city that their lord was driven away, the people gave themselves up to Weyasit, who occupied the city and all the country with his people.
  • 70. 8.—Of serpents and vipers. A great miracle is to be noted which took place near the said city of Samson, during the time that I was with Weyasit. There came around the city such a lot of vipers and serpents, that they took up the space of a mile all round. There is a country called Tcyenick which belongs to Sampson; it is a wooded country in which are many forests. One part of the vipers came from the said forests, and one part came out of the sea. The vipers remained for xi days, and then they fought with each other, and nobody dared to leave the city on account of the vipers, although they did no harm either to men or to cattle. Then the lord of the city and of the country gave orders that likewise no harm should be done to these reptiles, and said it was a sign and a manifestation from Almighty God. And now on the tenth day, the serpents and vipers fought with each other from morning until the going down of the sun, and when the lord and the people of the city saw what was done, the lord caused the gate to be opened, and rode out with a few people out of the city, and looked where the vipers were fighting, and saw that the vipers from the sea had to succumb to those of the forests. And the next morning early, the lord again rode out of the city to see if the reptiles were still there; he found none but dead vipers, which he ordered to be collected and counted. There were eight thousand. He then ordered a pit to be made, and ordered all to be thrown in and covered with earth, and he sent to Weyasit, who at that time was lord in Turkey, to tell him of the marvel. He took it for a piece of luck, as he had only just taken the city and country of Samson, and almost rejoiced that the forest adders had succumbed to the sea adders, and said it was a manifestation from Almighty God, and he hoped that as he was a powerful lord and king of the sea-board, so he would also, by the help of God the Almighty, become the powerful lord and king of the sea. Samson consists of two cities opposite to each other, and their walls are distant, one from the
  • 71. other, an arrow’s flight. In one of these cities there are Christians, and at that time the Italians of Genoa(1) possessed it. In the other are Infidels to whom the country belongs. At that time the lord of the city and country was a duke called Schuffmanes, son of [the duke of] Middle Pulgrey, the chief city of which country is Ternowa, (2) and who at that time had three hundred fortified towns, cities, and castles. This country was conquered by Weyasit who took the duke and his son. The father died in prison, and the son became converted to the faith of the Infidels, so that his life might be spared. Weyasit conquered Samson and the country, and conquered Zyenick; and the city and the country he gave to him for his lifetime, in place of his fatherland.
  • 72. 9.—How the Infidels remain in the fields with their cattle, in winter and summer. It is the custom among the Infidels for some lords to lead a wandering life with their cattle, and when they come to a country that has good pasturage, they rent it of the lord of the country for a time. There was a Turkish lord called Otman, who wandered about with his cattle, and in the summer came to a country called Tamast, and the capital of the country is also so called. He asked the king of Tamast, who was called Wurchanadin,(1) that he would lend him a pasturage where he and his cattle might feed during the summer. The king lent him such a pasturage, to which he went with his dependants and cattle, and remained there the summer; and in autumn he broke up and returned to his country, without the king’s permission and knowledge; and when the king heard of this he became very angry, and took one thousand men with him and went to the pasturage that Otman had occupied, and encamped there, and sent four thousand horsemen after Otman, and ordered that they should bring back Otman alive, with all his belongings. And when Otman heard that the king had sent after him, he hid himself in a mountain, so that those who rode after him could not find him; and they encamped on a meadow in front of the mountain where Otman was with his people, and remained there that night without troubling themselves about him. And when the day dawned, Otman took one thousand of his best horsemen to look at the winds, and when he saw that they were not on their guard, and were without care, he rode towards them and suddenly took them by surprise, so that they could not defend themselves, and many of them were killed; the others took to flight. The king was told how Otman had annihilated his expedition, but he would not believe it, and thought that fun was being made of him, until some of them came running to him. Even then he would not believe it, and sent one hundred horsemen to see if such was the case; and when the hundred
  • 73. horsemen went to see about it, Otman was on his way with his people to attack the king; and when he saw the hundred horsemen he overtook them, and came with them into the camp. And when the king and his people saw that they were overtaken, and that they could not defend themselves any more, they took to flight. The king himself had scarcely time to mount his horse, and took to flight to a mountain; but one of Otman’s servants saw him, and hastened after him on the mountain; then the king could fly no farther, and the soldier called upon him to surrender; but he would not give himself up. Then he took his bow and would have shot him, when the king made himself known and asked him to let him go, promising to give him a fine castle, and he wanted to give him the ring he had on his hand as a pledge. The soldier would not do so, and made him a prisoner and brought him to his lord. And Otman pursued the people all day until the evening, and killed many of them, and encamped where the king had stayed, and sent for the people and cattle that he had left to run about the mountains. And when the people came with the cattle, he took the king, and went to the capital called Tamastk, where he encamped with all his people, and sent word into the city that he had captured the king, and that if they would deliver to him the city, he would give peace and security. The city made this answer: If he had their king, they had his son, and they had lords enough, as he was too weak to be a lord. He then said to the king, that if he wanted his life to be spared, he should speak to the citizens that they give up the city. So they took him before the city, and he asked the citizens that they should deliver him from death, and give up the city to Otman. They replied: We will not give up the city to Otman, because he is too feeble a lord for us; and if thou shouldst no longer care to be our lord, we have thy son, whom we will have for our lord. When Otman heard this, he was angry, and seeing his anger, the king begged him to spare his life, promising to give him the city of Gaissaria, with all its dependancies. This Otman would not do, and he ordered the king to be beheaded in sight of the people of the city, and ordered that afterwards he should be quartered, each part being fixed on a stake stuck in the ground in sight of the city, and the head on the point of a lance, together with
  • 74. the four quarters. And whilst the king lay before the city, the king’s son sent to his father-in-law, the powerful ruler of White Tartary, that he should come to his assistance, because Otman had killed his father and many others, and that he was before the city. And so soon as his father-in-law heard this, he took with him all his people, with their wives, children, and all their cattle, as is the custom of the country, because he intended going to Tamast to deliver the country from Otman, and his people were numbered at forty thousand men, without including women and children. When Otman heard that the Tartar king was approaching, he went with his people to the mountains, where he encamped. The Tartar king encamped before the city, and so soon as Otman heard of it, he took fifteen hundred men and divided them into two parts, and when night came he marched upon them on both sides with loud cries. When the Tartar king heard of this, he thought they wanted to betray him, and fled into the city, which, when his people heard, they also took to flight. Otman pursued them and killed a great many, and captured much booty. They returned to their country, and Otman took with him to the mountain where he had left his cattle, the cattle and the booty that he had taken from them. Before it was day, the Tartar king rode after his people to make them turn back; this they would not do, so he turned back again. Then Otman again lay siege to the city, and invited them to give him the city, and he would do as he had promised. This they would not do, and sent to beg Weyasit to come and drive Otman out of the country, and they would surrender the city to him. Weyasit sent his eldest son, with twenty thousand horsemen and four thousand foot-soldiers, to the help of the town; and I also was in this expedition. And when he heard that the son of Weyasit was coming, he sent his property and cattle to the mountain where he had been, and he himself remained in the plain with one thousand horsemen. Then the king’s son sent two thousand horsemen to see if they could find Otman; and when they saw Otman, they attacked each other. And when they saw that they could not overcome him, they sent for assistance. Then came Weyasit’s son, with all his people. But when Otman saw him, he rode against him, and would quickly have put him to flight, for the people
  • 75. were not close together. The king’s son cried to his people, and they began to fight, and they fought for three hours consecutively. And when they were fighting with each other, four thousand foot-soldiers attacked the tent of Otman, and when he heard this, he sent four hundred horsemen, who, with the assistance of those who kept the goods and cattle, expelled the foot-soldiers out of the tent. Otman went with a force into the mountain, where his property was, and sent it away, and remained during that time before the mountain. Then the king’s son appeared before the city, and the citizens opened the gates of Damastchk, and rode out and asked him to take the city. This he would not do, and sent to his father, that he should come and take the city and territory. He came with one hundred and fifty thousand men, took the city and country, and gave them to his son Machmet, and not to him who had expelled Otman from being king of the city and country.(2)
  • 76. 10.—How Weyasit took a country that belonged to the Sultan. After Weyasit had installed his son in the kingdom, he sent to the king-sultan in respect to a city called Malathea,(1) and the country that belonged to the city, because the city and the country belonged to the above-named kingdom which was in the possession of the king-sultan, and therefore required that he should surrender the city of Malathea and the territory, because he had conquered the kingdom. The king-sultan sent word to him that he had won the kingdom by the sword, and he who wished to have it must also win it by the sword. When Weyasit received this answer, he went into the country with two hundred thousand men, and lay siege to the city for two months; and when he found that it would not surrender, he filled up the ditches and surrounded the city with his people, and began to storm. When they saw this they asked for mercy, and gave themselves up. Then he took the city and the country, and occupied it. At about the same time, the White Tartars besieged the city called Angarus, which belonged to Weyasit; and when he heard of this, he sent to its assistance his eldest son with thirty-two thousand men. He fought a battle, but he was obliged to return to Wyasit, who ordered more men, and sent him back again. But he fought with him, and took the Tartar lord and two vassals, and brought them as prisoners to Weyasit, and thus the White Tartars gave themselves up to Weyasit. He put another lord over them, took the three lords to his capital, and then marched against another city called Adalia,1 which belonged to the sultan, and the city is not far from Zypern; and in the country to which the city belongs, there are no other cattle but camels. After Weyasit took the city and the country, the country made him a present of ten thousand camels; and after he
  • 77. occupied the city and the country, he took the camels into his own country. 1Adalia or Satalia, on the sea-shore. William of Tyre so called the chief city of Pamphylia. The town lies, as correctly stated, opposite to Cyprus. N.
  • 78. 11.—Of the King-Sultan. About this time died the king-sultan, named Warchhoch, and his son named Joseph became king; but one of his father’s dependants went to war with him for the kingdom. Then Joseph sent to Weyasit, and became reconciled with him, and asked him that he should come to help him. So he sent twenty thousand men to help him, in which expedition I was also. Thus Joseph expelled his rival, and became a powerful king.(1) After this it was told him, that five hundred of his dependants were against him, and were in favour of his rival. He ordered that they should be taken to a plain, where they were all cut into two parts. Afterwards, we again returned to our lord, Weyasit.
  • 79. 12.—How Temerlin conquered the kingdom of Sebast. When Weyasit had expelled Otman from Tamast, as has already been stated, he went to his lord named Tämerlin, to whom he was subject, and complained of Weyasit, how he had driven him away from the kingdom of Tamask, which he had conquered, and at the same time asked him to help him to reconquer his kingdom. Tämerlin said that he would send to Weyasit, to restore the country. This he did, but Weyasit sent word that he would not give it up, for as he had won it by the sword, it might as well be his as another’s. So soon as Tämerlin heard this, he assembled ten hundred thousand men, and conducted them into the kingdom of Sebast, and lay siege to the capital, before which he remained XXI days, and he undermined the walls of the city in several places, and took the city by force, although there were in it five thousand horsemen sent by Weyasit.(1) They were all buried alive in this way. When Tämerlin took the city, the governor begged that he would not shed their blood. To this he consented, and so they were buried alive. Then he levelled the city, and carried away the inhabitants into captivity in his own country. There were also nine thousand virgins taken into captivity by Tämerlin to his own country.(2) Before he took the city, he had at least three thousand men killed. Then he returned to his own country.
  • 80. 13.—Weyasit conquers Lesser Armenia. Scarcely had Tämerlin returned to his own country,(1) than Weyasit assembled three hundred thousand men, and went into Lesser Ermenia and took it from Tämerlin, and took the capital called Ersingen, together with its lord who was named Tarathan,(2) and then went back to his own country. So soon as Tämerlin heard that Weyasit had conquered the said country, he went to meet him with sixteen hundred thousand men; and when Weyasit heard this, he went to meet him with fourteen hundred thousand men. They met near a city called Augury, where they fought desperately. Weyasit had quite thirty thousand men of White Tartary, whom he placed in the van at the battle. They went over to Tämerlin; then they had two encounters, but neither could overcome the other. Now Tämerlin had thirty-two trained elephants at the battle, and ordered, after mid-day, that they should be brought into the battle. This was done, and they attacked each other; but Weyasit took to flight, and went with at least one thousand horsemen to a mountain. Tämerlin surrounded the mountain so that he could not move, and took him.1 Then he remained eight months in the country, conquered more territory and occupied it, and then went to Weyasit’s capital and took him with him, and took his treasure, and silver and gold, as much as one thousand camels could carry; and he would have taken him into his own country, but he died2 on the way3 (3). And so I became Tämerlin’s prisoner, and was taken by him to his country. After this I rode after him. What I have described took place during the time that I was with Weyasit. 1July 20th, 1402. 2March 8th, 1403, at Aksheher. 3Schiltberger’s accounts agree perfectly with the statements made by Byzantine and Eastern historians. We are forced to conclude, after Hammer’s searching enquiries, that there is no
  • 81. truth whatever in the story of Bajasid having been confined by Timur in an iron cage. N.