Elements Of Causal Inference Foundations And Learning Algorithms Jonas Peters Dominik Janzing Bernhard Scholkopf

Elements Of Causal Inference Foundations And
Learning Algorithms Jonas Peters Dominik Janzing
Bernhard Scholkopf download
https://ebookbell.com/product/elements-of-causal-inference-
foundations-and-learning-algorithms-jonas-peters-dominik-janzing-
bernhard-scholkopf-53065206
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Elements Of Causal Inference Foundations And Learning Algorithms Jonas
Peters
https://ebookbell.com/product/elements-of-causal-inference-
foundations-and-learning-algorithms-jonas-peters-44668796
Elements Of Comparative Syntax Enoch Aboh Eric Haeberli Genoveva Pusks
https://ebookbell.com/product/elements-of-comparative-syntax-enoch-
aboh-eric-haeberli-genoveva-pusks-44998790
Elements Of Physical Hydrology Second Edition 2nd Edition George M
Hornberger
https://ebookbell.com/product/elements-of-physical-hydrology-second-
edition-2nd-edition-george-m-hornberger-45035172
Elements Of Chemical Reaction Engineering Global Edition 6th Edition
6th H Fogler
https://ebookbell.com/product/elements-of-chemical-reaction-
engineering-global-edition-6th-edition-6th-h-fogler-46182274

Elements Of Mechanical Engineering Kr Gopala Krishna
https://ebookbell.com/product/elements-of-mechanical-engineering-kr-
gopala-krishna-46653832
Elements Of Rock Physics And Their Application To Inversion And Avo
Studies Robert S Gullco
https://ebookbell.com/product/elements-of-rock-physics-and-their-
application-to-inversion-and-avo-studies-robert-s-gullco-46667970
Elements Of Chemistry Parts 13 Penny Reid
https://ebookbell.com/product/elements-of-chemistry-parts-13-penny-
reid-46940580
Elements Of Information Theory Second Edition 2nd Ed Complete
Instructor Solution Manual Solutions Latest As Of Aug 2007 2nd Edition
Thomas M Cover
https://ebookbell.com/product/elements-of-information-theory-second-
edition-2nd-ed-complete-instructor-solution-manual-solutions-latest-
as-of-aug-2007-2nd-edition-thomas-m-cover-47078658
Elements Of Classical Plasticity Theory Andreas Chsner
https://ebookbell.com/product/elements-of-classical-plasticity-theory-
andreas-chsner-47214936

Adaptive Computation and Machine Learning
Francis Bach, Editor
Christopher Bishop, David Heckerman, Michael Jordan, and Michael
Kearns, Associate Editors
A complete list of books published in The Adaptive Computation and
Machine Learning series appears at the back of this book.

Elements of Causal Inference
Foundations and Learning Algorithms
Jonas Peters, Dominik Janzing, and Bernhard
Schölkopf
The MIT Press
Cambridge, Massachusetts
London, England

© 2017 Massachusetts Institute of Technology
This work is licensed to the public under a Creative Commons
Attribution- NonCommercial-NoDerivatives 4.0 license
(international):
http://creativecommons.org/licenses/by-nc-nd/4.0/
All rights reserved except as licensed pursuant to the Creative
Commons license identified above. Any reproduction or other use
not licensed as above, by any electronic or mechanical means
(including but not limited to photocopying, public distribution, online
display, and digital information storage and retrieval) requires
permission in writing from the publisher.
This book was set in LaTeX by the authors.
Printed and bound in the United States of America.
Library of Congress Cataloging-in-Publication Data
Names: Peters, Jonas. | Janzing, Dominik. | Schölkopf, Bernhard.
Title: Elements of causal inference : foundations and learning
algorithms / Jonas Peters, Dominik Janzing, and Bernhard
Schölkopf.
Description: Cambridge, MA : MIT Press, 2017. | Series: Adaptive
computation and machine learning series | Includes bibliographical
references and index.
Identifiers: LCCN 2017020087 | ISBN 9780262037310 (hardcover :
alk. paper)
Subjects: LCSH: Machine learning. | Logic, Symbolic and
mathematical. | Causation. | Inference. | Computer algorithms.

Classification: LCC Q325.5 .P48 2017 | DDC 006.3/1–dc23
LC record available at https://lccn.loc.gov/2017020087
d_r0

To all those who enjoy the pursuit of causal insight

Contents
Preface
Notation and Terminology
1 Statistical and Causal Models
1.1 Probability Theory and Statistics
1.2 Learning Theory
1.3 Causal Modeling and Learning
1.4 Two Examples
2 Assumptions for Causal Inference
2.1 The Principle of Independent Mechanisms
2.2 Historical Notes
2.3 Physical Structure Underlying Causal Models
3 Cause-Effect Models
3.1 Structural Causal Models
3.2 Interventions
3.3 Counterfactuals
3.4 Canonical Representation of Structural Causal Models
3.5 Problems
4 Learning Cause-Effect Models
4.1 Structure Identifiability
4.2 Methods for Structure Identification

4.3 Problems
5 Connections to Machine Learning, I
5.1 Semi-Supervised Learning
5.2 Covariate Shift
5.3 Problems
6 Multivariate Causal Models
6.1 Graph Terminology
6.2 Structural Causal Models
6.3 Interventions
6.4 Counterfactuals
6.5 Markov Property, Faithfulness, and Causal Minimality
6.6 Calculating Intervention Distributions by Covariate
Adjustment
6.7 Do-Calculus
6.8 Equivalence and Falsifiability of Causal Models
6.9 Potential Outcomes
6.10 Generalized Structural Causal Models Relating Single
Objects
6.11 Algorithmic Independence of Conditionals
6.12 Problems
7 Learning Multivariate Causal Models
7.1 Structure Identifiability
7.2 Methods for Structure Identification
7.3 Problems
8 Connections to Machine Learning, II
8.1 Half-Sibling Regression
8.2 Causal Inference and Episodic Reinforcement Learning
8.3 Domain Adaptation
8.4 Problems
9 Hidden Variables

9.1 Interventional Sufficiency
9.2 Simpson’s Paradox
9.3 Instrumental Variables
9.4 Conditional Independences and Graphical Representations
9.5 Constraints beyond Conditional Independence
9.6 Problems
10 Time Series
10.1 Preliminaries and Terminology
10.2 Structural Causal Models and Interventions
10.3 Learning Causal Time Series Models
10.4 Dynamic Causal Modeling
10.5 Problems
Appendices
Appendix A Some Probability and Statistics
A.1 Basic Definitions
A.2 Independence and Conditional Independence Testing
A.3 Capacity of Function Classes
Appendix B Causal Orderings and Adjacency Matrices
Appendix C Proofs
C.1 Proof of Theorem 4.2
C.2 Proof of Proposition 6.3
C.3 Proof of Remark 6.6

Bibliography
Index
List of Figures
Figure I: This figure depicts the stronger dependences among the
chapters (there exist many more less-pronounced relations). We
suggest that the reader begins with Chapter 1, 3, or 6.
Figure 1.1: Terminology used by the present book for various
probabilistic inference problems (bottom) and causal inference
problems (top); see Section 1.3. Note that we use the term
“inference” to include both learning and reasoning.
Figure 1.2: Reichenbach’s common cause principle establishes a
link between statistical properties and causal structures. A statistical
dependence between two observables X and Y indicates that they
are caused by a variable Z, often referred to as a confounder (left).
Here, Z may coincide with either X or Y, in which case the figure
simplifies (middle/right). The principle further argues that X and Y are
statistically independent, conditional on Z. In this figure, direct
causation is indicated by arrows; see Chapters 3 and 6.
Figure 1.3: Two structural causal models of handwritten digit data
sets. In the left model (i), a human is provided with class labels Y
and produces images X. In the right model (ii), the human decides
which class to write (Z) and produces both images and class labels.
For suitable functions f, g, h and noise variables NX, MX, MY, Z, the

two models produce the same observable distribution PX,Y, yet they
are interventionally different; see Section 1.4.1.
Figure 1.4: The activity of two genes (top: gene A; bottom: gene B)
is strongly correlated with the phenotype (black dots). However, the
best prediction for the phenotype when deleting the gene, that is,
setting its activity to 0 (left), depends on the causal structure (right).
If a common cause is responsible for the correlation between gene
and phenotype, we expect the phenotype to behave under the
intervention as it usually does (bottom right), whereas the
intervention clearly changes the value of the phenotype if it is
causally influenced by the gene (top right). The idea of this figure is
based on Peters et al. [2016].
Figure 2.1: The left panel shows a generic view of the (separate)
parts comprising a Beuchet chair. The right panel shows the illusory
percept of a chair if the parts are viewed from a single, very special
vantage point. From this accidental viewpoint, we perceive a chair.
(Image courtesy of Markus Elsholz.)
Figure 2.2: The principle of independent mechanisms and its
implications for causal inference (Principle 2.1).
Figure 2.3: Early path diagram; dam and sire are the female and
male parents of a guinea pig, respectively. The path coefficients
capture the importance of a given path, defined as the ratio of the
variability of the effect to be found when all causes are constant
except the one in question, the variability of which is kept
unchanged, to the total variability. (Reproduced from Wright [1920].)
Figure 2.4: Simple example of the independence of initial state and
dynamical law: beam of particles that are scattered at an object. The
outgoing particles contain information about the object while the
incoming do not.
Figure 4.1: Joint density over X and Y for an identifiable example.
The blue line is the function corresponding to the forward model Y :=
0.5·X+NY, with uniformly distributed X and NY; the gray area indicates
the support of the density of (X, Y). Theorem 4.2 states that there
cannot be any valid backward model since the distribution of (X, NY)

is non-Gaussian. The red line characterized by (b, c) is the least
square fit minimizing 𝔼 [X − bY − c]2. This is not a valid backward
model X = bY + c + NX since the resulting noise NX would not be
independent of Y (the size of the support of NX would differ for
different values of Y).
Figure 4.2: Joint density over X and Y for two non-identifiable
examples. The left panel shows the linear Gaussian case and the
right panel shows a slightly more complicated example, with “fine-
tuned” parameters for function, input, and noise distribution (the
latter plot is based on kernel density estimation). The blue function fY
corresponds to the forward model Y := fY (X)+NY, and the red function
fX to the backward model X := fX (Y)+NX.
Figure 4.3: Only carefully chosen parameters allow ANMs in both
directions (radii correspond to probability values); see Theorem 4.6.
The sets described by the theorem are C0 = {a1, a2,…, a8} and C1 =
{b1,b2,…, b8}. The function f takes the values c0 and c1 on C0 and C1,
respectively.
Figure 4.4: Visualization of the idea of IGCI: Peaks of pY tend to
occur in regions where f has small slope and f−1 has large slope
(provided that pX has been chosen independently of f). Thus pY
contains information about f−1. IGCI can be generalized to non-
differentiable functions f [Janzing et al., 2015].
Figure 4.5: We are given a sample from the underlying distribution
and perform a linear regression in the directions X → Y (left) and Y
→ X (right). The fitted functions are shown in the top row, the
corresponding residuals are shown in the bottom row. Only the
direction X → Y yields independent residuals; see also Figure 4.1.
Figure 4.6: Relation between average temperature in degrees
Celsius (Y) and altitude in meters (X) of places in Germany. The data
are taken from “Deutscher Wetterdienst,” see also Mooij et al.
[2016]. A nonlinear function (which is close to linear in the regime far
away from sea level) with additive noise fits these empirical
observations reasonably well.

Figure 5.1: Top: a complicated mechanism ϕ called the ribosome
translates mRNA information X into a protein chain Y.2 Predicting the
protein from the mRNA is an example of a causal learning problem,
where the direction of prediction (green arrow) is aligned with the
direction of causation (red). Bottom: In handwritten digit recognition,
we try to infer the class label Y (i.e., the writer’s intention) from an
image X produced by a writer. This is an anticausal problem.
Figure 5.2: The benefit of SSL depends on the causal structure.
Each column of points corresponds to a benchmark data set from
the UCI repository and shows the performance of six different base
classifiers augmented with self-training, a generic method for SSL.
Performance is measured by percentage decrease of error relative
to the base classifier, that is, (error(base) – error(self-
train))/error(base). Self-training overall does not help for the causal
data sets, but it does help for some of the anticausal/confounded
data sets [from Schölkopf et al., 2012].
Figure 5.3: In this example, SSL reduces the loss even in the causal
direction. Since for every x, the label zero is a priori more likely than
the label one, the expected number of errors is minimized when a
function is chosen that attains one at a point x where p(x) is minimal
(here: x = 3).
Figure 5.4: Example where PX changes to in a way that suggests
that PY has changed and PX|Y remained the same. When Y is binary
and known to be the cause of X, observing that PX is a mixture of two
Gaussians makes it plausible that the two modes correspond to the
two different labels y = 0,1. Then, the influence of Y on X consists
just in shifting the mean of the Gaussian (which amounts to an ANM
— see Section 4.1.4), which is certainly a simple explanation for the
joint distribution. Observing furthermore that the weights of the
mixture changed from one data set to another one makes it likely
that this change is due to the change of PY.
Figure 5.5: Example where X causes Y and, as a result, PY and PX|Y
contain information about each other. Left: PX is a mixture of sharp
peaks at the positions s1, s2, s3. Right: PY is obtained from PX by

convolution with Gaussian noise with zero mean and thus consists of
less sharp peaks at the same positions s1, s2, s3. Then PX|Y also
contains information about s1, s2, s3 (see Problem 5.1).
Figure 6.1: Example of an SCM (left) with corresponding graph
(right). There is only one causal ordering π (that satisfies 3 ↦ 1, 1 ↦
2, 2 ↦ 3, 4 ↦ 4).
Figure 6.2: Causal models as SCMs do not only model an
observational distribution P (Proposition 6.3) but also intervention
distributions (Section 6.3) and counterfactuals (Section 6.4).
Figure 6.3: Simplified description of randomized studies. T denotes
the treatment, P and B the patient’s psychology and some
biochemical state, and R indicates whether the patient recovers. The
randomization over T removes the influence of any other variable on
T, and thus there cannot be any hidden common cause between T
and R. We distinguish between two different effects: the placebo
effect via P and the biochemical effect via B.
Figure 6.4: Two Markov equivalent DAGs (left and center); these are
the only two DAGs in the corresponding Markov equivalence class
that can be represented by the CPDAG on the right-hand side.
Figure 6.5: Only the path X ← A → K → Y is a “backdoor path” from
X to Y. The set Z = {K} satisfies the backdoor criterion (see
Proposition 6.41 (ii)); but Z = {F,C,K} is also a valid adjustment set
for (X,Y); see Proposition 6.41 (iii).
Figure 7.1: The figure summarizes two approaches for the
identification of causal structures. Independence-based methods
(top) test for conditional independences in the data; these properties
are related to the graph structure by the Markov condition and
faithfulness. Often, the graph is not uniquely identifiable; the method
may therefore output different graphs 𝒢 and 𝒢′. Alternatively, one
may restrict the model class and fit the SCM directly (bottom).
Figure 8.1: The causal structure that applies to the exoplanet search
problem. The underlying signal of interest Q can only be measured
as a noisy version Y. If the same noise source also corrupts
measurements of other signals that are independent of Q, those

measurements can be used for denoising. In our example, the
telescope N constitutes systematic noise that affects measurements
X and Y of independent light curves.
Figure 8.2: Every time a planet occludes a part of the star, the light
intensity decreases. If the planet orbits the star, this phenomenon
occurs periodically. (Image courtesy of Nikola Smolenski,
https://en.wikipedia.org/wiki/File:Planetary_transit.svg, [CC BY-SA
3.0]. Image has been edited for clarity and style.)
Figure 8.3: The graph describes an episodic reinforcement learning
problem. The action variables Ai influence the system’s next state
Si+1. The variable Y describes the output or return that we receive
after one episode. This return Y may depend on the actions, too
(edges omitted for clarity); it is often modelled as the (possibly
weighted) sum of rewards that are received after each decision; see
Section 8.2.3. The whole system can be confounded by an
unobserved variable H. The bold, red edges indicate the conditionals
that the player can influence, that is, the strategy. Equation (8.4)
estimates the expected outcome [Y] under a strategy from data
obtained using strategy π. The equation still holds, when there are
additional edges from the actions A to H and/or Y.
Figure 8.4: Here, there exist variables F1,…,F4 that contain all
relevant information about the states S1,…,S4 in the sense that
Equations (8.5) and (8.6) hold. Equation (8.6) is not represented in
the graph. Then, it suffices if the actions Aj depend on Fj–1 (red, solid
lines) rather than Sj–1 (red, dashed lines). In the blackjack example,
the Sj’s encode the dealer’s hand and player’s hand including suits,
while the Fj encode the same information except for suits (suits do
not have an influence on the outcome of blackjack). Since Fj take
fewer values than Sj, the optimal strategy becomes easier to learn.
Figure 8.5: Example for the placement of advertisements. The target
variable Y indicates whether a user has clicked on one of the shown
ads. H (unknown) and S (known) are state variables and the action A
corresponds to the mainline reserve, a real-valued parameter that
determines how many ads are shown in the mainline. F is a discrete

variable indicating the (known) number of ads placed in the mainline.
Although the conditional p(a | s) is randomized over, we may use p(f
| s) for the reweighting (see Proposition 8.2).
Figure 9.1: Both graphs represent interventionally equivalent SCMs
for the model described in Example 9.2. While only the second
representation renders X and Y causally sufficient, X and Y are
interventionally sufficient independently of the representation.
Figure 9.2: Left: setting of an instrumental variable (Section 9.3). A
famous example is a randomized clinical trial with non-compliance: Z
is the treatment assignment, X the treatment and Y the outcome.
Right: Y-structure; see Section 9.4.1.
Figure 9.3: Starting with an SCM on the left-hand side, the three
graphs on the right encode the set of conditional independences (A
⫫ C). Due to an erroneous causal interpretation, the DAG is not
desirable as an output of a causal learning method. In this example,
the IPG and the latent projection (ADMG) are equal to the MAG.
Figure 9.4: This example is taken from Richardson and Spirtes
[2002, Figure 2(i)]. It shows that DAGs are not closed under
marginalization. There is no DAG over nodes O = {A,B,C,D} that
encodes all conditional independences from the graph including H.
Figure 9.5: Any distribution that is Markovian with respect to this
graph satisfies the Verma constraint (9.3), a non-independence
constraint that appears in the marginal distribution over A, B, C, and
D; the dashed variable H is unobserved [Verma and Pearl, 1991].
Figure 9.6: Two important examples of latent structures that entail
inequality constraints.
Figure 9.7: DAG that is not able to generate a joint distribution over
X, Y, and Z, for which all three observed variables attain
simultaneously 0 or 1 with probability 1/2 each.
Figure 9.8: If the graph corresponds to a linear SCM, the entailed
distribution will satisfy the tetrad constraints (9.12)–(9.14).
Figure 9.9: The figure shows a scatter plot for PX,Y. The red line
describes the manifold M; see Equation (9.18).

Figure 9.10: Detecting low-complexity intermediate variables: if the
path between X and Y is blocked by some variable U that attains
only a few values, PY|X often shows typically properties as a
“fingerprint” of U.
Figure 9.11: Visualization of a pure and a non-pure conditional.
Figure 9.12: Another example of a non-pure conditional: the line
connecting and can be extended without leaving the
simplex.
Figure 10.1: Example of a time series with no instantaneous effects.
Figure 10.2: Example of a time series with instantaneous effects.
Figure 10.3: Summary graph of the full time graphs shown in
Figures 10.1 and 10.2.
Figure 10.4: Example of a subsampled time series: only the
variables in the shaded areas are observed.
Figure 10.5: Two DAGs that are not Markov equivalent although
they coincide up to instantaneous effects.
Figure 10.6: Typical scenario, in which Granger causality works: if all
arrows from X to Y were missing, Yt would be conditionally
independent of the past values of X, given its own past. Here, Yt
does depend on the past values of X, given its own past. Thus,
condition (10.4) proves the existence of an influence from X to Y.
Figure 10.7: In these examples, Granger causality infers an
incorrect graph structure.
Figure 10.8: Two scenarios with instantaneous effects, one where
Granger causality fails to detect them (a) and one where it does not
(b).
List of Tables

Table 1.1: A simple taxonomy of models. The most detailed model
(top) is a mechanistic or physical one, usually involving sets of
differential equations. At the other end of the spectrum (bottom), we
have a purely statistical model; this model can be learned from data,
but it often provides little insight beyond modeling associations
between epiphenomena. Causal models can be seen as descriptions
that lie in between, abstracting away from physical realism while
retaining the power to answer certain interventional or counterfactual
questions. See Mooij et al. [2013] for a discussion of the link
between physical models and structural causal models, and Section
6.3 for a discussion of interventions.
Table 6.1: A classic example of Simpson’s paradox. The table
reports the success rates of two treatments for kidney stones [Bottou
et al., 2013, Charig et al., 1986, tables I and II]. Although the overall
success rate of treatment b seems better (any bold number is largest
in its column), treatment b performs worse than treatment a on both
patients with small kidney stones and patients with large kidney
stones (see Examples 6.37 and Section 9.2).
Table 6.2: This table presents Example 3.4 using potential
outcomes. For each patient (or unit), we observe only one of the two
potential outcomes. The observed information has a gray
background. The treatment T is helpful for almost all patients. Only in
2 of 200 cases, the treatment harms the patient and blinds him B =
1. Although assigning the treatment (T = 1) is a good idea in most
cases, for patient u = 120 it was exactly the wrong decision.
Table 7.1: Summary of some known identifiability results for
Gaussian noise. Results for non-Gaussian noise identifiability results
are available, too, but they are more technical.
Table 8.1: In domain generalization, the test data come from an
unseen domain, whereas in multi-task learning, some data in the test
domain(s) are available.
Table 9.1: Consider an SCM over (observed) variables O and
(hidden) variables H that induces a distribution PO,V. How do we
model the observed distribution PO? We would like to use an SCM

with (arbitrarily many) latent variables. This model class, however,
has bad properties for causal learning. This table summarizes some
alternative model classes (current research focuses especially on
MAGs and ADMGs).
Table B.1: The number of DAGs depending on the number d of
nodes, taken from http://oeis.org/A003024 [OEIS Foundation Inc.,
2017]. The length of the numbers grows faster than any linear term.

Preface
Causality is a fascinating topic of research. Its mathematization has
only relatively recently started, and many conceptual problems are
still being debated — often with considerable intensity.
While this book summarizes the results of spending a decade
assaying causality, others have studied this problem much longer
than we have, and there already exist books about causality,
including the comprehensive treatments of Pearl [2009], Spirtes et
al. [2000], and Imbens and Rubin [2015]. We hope that our book is
able to complement existing work in two ways.
First, the present book represents a bias toward a subproblem of
causality that may be considered both the most fundamental and the
least realistic. This is the cause-effect problem, where the system
under analysis contains only two observables. We have studied this
problem in some detail during the last decade. We report much of
this work, and try to embed it into a larger context of what we
consider fundamental for gaining a selective but profound
understanding of the issues of causality. Although it might be
instructive to study the bivariate case first, following the sequential
chapter order, it is also possible to directly start reading the
multivariate chapters; see Figure I.

Figure I: This figure depicts the stronger dependences among the chapters (there
exist many more less-pronounced relations). We suggest that the reader begins
with Chapter 1, 3, or 6.
And second, our treatment is motivated and influenced by the
fields of machine learning and computational statistics. We are
interested in how methods thereof can help with the inference of
causal structures, and even more so whether causal reasoning can
inform the way we should be doing machine learning. Indeed, we
feel that some of the most profound open issues of machine learning
are best understood if we do not take a random experiment
described by a probability distribution as our starting point, but
instead we consider causal structures underlying the distribution.
We try to provide a systematic introduction into the topic that is
accessible to readers familiar with the basics of probability theory
and statistics or machine learning (for completeness, the most
important concepts are summarized in Appendices A.1 and A.2).
While we build on the graphical approach to causality as
represented by the work of Pearl [2009] and Spirtes et al. [2000], our
personal taste influenced the choice of topics. To keep the book
accessible and focus on the conceptual issues, we were forced to
devote regrettably little space to a number of significant issues in
causality, be it advanced theoretical insights for particular settings or

various methods of practical importance. We have tried to include
references to the literature for some of the most glaring omissions,
but we may have missed important topics.
Our book has a number of shortcomings. Some of them are
inherited from the field, such as the tendency that theoretical results
are often restricted to the case where we have infinite amounts of
data. Although we do provide algorithms and methodology for the
finite data case, we do not discuss statistical properties of such
methods. Additionally, at some places we neglect measure theoretic
issues, often by assuming the existence of densities. We find all of
these questions both relevant and interesting but made these
choices to keep the book short and accessible to a broad audience.
Another disclaimer is in order. Computational causality methods
are still in their infancy, and in particular, learning causal structures
from data is only doable in rather limited situations. We have tried to
include concrete algorithms wherever possible, but we are acutely
aware that many of the problems of causal inference are harder than
typical machine learning problems, and we thus make no promises
as to whether the algorithms will work on the reader’s problems.
Please do not feel discouraged by this remark — causal learning is a
fascinating topic and we hope that reading this book may convince
you to start working on it.
We would have not been able to finish this book without the
support of various people.
We gratefully acknowledge support for a Research in Pairs stay of
the three authors at the Mathematisches Forschungsinstitut
Oberwolfach, during which a substantial part of this book was
written.
We thank Michel Besserve, Peter Bühlmann, Rune Christiansen,
Frederick Eberhardt, Jan Ernest, Philipp Geiger, Niels Richard
Hansen, Alain Hauser, Biwei Huang, Marek Kaluba, Hansruedi
Künsch, Steffen Lauritzen, Jan Lemeire, David Lopez-Paz, Marloes
Maathuis, Nicolai Meinshausen, Søren Wengel Mogensen, Joris
Mooij, Krikamol Muandet, Judea Pearl, Niklas Pfister, Thomas
Richardson, Mateo Rojas-Carulla, Eleni Sgouritsa, Carl Johann
Simon-Gabriel, Xiaohai Sun, Ilya Tolstikhin, Kun Zhang, and Jakob

Zscheischler for many helpful comments and interesting discussions
during the time this book was written. In particular, Joris and Kun
were involved in much of the research that is presented here.
We thank various students at Karlsruhe Institute of Technology,
Eidgenössische Technische Hochschule Zürich, and University of
Tübingen for proofreading early versions of this book and for asking
many inspiring questions.
Finally, we thank the anonymous reviewers and the copyediting
team from Westchester Publishing Services for their helpful
comments, and the staff from MIT Press, in particular Marie Lufkin
Lee and Christine Bridget Savage, for providing kind support during
the whole process.
København and Tübingen, August 2017
Jonas Peters
Dominik Janzing
Bernhard Schölkopf

Notation and Terminology
X, Y, Z random variable; for noise variables, we
use N, NX, Nj,…
x value of a random variable X
P probability measure
PX probability distribution of X
an i.i.d. sample of size n; sample index is
usually i
PY|X =x conditional distribution of Y given X = x
PY|X collection of PY|X=x for all x; for short:
conditional of Y given X
p density (either probability mass function or
probability density function)
pX density of PX
p(x) density of PX evaluated at the point x
p(y|x) (conditional) density of PY|X=x evaluated at y
𝔼[X] expectation of X
var[X] variance of X
cov[X, Y] covariance of X, Y
X ⫫ Y independence between random variables
X and Y
X ⫫ Y | Z conditional independence
X = (X1,…, Xd) random vector of length d; dimension index
is usually j
C structural causal model
intervention distribution

counterfactual distribution
𝒢 graph
parents, descendants, and ancestors of
node X in graph 𝒢

1
Statistical and Causal Models
Using statistical learning, we try to infer properties of the
dependence among random variables from observational data. For
instance, based on a joint sample of observations of two random
variables, we might build a predictor that, given new values of only
one of them, will provide a good estimate of the other one. The
theory underlying such predictions is well developed, and —
although it applies to simple settings — already provides profound
insights into learning from data. For two reasons, we will describe
some of these insights in the present chapter. First, this will help us
appreciate how much harder the problems of causal inference are,
where the underlying model is no longer a fixed joint distribution of
random variables, but a structure that implies multiple such
distributions. Second, although finite sample results for causal
estimation are scarce, it is important to keep in mind that the basic
statistical estimation problems do not go away when moving to the
more complex causal setting, even if they seem small compared to
the causal problems that do not appear in purely statistical learning.
Building on the preceding groundwork, the chapter also provides a
gentle introduction to the basic notions of causality, using two
examples, one of which is well known from machine learning.
1.1 Probability Theory and Statistics

Probability theory and statistics are based on the model of a random
experiment or probability space (Ω, 𝓕, P). Here, Ω is a set
(containing all possible outcomes), 𝓕 is a collection of events A ⊆ Ω,
and P is a measure assigning a probability to each event. Probability
theory allows us to reason about the outcomes of random
experiments, given the preceding mathematical structure. Statistical
learning, on the other hand, essentially deals with the inverse
problem: We are given the outcomes of experiments, and from this
we want to infer properties of the underlying mathematical structure.
For instance, suppose that we have observed data
where xi ∈ 𝒳 are inputs (sometimes called covariates or cases)
and yi ∈ 𝒴 are outputs (sometimes called targets or labels). We
may now assume that each (xi, yi), i = 1,…,n, has been generated
independently by the same unknown random experiment. More
precisely, such a model assumes that the observations (x1, y1),…,
(xn, yn) are realizations of random variables (X1,Y1),…, (Xn, Yn) that
are i.i.d. (independent and identically distributed) with joint
distribution PX,Y. Here, X and Y are random variables taking values
in metric spaces 𝒳 and 𝒴.1 Almost all of statistics and machine
learning builds on i.i.d. data. In practice, the i.i.d. assumption can be
violated in various ways, for instance if distributions shift or
interventions in a system occur. As we shall see later, some of these
are intricately linked to causality.
We may now be interested in certain properties of PX,Y, such as:
(i) the expectation of the output given the input, f (x) = 𝔼[Y | X = x],
called regression, where often 𝒴 = ℝ,
(ii) a binary classifier assigning each x to the class that is more
likely, , where 𝒴 = {±1},
(iii) the density pX,Y of PX,Y (assuming it exists).

In practice, we seek to estimate these properties from finite data
sets, that is, based on the sample (1.1), or equivalently an empirical
distribution that puts a point mass of equal weight on each
observation.
This constitutes an inverse problem: We want to estimate a
property of an object we cannot observe (the underlying distribution),
based on observations that are obtained by applying an operation (in
the present case: sampling from the unknown distribution) to the
underlying object.
1.2 Learning Theory
Now suppose that just like we can obtain f from PX,Y, we use the
empirical distribution to infer empirical estimates fn. This turns out to
be an ill-posed problem [e.g., Vapnik, 1998], since for any values of
x that we have not seen in the sample (x1,y1),…, (xn, yn), the
conditional expectation is undefined. We may, however, define the
function f on the observed sample and extend it according to any
fixed rule (e.g., setting f to +1 outside the sample or by choosing a
continuous piecewise linear f). But for any such choice, small
changes in the input, that is, in the empirical distribution, can lead to
large changes in the output. No matter how many observations we
have, the empirical distribution will usually not perfectly approximate
the true distribution, and small errors in this approximation can then
lead to large errors in the estimates. This implies that without
additional assumptions about the class of functions from which we
choose our empirical estimates fn, we cannot guarantee that the
estimates will approximate the optimal quantities f in a suitable
sense. In statistical learning theory, these assumptions are
formalized in terms of capacity measures. If we work with a function
class that is so rich that it can fit most conceivable data sets, then it
is not surprising if we can fit the data at hand. If, however, the
function class is a priori restricted to have small capacity, then there
are only a few data sets (out of the space of all possible data sets)
that we can explain using a function from that class. If it turns out

that nevertheless we can explain the data at hand, then we have
reason to believe that we have found a regularity underlying the
data. In that case, we can give probabilistic guarantees for the
solution’s accuracy on future data sampled from the same
distribution PX, Y.
Another way to think of this is that our function class has
incorporated a priori knowledge (such as smoothness of functions)
consistent with the regularity underlying the observed data. Such
knowledge can be incorporated in various ways, and different
approaches to machine learning differ in how they handle the issue.
In Bayesian approaches, we specify prior distributions over function
classes and noise models. In regularization theory, we construct
suitable regularizers and incorporate them into optimization
problems to bias our solutions.
The complexity of statistical learning arises primarily from the fact
that we are trying to solve an inverse problem based on empirical
data — if we were given the full probabilistic model, then all these
problems go away. When we discuss causal models, we will see that
in a sense, the causal learning problem is harder in that it is ill-posed
on two levels. In addition to the statistical ill-posed-ness, which is
essentially because a finite sample of arbitrary size will never contain
all information about the underlying distribution, there is an ill-posed-
ness due to the fact that even complete knowledge of an
observational distribution usually does not determine the underlying
causal model.
Let us look at the statistical learning problem in more detail,
focusing on the case of binary pattern recognition or classification
[e.g., Vapnik, 1998], where 𝒴 = {±1}. We seek to learn f : 𝒳 → 𝒴
based on observations (1.1), generated i.i.d. from an unknown PX,Y.
Our goal is to minimize the expected error or risk2

over some class of functions 𝓕. Note that this is an integral with
respect to the measure PX,Y; however, if PX,Y allows for a density p(x,
y) with respect to Lebesgue measure, the integral reduces to
.
Since PX,Y is unknown, we cannot compute (1.2), let alone
minimize it. Instead, we appeal to an induction principle, such as
empirical risk minimization. We return the function minimizing the
training error or empirical risk
over f ∈ 𝓕. From the asymptotic point of view, it is important to ask
whether such a procedure is consistent, which essentially means
that it produces a sequence of functions whose risk converges
towards the minimal possible within the given function class 𝓕 (in
probability) as n tends to infinity. In Appendix A.3, we show that this
can only be the case if the function class is “small.” The Vapnik-
Chervonenkis (VC) dimension [Vapnik, 1998] is one possibility of
measuring the capacity or size of a function class. It also allows us
to derive finite sample guarantees, stating that with high probability,
the risk (1.2) is not larger than the empirical risk plus a term that
grows with the size of the function class 𝓕.
Such a theory does not contradict the existing results on
universal consistency, which refers to convergence of a learning
algorithm to the lowest achievable risk with any function. There are
learning algorithms that are universally consistent, for instance
nearest neighbor classifiers and Support Vector Machines [Devroye
et al., 1996, Vapnik, 1998, Schölkopf and Smola, 2002, Steinwart
and Christmann, 2008]. While universal consistency essentially tells
us everything can be learned in the limit of infinite data, it does not
imply that every problem is learnable well from finite data, due to the
phenomenon of slow rates. For any learning algorithm, there exist
problems for which the learning rates are arbitrarily slow [Devroye et
al., 1996]. It does tell us, however, that if we fix the distribution, and

gather enough data, then we can get arbitrarily close to the lowest
risk eventually.
In practice, recent successes of machine learning systems seem
to suggest that we are indeed sometimes already in this asymptotic
regime, often with spectacular results. A lot of thought has gone into
designing the most data-efficient methods to obtain the best possible
results from a given data set, and a lot of effort goes into building
large data sets that enable us to train these methods. However, in all
these settings, it is crucial that the underlying distribution does not
differ between training and testing, be it by interventions or other
changes. As we shall argue in this book, describing the underlying
regularity as a probability distribution, without additional structure,
does not provide us with the right means to describe what might
change.
1.3 Causal Modeling and Learning
Causal modeling starts from another, arguably more fundamental,
structure. A causal structure entails a probability model, but it
contains additional information not contained in the latter (see the
examples in Section 1.4). Causal reasoning, according to the
terminology used in this book, denotes the process of drawing
conclusions from a causal model, similar to the way probability
theory allows us to reason about the outcomes of random
experiments. However, since causal models contain more
information than probabilistic ones do, causal reasoning is more
powerful than probabilistic reasoning, because causal reasoning
allows us to analyze the effect of interventions or distribution
changes.
Just like statistical learning denotes the inverse problem to
probability theory, we can think about how to infer causal structures
from its empirical implications. The empirical implications can be
purely observational, but they can also include data under
interventions (e.g., randomized trials) or distribution changes.
Researchers use various terms to refer to these problems, including

structure learning and causal discovery. We refer to the closely
related question of which parts of the causal structure can in
principle be inferred from the joint distribution as structure
identifiability. Unlike the standard problems of statistical learning
described in Section 1.2, even full knowledge of P does not make
the solution trivial, and we need additional assumptions (see
Chapters 2, 4, and 7). This difficulty should not distract us from the
fact, however, that the ill-posed-ness of the usual statistical problems
is still there (and thus it is important to worry about the capacity of
function classes also in causality, such as by using additive noise
models — see Section 4.1.4 below), only confounded by an
additional difficulty arising from the fact that we are trying to estimate
a richer structure than just a probabilistic one. We will refer to this
overall problem as causal learning. Figure 1.1 summarizes the
relationships between the preceding problems and models.
Figure 1.1: Terminology used by the present book for various probabilistic
inference problems (bottom) and causal inference problems (top); see Section
1.3. Note that we use the term “inference” to include both learning and reasoning.

To learn causal structures from observational distributions, we
need to understand how causal models and statistical models relate
to each other. We will come back to this issue in Chapters 4 and 7
but provide an example now. A well-known topos holds that
correlation does not imply causation; in other words, statistical
properties alone do not determine causal structures. It is less well
known that one may postulate that while we cannot infer a concrete
causal structure, we may at least infer the existence of causal links
from statistical dependences. This was first understood by
Reichenbach [1956]; we now formulate his insight (see also Figure
1.2).3
Figure 1.2: Reichenbach’s common cause principle establishes a link between
statistical properties and causal structures. A statistical dependence between two
observables X and Y indicates that they are caused by a variable Z, often referred
to as a confounder (left). Here, Z may coincide with either X or Y, in which case
the figure simplifies (middle/right). The principle further argues that X and Y are
statistically independent, conditional on Z. In this figure, direct causation is
indicated by arrows; see Chapters 3 and 6.
Principle 1.1 (Reichenbach’s common cause principle) If two
random variables X and Y are statistically dependent (X Y), then
there exists a third variable Z that causally influences both. (As a
special case, Z may coincide with either X or Y.) Furthermore, this
variable Z screens X and Y from each other in the sense that given
Z, they become independent, X ⫫ Y | Z.
In practice, dependences may also arise for a reason different
from the ones mentioned in the common cause principle, for
instance: (1) The random variables we observe are conditioned on
others (often implicitly by a selection bias). We shall return to this
issue; see Remark 6.29. (2) The random variables only appear to be
dependent. For example, they may be the result of a search

procedure over a large number of pairs of random variables that was
run without a multiple testing correction. In this case, inferring a
dependence between the variables does not satisfy the desired type
I error control; see Appendix A.2. (3) Similarly, both random variables
may inherit a time dependence and follow a simple physical law,
such as exponential growth. The variables then look as if they
depend on each other, but because the i.i.d. assumption is violated,
there is no justification of applying a standard independence test. In
particular, arguments (2) and (3) should be kept in mind when
reporting “spurious correlations” between random variables, as it is
done on many popular websites.
1.4 Two Examples
1.4.1 Pattern Recognition
As the first example, we consider optical character recognition, a
well-studied problem in machine learning. This is not a run-of-the-mill
example of a causal structure, but it may be instructive for readers
familiar with machine learning. We describe two causal models
giving rise to a dependence between two random variables, which
we will assume to be handwritten digits X and class labels Y. The
two models will lead to the same statistical structure, using distinct
underlying causal structures.
Model (i) assumes we generate each pair of observations by
providing a sequence of class labels y to a human writer, with the
instruction to always produce a corresponding handwritten digit
image x. We assume that the writer tries to do a good job, but there
may be noise in perceiving the class label and executing the motor
program to draw the image. We can model this process by writing
the image X as a function (or mechanism) f of the class label Y
(modeled as a random variable) and some independent noise NX
(see Figure 1.3, left). We can then compute PX,Y from PY, PNX, and f.
This is referred to as the observational distribution, where the
word “observational” refers to the fact that we are passively

observing the system without intervening. X and Y will be dependent
random variables, and we will be able to learn the mapping from x to
y from observations and predict the correct label y from an image x
better than chance.
Figure 1.3: Two structural causal models of handwritten digit data sets. In the left
model (i), a human is provided with class labels Y and produces images X. In the
right model (ii), the human decides which class to write (Z) and produces both
images and class labels. For suitable functions f, g, h and noise variables NX, MX,
MY, Z, the two models produce the same observable distribution PX,Y, yet they are
interventionally different; see Section 1.4.1.
There are two possible interventions in this causal structure,
which lead to intervention distributions.4 If we intervene on the
resulting image X (by manipulating it, or exchanging it for another
image after it has been produced), then this has no effect on the
class labels that were provided to the writer and recorded in the data
set. Formally, changing X has no effect on Y since Y := NY.
Intervening on Y, on the other hand, amounts to changing the class
labels provided to the writer. This will obviously have a strong effect
on the produced images. Formally, changing Y has an effect on X
since X := f (Y, NX). This directionality is visible in the arrow in the
figure, and we think of this arrow as representing direct causation.
In alternative model (ii), we assume that we do not provide class
labels to the writer. Rather, the writer is asked to decide himself or
herself which digits to write, and to record the class labels alongside.

In this case, both the image X and the recorded class label Y are
functions of the writer’s intention (call it Z and think of it as a random
variable). For generality, we assume that not only the process
generating the image is noisy but also the one recording the class
label, again with independent noise terms (see Figure 1.3, right).
Note that if the functions and noise terms are chosen suitably, we
can ensure that this model entails an observational distribution PX,Y
that is identical to the one entailed by model (i).5
Let us now discuss possible interventions in model (ii). If we
intervene on the image X, then things are as we just discussed and
the class label Y is not affected. However, if we intervene on the
class label Y (i.e., we change what the writer has recorded as the
class label), then unlike before this will not affect the image.
In summary, without restricting the class of involved functions and
distributions, the causal models described in (i) and (ii) induce the
same observational distribution over X and Y, but different
intervention distributions. This difference is not visible in a purely
probabilistic description (where everything derives from PX,Y).
However, we were able to discuss it by incorporating structural
knowledge about how PX,Y comes about, in particular graph
structure, functions, and noise terms.
Models (i) and (ii) are examples of structural causal models
(SCMs), sometimes referred to as structural equation models
[e.g., Aldrich, 1989, Hoover, 2008, Pearl, 2009, Pearl et al., 2016]. In
an SCM, all dependences are generated by functions that compute
variables from other variables. Crucially, these functions are to be
read as assignments, that is, as functions as in computer science
rather than as mathematical equations. We usually think of them as
modeling physical mechanisms. An SCM entails a joint distribution
over all observables. We have seen that the same distribution can
be generated by different SCMs, and thus information about the
effect of interventions (and, as we shall see in Section 6.4,
information about counterfactuals) may be lost when we make the
transition from an SCM to the corresponding probability model. In

this book, we take SCMs as our starting point and try to develop
everything from there.
We conclude with two points connected to our example:
First, Figure 1.3 nicely illustrates Reichenbach’s common cause
principle. The dependence between X and Y admits several causal
explanations, and X and Y become independent if we condition on Z
in the right-hand figure: The image and the label share no
information that is not contained in the intention.
Second, it is sometimes said that causality can only be discussed
when taking into account the notion of time. Indeed, time does play
a role in the preceding example, for instance by ruling out that an
intervention on X will affect the class label. However, this is perfectly
fine, and indeed it is quite common that a statistical data set is
generated by a process taking place in time. For instance, in model
(i), the underlying reason for the statistical dependence between X
and Y is a dynamical process. The writer reads the label and plans a
movement, entailing complicated processes in the brain, and finally
executes the movement using muscles and a pen. This process is
only partly understood, but it is a physical, dynamical process taking
place in time whose end result leads to a non-trivial joint distribution
of X and Y. When we perform statistical learning, we only care about
the end result. Thus, not only causal structures, but also purely
probabilistic structures may arise through processes taking place in
time — indeed, one could hold that this is ultimately the only way
they can come about. However, in both cases, it is often instructive
to disregard time. In statistics, time is often not necessary to discuss
concepts such as statistical dependence. In causal models, time is
often not necessary to discuss the effect of interventions. But both
levels of description can be thought of as abstractions of an
underlying more accurate physical model that describes reality more
fully than either; see Table 1.1. Moreover, note that variables in a
model may not necessarily refer to well-defined time instances. If, for
instance, a psychologist investigates the statistical or causal relation
between the motivation and the performance of students, both
variables cannot easily be assigned to specific time instances.

Measurements that refer to well-defined time instances are rather
typical for “hard” sciences like physics and chemistry.
Table 1.1: A simple taxonomy of models. The most detailed model (top) is a
mechanistic or physical one, usually involving sets of differential equations. At the
other end of the spectrum (bottom), we have a purely statistical model; this model
can be learned from data, but it often provides little insight beyond modeling
associations between epiphenomena. Causal models can be seen as descriptions
that lie in between, abstracting away from physical realism while retaining the
power to answer certain interventional or counterfactual questions. See Mooij et al.
[2013] for a discussion of the link between physical models and structural causal
models, and Section 6.3 for a discussion of interventions.
1.4.2 Gene Perturbation
We have seen in Section 1.4.1 that different causal structures lead to
different intervention distributions. Sometimes, we are indeed
interested in predicting the outcome of a random variable under such
an intervention. Consider the following, in some ways oversimplified,
example from genetics. Assume that we are given activity data from
gene A and, correspondingly, measurements of a phenotype; see
Figure 1.4 (top left) for a toy data set. Clearly, both variables are
strongly correlated. This correlation can be exploited for classical

prediction: If we observe that the activity of gene A lies around 6, we
expect the phenotype to lie between 12 and 16 with high probability.
Similarly, for a gene B (bottom left). On the other hand, we may also
be interested in predicting the phenotype after deleting gene A, that
is, after setting its activity to 0.6 Without any knowledge of the causal
structure, however, it is impossible to provide a non-trivial answer. If
gene A has a causal influence on the phenotype, we expect to see a
drastic change after the intervention (see top right). In fact, we may
still be able to use the same linear model that we have learned from
the observational data. If, alternatively, there is a common cause,
possibly a third gene C, influencing both the activity of gene B and
the phenotype, the intervention on gene B will have no effect on the
phenotype (see bottom right).

Figure 1.4: The activity of two genes (top: gene A; bottom: gene B) is strongly
correlated with the phenotype (black dots). However, the best prediction for the
phenotype when deleting the gene, that is, setting its activity to 0 (left), depends on
the causal structure (right). If a common cause is responsible for the correlation
between gene and phenotype, we expect the phenotype to behave under the
intervention as it usually does (bottom right), whereas the intervention clearly
changes the value of the phenotype if it is causally influenced by the gene (top
right). The idea of this figure is based on Peters et al. [2016].
As in the pattern recognition example, the models are again
chosen such that the joint distribution over gene A and the
phenotype equals the joint distribution over gene B and the
phenotype. Therefore, there is no way of telling between the top and
bottom situation from just observational data, even if sample size
goes to infinity. Summarizing, if we are not willing to employ
concepts from causality, we have to answer “I do not know” to the
question of predicting a phenotype after deletion of a gene.
Notes
1A random variable X is a measurable function Ω → 𝒳, where the metric space
𝒳 is equipped with the Borel σ-algebra. Its distribution PX on 𝒳 can be obtained
from the measure P of the underlying probability space (Ω, 𝓕, P). We need not
worry about this underlying space, and instead we generally start directly with the
distribution of the random variables, assuming the random experiment directly
provides us with values sampled from that distribution.
2This notion of risk, which does not always coincide with its colloquial use, is
taken from statistical learning theory [Vapnik, 1998] and has its roots in statistical
decision theory [Wald, 1950, Ferguson, 1967, Berger, 1985]. In that context, f (x) is
thought of as an action taken upon observing x, and the loss function measures
the loss incurred when the state of nature is y.
3For clarity, we formulate some important assumptions as principles. We do not
take them for granted throughout the book; in this sense, they are not axioms.
4We shall see in Section 6.3 that a more general way to think of interventions is
that they change functions and random variables.
5Indeed, Proposition 4.1 implies that any joint distribution PX,Y can be entailed
by both models.

6Let us for simplicity assume that we have access to the true activity of the gene
without measurement noise.

2
Assumptions for Causal Inference
Now that we have encountered the basic components of SCMs, it is
a good time to pause and consider some of the assumptions we
have seen, as well as what these assumptions imply for the purpose
of causal reasoning and learning.
A crucial notion in our discussion will be a form of independence,
and we can informally introduce it using an optical illusion known as
the Beuchet chair. When we see an object such as the one on the
left of Figure 2.1, our brain makes the assumption that the object and
the mechanism by which the information contained in its light
reaches our brain are independent. We can violate this assumption
by looking at the object from a very specific viewpoint. If we do that,
perception goes wrong: We perceive the three-dimensional structure
of a chair, which in reality is not there. Most of the time, however, the
independence assumption does hold. If we look at an object, our
brain assumes that the object is independent from our vantage point
and the illumination. So there should be no unlikely coincidences, no
separate 3D structures lining up in two dimensions, or shadow
boundaries coinciding with texture boundaries. This is called the
generic viewpoint assumption in vision [Freeman, 1994].

Figure 2.1: The left panel shows a generic view of the (separate) parts comprising
a Beuchet chair. The right panel shows the illusory percept of a chair if the parts
are viewed from a single, very special vantage point. From this accidental
viewpoint, we perceive a chair. (Image courtesy of Markus Elsholz.)
The independence assumption is more general than this, though.
We will see in Section 2.1 below that the causal generative process
is composed of autonomous modules that do not inform or influence
each other. As we shall describe below, this means that while one
module’s output may influence another module’s input, the modules
themselves are independent of each other.
In the preceding example, while the overall percept is a function of
object, lighting, and viewpoint, the object and the lighting are not
affected by us moving about — in other words, some components of
the overall causal generative model remain invariant, and we can
infer three-dimensional information from this invariance. This is the
basic idea of structure from motion [Ullman, 1979], which plays a
central role in both biological vision and computer vision.

2.1 The Principle of Independent Mechanisms
We now describe a simple cause-effect problem and point out
several observations. Subsequently, we shall try to provide a unified
view of how these observation relate to each other, arguing that they
derive from a common independence principle.
Suppose we have estimated the joint density p(a, t) of the altitude
A and the average annual temperature T of a sample of cities in
some country (see Figure 4.6 on page 65). Consider the following
ways of expressing p(a, t):
The first decomposition describes T and the conditional A|T. It
corresponds to a factorization of p(a,t) according to the graph T →
A.1 The second decomposition corresponds to a factorization
according to A → T (cf. Definition 6.21). Can we decide which of the
two structures is the causal one (i.e., in which case would we be able
to think of the arrow as causal)?
A first idea (see Figure 2.2, left) is to consider the effect of
interventions. Imagine we could change the altitude A of a city by
some hypothetical mechanism that raises the grounds on which the
city is built. Suppose that we find that the average temperature
decreases. Let us next imagine that we devise another intervention
experiment. This time, we do not change the altitude, but instead we
build a massive heating system around the city that raises the
average temperature by a few degrees. Suppose we find that the
altitude of the city is unaffected. Intervening on A has changed T, but
intervening on T has not changed A. We would thus reasonably
prefer A → T as a description of the causal structure.

Figure 2.2: The principle of independent mechanisms and its implications for
causal inference (Principle 2.1).
Why do we find this description of the effect of interventions
plausible, even though the hypothetical intervention is hard or
impossible to carry out in practice?
If we change the altitude A, then we assume that the physical
mechanism p(t|a) responsible for producing an average temperature
(e.g., the chemical composition of the atmosphere, the physics of
how pressure decreases with altitude, the meteorological
mechanisms of winds) is still in place and leads to a changed T. This
would hold true independent of the distribution from which we have
sampled the cities, and thus independent of p(a). Austrians may
have founded their cities in locations subtly different from those of
the Swiss, but the mechanism p(t|a) would apply in both cases.2
If, on the other hand, we change T, then we have a hard time
thinking of p(a|t) as a mechanism that is still in place — we probably
do not believe that such a mechanism exists in the first place. Given
a set of different city distributions p(a,t), while we could write them all
as p(a|t) p(t), we would find that it is impossible to explain them all
using an invariant p(a|t).
Our intuition can be rephrased and postulated in two ways: If A →
T is the correct causal structure, then

(i) it is in principle possible to perform a localized intervention
on A, in other words, to change p(a) without changing p(t|a),
and
(ii) p(a) and p(t|a) are autonomous, modular, or invariant
mechanisms or objects in the world.
Interestingly, while we started off with a hypothetical intervention
experiment to arrive at the causal structure, our reasoning ends up
suggesting that actual interventions may not be the only way to
arrive at causal structures. We may also be able to identify the
causal structure by checking, for data sources p(a,t), which of the
two decompositions (2.1) leads to autonomous or invariant terms.
Sticking with the preceding example, let us denote the joint
distributions of altitude and temperature in Austria and Switzerland
by pö(a, t) and ps(a, t), respectively. These may be distinct since
Austrians and Swiss founded their cities in different places (i.e., pö(a)
and ps(a) are distinct). The causal factorizations, however, may still
use the same conditional, i.e. pö(a, t) = p(t|a) pö(a) and ps(a,t) = p(t|a)
ps(a).
We next describe an idea (see Figure 2.2, middle), closely related
to the previous example, but different in that it also applies for
individual distributions. In the causal factorization p(a, t) = p(t|a) p(a),
we would expect that the conditional density p(t|a) (viewed as a
function of t and a) provides no information about the marginal
density function p(a). This holds true if p(t|a) is a model of a physical
mechanism that does not care about what distribution p(a) we feed
into it. In other words, the mechanism is not influenced by the
ensemble of cities to which we apply it.
If, on the other hand, we write p(a, t) = p(a|t)p(t), then the
preceding independence of cause and mechanism does not
apply. Instead, we will notice that to connect the observed p(t) and
p(a, t), the mechanism p(a|t) would need to take a rather peculiar
shape constrained by the equation p(a, t) = p(a|t)p(t). This could be
empirically checked, given an ensemble of cities and temperatures.3
We have already seen several ideas connected to independence,
autonomy, and invariance, all of which can inform causal inference.

We now turn to a final one (see Figure 2.2, right), related to the
independence of noise terms and thus best explained when rewriting
(2.1) as a distribution entailed by an SCM with graph A → T, realizing
the effect T as a noisy function of the cause A,
where NT and NA are statistically independent noises NT ⫫ NA.
Making suitable restrictions on the functional form of fT (see Sections
4.1.3–4.1.6 and 7.1.2) allows us to identify which of two causal
structures (A → T or T → A) has entailed the observed p(a, t)
(without such restrictions though, we can always realize both
decompositions (2.1)). Furthermore, in the multivariate setting and
under suitable conditions, the assumption of jointly independent
noises allows the identification of causal structures by conditional
independence testing (see Section 7.1.1).
We like to view all these observations as closely connected
instantiations of a general principle of (physically) independent
mechanisms.
Principle 2.1 (Independent mechanisms) The causal generative
process of a system’s variables is composed of autonomous
modules that do not inform or influence each other.
In the probabilistic case, this means that the conditional
distribution of each variable given its causes (i.e., its mechanism)
does not inform or influence the other conditional distributions. In
case we have only two variables, this reduces to an independence
between the cause distribution and the mechanism producing the
effect distribution.
The principle is plausible if we conceive our system as being
composed of modules comprising (sets of) variables such that the
modules represent physically independent mechanisms of the world.
The special case of two variables has been referred to as

independence of cause and mechanism (ICM) [Daniušis et al., 2010,
Shajarisales et al., 2015]. It is obtained by thinking of the input as the
result of a preparation that is done by a mechanism that is
independent of the mechanism that turns the input into the output.
Before we discuss the principle in depth, we should state that not
all systems will satisfy it. For instance, if the mechanisms that an
overall system is composed of have been tuned to each other by
design or evolution, this independence may be violated.
We will presently argue that the principle is sufficiently broad to
cover the main aspects of causal reasoning and causal learning (see
Figure 2.2). Let us address three aspects, corresponding, from left to
right, to the three branches of the tree in Figure 2.2.
1. One way to think of these modules is as physical machines that
incorporate an input-output behavior. This assumption implies
that we can change one mechanism without affecting the
others — or, in causal terminology, we can intervene on one
mechanism without affecting the others. Changing a
mechanism will change its input-output behavior, and thus the
inputs other mechanisms downstream might receive, but we are
assuming that the physical mechanisms themselves are
unaffected by this change. An assumption such as this one is
often implicit to justify the possibility of interventions in the first
place, but one can also view it as a more general basis for
causal reasoning and causal learning. If a system allows such
localized interventions, there is no physical pathway that would
connect the mechanisms to each other in a directed way by
“meta-mechanisms.” The latter makes it plausible that we can
also expect a tendency for mechanisms to remain invariant with
respect to changes within the system under consideration and
possibly also to some changes stemming from outside the
system (see Section 7.1.6). This kind of autonomy of
mechanisms can be expected to help with transfer of
knowledge learned in one domain to a related one where some
of the modules coincide with the source domain (see Sections
5.2 and 8.3).

2. While the discussion of the first aspect focused on the physical
aspect of independence and its ramifications, there is also an
information theoretic aspect that is implied by the above. A time
evolution involving several coupled objects and mechanisms
can generate statistical dependence. This is related to our
discussion from page 10, where we considered the dependence
between the class label and the image of a handwritten digit.
Similarly, mechanisms that are physically coupled will tend to
generate information that can be quantified in terms of statistical
or algorithmic information measures (see Sections 4.1.9 and
6.10 below).
Here, it is important to distinguish between two levels of
information: obviously, an effect contains information about its
cause, but — according to the independence principle — the
mechanism that generates the effect from its cause contains no
information about the mechanism generating the cause. For a
causal structure with more than two nodes, the independence
principle states that the mechanism generating every node from
its direct causes contain no information about each other.4
3. Finally, we should discuss how the assumption of independent
noise terms, commonly made in structural equation modeling, is
connected to the principle of independent mechanism. This
connection is less obvious. To this end, consider a variable E :=
f(C, N) where the noise N is discrete. For each value s taken by
N, the assignment E := f (C, N) reduces to a deterministic
mechanism E := fs(C) that turns an input C into an output E.
Effectively, this means that the noise randomly chooses
between a number of mechanisms fs (where the number equals
the cardinality of the range of the noise variable N). Now
suppose the noise variables for two mechanisms at the vertices
Xj and Xk were statistically dependent.5 Such a dependence
could ensure, for instance, that whenever one mechanism is
active at node j, we know which mechanism is active at node
k. This would violate our principle of independent mechanisms.

The preceding paragraph uses the somewhat extreme view
of noise variables as selectors between mechanisms (see also
Section 3.4). In practice, the role of the noise might be less
pronounced. For instance, if the noise is additive (i.e., E := f (C)
+ N), then its influence on the mechanism is restricted. In this
case, it can only shift the output of the mechanism up or down,
so it selects between a set of mechanisms that are very similar
to each other. This is consistent with a view of the noise
variables as variables outside the system that we are trying to
describe, representing the fact that a system can never be
totally isolated from its environment. In such a view, one would
think that a weak dependence of noises may be possible
without invalidating the principle of independent mechanisms.
All of the above-mentioned aspects of Principle 2.1 may help for the
problem of causal learning, in other words, they may provide
information about causal structures. It is conceivable, however, that
this information may in cases be conflicting, depending on which
assumptions hold true in any given situation.
2.2 Historical Notes
The idea of autonomy and invariance is deeply engrained in the
concept of structural equation models (SEMs) or SCMs. We prefer
the latter term, since the term SEM has been used in a number of
contexts where the structural assignments are used as algebraic
equations rather than assignments. The literature is wide ranging,
with overviews provided by Aldrich [1989], Hoover [2008], and Pearl
[2009].
An intellectual antecedent to SEMs is the concept of a path model
pioneered by Wright [1918, 1920, 1921] (see Figure 2.3). Although
Wright was a biologist, SEMs are nowadays most strongly
associated with econometrics. Following Hoover [2008], pioneering
work on structural econometric models was done in the 1930s by
Jan Tinbergen, and the conceptual foundations of probabilistic

Random documents with unrelated
content Scribd suggests to you:

ihres Alters, ihrer Würde. Aber der immer beschäftigte, an alles
gebundene, unfreie, begehrliche, immer strebende Geistesdemokrat
gelangte nicht zum Humor, weil nie zum Verzicht.
Der Demokrat Freiligrath begriff nicht die Inkongruenz seines
bärtig deutschen Wesens mit seiner Löwenrittsphantasie, die uns
heute so lächerlich vorkommt. Andrerseits sah der Revolutionär
Friedrich Reuter sich genötigt, die Tragik seines Lebens in
humoristischen Fabeln und Figuren zu gestalten — mit Hülfe der
Bauernsprache.
Wenn gar nichts andres, so macht der Humor zum Herrn der Lage.
Ein Lord in einer adligen Tischgesellschaft bekam einen zu heißen
Bissen in den Mund und sagte, genötigt, ihn auf den Teller zurück zu
speien, zu den entsetzten Umsitzenden mit vollkommener
Seelenruhe: „A fool, who would swallowed it.“ Herr der Lage.
Um näher auf das Wesen des Humors eingehen zu können, wäre
zuvörderst eine Entwickelung des Wesens von Tragik vonnöten, die
nicht gegeben werden kann. Soviel sei gesagt:
Humoristen in diesem, im tragischen Sinne gab es: Jean Paul,
Dickens, Wilhelm Raabe, Thackeray, Gottfried Keller, Wilhelm Busch,
Christian Morgenstern, Sterne.
Humor ist eine germanische Angelegenheit.
Auch Tragik ist — seit den Griechen — eine germanische
Angelegenheit.
Die Welt ist nunmehr eine germanische Angelegenheit.
Schluß!
Privates Nachwort für meinen Papa:
Solange ich dergleichen Dinge denken kann, ist mir nie etwas
natürlicher erschienen als der Umstand, daß, da Du schon ein
Herzog bist, demokratischen Neigungen huldigst.
Folgerichtige Erweiterung aus diesem:

Der Natur-Aristokrat habe demokratische Neigungen; der Natur-
Demokrat, der Geistmensch, der Dichter habe aristokratischen Kern.
Renate an Magda
Am 24. Nov.
Mein Krankes:
Dein Vater sagte mir heut am Telephon, daß Deine Augen
angegriffen seien. So darf ich Dir wohl nur ein paar Zeilen und alle
liebenden Wünsche schicken. Onkel Augustin fügt die seinen und
seine letzten Rosen hinzu. Alles Gute, mein Herz! Ich rufe jeden Tag
an und frage, wie Dirs geht. Mir ist schon wieder ganz wohl, aber ich
muß noch mehrere Tage das Haus hüten. Soll ich dann kommen? —
Immer zärtlich bei Dir Deine
Renate
P. S. Weißt Du, daß ich Deinen Maler kenne? Nach einem
abscheulichen, mißverständlichen Telephongespräch hatten wir
neulich ein andres am Abend, und das war schön. Ich mußte daran
denken, wie Du neulich Dein Zimmer mit einer Schiffskajüte
verglichen hattest; auch ich in meinem kleinen, leuchtenden Raum
kam mir so viel tiefer in der Nacht vor als sonst; als säße ich auf
einem Leuchtturm und hätte Verbindung mit solch einem
dauerhaften, alten Segler, der nach der Einfahrt zum Hafen fragte.
Was hat er doch für eine ruhige, gedämpfte Stimme! Heute abend
ist Verlobung bei Herzbruchs, da werde ich seine Mutter
kennenlernen. Nun adieu!
R.
Noch eine Nachschrift? — —
Sage mir — —
Ich weiß nicht, soll ich weiter schreiben oder nicht?
Also sage mir, ob Bogner vielleicht die Gewohnheit hat, lautlos zu
lachen, indem er das Kinn anhebt und den Mund leicht öffnet ...

Hör zu! Das, wovon ich Dir früher einmal erzählte, ohne es Dir
beweisen zu können, — es hat sich wieder gezeigt: mein
phantastisches Gesicht. Ganz schwach schon einmal — so daß ich
darüber hinglitt — in den ersten Tagen meines Hierseins, wo ich an
einem Abend, anstatt daß ich, wie ich glaubte, mein Zimmer betrat,
aus einer Waldöffnung über ein düstres, abenddämmriges Tal von
ungeheurer Größe hinaustrat, in dessen Ferne grauschwarze Berge
sich zusammenschoben, zwischen denen ein erschreckendes Rot, ein
trübes, qualmiges, wahres Weltuntergangsrot brannte; am Grunde
des Tals tanzten undeutliche Gestalten — sonderbar blitzten die
Goldschellen in den Rändern ihrer Tamburine, während im
blaugrauen Himmel ein völlig grüner Stern aufleuchtete, mit Josefs
Augen lächelte und entschwand samt der Landschaft.
Nun heute wieder, es macht mich doch nachdenklich ...
Denn wenn ich denke, daß ich diese Gesichte immer nur daheim,
im Haus oder Garten hatte, nie auf einer Reise, in keiner noch so
geringen Ferne — darum auch in der ganzen Zeit unseres
Zusammenseins nicht — und nun hier im Hause von meines Vaters
Bruder wieder, — welche Vernietung des Blutes! — Da fällt mir auch
ein, daß ich nach Papas Begräbnis sieben Tage, ohne Lebenszeichen,
wie ein Steinbild auf meinem Bett gelegen habe, während meine
Seele — — nun, erwachend wußte ich freilich nicht mehr, wo sie
gewesen war, doch waren es wohl die Toten.
Gute Na— nein, da sehe ich ja, daß ich die heutige Erscheinung zu
beschreiben vergaß!
Ich kam am Nachmittag in mein Zimmer und fand einen schönen,
mächtig großen Adler darin, der ganz goldenbraun war. Er saß auf
der Lehne eines Sessels, seine kleinen Augen waren blau mit
goldenen Linien wie Lapislazuli. Ich trete sehr erfreut auf ihn zu und
schiebe gleich meine rechte Hand zwischen seinen Leib und die linke
Flügelschulter tief ins warme Gefieder, indem ich denke, das wird er
gern haben ... Ah wie das lebte, warm war und so — mehlig! Wie die
weich übereinandergeschichteten Federn sich zusammendrücken
ließen! Indem aber fühle ich schon, den Kopf langsam senkend, daß
wir fliegen. Ich hebe wieder den Kopf und sehe ihn über mir fliegen,
— ah wie das wundervoll war, einen großen Vogel einmal so nahe

fliegen zu sehen, wie die gewaltigen Fittiche nach vorn ausgreifen
und sich spannen, Feder um Feder, wie sie krachend und knatternd,
in immer demselben langen, steten Schwunge ausholend, nach
hinten schlagen, die biegsamen Schwingenenden peitschend ganz
an seinen Leib sich anpressen, daß er wie flügellos vorwärts schießt!
Und ich dicht unter ihm, sitze in starken Seilen, die seine Krallen
halten, in den Tiefen unter mir rollen goldene Länder, von
Wolkenschatten, von den Hieben der Goldspeichen am Sonnenrade
riesig durchfegt, und wie ich jetzt, ganz lusterfüllt, zu dem Adler
aufsehe, öffnet er den Schnabel und lacht lautlos. Ich senke das
Gesicht, denke an Bogner, und darüber vergeht alles.
Renate
Renate von Montfort an Benvenuto Bogner
Waldhausen bei Altenrepen, Güntherstr. 5, 24. Nov.
Lieber Herr Bogner,
Hier sind Magdas Briefe. Sie dürfen annehmen, daß mein Vertrauen
zu Ihnen nicht geringer ist als das Magdas, wie Sie es in diesen
Briefen erkennen werden.
Ein gar nicht unfeiner Mensch, nämlich mein Vetter Josef, sagte
einmal zu mir, meine Erscheinung sei von solcher Art, daß alles
andre vor ihr seine Gültigkeit verlöre. Das mag Sinn haben oder
Unsinn: es gesagt zu hören reizt, und das Gleichgewicht wird
gestört. Daher, seit ich wußte, daß ich einmal so mit Ihnen zu reden
haben würde, wie ich es tat, bin ich seelenfroh, diesen
Telephonausweg gefunden zu haben und Ihnen in Unsichtbarkeit
gegenübertreten zu können. Ich fand, wir waren Beide unangreifbar
in jener Stunde, außer dort, wo wir selber es zuließen.
Aber Sie erinnern mich an den einzigen Mann, der mich einmal
angerührt hat. Es war in Tirol, und der Mann war ein Kardinal, ein
schöner, weißer Schädel mit funkelnden, braunen Augen. Er legte
mir zwei Finger unter das Kinn und sagte etwas von einer holden
Alpenrose. Damals mußte ich mich so zusammenreißen wie vor

Ihnen am Telephon. Soll ich Ihnen sagen, wie ich damals
triumphierte? O königlich! Indem ich einen Ring vom Finger zog, ihn
fest und mit tiefster Verneigung in seine Hand drückte und auf
seinen höchlich erstaunten Blick in meiner bescheidensten Haltung
antwortete: „Umsonst, Eminenz, pflegen Diener Ihrer Kirche doch
nie ihre segnende Hand aufzulegen!“
Gestern abend sprach ich mit Ihrer Mutter, sah auch Ihren Vater,
ohne daß er mich hätte sehen können freilich, denn nach einem
jahrelangen Augenleiden ist er nun am Erblinden. In einiger Zeit soll
noch eine Operation versucht werden, auf die aber — es ist grüner
Star — niemand Hoffnung setzt.
Ihre Mutter soll sich seit vielen Jahren kaum verändert haben, so
brauche ich sie nicht zu beschreiben. Ihr Haar ist fast weiß
geworden freilich, dafür hat sie aber die lebhaftesten braunen Augen
und scheint sich eine unaufhörliche Bereitschaft zur Heiterkeit
zugeschworen zu haben. Ich konnte sie allein sprechen und Ihre
Grüße ausrichten. Sie erschrak ein wenig, schwieg aber und ließ
mich erklären, auf welche Weise ich Sie kennenlernte. Als ich sagte,
Sie dächten daran, im Frühjahr zu kommen, sagte sie nur: „So? Ja,
es ist ja nun spät geworden.“ — Ich blickte nach Ihrem Vater hin
und fragte, es sei wohl höchste Zeit. — „Ach,“ meinte sie ruhig, „vor
zehn Jahren wäre es Zeit gewesen. Nun hat man sich ja daran
gewöhnt.“ — Ich habe natürlich nicht all ihre Worte behalten,
erinnere mich aber, daß sie auf mein Befragen anfing zu erzählen
und bald sagte, daß es für sie nur in den ersten Jahren hart
gewesen sei. „Er war doch noch ein Junge, der nichts von der Welt
wußte, und sicher hat er gehungert,“ sagte sie, „und das war wohl
das Schrecklichste für mich, neben seinem Vater wach liegen zu
müssen und zu hören, wie er selber wach lag und oft stöhnte, und
nicht weinen zu dürfen. Dann gewöhnte man sich schon daran, und
dann starb unsere kleine Erika — die Nachricht erhielten Sie wohl?
— und da fing alles wieder von vorn an, und als dann auch Herbert
sein Examen gemacht hatte und fortging, waren wir ganz allein.“
Aber dann, sagte sie, hätte sie ja wohl merken müssen, daß
zwischen Ihnen und Ihrem Bruder gar kein Unterschied sei, denn der
sei die ersten Jahre wohl noch in den Ferien gekommen, dann aber

immer seltner und zuletzt nur noch Weihnachten, und so sei es wohl
mit den Söhnen. Freilich hätte sie für Herbert ja immer noch sorgen
können, seine Wäsche besorgen und ihm das und jenes schicken,
auch hätte er ja immer fleißig geschrieben. Nein, sie dürfe sich gar
nicht beklagen, nur ihr Mann wäre zu bemitleiden, weil er von
hartem Charakter wäre, alles in sich verschlossen hätte und nur
immer stiller geworden sei. „Wer hätte auch gedacht, daß er ganz
fortbleiben würde, Vater hätte ihn ja gerne wieder aufgenommen,
wenn er ihm nur gezeigt hätte, daß es ihm Ernst war mit dem
Malen.“ Nein, sie selber habe gewiß am wenigsten gelitten, und
schließlich sagte sie, „wenn man weiß, er ist am Leben und kommt
vorwärts und ist zufrieden mit seinem Dasein, — was kann man
denn mehr wünschen?“ Übrigens habe sie auch Nachforschungen
nach Ihnen angestellt, gestand sie; sie lachte herzlich, als sie
erzählte, wie sie sich das Geld dafür von ihrem Haushaltsgeld habe
absparen müssen.
Ach, Bogner, glauben Sie, ich wollte Sie rühren mit alledem? Sie
haben aber wohl recht mit Ihren ‚romantischen Vorstellungen‘, denn
wenn man eine alte Frau sagen hört: „Das Leben hat es wohl immer
anders mit uns vor, als man so träumt, und wenn man sie
herumlaufen sieht, wenn sie klein sind und nach allem fragen
müssen und schreien, wenn sie sich gestoßen haben, dann meint
man ja, es bliebe ewig so und man müßte immer hinterher sein,
aber das wollen sie freilich gar nicht.“ Wenn man sie dann lachen
hört, so ist allerdings alles nur einfach und gut und etwas
kümmerlich, und so natürlich sieht es aus, daß man es fast nicht
begreifen kann, wenn dieselbe Stimme nach einem Schweigen sagt:
„Manchmal muß ich allerdings jetzt noch mitten in der Nacht
aufwachen und denken, er steht vielleicht am Zaun draußen, oder
man glaubt, seinen Schritt zu hören, aber es ist nur sein Vater, der
noch ein Glas Bier getrunken hat, und nun muß man aufpassen und
darf doch nicht aufstehen, weil er kaum noch sieht und doch nicht
will, daß ihm jemand hilft.“
Ihre Mutter läßt Sie wieder grüßen, und Sie sollten kommen, wann
Sie es für richtig hielten; Ihr Bett stünde auf dem Boden und könnte
jeden Augenblick heruntergeholt werden.

Ich hab Ihnen dieses aufgeschrieben, weil ich viel zu gut weiß, wie
sehr Sie schuldlos sind. Wenn hier Fehler sind, so liegen sie in der
menschlichen Natur, die will, daß alles Leidende sich in sich selbst
verhärtet. Da Sie einen einsamen Weg gingen, so schadete Ihre
Selbstverhärtung niemandem sonst, und Ihnen selber war sie ja
notwendig. Wenn aber Ihr Vater hier die Schuld tragen sollte, so
trug er auch die ganze Last, da er zu einfach war, um nicht nur nach
innen zu wachsen, und Ihre Mutter hatte es zu leiden. Dies sollten
Sie wissen. Gott befohlen!
Renate Montfort
Benvenuto Bogner an Renate von Montfort
Helenenruh, 29. November
Liebes gnädiges Fräulein!
Freiwillig und gerne gestehe ich Ihnen ein, daß ich Ihnen nicht
durchaus wohlgesinnt bin. Nicht durchaus. Können Sie folgenden
Unterschied machen: Sie hören, daß ein fremder Mensch dies und
jenes über Sie geäußert hat. Nichts, das vielleicht ungünstig wäre.
Aber Sie hören, daß ein andrer sich eine Meinung über Sie gebildet
hat. Auf Grund seiner vollständigen Unkenntnis. Sie würden ihm
nicht wohlgesinnt sein.
Dagegen ich, ich kenne Ihre Züge, hörte Ihre sanfte Stimme und
vernahm Wohlmeinendes. Ihr kluger Vetter übrigens scheint mir so
sehr recht zu haben, daß auch die telephonische Unsichtbarkeit die
Richtigkeit seines Satzes nur erhärten konnte. Übrigens vergaßen
Sie, daß ich Ihr Bild kenne, wie unsre Kleine Ihnen schrieb. Es steht
fest, daß Ihnen kaum zu widerstehn ist, und so verlasse ich meine
Zehnjahrsschweigsamkeit und rüste mich, zu reden.
Sie kamen mit einer fertigen Meinung. Ein solcher Mensch mußte
einmal kommen, einer, gewissermaßen, der mich zur letzten
Nachprüfung meiner selbst zu bewegen wußte. Jedem Fragenden
hätte ich Auskunft gegeben. Meine Natur ist friedlich. Sie kamen mit
Müttern und unzählbaren Erinnerungen.

Haben Sie unrecht? Habe ich unrecht? Sie nötigen mich, alles von
vorn zu bedenken. Dies aber scheint mir nicht notwendig. Sie
meinen, von dem, was Sie über meine Mutter aussagten, hätte ich
nichts gewußt? Wohlan!
Sie erinnern mich an einen Tag, kurz bevor ich ins Kadettenkorps
abmarschieren sollte, weil ich Ostern um Ostern meiner Versetzung
den heftigsten Widerstand leistete. In seinem Zimmer zwischen den
Apparaten und Glasschränken mit ärztlichem Handwerkszeug hatte
mein Vater den schönsten Kasten mit Ölfarben, die herrlichsten
Pinsel und die größte, braunglänzende Holzpalette ausgebreitet.
Meine unwahrscheinlichsten Träume funkelten da. Er sprach liebreich
und gütig zu mir. Am Ende bat er mich um mein Ehrenwort, daß ich
diese Gegenstände nur an Sonntagen berühren würde.
Früher hatte ich doch kein Gewissen vor ihm gehabt und ihn
hundertmal hintergangen und belogen. Warum stand es nun auf und
sagte: Er will dich verlocken! und verweigerte das Versprechen?
Warum diese Ehrlichkeit, da ich doch entschlossen war, aus dem
Korps zu entspringen?
Hatte er unrecht? Er verfuhr wie der liebe Gott, sagte: von diesem
Baum darfst du nur Sonntags essen, wollte aus mir einen ruhigen
Bürger machen, der sich zu bezähmen weiß.
Da erinnern Sie mich nun an die Stunde, Jahre danach, wo das
Bild dieser sich wiederum aufstellte, Arztzimmer, Malgerät und der
schwere, grauhaarige Mann mit ausgestreckten Händen, aber nun
heißt es nicht: er will dich verlocken, sondern: er hat dir doch eine
Freude machen wollen ... So heißt es nun.
Was denken Sie sich dabei: „Er hätte ihn ja gern wieder
aufgenommen, wenn er nur gesehen hätte, daß es ihm Ernst war
mit der Malerei“? Sie denken, was mein Vater dachte — nun weiß ich
es freilich längst —, daß Ernst nicht von einem Menschen unter
zwanzig Jahren zu erwarten ist, denn dies ist bürgerliche Meinung;
nicht von einem Siebzehnjährigen, der seit seinem ersten
Weihnachtsfarbenkasten, den er mit sieben Jahren bekam, nicht von
ihm wegzubringen war, nicht mit Prügeln, nicht mit Hunger, nicht mit
Einsperren, nicht mit Taschengeldentziehung, da er vielmehr hinging

und die Kasse seiner Mutter bestahl. Armes Kind, vor zehn Jahren
hab ich es wirklich nicht besser gewußt.
Am Telephon sagte ich Ihnen Dank für eine schöne Stunde. Bin ich
nun erzürnt auf Sie? — Ich bin verwundert. Herzlich sehr
verwundert.
Vielleicht wünschen Sie, fünfzehn Jahre wie eine Kugel
zurückzurollen, hin woher sie kam. Mädchen, was für Gedanken! Gut
und weich ist Ihr Herz, Sie denken, alles hätte auf andre Weise auch
geschehen können.
Ja, Sie haben ein Herz für Andre, sind gut und hülfreich. Wer
wollte das nicht sehn? Es ist zu sehn, wie ein angenehmer Wind,
wenn er im Fernen durch ein Kornfeld geht. Man empfindet ihn nicht
am eignen Leibe. Den trug man fünfzehn Jahre für sich alleine
umher. Kein Wind kam.
Aus Ihnen redet das Mütterliche. Sie sorgen schon um eigene
Kinder.
Zweiunddreißig Jahre bin ich nun alt geworden, ohne heut zu
wissen, ob ich gerecht gehandelt habe. Mit Neunzigen werde ich es
nicht sichrer wissen. Denn unser Recht und unser Unrecht liegt nicht
in unsrer Meinung, und wenn wir sie dem Teufel abgekämpft hätten.
Aber in uns brennt eine riesige Notwendigkeit, die uns das Einzige
vorschreibt, und vor der keine Meinungen gelten. In Menschenleben
wie in dem meines Vaters gilt der Zufall und Zwang, sich
zurechtzufinden, es sich leicht zu machen, den Platz zu finden, wo
am wenigsten Kümmernisse zu hausen scheinen. All dies umgekehrt
trifft auf mich zu.
Herzlich grüßend der Ihrige
Bogner

Fünftes Kapitel: Dezember
Renate von Montfort an Benvenuto Bogner
den 8. Dezember 1911
Was tat ich Ihnen, Fremder? Niemals werde ich mich hindern lassen,
bei einem Menschen einzudringen, wenn mein wahrhaftiges Herz
mich dazu treibt. Beschämt und verwundet, schweige ich auch heute
nicht still, sondern wehre mich kräftig. Sollte ich mich so leicht
enttäuschen lassen, möchte ich nicht lange mehr leben. Da wir uns
fremd sind, wie sollte es ohne Irrwege abgehn? Auch nach dem
ersten Telephongespräch war ich es, die es zum zweiten Mal
versuchte. Freilich ist es beklagenswert! Unser jeder weiß um die
Stelle, wo alles einfach ist und verständlich, aber wir haben uns
zugeschworen, sie nicht preiszugeben ohne die sonderbarsten
Ehrenhändel. Man muß Mitleid haben mit Ihnen, denn Sie sind der
Ungeschicktere, als Mann, und weil Sie immer allein waren.
Ich will Ihnen sagen, was Sie vergaßen.
Dort lebten Sie, Ihre Eltern hier. Als Sie fortgingen, fingen Sie das
Labyrinth an, die viel hundert Verwandlungen. Alltäglich ein neues
Gesicht, eine neue Haut, ein neuer Blick, eine neue Welt. Immer
stand Ihnen das Auge einwärts gerichtet in die Schar der
unzählbaren Visionen, Möglichkeiten, Aufgaben. Sie lebten unter
Gottes Augen. Sie schlugen sich mit unzählbaren Verwirrungen. Sie
schliefen in einem Hammerwerk. Sie wälzten den Block, Sie hatten
um sich Luft vom Tartarus, Sie zwängten sich in sich selbst wie einen
Keil in einen Baum, Sie vergaßen Ihr eigenes Aussehn hinter
zehntausend Vermummungen, Sie waren sich immer rätselhaft,
immer unvollkommen, immer widerspänstig, immer wunderbar. Sie
lernten Unschätzbares kennen. Den Hunger und die Ohnmacht, die
Schlaflosigkeit und das Auge des Gottes im Finstern. Sie erledigten
Tausendfaches. Sie hafteten am Einen.
Nein, Sie konnten nicht an Andre denken. Habe ich ein Bild von
Ihnen oder habe ich nicht? — Nun eines von Ihren Eltern.

Sie hafteten am Einen: an Ihrem unhörbaren Schritt. Sie lernten
die Schlaflosigkeit mit dem ewigen Nachtgebet: Auch heut ist er
nicht gekommen. Sie litten die unaufhörliche Ohnmacht, nicht
zurücknehmen zu können. Und sonst? —
Alltäglich ein altes Gesicht, ein altes Tun, eine alte Sorge, eine alte
Bitterkeit. Alltäglich ein unveränderliches Ich, ein vollkommenes,
fertiges, unverstandenes und doch einfachstes, immer gleiches. Sie
wurden alt, sonst nichts. Sie blieben auf ihrer Stelle wie sanfte Tiere
und hatten nichts als ihr Älterwerden. Sie hofften jahrlang und
hofften dann nicht mehr. Sie vergaßen am Ende. Sie hatten nur das
endlose Einschlafen, sie wurden immer schläfriger wach, sie konnten
sich an nichts messen, sie waren die Einsamsten. Sie hatten ja noch
einen Sohn, sie hatten Arbeit, Sorge, Freuden, das tägliche Leben.
Ist es nicht elend, Freund, entsetzlich elend, nichts zu haben als das
eigene Leben? — Sie hatten die Stille, wo nichts laut ist als das
eigene Atemholen. Sie hatten auch den Zorn vielleicht, die Bitterkeit,
denn: sie waren immer die Unterlegenen. Oh sie hatten das
Allerschlimmste: sie konnten nicht verstehn. Sie wußten von sich
selber nur wenig, und wenn sie einmal nach oben fragten, so gab es
immer nur die eine Frage: warum ist dies so? Warum ist es denn
nicht anders? Wäre es anders nicht besser? — Ihnen war es nicht
gegeben, sich selbst zu bezwingen, denn — oh Ihre Weisheit! — sie
kannten das Auge der Notwendigkeit nicht! Sie hatten nur gelernt,
daß alte Menschen erfahrener sind als junge. Daß man sich nach
ihnen richten müßte. Sie hatten einfache Dinge gelernt. Nie hatten
sie gehört, was Ausnahmen sind, wie sollten sie eine erkennen, wie
sollten sie einwilligen? Unter ihren Lehrsätzen war der kostbarste
der: Wenn du einmal so alt bist wie wir, dann wirst du uns recht
geben.
Mache ich Ihnen Vorwürfe? Klage ich an? Ach, ich wollte, ich
könnte es, ich wollte, ich könnte Ihnen vorwerfen, warum Sie nicht
noch besser geworden sind, warum Sie niemals das Eine bedacht
haben, daß Sie in sich selber alles besitzen, daß Sie mit Leichtigkeit
der Unterlegene hätten scheinen und nachgeben können, da es
Ihnen nichts verschlug, ob Ihre Eltern glaubten, recht und gesiegt zu

haben! Wieviel größer wären Sie, wenn Sie das Unrecht an sich
gerissen hätten!
Nun vergeben Sie mir! Heute nur, heute sage ich Ihnen all dies,
damit Sie wissen, zu wem Sie kommen, wenn Sie heimgehn, wen Sie
zurückließen und wen Sie wiederfinden. Alte Menschen, augenlose,
die arme und eingelernte Worte murmeln, die mit zwanzig Jahren
alles auswendig wußten, die immer nur die paar alten Bücher in
neuen Auflagen, in ihrer armen Blindenschrift jahraus jahrein
nachtasten, und da stehn Sie nun in Ihrer Sonne und sind nicht
zufrieden.
Weiß ich zuviel? Jedes meiner Worte stand im Gesicht Ihrer Mutter
leserlich. Ich habe, auch ich, nur gelesen mit meinen ‚sehenden
Augen‘, die — nach Ihnen — eine Gabe sind — und ein Gesetz.
Gott befohlen!
Renate Montfort
Benvenuto Bogner an Renate von Montfort
Helenenruh, 14. Dezember
Liebes fremdes Fräulein:
Immer ist mir die Gestalt jenes Mädchens rührend gewesen, der
Pallas Athene, die um Odysseus am Ufer den Nebel zerteilte und ihm
seine Insel zeigte, die er nicht erkannte. Vor vielen Jahren kannte ich
eine Frau, die ich mit der Göttin verglich; damals stand ich im
Anfang, und es war ein andres Land, in das sie mich hineinführen
wollte. Meine richtige Heimat wars. Dem Odysseus war die Göttin
immer unsichtbar geblieben, obwohl sie ihm half; erst, als die
Mühsal beendet, als er anlangte, gab sie sich zu erkennen. Ich
freilich, ich gehe nicht meinetwegen heim, denn ich bin dort nicht
zuhause, aber am heutigen Tage und angesichts Ihrer schönen,
glühenden Bewegung will es mir wohl scheinen, als ob nur dieses
der Grund war, weshalb ich nicht lange schon dorthin ging: es fehlte
nur jemand, der es mir sagte, der mich bedenken hieß, der mich
verlockte.

So schön ist dies an euch, ihr sonderbaren Geschöpfe, so schön ist
eure ewige Bereitwilligkeit. Von euch selber seht ihr gerne ab, aber
immer steht ihr vor einem Tor, das ihr jemandem aufschlagen wollt.
Immer zu irgend etwas wollt ihr verlocken, immer helfen, immer
alles öffnen, immer einladen, immer begütigen. Unbedenklich greift
ihr das Schwerste an, als sei eben dieses das Allerleichteste; als sei
es das Einzige jedenfalls, was in diesem Augenblick zu geschehen
habe, und als ob ihr über göttliche Kräfte verfügtet. Denn immer
seid ihr stark für Andre, die ihr für euch selber meist hülflos,
unwissend und von vornherein unterlegen seid.
Muß ich noch mehr sagen? Ihre Worte haben mir alle wohlgetan,
und ich mache das Zugeständnis, das ich bisher nur mir selber
abgelegt habe, mit Freuden auch Ihnen: daß ich bedächtiger hätte
sein können. Das nützt ja nichts, aber ich glaube, es macht Ihnen
Freude, es zu hören.
Nun muß ich einiges über Magda schreiben.
Die Briefe sind hier mit vielem Dank zurück. Von allem, was die
arme Kleine betroffen hat, wußte ich ja Einiges —, die Geschichte
der Wahrsagung, ohne freilich die letzten Folgerungen Magdas auf
den al Manach. (Der schüttet sich noch immer aus, es ist eine
ziemliche Qual, das anzusehn, man hat die Vorstellung, daß er auch
die Nächte nicht anders herbringt, und dabei wird er dünn wie ein
weißer Faden. Und all das nicht ganz ohne Komik ...) Nein, es ist am
besten, ich schweige über alles; wir müssen warten.
Es geht ihr herzlich schlecht, das muß ich gestehn. Die Krankheit
ist behoben, sie ist seit ein paar Tagen fieberfrei, aber matt wie ein
verregneter Kohlweißling, mag nicht aufstehn und nicht liegen, ist
mißlaunig geworden, kann das Licht nicht vertragen und ist immer
müde. Als ich hierherkam, war sie das reine Kind, kindlich weise und
lerchenhaft, jetzt sieht sie altjüngferlich aus, gelb und hat grausame
Falten um den Mund. Die einzige Kraftanstrengung merkte ich ihr
an, als sie mir auftrug, Ihnen auf das bestimmteste zu verbieten,
daß Sie kämen. Dazu muß ich selber sagen, daß ich Ihrem Kommen
zurzeit wenig Einfluß zutraue. Vieles in ihr mag nur körperliche
Mattheit der kaum überwältigten Krankheit sein, deshalb dürften Sie

zu einer späteren Stunde gelegener kommen, wenn nicht gar eine
andre notwendiger sein wird. Sie spricht oft von Ihnen.
Folgendes trug sich gestern zu:
Ich hatte ein Weilchen an ihrem Bett gesessen und Silhouetten
von Rosen geschnitten bei halber Dämmerung; dem sieht sie gern
zu. Als ich dann am Fenster stand, hörte ich sie plötzlich ganz laut
sagen: Georg! — Ich wartete eine halbe Minute und sagte dann:
Nun? mit meiner gewöhnlichen Stimme, worauf ich sie ein wenig
später mit einem leisen Seufzer antworten hörte: Weißt du, Georg,
es ist doch schwerer, als man so denkt. — Nun ging ich zu ihr und
sagte möglichst freundlich: Georg ist nicht hier, mein Kind, möchtest
du ihm etwas sagen? — Sie sah mich lange und zweifelnd an und
fragte: Meinst du nicht, Maler Bogner, daß der Prinz ein guter
Mensch ist? — Gewiß, sagte ich, und nun rief sie mit einem
triumphierenden Blick, als ob sie mich jetzt erwischt hätte: Warum
ist er denn nicht hier und hilft mir? — Danach besann sie sich und
setzte altklug hinzu: Aber er muß ja fleißig sein und Herzog werden,
da kann er natürlich nicht kommen, nicht? Ich bestätigte ihr das,
und nun sagte sie nichts mehr.
Dies hat mich aber auf den Gedanken gebracht, ob es vielleicht
nützlich sein könnte, daß ich dem Prinzen schreibe und ihm
nahelege, Magda ein Zeichen von sich zu geben. Was meinen Sie
dazu?
Noch dies, daß ich glaube, die Jahreszeit ist an Vielem schuld. Sie
schrieb ja, daß sie sich vor dem Garten fürchtet, deshalb wehrt sie
sich auch so gegen das Fenster, hinter dem es stürmt und wirbelt. Es
wird das Beste sein, wir lassen Weihnachten noch vorübergehn;
danach ist meine Zeit in Helenenruh abgelaufen, und Sie versuchen
dann, was zu tun sein wird. Wenn es Ihnen möglich sein wird, mit
ihr nach Italien zu gehn, so wird es Ihnen auch besser als mir
gelingen, ihren Vater von der Notwendigkeit einer solchen Reise zu
überzeugen, da er sie zurzeit dicht am Gesunden glaubt, jeden Tag
eine Minute an ihrem Bett steht und meint: Es wird schon werden!
Ihnen herzlich dankbar und wieder „durchaus wohlgesinnt“
Bogner

Renate an Bogner
Am 22. Dezember
Lieber Freund,
Ihre beiden Skizzenblätter von Magda haben mich mehr erschreckt,
als Sie sich denken können. Das ist aus ihr geworden? Man möchte
ja verzweifeln, wenn einem nichts einfällt, um das wieder
gutzumachen. Und was haben Sie für eine Hand! Erinnern Sie sich
an das, was Magda schrieb, wie sie erschrocken sei über einen
gezeichneten Zug ihres Gesichts: als sei er daraus fortgenommen?
Das kann ich nun begreifen, denn diese beiden Gesichter sehen so
wirklich aus, als könnten sie nur hier, auf diesem Papier sein, als
hätten Sie sie von dem ihren abgenommen, — ach, wollte Gott, Sie
hätten es wirklich getan und sie wäre ihrer ledig für immer!
Zu Ihrer Idee mit dem Prinzen kann ich nicht ja und nicht nein
sagen. Da ist dies Gedicht, das er ihr damals schickte ... Nun, ich
kann mich ja irren und bin gerne bereit dazu, — aber liebevoller als
dies bereitwillige, sozusagen postwendende Verstehen und
Einverständnis, dies aufs Geratewohl prophezein (oder ist soviel
Ahnungsvermögen glaublich?) würde mir weniger Verständnis und
mehr Schmerz, weniger Entsagungsfreude und mehr Widerstreben
erscheinen. Überhaupt dies hurtige Umsetzen von Gefühl in Klang
und Beweis, dies Vergleichefinden und so weiter, — dichterisch mag
es ja wohl sein, und glauben Sie auch nicht, daß ich es menschlich
unwürdig finde! Es macht mich nur an seinen Gefühlen für Magda
zweifeln, für die er durchaus niemandem, auch ihr selber nicht,
verantwortlich zu sein hat, da bekanntlich, wo nichts ist, der Jude
sein Recht verloren hat, aber —. Aber. Punkt.
Viel Mühe habe ich mir gegeben, aus Ihrer Darstellung von
Magdas Zustande herauszulesen, daß sie auch des Grübelns müde
geworden ist, doch bin ich nicht ganz überzeugt. Da mußte ich
bedenken: Aus unsrer Kindheit in das Reich der Seele zu gelangen,
aus Kindern Gotteskinder zu werden, oder wie man es ausdrücken
will, das ist doch wohl unsre Aufgabe. Da giebt es nun unter uns

Viele, die können derlei Aufgaben nur in schrecklich harten
Stufungen erledigen, und deren einer ist unsre Magda, die aus dem
unwillkürlichen Jugendland, wo das leicht bewölkte Gemüt über
allem blaut und sich bescheinen läßt, nur über diese Messerbrücke
des Gedankens, des grübelnden Erleidens gehen konnte, — wohin?
In das eiserne Haus, das Arsenal, wo die Seelen ausgeteilt werden
wie Kleider? Unsre Vorstellungskraft reicht ja für seelische Dinge
niemals aus, und es klingt wohl absurd, was ich sage. Ich hätte
Saint-Georges vorher fragen sollen. (Das ist ein neuer Freund von
mir, der sich dadurch auszeichnet, daß er alles weiß.) Sie sind ja
auch ein weiser Herr und begreifen vielleicht, was ich sagen wollte.
Morgen ist Heiligabend. Da ist mir einigermaßen bänglich ums
Herz, denn kurz vor dem vorjährigen starb mein Vater. Was
Weihnachten ist, werden Sie kaum wissen, mir aber vielleicht doch
erlauben, Ihrer herzlich zu gedenken und einer alten Frau eine
Blume zu bringen.
Wann kommen Sie?
Renate Montfort
Renate an Magda
22. Dezember
Meine liebe, liebe Magda!
Einen Weinachtsbrief bekommst Du, obgleich Du, wie es scheint,
geschworen hast, mir nicht zu schreiben. Freilich in Eile, denn es
giebt unbeschreiblich viel zu tun. Alle Angestellten werden beschenkt
und haben Feiern und haben unzählbare Kindlein, die Geschenke
und Feiern haben wollen, und dann sind noch die Armen und die
Kinder der Armen, und allesamt wollen mir den Kopf ausreißen. Ich
bin froh, daß Du nun wenigstens wieder außer Bett bist. Wenn Du
morgen nicht selber ans Telephon kommst, wird es das letzte Mal
gewesen sein, daß ich angerufen habe, hörst Du?
Die Heidermappe vom Kunstanwärter (Josefs Bonmot!) wird
vielleicht den Groll des Malers erregen, aber da ich sie im Buchladen

fand, schien sie mir sehr schön, und Du wirst eine kleine Hirtin darin
finden, die genau so aussieht wie Du, als Du in Genf einzogst.
Ach, Liebste, Gott gebe nur, daß Du empfinden kannst, daß
Weihnachten ist! Ich habe Dir närrisches Zeug geschrieben, nur um
zu verhindern, daß mir die Augen wieder naß werden, wie immer,
wenn ich an Dich denke. Ich weiß auch nicht, was das mit mir ist.
Ich habe ein seltsames Angstgefühl schon seit vielen Tagen, mitten
in allem Getriebe und den Vorbereitungen, um Dich natürlich, warum
sonst, und unbeschreibliche Sehnsucht nach Dir und Deinen armen
bekümmerten Augen. Bogner soll Dir einen Rahmen für das
Hirtinnenbild verschaffen, damit Du es jeden Tag vor Dir hast und
lernst, wie Du aussehn mußt, wenn nicht gar zu traurig sein soll
Deine Dich tausendmal zärtlich küssende
Renate
Die Handschuh sind sämtlich von Onkel mit einem Kuß ‚auf die
zierlichste Hand‘; hoffentlich habe ich die Nummer richtig behalten.
Bogner an Georg
Helenenruh am 22. Dezember
Lieber Prinz:
Ich möchte Ihnen schreiben, daß Magda Chalybäus einige Zeit krank
gewesen ist und hiervon, und mehr von mancherlei seelischen
Erschütterungen der letzten Zeit, so angegriffen und ermattet
scheint, als wolle sie sich weigern, noch weiter am Dasein
teilzuhaben. Sie kennen mich ein wenig und können wissen, was es
zu bedeuten hat, wenn ich Sie darauf aufmerksam mache, daß ein
Wort, ein Lebenszeichen von Ihnen, vielleicht nicht heilsam, aber
doch wohltätig wirken könnte, wobei Sie zu ermessen haben, ob Sie
in der Lage zu dergleichen sind.
Sehe ich Sie auf der Trassenburg? Ich denke, in der Zeit zwischen
Weihnachten und Neujahr ein paar Tage dort zu sein, dann in
Trassenberg die Aula des neuen Genesungsheims auszumalen.
Herzlich grüßt Sie, Ihnen wohlgewogen

Bogner
Magda an Renate
Freitag
Ja, Renate, von mir bekommst Du diesmal nichts. Meine Arbeit ist
nicht fertig geworden, es ist wohl schlecht von mir, aber Du mußt
schon entschuldigen, ich bin gar zu müde. Der gute Jason schläfert
einen so schön ein, seit gestern haben wir auch Schnee und stillen
Wind, ich bin immer dicht vor dem Einschlafen und tu es bloß nicht,
weil ich Angst vor dem Aufwachen habe.
Also verzeih, wenn Du kannst, meine Nachlässigkeit.
Aber ich habe Dich doch lieb.
Hast Du den Maler gern? Ihr schreibt Euch ja wohl täglich, oh ich
bin nicht eifersüchtig, Ihr seid ja Beide viel klüger als ich und paßt
zueinander. Wenn Ihr dann heiraten wollt, kann die Hochzeit ja
gleich mit Irenes zusammen sein.
Ich muß aufhören. Papa hat seinen Toddy fertig und ist begierig,
die Fortsetzung von ‚Jettchen Gebert‘ zu hören. Er hat es fertig
gekriegt, daß Jason nur noch Romane aufsagt, und läßt grüßen.
Grüße auch Deinen Onkel! In Liebe küßt Dich Deine Freundin
M.
Bogner an Renate
Helenenruh am 24. Dezember
Heiligabend. Heiligabend? Heiligabend. Komisch.
Es wurden Äpfel gegessen. Es wurde Marzipan gegessen. Ein
großer Baum brannte voller Lichter. Es wurden Mürbekuchchen
gegessen. Es wurden Makronen gegessen. Es lag alles voll von
Geschenken. Großäugige Dienstboten in Verlegenheit. Es wurde
Spekulatius gegessen. Es wurde Schokolade gegessen. Herr du
meines Lebens! Es wurde Heringssalat gegessen. Es wurden abermal
Äpfel, Marzipan, Spekulatius, braune Kuchen, Nüsse, Datteln und
Pfefferkuchen gegessen, und jemand in einem himmlischen

silbergrauen Kimono sang sehr leise: Es — — ist — — ein Ros — —
ent — — spru — u — — gen — —
Ich erinnere mich, dies ist ein Fest der Mägen und der Kindheit. Es
läßt sich nicht umgehn. Es stimmt die Seele freundlich. Da sitze ich
in einem angenehm erleuchteten Zimmer voll vieler kleiner
Pferdeporträte, es schlägt elf Uhr, ich mache mein viertes Glas
heißen Toddy zurecht, ich sehe den Jason al Manach in der Sofaecke
sitzen und daß er glänzende Augen hat und wie ein Schlot raucht.
Ich glaube, er hat einen schönen Charakter. Ich esse Pfeffernüsse,
nie im Leben aß ich soviel verschiedene Dinge hintereinander. Ich
muß ein glücklicher Mensch sein. Ich fragte den Almanach nach
einem Zitat mit Kindheit! „O Kindheit! O entgleitende Vergleiche!“
Jemand packte mit schwachen Fingern eine ungeheure Kiste aus;
die Papiere nahmen kein Ende. Schließlich fiel doch etwas großes
Graues an die Erde. Bald darauf stand jemand unter dem
Kronleuchter, drehte sich und suchte an einem riesigen Lichterbaum
vorbei sein Abbild im großen Pfeilerspiegel zu erhaschen, wo es ganz
fern und seltsam hinter demselben gespiegelten Lichterbaum
sichtbar wurde, bekleidet mit einem silbergrauen Kimono, auf dessen
Rücken ein lebensgroßer weißer Pfau in Silber und Weiß gestickt war
dergestalt, daß sein Kopf zwischen den Schultern des Gewands, der
ausgebreitete Schweif mit hundert Federn und schneeweißen Augen
über die weiten Ärmel und bis zum Kleidsaum hinunterhingen.
Geschenk vom Maler Bogner; aus eigenem Besitz; extra aus Berlin
geholt.
Auch Maler Bogner besitzt einen nicht unschönen Charakter. Er
traf das Richtige. Auch ein längeres Telegramm vom Prinzen Georg
war sein Werk und wirkte bedeutend. Ich bin doch geneigt, die
Hauptwirkung dem Kimono zuzuschreiben. Derselbe war
unwiderstehlich. Er ließ sich nur glatt streichen. Dies war seine
unübertreffliche Eigenschaft. Damit erledigte sich alles. Wir kehrten
allesamt freudestrahlend zur Kindheit zurück und sangen völlig
falsch, aber liebevoll: Sti — ille — Nacht! Hei — lige Nacht! — Ach,
du liebe Zeit!
Sie hätten es sehen sollen! Wie der Kimono sich langsam in B.
Bogners Händen entfaltete. Wie zwei schlecht gelaunte Augen und

ein weinerlicher Mund aufmerksam wurden und stillstanden. Wie
unter einer Reisedecke eine Hand hervorkam und zaghaft zufaßte.
Wie der völlig in Andacht verlorene Mund das eine, beseligte Wort:
Seide! hervorbrachte. Wie die Augen um Gnade baten, aber der
Mund nicht wollte. Wie der Mund nicht mehr widerstehen konnte,
und die Augen widerspänstig waren. Wie der weiße Pfau strahlte.
Wie auf Stirn, Mund und Wangen das Wort Kindheit aufbrach wie ein
Zimmer voll Kerzen und Geschenke. Wie die Reisedecke fortflog, und
viel Gram und Kümmernis hinschwand vor einem weißen
Pfauenschweif.
Es ist lieblich, Feste zu feiern, wenn die Gelegenheit es mit sich
bringt. Weihnachten ist mir erinnerlich, wo ich bei einem Talglicht
über Kupferplatten gesessen habe tränenden Auges, ohne etwas von
den Möglichkeiten des Abends zu wissen; Weihnachten, wo ich mit
angezogenem Paletot im Bett lag und beim Licht einer
Straßenlaterne Käfer auf der hellen Wand über mir herumlaufen sah;
Weihnachten in einer Waschküche, gehüllt in weiße Dünste, einen
Kaffeetopf in der Hand; Weihnachten in einem angenehmen
Frauengemach neben einer Stehlampe und einem dunkelhaarigen
Mädchenkopf, der sich über Holzschnitte von Schongauer und Dürer,
über Schwarzweißblätter von Beardsley beugt, — sie liegt nun zwölf
Jahre schon still und beruhigt in Weißensee unter einem prächtigen
Monument, aber ich vergesse sie deshalb nicht. Auch Weihnachten
im blauweiß karierten Aschinger vor einem Paar Bierwürste mit
Salat, und Weihnachten mit einem Handköfferchen in der Hand in
einem schönen Geschäftszimmer vor einem zornigen alten
weißbärtigen Herrn neben einem offenen Geldschrank. Ja, da stehe
ich im siebenzehnten Jahre meines Lebens, bei mir mein Kamerad,
der Sohn des zornigen Kaufmanns, der mir eine ungeheure,
donnernde Rede gehalten hat, ob ich mir einbildete, daß er meine
verrückten Streiche hinter dem Rücken meiner Eltern unterstützte,
worauf er den Geldschrank aufschloß, seinem Sohn bedeutete, daß
er auf denselben achtgeben solle, bis er wiederkäme, und
verschwand. Da bestahl nun der Sohn den eigenen Vater, und B.
Bogner fuhr in dieser Nacht in die Welt, um ein Maler zu werden.

Ich sehe, man kann Weihnachten auf so vielerlei Arten begehn,
wie es Dinge an diesem Tage zu essen giebt.
Dies glaubte ich Ihnen sagen zu müssen. Nehmen Sie es
freundlich auf! In den nächsten Tagen fahre ich zum Herzog. Je
nachdem wie die Arbeit vonstatten geht, komme ich im April oder
Mai nach Altenrepen.
Gute Nacht, freundliches Wesen!
Bogner
Georg an Magda
(Telegramm)
24. Dezember
Hier ist Georg, Anna, steht draußen vor der Tür und weiß nicht, ob
jemand drinnen ist. Erlaubst Du ihm, ganz leise anzuklopfen, weil
Weihnachten ist? Ich bin in Trassenberg, diese Hälfte des Semesters
war schrecklich. Mama und Papa wünschen Dir und Deinem Vater
ein schönes Fest, ebenfalls Dein einsamer alter Georg.
Magda an Renate
Am ersten Feiertag
Renate, es liegt Schnee! Über Nacht ist er leise heruntergefallen,
während ich fest und warm schlief, und am Morgen schien er durch
einen Spalt im Vorhang so hell herein, daß ich gleich hinlief, und da
war draußen alles weiß und still, weithin, und kein Unterschied mehr
zwischen unserm Obstgarten und dem Park; überall standen die
schwarzen Bäume mit dicken weißen Pelzen, und ich hatte die
größte Lust, gleich in die Weihnachtsstube hinunterzulaufen, — hast
Du das auch getan? damals als man noch klein war? um
nachzusehn, ob auch alles noch da war? — um nach meinem
himmlischen Feepelz zu sehn und vor allem nach dem Kimono. Aber
davon mußt Du Dir von Bogner erzählen lassen. Nämlich, ich bin
heut zum ersten Mal draußen gewesen, nur ins Schlößchen hinüber,
um ihn zu besuchen, und da war er grade dabei, einen Brief für Dich

in den Umschlag zu tun. Den nahm ich ihm weg, und schließlich
erlaubte er mir, ihn zu lesen. Geheimnisse standen ja wahrhaftig
nicht drin. Nein, was er für ulkige Briefe schreiben kann!
Weißt Du, Renate, wir wollen ihn ja ruhig dabei lassen, daß ich
seiner Kimonoidee alles verdanke, und wirklich, — ein wenig schäme
ich mich sogar, daß er mich so überlistet hat. Ich wußte ja selber
nicht, wie gern und wie lange schon ich wieder ganz gesund werden
wollte, aber schon die letzten Tage war mir so sonderbar! Auf einmal
war es so schön, müde zu sein, und dann konnte ich mich auf nichts
mehr besinnen. Alles war hingeschwunden oder so fremd geworden,
daß es mit mir gar nichts mehr zu tun zu haben schien; es war alles
wie zugedeckt, ja, wie das Land von der Schneedecke, und ach, ich
möchte ja so innig, so innig möcht ich hoffen, daß es im Frühjahr,
wenn die Decke schwindet, alles neu und anders geworden ist!
Ich kann gar keine Gedanken mehr fassen. Es kommt mir vor, als
ob ich die ganze Welt durchgedacht hätte, und Jahre um Jahre hätte
es gedauert, aber nun stehe ich am andern Ausgang und weiß
kaum, wie ich dahin gekommen bin. Hilf mir nun, Du Gute, hilf mir,
daß ich nicht wieder anfangen muß, immer nur das Düstere und
Beklemmende zu denken! Hilf mir, daß ich so einfach und gläubig
sein kann wie Du, ich bin ja schwach und einsam, und es geht sich
so schwer unter solcher Last von Gedanken, — möchtest Du nicht,
daß ich für ein paar Wochen zu Euch käme? Am liebsten käm ich ja
morgen schon, aber vor Neujahr leidets Vater nicht; später wird er
mich wohl ganz gern entbehren. Wüßt ich nur, was aus dem armen
Jason werden soll! Vielleicht läßt er sich mitnehmen, ich muß ihn
einmal fragen, wo er eigentlich zu Hause ist. Bogner ist über Nacht
eingefallen, wie er mich malen muß; er hats freilich wieder
vergessen, aber das sei keine Frage, sagt er, daß er es wieder fände.
Dazu braucht er aber mich nicht mehr, und nun will er in den
nächsten Tagen nach Trassenberg zum Herzog. Denk Dir nur, wie
rührend er ist! Er hat mir einen herrlichen Lampenschirm gemacht
aus lichtgrünem Papier mit einer Menge Kreise und darin schwarze
Silhouetten von lauter lustigen Personen: Harlekine und Pantalons
und Kolombinen, phantastische Vögel und Affen, und das hat er alles

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

Elements Of Causal Inference Foundations And Learning Algorithms Jonas Peters Dominik Janzing Bernhard Scholkopf

Weitere ähnliche Inhalte

Empfohlen (20)

Elements Of Causal Inference Foundations And Learning Algorithms Jonas Peters Dominik Janzing Bernhard Scholkopf