Computational Mechanics With Deep Learning An Introduction Genki Yagawa

Computational Mechanics With Deep Learning An
Introduction Genki Yagawa download
https://ebookbell.com/product/computational-mechanics-with-deep-
learning-an-introduction-genki-yagawa-47107412
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Computational Mechanics With Neural Networks Genki Yagawa Atsuya Oishi
https://ebookbell.com/product/computational-mechanics-with-neural-
networks-genki-yagawa-atsuya-oishi-46245306
Computational And Experimental Fluid Mechanics With Applications To
Physics Engineering And The Environment 1st Edition Leonardo Di G
Sigalotti
https://ebookbell.com/product/computational-and-experimental-fluid-
mechanics-with-applications-to-physics-engineering-and-the-
environment-1st-edition-leonardo-di-g-sigalotti-4635122
Electromagnetic Field Interaction With Transmission Lines From
Classical Theory To Hf Radiation Effects F Rachidi
https://ebookbell.com/product/electromagnetic-field-interaction-with-
transmission-lines-from-classical-theory-to-hf-radiation-effects-f-
rachidi-2455822
Design And Nature Vi Comparing Design In Nature With Science And
Engineering S Hernandez
https://ebookbell.com/product/design-and-nature-vi-comparing-design-
in-nature-with-science-and-engineering-s-hernandez-5464362

Classical Mechanics A Computational Approach With Examples Using
Mathematica And Python Christopher W Kulp
https://ebookbell.com/product/classical-mechanics-a-computational-
approach-with-examples-using-mathematica-and-python-christopher-w-
kulp-37226070
Computational Methods In Fracture Mechanics Special Topic Volume With
Invited Peer Reviewed Papers Only M H Aliabadi Adrian Cisilino
https://ebookbell.com/product/computational-methods-in-fracture-
mechanics-special-topic-volume-with-invited-peer-reviewed-papers-only-
m-h-aliabadi-adrian-cisilino-4721672
Computational Methods And Experimental Measurements Xv Wit
Transactions On Modelling And Simulation G M Carlomagno
https://ebookbell.com/product/computational-methods-and-experimental-
measurements-xv-wit-transactions-on-modelling-and-simulation-g-m-
carlomagno-2266998
Computational Methods In Multiphase Flow Vi Wit Transactions On
Engineering Sciences C A Brebbia
https://ebookbell.com/product/computational-methods-in-multiphase-
flow-vi-wit-transactions-on-engineering-sciences-c-a-brebbia-2267006
Materials Characterisation V Computational Methods And Experiments Wit
Transactions On Engineering Sciences 5th Edition A Mammoli
https://ebookbell.com/product/materials-characterisation-v-
computational-methods-and-experiments-wit-transactions-on-engineering-
sciences-5th-edition-a-mammoli-2454030

Lecture Notes on Numerical Methods
in Engineering and Sciences
Genki Yagawa
Atsuya Oishi
Computational
Mechanics
with Deep
Learning
An Introduction

Lecture Notes on Numerical Methods
in Engineering and Sciences
Series Editor
Eugenio Oñate , Jordi Girona, 1, Edifici C1 - UPC, Universitat Politecnica de
Catalunya, Barcelona, Spain
Editorial Board
Charbel Farhat, Department of Mechanical Engineering, Stanford University,
Stanford, CA, USA
C. A. Felippa, Department of Aerospace Engineering Science, University of
Colorado, College of Engineering & Applied Science, Boulder, CO, USA
Antonio Huerta, Universitat Politècnica de Cataluny, Barcelona, Spain
Thomas J. R. Hughes, Institute for Computational Engineering, University of Texas
at Austin, Austin, TX, USA
Sergio Idelsohn, CIMNE - UPC, Barcelona, Spain
Pierre Ladevèze, Ecole Normale Supérieure de Cachan, Cachan Cedex, France
Wing Kam Liu, Evanston, IL, USA
Xavier Oliver, Campus Nord UPC, International Center of Numerical Methods,
Barcelona, Spain
Manolis Papadrakakis, National Technical University of Athens, Athens, Greece
Jacques Périaux, CIMNE - UPC, Barcelona, Spain
Bernhard Schrefler, Mechanical Sciences, CISM - International Centre for
Mechanical Sciences, Padua, Italy
Genki Yagawa, School of Engineering, University of Tokyo, Tokyo, Japan
Mingwu Yuan, Beijing, China
Francisco Chinesta, Ecole Centrale de Nantes, Nantes Cedex 3, France

This series publishes text books on topics of general interest in the field of
computational engineering sciences.
The books will focus on subjects in which numerical methods play a fundamental
role for solving problems in engineering and applied sciences. Advances in finite
element, finite volume, finite differences, discrete and particle methods and their
applications to classical single discipline fields and new multidisciplinary domains
are examples of the topics covered by the series.
The main intended audience is the first year graduate student. Some books define
the current state of a field to a highly specialised readership; others are accessible to
final year undergraduates, but essentially the emphasis is on accessibility and clarity.
The books will be also useful for practising engineers and scientists interested in
state of the art information on the theory and application of numerical methods.

Genki Yagawa · Atsuya Oishi
Computational Mechanics
with Deep Learning
An Introduction

Genki Yagawa
Professor Emeritus
University of Tokyo and Toyo University
Tokyo, Japan
Atsuya Oishi
Graduate School of Technology
Industrial and Social Sciences
Tokushima University
Tokushima, Japan
ISSN 1877-7341 ISSN 1877-735X (electronic)
Lecture Notes on Numerical Methods in Engineering and Sciences
ISBN 978-3-031-11846-3 ISBN 978-3-031-11847-0 (eBook)
https://doi.org/10.1007/978-3-031-11847-0
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface
Computational Mechanics
It is well known that various physical, chemical, and mechanical phenomena in
nature and the behaviors of artificially created structures and devices are described
by partial differential equations. While partial differential equations are rarely solved
by analytical methods except those under idealized and special conditions, such
numerical methods as the finite element method (FEM), the finite difference method
(FDM), and the boundary element method (BEM) can approximately solve most
partial differential equations using a grid or elements that spatially subdivide the
object. Thus, the development of these numerical methods has been a central issue
in computational mechanics. In these methods, the fineness of the grid or elements is
directly related to the accuracy of the solution. Therefore, many research resources
have been devoted to solving larger simultaneous equations faster, and together with
the remarkable advances in computers, it has become possible to solve very large
problems that were considered to be unsolvable some decades ago.
Nowadays, it has become possible to analyze a variety of complex phenomena,
expanding the range of applications of computational mechanics.
Deep Learning
On the other hand, the advances of computers have brought about significant devel-
opments in machine learning, which aims classifying and making decisions by the
process of finding inherent rules and trends in large amounts of data based on
algorithms rather than human impressions and intuition.
Feedforward neural networks are one of the most popular machine learning algo-
rithms. They have the ability to approximate arbitrary continuous functions and have
been applied to various fields since the development of the error back propagation
learning in 1986. Since the beginning of 21st century, they have become able to
v

vi Preface
use many hidden layers, called deep learning. Their areas of applications have been
further expanded due to the performance improvement by using more hidden layers.
Computational Mechanics with Deep Learning
Although the development of computational mechanics including the FEM has made
it possible to analyze various complex phenomena, there still remain many problems
that are difficult to deal with. Specifically, numerical solution methods such as the
FEM are solid solution methods based on mathematical equations (partial differential
equations), so they are useful when finding solutions to partial differential equations
based on given boundary and initial conditions. However, it is not the case when
estimating boundary and initial conditions from the solutions. In fact, the latter is
often encountered in the design phase of artifacts.
In addition, as deep learning and neural networks can discover mapping relations
between data without explicit mathematical formulas, it is possible to find inverse
mappings only by swapping the input and output. For this reason, deep learning and
neural networks have been accepted in the field of computational mechanics as an
important method to deal with the weak points of conventional numerical methods
such as the FEM.
They were mainly applied to such limited areas as the estimation of constitu-
tive laws of nonlinear materials and non-destructive evaluation, but with the recent
development of deep learning, their applicability has been expanded dramatically. In
other words, a fusion has started between deep learning and computational mechanics
beyond the conventional framework of computational mechanics.
Readership
The authors’ previous book titled Computational Mechanics with Neural Networks
published in 2021 from Springer covers most of the applications of neural networks
and deep learning in computational mechanics from its early days to the present
together with applications of other machine learning methods. Its concise descrip-
tions of individual applications make it suitable for researchers and engineers to get
an overview of this field.
On the other hand, the present book, Computational Mechanics with Deep
Learning: An Introduction, is intended to select carefully some recent applications of
deep learning and to discuss each application in detail, but in an easy-to-understand
manner. Sample programs are included for the readers to try out in practice. This
book is therefore useful not only for researchers and engineers, but also for a wide
range of readers who are interested in this field.

Preface vii
Structure of This book
The present book is written from the standpoint of integrating computational
mechanics and deep learning, consisting of three parts: Part I (Chaps. 1–3) covers the
basics, Part II (Chaps. 4–8) covers several applications of deep learning to computa-
tional mechanics with detailed descriptions of the fields of computational mechanics
to which deep learning is applied, and Part III (Chaps. 9–10) describes programming,
where the program codes for both computational mechanics and deep learning are
discussed in detail. The authors have tried to make the program not a black box, but
a useful tool for readers to fully understand and handle the processing. The contents
of each chapter are summarized as follows:
Part I Fundamentals:
In Chap. 1, the importance of deep learning in computational mechanics is given
first and then the development process of deep learning is reviewed. In addition,
various new methods used in deep learning are introduced in an easy-to-understand
manner.
Chapter 2 is devoted to the mathematical aspects of deep learning. It discusses the
forward and backward propagations of typical network structures in deep learning,
such as fully connected feedforward neural networks and convolutional neural
networks,usingmathematicalformulaswithexamples,andalsolearningacceleration
and regularization methods.
Chapter 3 discusses the current research trends in this field based on articles
published in several journals. Many of these articles are compiled in the reference
list, which may be useful for further study.
Part II Case Study:
Chapter 4 presents an application of deep learning to the elemental integration
process of the finite element method. It is shown that a general-purpose numerical
integration method can be optimized for each integrand by deep learning to obtain
better results.
Chapter 5 introduces a method for improving the accuracy of the finite element
solutions by deep learning, showing how deep learning can break the common
knowledge that a fine mesh is essential to obtain an accurate solution.
Chapter 6 is devoted to an application of deep learning to the contact point search
process in contact analysis. It deals with contact between smooth contact surfaces
defined by NURBS and B-spline basis functions, showing how deep learning helps
to accelerate and stabilize the contact analysis.
Chapter 7 presents an application of deep learning to fluid dynamics. A convolu-
tional neural network is used to predict the flow field, showing its unparalleled speedy
calculation against that of conventional computational fluid dynamics (CFD).
Chapter 8 discusses further applications of deep learning to solid and fluid
analysis.
Part III Computational Procedures:
Chapter 9 describes some programs to be used for the application problems:
Sect. 9.1 programs in the field of computational mechanics, such as the element

viii Preface
stiffness matrix calculation program, and Sect. 9.2 those in the field of deep learning,
such as the feedforward neural network, both of which are given with background
mathematical formulas.
Chapter 10 presents programs for the application of deep learning to the elemental
integration discussed in Chap. 4. With these programs and those presented in Chap. 9,
the readers of the present book could easily try “Computational Mechanics with Deep
Learning” by themselves.
Tokyo, Japan
Tokushima, Japan
May 2022
Genki Yagawa
Atsuya Oishi

Acknowledgements
We would like to express our gratitude to Y. Tamura, M. Masuda, and Y. Nakabayashi
for providing the data for Chap. 7. We also express our cordial thanks to all the
colleagues and students who have collaborated with us over several decades in the
field of computational mechanics with neural networks/deep learning: S. Yoshimura,
M. Oshima, H. Okuda, T. Furukawa, N. Soneda, H. Kawai, R. Shioya, T. Horie, Y.
Kanto, Y. Wada, T. Miyamura, G. W. Ye, T. Yamada, A. Yoshioka, M. Shirazaki, H.
Matsubara, T. Fujisawa, H. Hishida, Y. Mochizuki, T. Kowalczyk, A. Matsuda, C.
R. Pyo, J. S. Lee, and K. Yamada.
We are particularly grateful to Prof. E. Oñate (CIMNE/Technical Univ. of
Catalonia, Spain) for his kind and important suggestions and encouragements during
the publication process of this book.
Tokyo, Japan
Tokushima, Japan
Genki Yagawa
Atsuya Oishi
ix

Contents
Part I Fundamentals
1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Deep Learning: New Way for Problems Unsolvable
by Conventional Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Progress of Deep Learning: From McCulloch–Pitts Model
to Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 New Techniques for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.1 Numerical Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Adversarial Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3.3 Dataset Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.3.4 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3.5 Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.6 Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . 31
1.3.7 Variational Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.3.8 Automatic Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 39
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2 Mathematical Background for Deep Learning . . . . . . . . . . . . . . . . . . . 49
2.1 Feedforward Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.3 Training Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3.1 Momentum Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3.2 AdaGrad and RMSProp . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.3.3 Adam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.4 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4.1 What Is Regularization? . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4.2 Weight Decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.4.3 Physics-Informed Network . . . . . . . . . . . . . . . . . . . . . . . . . 72
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
xi

xii Contents
3 Computational Mechanics with Deep Learning . . . . . . . . . . . . . . . . . . 75
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 Recent Papers on Computational Mechanics with Deep
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Part II Case Study
4 Numerical Quadrature with Deep Learning . . . . . . . . . . . . . . . . . . . . . 95
4.1 Summary of Numerical Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.1.1 Legendre Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.1.2 Lagrange Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.1.3 Formulation of Gauss–Legendre Quadrature . . . . . . . . . . 98
4.1.4 Improvement of Gauss–Legendre Quadrature . . . . . . . . . 101
4.2 Summary of Stiffness Matrix for Finite Element Method . . . . . . . 103
4.3 Accuracy Dependency of Stiffness Matrix on Numerical
Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.4 Search for Optimal Quadrature Parameters . . . . . . . . . . . . . . . . . . . 114
4.5 Search for Optimal Number of Quadrature Points . . . . . . . . . . . . . 122
4.6 Deep Learning for Optimal Quadrature of Element
Stiffness Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.6.1 Estimation of Optimal Quadrature Parameters
by Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.6.2 Estimation of Optimal Number of Quadrature
Points by Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.7 Numerical Example A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.7.1 Data Preparation Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.7.2 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.7.3 Application Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.8 Numerical Example B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.8.2 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5 Improvement of Finite Element Solutions with Deep Learning . . . . . 139
5.1 Accuracy Versus Element Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.2 Computation Time versus Element Size . . . . . . . . . . . . . . . . . . . . . 141
5.3 Error Estimation of Finite Element Solutions . . . . . . . . . . . . . . . . . 148
5.3.1 Error Estimation Based on Smoothing of Stresses . . . . . 148
5.3.2 Error Estimation Using Solutions Obtained
by Various Meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.4 Improvement of Finite Element Solutions Using Error
Information and Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.5 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

Contents xiii
5.5.2 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6 Contact Mechanics with Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.1 Basics of Contact Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.2 NURBS Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.3 NURBS Objects Based on NURBS Basis Functions . . . . . . . . . . . 180
6.4 Local Contact Search for Surface-to-Surface Contact . . . . . . . . . . 188
6.5 Local Contact Search with Deep Learning . . . . . . . . . . . . . . . . . . . 192
6.6.2 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7 Flow Simulation with Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.1 Equations for Flow Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.2 Finite Difference Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
7.3 Flow Simulation of Incompressible Fluid with Finite
Difference Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.3.1 Non-dimensional Navier–Stokes Equations . . . . . . . . . . . 218
7.3.2 Solution Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.3.3 Example: 2D Flow Simulation of Incompressible
Fluid Around a Circular Cylinder . . . . . . . . . . . . . . . . . . . 221
7.4 Flow Simulation with Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . 222
7.5 Neural Networks for Time-Dependent Data . . . . . . . . . . . . . . . . . . 225
7.5.1 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.5.2 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . . . . 230
7.6.2 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8 Further Applications with Deep Learning . . . . . . . . . . . . . . . . . . . . . . . 241
8.1 Deep Learned Finite Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.1.1 Two-Dimensional Quadratic Quadrilateral
Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
8.1.2 Improvement of Accuracy of [B] Matrix Using
Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
8.2 FEA-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
8.2.1 Finite Element Analysis (FEA) With Convolution . . . . . 253
8.2.2 FEA-Net Based on FEA-Convolution . . . . . . . . . . . . . . . . 259
8.2.3 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
8.3 DiscretizationNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

xiv Contents
8.3.1 DiscretizationNet Based on Conditional
Variational Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
8.4 Zooming Method for Finite Element Analysis . . . . . . . . . . . . . . . . 269
8.4.1 Zooming Method for FEA Using Neural Network . . . . . 269
8.5 Physics-Informed Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 275
8.5.1 Application of Physics-Informed Neural Network
to Solid Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Part III Computational Procedures
9 Bases for Computer Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
9.1 Computer Programming for Data Preparation Phase . . . . . . . . . . . 285
9.1.1 Element Stiffness Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
9.1.2 Mesh Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
9.1.3 B-Spline and NURBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
9.2 Computer Programming for Training Phase . . . . . . . . . . . . . . . . . . 325
9.2.1 Sample Code for Feedforward Neural Networks
in C Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
in C with OpenBLAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
in Python Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
9.2.4 Sample Code for Convolutional Neural Networks
in Python Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
10 Computer Programming for a Representative Problem . . . . . . . . . . . 381
10.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
10.2 Data Preparation Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
10.2.1 Generation of Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
10.2.2 Calculation of Shape Parameters . . . . . . . . . . . . . . . . . . . . 385
10.2.3 Calculation of Optimal Numbers of Quadrature
Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
10.3 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
10.4 Application Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

Chapter 1
Overview
Abstract In this chapter, we provide an overview of deep learning. Firstly, in
Sect. 1.1, the differences between deep learning and conventional methods and also
the special role of deep learning are explained. Secondly, Sect. 1.2 shows a histor-
ical view of the development of deep learning. Finally, Sect. 1.3 gives various new
techniques used in deep learning.
1.1 Deep Learning: New Way for Problems Unsolvable
by Conventional Methods
Deep learning could be said as a revision of feedforward neural networks. Both of
them have been adopted in various fields of computational mechanics since their
emergence due to the reason that these techniques have a potential to compensate
for the weakness of conventional computational mechanics methods.
Let us consider a simple problem to know the role of deep learning in
computational mechanics as follows:
Problem 1 Assume a square plate, its bottom fixed at both ends, and
loaded partially at the top (Fig. 1.1a). Let us find the displacements
(u1, v1), . . . , (u4, v4) at the four points at the top (Fig. 1.1b).
The first solution method, which will be the simplest, is to actually apply a load
to the plate and measure the displacements, which may give the most reliable results
if it is easy to set up the experimental conditions and measure the physical quantity
of interest. This method can be called an experiment-based solution method.
The second method that comes to mind is to calculate the displacements by the
finite element analysis [32]. This problem can be solved by the two-dimensional
finite element stress analysis based on the following three kinds of equations.
Equations of balance of forces in an analysis region:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
G. Yagawa and A. Oishi, Computational Mechanics with Deep Learning,
Lecture Notes on Numerical Methods in Engineering and Sciences,
https://doi.org/10.1007/978-3-031-11847-0_1
3

4 1 Overview
(a) (b)
Fig. 1.1 Square plate under tensile force
{
∂σx
∂x
+
∂τxy
∂y
= 0
∂τxy
∂x
+
∂σy
∂y
= 0
in Ω (1.1.1)
Equations of equilibrium at the load boundary:
{
σx nx + τxyny = Tx
τxynx + σyny = Ty
on ┌σ (1.1.2)
Equations at the displacement boundary:
{
u = u
v = v
on ┌u (1.1.3)
Equation (1.1.1) is solved under the conditions Eqs. (1.1.2) and (1.1.3). Equa-
tion (1.1.2), which describes equilibrium at the load boundary, is called the Neumann
boundary condition, and Eq. (1.1.3), which describes the fixed displacements, the
Dirichlet boundary condition.
Based on the finite element method, Eqs. (1.1.1), (1.1.2) and (1.1.3) are formulated
as a set of linear equations as follows [72]:
[K]{U} = {F} (1.1.4)
where [K] in the left-hand side is called the coefficient matrix or the global stiffness
matrix, {U} the vector of nodal displacements, and {F} the right-hand side vector
calculated from the nodal equivalent load. The nodal displacements of all the nodes
in the domain to be solved are obtained by solving the simultaneous linear equations,
Eq. (1.1.4). For each of the four points specified in the problem, the displacements

1.1 Deep Learning: New Way for Problems Unsolvable … 5
of the point can be directly obtained as the nodal displacements if the point is a node,
or by interpolating the displacements of the surrounding nodes if it is not a node.
This solution method, which is based on the numerical solution of partial differential
equations, is called a computational method based on differential equation or, simply,
equation-based numerical solution method.
Then, we consider the following problem.
Problem 2 Assume the same square plate, its bottom fixed at both ends, and
loaded at one side of the top as Problem 1 (Fig. 1.1a). But, as shown in Fig. 1.2,
there is a hole inside the plate. Find the displacements (u1, v1), . . . , (u4, v4) at
the four points at the top (Fig. 1.1b).
The experiment for this problem may be more difficult than for the previous case.
Especially, if the domain is not a plate but a cube with a void being embedded, it will
be very time consuming to prepare for the experiment.
On the other hand, the equation-based numerical solution method can solve
Problem 2 without any difficulty by using a mesh divided according to the given
shape. This versatility of the equation-based numerical solution methods such as
the finite element method is a great advantage over the experiment-based solution
methods.
Supported by this advantage, it has become possible for numerical methods to
deal with almost all kinds of applied mechanics problems. Nowadays, the methods
are taken as the first choice for solving various problems.
However, it is clear that even the equation-based numerical solution method is
not a panacea if we consider the following problem.
Problem 3 Assume a square plate, its bottom fixed at both ends, loaded at one
side of the top (Fig. 1.1a) and the displacements at the four points at the top
(a) (b) (c)
(triangular hole)
(round hole)
(square hole)
Fig. 1.2 Square plates with embedded holes

6 1 Overview
Fig. 1.3 Estimation of
shape and location of an
embedded hole
known as (u1, v1), . . . , (u4, v4). Find the shape and the position of an unknown
hole in the plate (Fig. 1.3).
Apparently, neither the experiment-based nor the equation-based numerical solu-
tion methods can solve this problem. So, what is the difference between Problems 1
and 2, and Problem 3?
In the equation-based numerical solution method for Problems 1 and 2, the
governing equations for displacements are solved under a given load condition called
as the Neumann condition and a fixation condition called as the Dirichlet condition,
and additional boundary conditions, such as the shape and the position of the hole,
where the displacements of all the nodes in the domain are obtained as the solution.
This is equivalent to an approach to achieve a mapping relation as follows:
g :
⎧
⎨
⎩
Dirichret boundary condition
Neumann boundary condition
Hole parameters
→
⎧
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎩
u1
v1
.
.
.
u4
v4
⎫
⎪
⎪
⎪
⎪
⎪
⎬
⎪
⎪
⎪
⎪
⎪
⎭
(1.1.5)
To solve the direct problem with the equation-based numerical method is equal
to find this kind of mapping.
On the other hand, the mapping relation to solve Problem 3 is expressed as

h :
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
Dirichret boundary condition
Neumann boundary condition
⎧
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎩
u1
v1
.
.
.
u4
v4
⎫
⎪
⎪
⎪
⎪
⎪
⎬
⎪
⎪
⎪
⎪
⎪
⎭
→ Hole parameters (1.1.6)
Solvingtheinverseproblemisequaltofindtheabovekindofmapping.Inthiscase,
it is to find mapping from the displacements that would usually be results obtained
by solving the governing equations to the hole parameters (shape and position of the
hole) that are conditions usually considered as input to solve the governing equations
[41].
It is clear that the inverse problem is much difficult to handle than the direct
problem, where the solution can be achieved directly through the routine operation
of solving equations. It is noted that an inverse problem such as Problem 3 is a type
of problem that we encounter often when we design an artifact, asking “How can we
satisfy this condition?” This means that solving inverse problems efficiently is one
of the most important issues for applied mechanics.
Now, omitting the parameters used in Eq. (1.1.5), we have
g : HoleParams →
⎧
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎩
u1
v1
.
.
.
u4
v4
⎫
⎪
⎪
⎪
⎪
⎪
⎬
⎪
⎪
⎪
⎪
⎪
⎭
or
⎧
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎩
u1
v1
.
.
.
u4
v4
⎫
⎪
⎪
⎪
⎪
⎪
⎬
⎪
⎪
⎪
⎪
⎪
⎭
= g(HoleParams) (1.1.7)
Similarly, Eq. (1.1.6) can be written in concise form as follows:
h :
⎧
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎩
u1
v1
.
.
.
u4
v4
⎫
⎪
⎪
⎪
⎪
⎪
⎬
⎪
⎪
⎪
⎪
⎪
⎭
→ HoleParams or HoleParams = h
⎛
⎜
⎜
⎜
⎜
⎜
⎝
⎧
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎩
u1
v1
.
.
.
u4
v4
⎫
⎪
⎪
⎪
⎪
⎪
⎬
⎪
⎪
⎪
⎪
⎪
⎭
⎞
⎟
⎟
⎟
⎟
⎟
⎠
(1.1.8)
Employing repeatedly the equation-based method to find the displacements
(u1, v1), . . . , (u4, v4) of the four points on the top of the plate for various hole
parameters, we can get a lot of data pairs of the hole parameters, HoleParams(i) =
(p1(i), p2(i), . . . , pn(i)), and the displacements calculated using the parameters,
(u1(i), v1(i)), . . . , (u4(i), v4(i)), shown as

8 1 Overview
{HoleParams(1), ((u1(1), v1(1)), . . . , (u4(1), v4(1)))}
{HoleParams(2), ((u1(2), v1(2)), . . . , (u4(2), v4(2)))}
.
.
.
{HoleParams(N), ((u1(N), v1(N)), . . . , (u4(N), v4(N)))}
(1.1.9)
Now, let H() be an arbitrary function (mapping) with
(u1(i), v1(i)), . . . , (u4(i), v4(i)) as input and the approximate values of
HoleParams(i) as output, which is written as
(
pH
1 (i), pH
2 (i), . . . , pH
n (i)
)
= H(u1(i), v1(i), . . . , u4(i), v4(i)) (1.1.10)
Then, let us find the H() among all the admissible candidates, that minimizes
L =
N
Σ
i=1
n
Σ
j=1
(
pj (i) − pH
j (i)
)2
(1.1.11)
As H(), which minimizes L, corresponds to the map h in Eq. (1.1.8), it is expected
that given the displacements (u1, v1), . . . , (u4, v4) as input, a set of values of the hole
parameters corresponding to the input data is estimated.
(
pH
1 , pH
2 , . . . , pH
n
)
= H(u1, v1, . . . , u4, v4) (1.1.12)
This approach is considered to be a solution method that attempts to derive a
solution by utilizing a large number of data as shown in Eq. (1.1.9) and can be called
a computational method based on data or, simply, a data-based solution method, one
of the most powerful solution methods for inverse problems.
Here, we discuss how we find the mapping H(), which minimizes L. It is known
that feedforward neural networks [27] and deep learning [22], an extension of feed-
forward neural networks, are able to construct a mapping H() from data pairs. Specif-
ically, H(), which corresponds to the mapping h in Eq. (1.1.8), can be constructed
by the error back propagation learning using the data in Eq. (1.1.9) as training data,
u1(i), v1(i), . . . , u4(i), v4(i) as input data, and p1(i), p2(i), . . . , pn(i) as teacher
signals (see Sect. 2.1).
As described above, feedforward neural networks and their advanced form, deep
learning are powerful techniques of data-based solution methods that can deal with
inverse problems difficult with conventional computational mechanics methods.
Finally, let us consider the following problem.
Problem 4 Assume a square plate with its bottom fixed at both ends, loaded
at one side of the top (Fig. 1.1a), and the displacements at the four points
on the top being measured as (u1, v1), . . . , (u4, v4). Then, find the shape and

Fig. 1.4 Estimation of
shape and location of an
embedded hole by
minimizing a function
the position of a hole in the domain that minimize (v1 − v4)2
+ (v2 − v3)2
(Fig. 1.4).
The above problem, minimizing or maximizing some value, is that we often
encounter in designing artifacts, where we must seek an optimization point.
A possible way to solve Problem 4 is to repeatedly use the equation-based
methods to calculate (u1(i), v1(i)), . . . , (u4(i), v4(i)) and then (v1(i) − v4(i))2
+
(v2(i) − v3(i))2
to be minimized for all possible HoleParams(i), which is the so-
called “brute-force” method. However, the process of calculating the displacements
using the equation-based solution method, i.e., the analysis process using the finite
element method, is considered unpractical due to the enormous computational load.
On the other hand, the evolutionary computation algorithms, such as the genetic
algorithms [46], are often employed as we can reduce the number of calculation cycle
of the finite element analyses, resulting in high efficiency in solving optimization
problems. Specifically, the genetic algorithm is known to be efficient in finding an
optimal HoleParams(i) as the search area can be narrowed.
In addition, the data-based solution methods such as the feedforward neural
networks can also be used to dramatically reduce the huge computational load.
Based on the data in Eq. (1.1.9), let G() be an arbitrary function (mapping) with
HoleParams(i) as input and (u1(i), v1(i)), . . . , (u4(i), v4(i)) as output. Thus, we
have
(
uG
1 (i), vG
1 (i)
)
, . . . ,
(
uG
4 (i), vG
4 (i)
)
= G(HoleParams(i)) (1.1.13)
Then, among the broad range of admissible G()s, we find the G() that minimizes
the equation as follows:
N
Σ
i=1
4
Σ
j=1
{(
u j (i) − uG
j (i)
)2
+
(
vj (i) − vG
j (i)
)2
}
→ min (1.1.14)

10 1 Overview
Here, G() outputs the displacements (u1, v1), . . . , (u4, v4) for the given
HoleParams, which is almost equivalent to the finite element analysis. In other words,
the finite element analysis, which is an equation-based solution method with high
computational load, can be replaced by a neural network with low computational
load, which is constructed by a data-based solution method. This kind of neural
network is often called a surrogate model of the original finite element analysis,
which is an example of the application of the data-based solution method to direct
problems.
As is well recognized, the equation-based numerical solution methods, such as
the finite element analysis, have been the major tool of conventional computational
mechanics, expanding their area of application to various problems. As a result,
majority of mechanical phenomena can now be solved by the equation-based numer-
ical solutionmethod, replacingtheexperiment-basedsolutionmethod. However, they
are still insufficient for solving inverse and optimization problems, both of which are
important in many fields. In contrast, the data-based solution methods such as neural
networks and deep learning can tackle rather easily the above problems. In other
words, the data-based solution method will remedy the weakness of the equation-
based numerical solution method, becoming a powerful way for tackling inverse and
optimization problems.
1.2 Progress of Deep Learning: From McCulloch–Pitts
Model to Deep Learning
In this section, the development of deep learning and its predecessor, feedforward
neural networks, is studied.
First, let us review a feedforward neural network, the predecessor of deep learning,
which is a network consisting of layers of units with connections between units in
adjacent layers. A unit performs multiple-input, single-output nonlinear transforma-
tion, similarly to a biological neuron (Fig. 1.5). In a feedforward neural network with
n layers, the first layer is called the input layer, the second to (n − 1)th layers the
intermediate or hidden layers, and the nth layer the output layer. Figure 1.6 shows
the structure of a feedforward neural network. The signal input to the input layer
is sequentially passed through the hidden layers and becomes the output signal at
the output layer. Here, the input signal undergoes a nonlinear transformation in each
layer. A feedforward neural network is considered “deep,” if it has five or more
nonlinear transformation layers [43].
A brief chronology of feedforward neural networks and deep learning is shown
as follows:
1943 McCulloch–Pitts model [45]
1958 Perceptron [56]
1967 Stochastic gradient decent [1]
1969 Perceptrons [47]

1.2 Progress of Deep Learning: From McCulloch–Pitts Model … 11
Fig. 1.5 Unit
Fig. 1.6 Feedforward neural
network
1980 Neocognitron [18]
1986 Back propagation algorithm [58]
1989 Universal approximator [19, 29]
1989 Convolutional neural network [42]
2006 Pretraining with restricted Boltzmann machine [28]
2006 Pretraining with autoencoders [4]
2012 AlexNet [40]
2016 AlphaGo [61]
2017 AlphaGo Zero [62]
The McCulloch–Pitts model was proposed as a mathematical model of biological
neurons [45], where inputs I1, . . . , In are the outputs of different neurons, each input
is multiplied by weights w1, . . . , wn and summed, and then bias θ is added to the
input u of the activation function as shown in Fig. 1.7. The neuron outputs a single
value f (u) as the output value O as follows:

12 1 Overview
Fig. 1.7 Mathematical
model of a neuron
O = f (u) = f
( n
Σ
i=1
wi Ii + θ
)
(1.2.1)
In this model, the output of the neuron is binary (0 or 1), and the Heaviside function
is used as the activation function as
O = f (u) =
{
1 (u ≥ 0)
0 (u < 0)
(1.2.2)
Later, the perceptron was introduced in 1958, demonstrating the ability of
supervised learning for pattern recognition [56]. Here, we discuss a two-class
(C1, C2) classification problem for d-dimensional data xi = (xi1, . . . , xid)T
using
this model (Fig. 1.8). If we expand the dimension of the data by one and set
xi = (1, xi1, . . . , xid)T
, and set w = (w0, w1, . . . , wd)T
for the weights, then u
in Eq. (1.2.1) can be described as
u = wT
xi = w0 + w1xi1 + w2xi2 + · · · + wd xid (1.2.3)
Fig. 1.8 Perceptron

where w0 corresponds to θ in Eq. (1.2.1). Then, the classification rule with the
perceptron is written as
{
xi ∈ C1
(
f
(
wT
xi
)
≥ 0
)
xi ∈ C2
(
f
(
wT
xi
)
< 0
) (1.2.4)
As learning in the perceptron model can be regarded as the process of learning
weights w that enable correct classification, the correct weights can be found auto-
matically by repeating the iterative update of the values. When the weights in the
k-th step of the iterative updates are given as w(k)
, the learning rule of the perceptron
model leaves w(k)
unchanged if the classification using w(k)
is correct for a certain
input data xi , and updates w(k)
by xi if it is not the case, as
⎧
⎨
⎩
w(k+1)
= w(k)
(for the case of correct classification)
w(k+1)
= w(k)
− αxi
(
f
(
w(k)
xi
)
≥ 0 for xi ∈ C2
)
w(k+1)
= w(k)
+ αxi
(
f
(
w(k)
xi
)
< 0 for xi ∈ C1
)
(1.2.5)
where α is a positive constant. For xi ∈ C1, we have
⎧
⎨
⎩
w(k+1)
= w(k)
(
f
(
w(k)T
xi
)
≥ 0
)
w(k+1)
= w(k)
+ αxi
(
f
(
w(k)T
xi
)
< 0
) (1.2.6)
If f
(
w(k) T
xi
)
< 0 holds, we have
f
(
w(k+1) T
xi
)
= f
((
w(k)
+ αxi
)T
xi
)
= f
(
w(k) T
xi + α|xi |2
)
(1.2.7)
Equation (1.2.7) suggests that the weights are updated so that the value
f
(
w(k+1)T
Xi
)
approaches positive.
Iteratively applying this learning rule to all the input data, weights w that can
correctly classify all input data are determined. This learning rule was proven to
converge in a finite number of learning iterations, called the perceptron convergence
theorem [57]. The perceptron has attracted a great deal of attention, and the first
boom of neural networks occurred with it.
In 1969, however, the limitations of the perceptron were theoretically demon-
strated [47], that it was theoretically proven that a simple single-layer perceptron
could be applied only to linearly separable problems (Fig. 1.9), which cast doubt
on its applicability to practical classification problems. The hope for the perceptron
dropped drastically, and the first neural network boom calmed down.
The weakness of the perceptron, which was only effective for linearly separable
problems, was solved by making it multilayered, but causing a new demand for a
suitable learning algorithm.

14 1 Overview
(a) Linearly separable (b) Linearly inseparable
Class 1
Class 2
Fig. 1.9 Linearly separable and inseparable data a linearly separable b linearly inseparable
In 1986, the back propagation algorithm was introduced as a new learning algo-
rithm for multilayer feedforward neural networks, as shown in Fig. 1.6 [58], which
is known as an algorithm based on the steepest descent method that modifies the
connection weights between units in the direction of decreasing the error, which is
defined as the square of the difference between the output from the output layer unit
and the corresponding teacher data as follows:
E =
1
2
nP
Σ
p=1
nL
Σ
j=1
(p
OL
j − p
Tj
)2
(1.2.8)
where
p
OL
j the output of the jth unit in the output at the Lth layer (output layer) for the
pth training pattern.
p
Tj the teacher signal corresponding to the output of the jth unit in the output layer
for the pth training pattern.
nP the total number of training patterns.
nL the total number of output units.
Let w(k)
ji be the connection weight between the ith unit of the kth layer and the
jth unit of the (k + 1)th layer in Fig. 1.6, then the back propagation algorithm
successively modifies w(k)
ji as follows:
w(k)
ji ← w(k)
ji − α
∂E
∂w(k)
ji
(1.2.9)

Fig. 1.10 Sigmoid function
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
-4 -2 0 2 4
Sigmoid(x)
Heaviside(x)
x
Here, α is a positive constant called the learning coefficient.
Since differentiation frequently appears in the back propagation, the sigmoid
function, that is nonlinear and continuously differentiable everywhere, has been used
as one of the most popular activation functions given as
f (x) =
1
1 + e−x
(1.2.10)
Figure 1.10 shows the Heaviside function and the sigmoid function. It is seen that
the latter is a smoothed version of the former. (see Sect. 2.1 for details of the back
propagation algorithm.)
In 1989, it was shown that feedforward neural networks can approximate arbitrary
continuous functions [19, 29]. However, this theoretical proof is a kind of existence
theorem, and provides little answer to important practical questions such as how large
a neural network (number of layers, number of units in each layer, etc.) should be
used, what training parameters be used, and how many training cycles are required for
convergence. Accordingly, determination of such meta-parameters is usually made
by trial and error.
With the advent of the back propagation algorithm in 1986, multilayer feedfor-
ward neural networks were put to practical use, and the application range of neural
networks was greatly expanded, resulting in the second neural network boom. It
should be noted that almost twenty years earlier than the advent of the back propaga-
tion algorithm, the prototype of the algorithm was proposed [1], but its importance
was not widely recognized at that time.
After a while, the second neural network boom that had started with the advent
of the back propagation algorithm gradually calmed down. This was due to the fact

16 1 Overview
Fig. 1.11 Development of
supercomputers
10-2
100
102
104
106
108
1010
1970 1980 1990 2000 2010 2020
Performance
Year
Cray-1
Cray-2
SX-3
ASCI Red
EarthSimulator
Roadrunner
K-computer
TaihuLight
Fugaku
(GFLOPS)
that when the scale of a feedforward neural network was increased to improve its
function and performance, the learning process became too slow or often did not
proceed at all. There were two main reasons for this: one the speed of the computer
and the other the vanishing gradient problem.
Let us consider first the speed of computers. Figure 1.11 shows the history of the
fastest supercomputers, where the vertical axis is the computation speed, defined
by the number of floating-point operations per second (FLOPS: Floating-point
Operations Per Second). The unit used here is Giga FLOPS (109
FLOPS).
It is seen from the figure that, in 1986, when the back propagation algorithm was
started, the speed of supercomputers was about 2 GFLOPS, it was 220 GFLOPS
in 1996, 280 TFLOPS (TeraFLOPS: 1012
FLOPS) in 2006, and 415 PFLOPS
(PetaFLOPS: 1015
FLOPS) in 2021.
A simple calculation suggests that the training time of a feedforward neural
network, which takes only one minute on a current computer (415 PFLOPS), took
1482 min (about one day) on a computer (280 TFLOPS) in 2006, 1,886,364 min
(about three and a half years) on a computer (220 GFLOPS) in 1996, and
207,500,000 min (about 400 years) on a computer (2 GFLOPS) in 1986. In reality,
this calculation is not necessarily true because of the effects of parallel processing
and other factors, but it still shows the speed of progress in computing speed, in other
words the slowness of the computers of old time, suggesting that it was necessary to
wait for the progress of computers in order to apply the back propagation algorithm
to relatively large neural networks.
As discussed above, the calculation speed of computers has been the big issue for
the back propagation algorithm. In addition to that, another barrier to the applica-
tion of large-scale multilayer feedforward neural networks to practical problems is
the vanishing gradient problem. This problem exists in multilayer neural networks,

where learning does not proceed in layers far away from the output layer, preventing
performance improvement by increasing the number of hidden layers. The cause of
the vanishing gradient is that the amount of correction by back propagation algorithm
as
Δw(k)
ji = −α
∂E
∂w(k)
ji
(1.2.11)
becomes small in the deeper layers (layers close to the input layer) due to the
small derivative of the sigmoid function that was most commonly employed as the
activation function. (see Sect. 2.1 for details.)
Because of these issues, feedforward neural networks, while having the back
propagation learning algorithm and the versatility of being able to simulate arbitrary
nonlinear continuous functions, were “applied only to problems that a relatively
small network could handle.”
The serious situation described above changed in 2006 as the methods to avoid the
vanishing gradient problem by layer-by-layer pretraining [4, 28] and also those for
training multilayer feedforward neural networks without the issue were proposed.
Here, we discuss how the autoencoder is used to pretrain multilayer feedforward
neural networks. The structure of autoencoder is shown in Fig. 1.12, which is a
feedforward neural network with one hidden layer, and the number of units in the
input layer is the same as the number of units in the output layer. The autoencoder
is trained to output the same data as the input data by the error back propagation
learning using the input data as the teacher data. After the training is completed,
the autoencoder simply outputs the input data, which seems to be a meaningless
operation, but in fact it corresponds to the conversion of the input data into a different
representation format in the hidden layer. For example, if the number of hidden layer
units is less than the number of input layer units, a compressed representation of the
input data is obtained.
Fig. 1.12 Autoencoder

18 1 Overview
In pretraining with autoencoders, the connection weights of the multilayer feed-
forward neural network are initialized by autoencoders. For the case of a five-layer
network shown in Fig. 1.13, three autoencoders are prepared; the autoencoder A
with the first layer of the original network as input layer and the second layer of
the original network as hidden layer, the autoencoder B with the second layer as
input layer and the third layer as hidden layer, and the autoencoder C with the third
layer as input layer and the fourth layer as hidden layer. Note that for each of these
autoencoders, the output layer has as many units as corresponding input layer. The
pretraining using the autoencoders above is performed as follows:
(1) First, the autoencoder A is trained using the input data of the original five-layer
neural network. After the training is completed, the connection weights between
the input and the hidden layers of the autoencoder A are set to the initial values
of the connection weights between the first and second layers of the original
five-layer neural network.
(2) Then, the autoencoder B is trained, where the output of the hidden layer of
the autoencoder A is used as the input data for training. After the training is
completed, the connection weights between the input and the hidden layers of
the autoencoder B are set to the initial values of the connection weights between
the second and third layers of the original five-layer neural network.
1
2
3
4
5
1
2
3
2
3
4
A
B
C
Fig. 1.13 Pretraining using autoencoders

(3) Third, the autoencoder C is trained, where the output of the hidden layer of
the autoencoder B is used as the input data for training. After the training is
completed, the connection weights between the input and the hidden layers of
the autoencoder C are set to the initial values of the connection weights between
the third and fourth layers of the original five-layer neural network.
(4) Finally, after initializing the connection weights between each layer with the
values obtained by the autoencoders in (1), (2) and (3) above, the error back
propagation learning of the five-layer feedforward neural network is performed
using the original input and the teacher data.
Thus, one can solve the vanishing gradient problem by setting the initial values of
the connection weights between layers starting from those closest to the input layer
with autoencoders.
Another factor, which made possible feedforward neural networks deeply multi-
layered, is the significant improvement in computer performance, suggesting that
learning can now be completed in a practical computing time. As a result, the restric-
tions on the construction of feedforward neural networks have been relaxed, and
the scale of the neural network can be increased according to the complexity of
the practical problem. In addition, in 2007, CUDA [39], a language using graphics
processing units (GPUs) for numerical computation, was introduced, which have
become widely used as accelerators for training and inference of feedforward neural
networks, further improving computer performance.
With the development of the pretraining method and the significant improve-
ment of computer performance above, the third neural network boom has started
with the emergence of a multilayer large-scale neural network, the so-called deep
learning. The necessity of pretraining is, however, decreasing due to improvements
of activation functions, training methods and computer performance.
The success of deep learning is owing to the development of convolutional neural
networks also, in which the units of each layer are arranged in a two-dimensional grid.
In other words, in a conventional feedforward neural network, all units in adjacent
layers are connected to each other, whereas in a convolutional neural network, a unit
in a layer is connected to only some units in the precedent layer. Figure 1.14 shows
the structure and function of a convolutional neural network. The input to the (k,l)th
unit in the pth layer, U
p
kl, is given using the outputs of units in the (p − 1)th layer
O
p−1
i, j as follows:
U
p
k,l =
S−1
Σ
s=0
T−1
Σ
t=0
h
p−1
s,t · O
p−1
k+s,l+t + θ
p
k,l (1.2.12)
where θ
p
k,l is the bias of the (k,l)th unit in pth layer, h
p−1
s,t , the weight at (s, t) in (p −
1)th layer, which, unlike the weights in a fully connected feedforward neural network,
is identical between units in the same layer, S and T are the range of contributions
to the input, and Fig. 1.14 shows the case of S = T = 3.

20 1 Overview
Fig. 1.14 Convolutional neural network
The weights hs,t can be expressed in matrix form as in Fig. 1.15 for the case
of S = T = 3. The operation of Eq. (1.2.12) with hs,t is the same as the filter
operation in image processing [21]. Figure 1.16 shows examples of filters used in
image processing, Fig. 1.16a the Laplacian mask used for image sharpening, and
both Figs. 1.16b and c for edge detection, where the direction of the edge to be
detected is different between them. Note that the convolution operation represented
by Eq. (1.2.12) is similar to feature extraction in image processing, and when the
input data is an image, it can be interpreted as an operation to extract the features of
the input image. For details on the calculation in the convolutional layer, see Sect. 2.2.
From a historical point of view, the introduction of the locality such as convo-
lutional layers into feedforward neural networks had already been done in Neocog-
nitron [18], which was inspired by the hierarchical structure of visual information
processing [31]. Figure 1.17 shows the structure of Neocognitoron. The prototype
of the current convolutional layer was proposed in 1989 [42], which are known very
useful when images are employed as input. In addition to images, convolutional
Fig. 1.15 Matrix of weights
in convolutional neural
network

Fig. 1.16 Examples of
filtering masks
Fig. 1.17 Neocognitron. Reprinted from [18] with permission from Springer
neural networks have become widely used for various multidimensional data such
as voice or speech.
The ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [59] is an
image recognition contest using ImageNet, a large image dataset of over 10 million
images. In 2012, deep learning showed dominant performance in ILSVRC [40].
Since then, deep learning has been the best performer, and the winning systems of
the contest since 2012 are given as follows:
2012 AlexNet [40]
2013 ZFNet [69]
2014 GoogLeNet [66]
2015 ResNet [26]
2016 CUImage [70]
2017 SENet [30]

22 1 Overview
As the performance improvement was known to be achieved by adding more
layers, AlexNet in 2012 above used five convolutional layers, GoogLeNet in 2014
more than 20 layers, and ResNet in 2015 more than 100 layers.
The effect of adding more convolutional layers has been verified in VGG [63],
which has a scalable structure and is considered a standard deep convolutional neural
network. It is shown that increasing the number of convolutional layers in VGG
improves the performance in classification.
In addition to image recognition, deep learning has been applied to game program-
ming. AlphaGO, a Go program using deep learning, defeated the world champion
[61]. AlphaGO has continued to evolve since then. While the original AlphaGO
used actual game records played by human players as training data, AlphaGO Zero
[62], an improved version of AlphaGO, adopted reinforcement learning, learning by
playing against other AlphaGO Zeros with the performance superior to that of its
precedent Alpha GO.
In addition to the above areas, deep learning has been applied to a wide range
of fields, including automatic driving such as traffic sign identification [10], natural
language processing such as machine translation [65] and language models GPT-3
[8] and BERT [16], and speech recognition [11, 54].
As deep learning has been applied to various fields, computational power has
been enhanced to deal with deep learning-specific processing. Large-scale and high-
performance computers based on CPUs and GPUs are used for the training process
of deep learning, which has made it possible to construct large-scale deep neural
networks. As a result, the amount of computation required for the inference process
using trained neural networks has increased rapidly. Usually, the inference process
is performed under the user’s control with computers much less powerful than those
used for training. They are called edge computing under computers embedded in
mobile devices such as smartphones, home appliances, and industrial machines.
Since real-time performance is needed in this inference process, embedded GPUs
such as Nvidia’s Jetson are often used, and accelerators that specialize in accelerating
the computation of deep learning inference are also being developed [6, 15].
Deep learning continues to develop, achieving remarkable results in various fields.
1.3 New Techniques for Deep Learning
In this section, some of new and increasingly important techniques in deep learning
are discussed.
1.3.1 Numerical Precision
First, let us study the numerical accuracy required for deep learning.

1.3 New Techniques for Deep Learning 23
It is well known that basic numbers employed in computers are binary, and there
are several formats for floating-point real numbers, among which we choose one
depending on the necessary precision level [67]. The floating-point real number is
represented by a series of binary digits, which consist of a sign part to represent to
be positive or negative, an exponent part for order, and a mantissa part for significant
digits, with the total length (number of bits) varying according to the precision of
the number. The IEEE754 standard specifies three types of real numbers, a double-
precision real number (FP64) with about 16 decimal significant digits, a single-
precision real number (FP32) with about 7 decimal digits, and a half-precision real
number (FP16) with about 3 decimal digits, which occupy 64, 32, and 16 bits of
memory, respectively. In the field of computational mechanics, FP64 is usually used,
and some problems even need quadruple-precision real numbers.
In contrast to the above, it has been shown that deep learning can suffice accuracy
even when using real numbers with relatively low precision [12–14, 25]. Although
training in deep learning usually requires higher numerical precision due to the
calculation of derivatives than when inferring with trained neural networks, it has
been shown that FP32 and FP16 are sufficient even for training.
For this reason, new low-precision floating-point real number formats are also
being used for deep learning, including BFloat16 (BF16) and Tensor Float 32 (TF32).
The former, proposed by Google, has more exponent bits than FP16 with the same
exponent bits as FP32. Since the number of bits in the mantissa part is reduced, that of
significant digits is also reduced, but the range of numbers that can express is almost
the same as that of FP32. The latter, proposed by Nvidia, has the same number of
digits in the exponent part as FP32, and the same number of digits in the mantissa
part as FP16. The TF32 format has 19 bits in total, meaning a special format whose
length is not a power of 2. The major floating-point number formats are summarized
in Fig. 1.18.
FP16
FP32
FP64
sign (1bit) exponent fraction
52bits
11bits
8bits 23bits
10bits
5bits
BF16 7bits
8bits
TF32 10bits
8bits
Fig. 1.18 Floating-point real number formats

24 1 Overview
Fig. 1.19 Pseudo-low-precision method
Since deep learning requires rather low numerical precision, it is possible to
further improve performance of deep learning by using integer [20] or fixed-point
real number formats [25], or to implement dedicated arithmetic hardware for deep
learning on field programmable gate array (FPGA).
In practice, it is difficult to know the numerical precision required for each
problem. However, a rough estimate can be made using the pseudo-low-precision
(PLP) method [50]. For example, a single-precision real number (FP32) has a
mantissa part of 23 bits out of a total length of 32 bits, but if we shift the entire
32 bits to the right by n bits and then to the left by n bits, the last n digits of the
mantissa part are filled with zeros, and the number of digits in the mantissa part is
reduced to (23 − n) bits. Figure 1.19 shows an example of PLP with 8-bit shifting,
and List 1.3.1 the sample code to verify the operation of PLP, where the int type
is assumed to be 32 bits. List 1.3.2 shows an example of the execution using gcc
4.8.5 on CentOS 7.9. Note that all the arithmetic operations above are performed as
single-precision real numbers (FP32) and the results are stored as single-precision
real numbers (23-bit mantissa), so it is necessary to reduce the precision by PLP
again immediately after each arithmetic operation.
List 1.3.1 PLP test code
#include <stdio.h>
typedef union{
float f;
int i;
}u_fi32;
int main(void){
int nsh;
float f1,f2;
u_fi32 d1,d2,d3;
f1 = 2.718281828;
f2 = 3.141592653;
for(nsh=1;nsh<23;nsh++){

d1.f = f1 ;
d1.i = d1.i >> nsh ;
d1.i = d1.i << nsh ;
d2.f = f2 ;
d2.i = d2 i >> nsh ;
d2.i = d2.i << nsh ;
d3.f = d1.f*d2.f ;
d3.i = d3.i >> nsh ;
d3.i = d3.i << nsh ;
printf(“%2d %f %f %fn”,nsh,d1.f,d2.f,d3.f);
}
return 0;
}
List 1.3.2 Results of the PLP test code (CentOS 7.9, gcc 4.8.5)
1 2.718282 3.141593 8.539734
2 2.718282 3.141592 8.539730
3 2.718281 3.141592 8.539726
4 2.718281 3.141590 8.539719
5 2.718277 3.141586 8.539673
6 2.718277 3.141586 8.539673
7 2.718262 3.141571 8.539551
8 2.718262 3.141541 8.539307
9 2.718262 3.141479 8.539062
10 2.718262 3.141357 8.538086
11 2.718262 3.141113 8.537109
12 2.717773 3.140625 8.535156
13 2.716797 3.140625 8.531250
14 2.714844 3.140625 8.515625
15 2.710938 3.140625 8.500000
16 2.703125 3.140625 8.437500
17 2.687500 3.125000 8.375000
18 2.687500 3.125000 8.250000
19 2.625000 3.125000 8.000000
20 2.500000 3.000000 7.500000
21 2.500000 3.000000 7.000000
22 2.000000 3.000000 6.000000
1.3.2 Adversarial Examples
Deep learning has shown very good performance in image recognition and is said to
surpass human ability of discrimination in some areas. However, it has been reported
that deep learning can misidentify images that can be easily identified by humans.
Goodfellow et al. [23], employing an image that should be judged to be a panda
on which a small noise is superimposed, show that the superimposed image looks
almost identical to the original image to the human eye or can be easily identified
as a panda, whereas the convolutional neural network GoogLeNet [66] judges it as
a gibbon. This kind of input data is called an adversarial example.

26 1 Overview
The mechanism by which an adversarial example occurs in a neural network can
be explained as follows.
Let the input to the neural network be x = (x1, . . . , xn)T
, and the weights wj =
(
wj1, . . . , wjn
)T
, then the input to the jth unit of the next layer can be written as
u j =
Σ
i
wji Ii = wT
j x (1.3.1)
When a small noise Δx is added to the input, the variation of u j , Δu j , is given
by
Δu j = wT
j Δx (1.3.2)
Equation (1.3.2) shows that Δu j is the inner product of wj and Δx, and therefore
the variation Δu j takes the maximum value when Δx = kwj , showing that among
various noises of similar magnitude, the variation of the input to a unit, and also the
output of the unit, becomes the largest for noise vector Δx with the specific direction,
i.e., parallel to wj .
For a well-trained multilayer feedforward neural network, we can also make the
output fluctuate greatly with small fluctuations of input as follows. When the error
function of the neural network is represented as E = E(x), the input noise vector
Δx = (Δx1, . . . , Δxn)T
, and the small positive constant ε , then adding the noise
vector Δx generated by
Δxi = ε · sgn
(
∂E
∂xi
)
(1.3.3)
to the input vector, we can increase a significant difference between the output of
neural network and the teacher data. Here, sgn(x) is defined as follows:
sgn(x) =
{
1 (x ≥ 0)
−1 (x < 0)
(1.3.4)
Adversarial examples being created as above, we discuss here how we make a
feedforward neural network that can identify an adversarial example as correctly as
possible? One method is to use the adversarial example as a regularization term [27].
Though the usual back propagation algorithm minimizes E(x), this method adds
E
(
x + Δxadv
)
as a regularization term to minimize the equation as follows:
α · E(x) + (1 − α) · E
(
x + Δxadv
)
(1.3.5)
where α is a positive constant, and Δxadv
a noise vector generated by Eq. (1.3.3).

1.3.3 Dataset Augmentation
In neural networks and deep learning, the number of training patterns is one of the
most important issues. If it is small, overtraining [27] is likely to occur. To avoid
this, it is necessary to have as many training patterns as possible. In many situations,
however, it is not easy to collect a sufficient number of training patterns as seen in
the case of medical imaging (X-ray, MRI, etc.).
For this reason, the original training patterns (images) are processed to make
new training patterns, called dataset augmentation. As an example, in the deep
learning library Keras, its ImageDataGenerator function provides images that have
been processed in various ways, such as rotation, translation, inversion, and shear
deformation, to increase the number of training patterns. Table 1.1 shows the main
parameters of ImageDataGenerator and their effects.
Even when the input data are audio data, the data augmentation described above
is performed and proved to be effective [34, 53].
Data augmentation for images is considered to be difficult in such case as a fully
connected feedforward neural network. Then, another data augmentation method,
the superimposition of noise, is studied [60], which is based on
x
input
i = (1 + ri )x
original
i , ri ∈ [−ε, ε] (1.3.6)
where ε is a small positive constant. During training, a small noise is superimposed
on each component x
original
i of the original input data xoriginal
by using a random
number generated each time, which is used as the component x
input
i of the input data
xinput
. Note that superimposing noise on the input data is reported to be effective in
preventing overtraining in various applications [17, 51, 52].
Table 1.1 Image data augmentation in Keras
Parameter of ImageDataGenerator Functionality
Rotation_range Rotation
Width_shift_range Horizontal translation
Height_shift_range Vertical translation
Shear_range Shear transform
Zoom_range Zooming
Horizontal_flip Horizontal flip
Vertical_flip Vertical flip
Fill_mode Points outside of the generated image are filled according to
the given mode

28 1 Overview
1.3.4 Dropout
Studied is a method to improve accuracy of classification by constructing multiple
neural networks and averaging or majority voting on them, called the ensemble
learning or the model averaging.
Suppose a highly idealized situation, where we have 2N + 1 neural networks,
each of which is different from the other, and that all of them have the same accuracy
of classification p
(
p > 1√
2
)
. In this case, if the classification results of each neural
network are independent, then the following equation holds for the accuracy of
classification P2N+1 by majority vote of the 2N + 1 neural networks,
P2N+1 < P2(N+1)+1, lim
N→∞
P2N+1 = 1 (1.3.7)
This is referred to as Condorcet’s theorem. In practice, the assumption that the
classification results of each classifier (e.g., neural network) are independent is unrea-
sonable, and it is often the case that many classifiers make the same misclassification
for a given input data.
To relax this, for example, in the random forests [7, 49], where ensemble learning
is introduced to decision trees and classification is done by majority voting of a large
number of decision trees, the structure of the training patterns of individual decision
trees is changed, and the parameters for discrimination are also changed among trees.
This, nevertheless, still does not result in the construction of a fully independent set
of classifiers. Even though, the method of preparing multiple classifiers is reported
to be effective in improving the accuracy of classification in many cases.
Dropout [64] is equivalent to averaging multiple neural networks while using a
single neural network. Figure 1.20 shows the schematic diagram of the dropout.
Figure 1.20a shows the original four-layer feedforward neural network. During
training, a uniform random number rnd of the range [0, 1] is generated for each
unit in each epoch, and if rnd > r for a predetermined dropout rate r(0 < r < 1),
the output of the unit is fixed to 0. This is equivalent to using a feedforward neural
network with a different structure for each epoch (Fig. 1.20b). After the training is
completed, the neural network with the original structure using all units is employed
forinference,buttheoutputoftheunitsismultipliedbythedropoutrater (Fig.1.20c).
Dropout is almost equivalent to taking the average value of many neural networks
with different structures, considered to suppress overtraining and improve accuracy
of estimation.
DropConnect [68] has also been proposed, which drops individual couplings
between units, whereas the dropout does individual units.

Fig. 1.20 Dropout
1.3.5 Batch Normalization
For data pairs that are to be used as input and teacher data for neural networks to
learn mapping, it is common to transform the data into a certain range of values,
or to process the data to align the mean and variance. Batch normalization is that
dynamically performs the above transformation also in hidden layers.
As mentioned above, transformation operations on input data of a neural network
are usually performed. Let the number of input data be n, and the pth input data
xp
=
(
x
p
1 , x
p
2 , . . . , x
p
d
)
. The maximum value, minimum value, mean, and standard
deviation of each component are, respectively, calculated as follows:
xmax
i = max
p
{
x
p
i
}
(1.3.8)
xmin
i = min
p
{
x
p
i
}
(1.3.9)
μi =
1
n
n
Σ
p=1
x
p
i (1.3.10)
σi =
⎡
l
l
√ 1
n
n
Σ
p=1
(
x
p
i − μi
)2
(1.3.11)

30 1 Overview
The most commonly employed transformation operations are the 0–1 transfor-
mation and the standardization. Assuming that the input data to the neural network
after transformation are x̃p
=
(
x̃
p
1 , x̃
p
2 , . . . , x̃
p
d
)
, the 0–1 transformation of the input
data is given by
x̃
p
i =
x
p
i − xmin
i
xmax
i − xmin
i
(1.3.12)
Similarly, the standardization of the input data is given as
x̃
p
i =
x
p
i − μi
σi
(1.3.13)
The above transformation can mitigate the negative effects of large difference in
numerical ranges between individual parameters.
The batch normalization [33] performs the same operations on the inputs of each
layer as on the input data. When the input values of the ith unit of the lth layer in
the pth learning pattern are x
p
l,i and the output is y
p
l,i , the input–output relationship is
expressed by
y
p
l,i = f
(
x
p
l,i
)
= f
⎛
⎝
Σ
j
wl
i, j y
p
l−1, j + θl,i
⎞
⎠ (1.3.14)
where θl,i is the bias of the ith unit in the lth layer, wl
i, j the connection weight between
the ith unit in the lth layer and the jth unit in the (l − 1)th layer.
The batch normalization is used to standardize the input values of each unit in a
mini-batch of size m. That is, if the input values of the units in each training pattern
in a mini-batch are
{
xk+1
l,i , xk+2
l,i , . . . , xk+m−1
l,i , xk+m
l,i
}
, then we employ as input values
the transformations
{
x̃k+1
l,i , x̃k+2
l,i , . . . , x̃k+m−1
l,i , x̃k+m
l,i
}
given as
x̃
p
l,i = γ
x
p
l,i − μl,i
σl,i
+ β, (k + 1 ≤ p ≤ k + m) (1.3.15)
Here, both γ and β are parameters that are updated by learning, and μl,i and σl,i
are, respectively, calculated by
μl,i =
1
m
k+m
Σ
p=k+1
x
p
l,i (1.3.16)
σl,i =
⎡
l
l
√ 1
m
k+m
Σ
p=k+1
(
x
p
l,i − μl,i
)2
+ ε (1.3.17)

where ∈ is a small constant that prevents division by zero.
It is noted that the batch normalization has been shown to be effective in improving
learning speed in many cases and has become widely employed.
1.3.6 Generative Adversarial Networks
Generative adversarial networks (GANs) [24], one of the most innovative techniques
developed for deep learning, consist of two neural networks: the generator and the
discriminator.
The former is a neural network to generate data that satisfies certain conditions,
and the latter that to judge whether the input data is true data or not. The generator
takes arbitrary data (for example, arbitrary data generated by random numbers) as
input and is trained to output data that satisfies certain conditions. The discriminator
is trained to correctly discriminate between data output by the generator (called fake
data) and data prepared in advance that truly satisfies certain conditions (called real
data). In the early stages of learning, the fake data output by the generator can be
easily detected as “fake” by the discriminator, but as the training of the generator
progresses, it may output fake data that cannot be detected even by the discriminator.
The goal of GAN is to build a generator that outputs real data that satisfies certain
conditions, or those that cannot be detected as “fake data” by the discriminator. The
training process of GAN can be summarized as follows:
(1) Two neural networks, the generator and the discriminator, are prepared as shown
in Fig. 1.21. The number of output units of the generator should be the same as
the number of input units of the discriminator, and the number of output units
of the discriminator is 1 because the discriminator determines whether the input
data is real or fake only.
Fig. 1.21 Generator and discriminator in GAN

32 1 Overview
(2) Prepare a large number of true data that satisfies certain conditions, called real
data.
(3) The generator takes a lot of arbitrary data generated by random numbers as
input to collect a lot of output data of the generator called fake data (Fig. 1.22).
(4) Training of the discriminator is performed using the real data prepared in (2) and
the fake data collected in (3) as input data. As shown in Fig. 1.23, the teacher
data should be real (e.g., 1) for real data and fake (e.g., 0) for fake data. In this
way, the discriminator is trained to correctly discriminate between real and fake
data.
(5) After the training of the discriminator, the generator is trained by connecting
the generator and the discriminator in series, as shown in Fig. 1.24. The input
Fig. 1.22 Generation of
fake data in GAN
Noise (Arbitrary Input)
Output (Fake Data)
Fig. 1.23 Training of
discriminator in GAN
Fake Data
(Generator Output)
Fake
Real Data
Real
Teacher Signal

generator in GAN
Noise (Generator Input)
Real
Teacher Signal
Discriminator
(weights fixed)
Generator
data to the connected network is those of the generator, and the teacher data is
“real.” The back propagation algorithm is used to train the connected network,
where all the parameters (e.g., connection weights) of the discriminator part are
fixed, and only the parameters of the generator part are updated. In this way, the
generator is trained to output data that is judged to be real by the discriminator.
(6) Return to (3) after the training of the generator is completed.
Repeating the training of the discriminator and the generator alternately, the
trained generator finally becomes able to output data that is indistinguishable from
the real data by the discriminator. The GAN can use convolutional neural networks
for the generator and the discriminator.
The training process of GAN is written as a min–max problem as follows [24]:
min
G
max
D
V (D, G) = Ex∼pd(x)
⎡
log D(x)
⏋
+ Ez∼pz(z)
⎡
log(1 − D(G(z)))
⏋
(1.3.18)
where V (D, G) is the objective function, D and G the discriminator and the gener-
ator, respectively, D(x) is the output of the discriminator for input data x, G(z) the
output of the generator for input data z, pd(x) the probability distribution of x, and
pz(z) the probability distribution of z. The training process (4) above is understood
to be the max operation of the left-hand side of Eq. (1.3.18), that is, the maximization
of the right side by updating the discriminator, while the training process (5) above
is the min operation on the left-hand side of Eq. (1.3.18), meaning the minimization
of the second term on the right-hand side by updating the generator.

34 1 Overview
GANs based on convolutional neural networks are used for various problems
including generation of images. For example, it can generate an image of a dog from
a noisy image with random numbers as input. However, it is known to be difficult to
control what kind of dog image the generator creates.
Conditional generative adversarial network (CGAN) [48] is a modified version
of GAN that can control the images generated by the generator, where both the
generator and the discriminator accept the same input data as in GAN as well as data
about the attributes of the data called label data. The training process of CGAN is
summarized as follows:
(1) Two neural networks, the generator and the discriminator, are prepared as shown
in Fig. 1.25. Both the generator and the discriminator use information about the
attributes of the data (Label data) as input data in addition to the standard GAN
input data. Thus, the number of input units of the discriminator is the sum of
two numbers: one is the number of output units of the generator and the other
the number of label data. The number of output units of the discriminator is 1
because the discriminator determines whether the input data is real or fake only.
(2) Prepare a large number of true data that satisfies certain conditions, called real
data, which are accompanied by label data indicating their attributes.
(3) The generator takes a lot of arbitrary data generated by random numbers and
their attribute information (label data) as input to collect a lot of output data
(fake data) (Fig. 1.26).
(4) Training of the discriminator is performed using the real data prepared in (2)
and the fake data collected in (3) as input data. The corresponding label data
are also used as input. As shown in Fig. 1.27, the teacher data are set real (e.g.,
1) for real data and fake (e.g., 0) for fake data, respectively. In this way, the
discriminator is trained to correctly discriminate between real and fake data.
(5) After the discriminator is trained, the training of the generator is performed by
connecting the generator and the discriminator in series, as shown in Fig. 1.28.
Input Data
(Noise)
Label Input Data
Label
Generator Discriminator
Fig. 1.25 Generator and discriminator in CGAN

Fig. 1.26 Generation of
fake data in CGAN
Input Data
(Noise)
Label
Output (Fake Data)
Label
(Fake Data)
Label
(Real Data)
Fake Data
(Generator Output)
Real Data
Fake Real
Teacher Signal
Fig. 1.27 Training of discriminator in CGAN
The input data to the connected network are the input data of the generator
and its label data with the teacher data being “real.” The back propagation
algorithm is used to train the connected network, where all the parameters (e.g.,
connection weights) of the discriminator part are fixed, and only the parameters
of the generator part are updated. In this way, the generator is trained to output
data judged to be real by the discriminator.
(6) Return to (3) after the training of the generator is completed.

36 1 Overview
generator in CGAN
Discriminator
(weights fixed)
Generator
Real
Teacher Signal
Input Data
(Noise)
Label
Label
(Fake Data)
The learning process of CGAN is also formulated to be a min–max problem as
follows [48]:
min
G
max
D
V (D, G) = Ex∼pd(x)
⎡
log D(x|y)
⏋
+ Ez∼pz(z)
⎡
log(1 − D(G(z|y)))
⏋
(1.3.19)
where y is the attribute information (Label data). Equation (1.3.19) can be regarded
as the modified version of Eq. (1.3.18) conditioned with respect to y.
GANs have attracted much attention, especially for their effectiveness in image
generation and speech synthesis, and various improved GANs have been proposed.
Researches on them are still active: for example, DCGAN [55] using a convolutional
neuralnetwork,InfoGAN[9]withanimprovedlossfunction,LSGAN[44]withaloss
function based on the least square error, CycleGAN [71] with doubled generators and
discriminators, WGAN [2] with a loss function based on the Wasserstein distance,
ProgressiveGAN [35] with hierarchically high resolution, and StyleGAN [36] with
an improved generator.
1.3.7 Variational Autoencoder
It is known that the variational autoencoder performs the similar function as the
generative adversarial network (GAN) described in Sect. 1.3.6.

Figure 1.29 shows the basic schematic diagram of the autoencoder. Let the number
of training data be N, the kth training data (input) xk
=
(
xk
1 , xk
2 , . . . , xk
n
)T
, the
encoder output yk
=
(
yk
1 , yk
2 , . . . , yk
m
)T
(m < n), and the decoder output x̃k
=
(
x̃k
1 , x̃k
2 , . . . , x̃k
n
)T
. Then, the objective function E to be minimized in the training
process of the autoencoder is given by
E =
1
2
N
Σ
k=1
l
l
lx̃k
− xk
l
l
l
2
(1.3.20)
Here, it is assumed that the input data are used as the teacher data also. This leads
to that the output yk
of the encoder is considered to be a compressed representation
of the input data xk
.
Unlike the conventional autoencoders, the variational autoencoders [37, 38] learn
distributions of probability. Figure 1.30 shows a schematic diagram of the operation
of a variational autoencoder. The encoder is assumed to represent the probability
distribution of Eq. (1.3.21). Here, z = (z1, z2, . . . , zm)T
is called a latent variable and
is usually much lower dimensional (m « n) than the input x = (x1, x2, . . . , xn)T
.
Note that N
(
z|μ, σ2
)
is a multidimensional normal distribution with mean μ =
(μ1, μ2, . . . , μm)T
and variance σ2
(standard deviation σ = (σ1, σ2, . . . , σm)T
).
qφ(z|x) = N
(
z|μ, σ2
)
(1.3.21)
Fig. 1.29 Encoder and
decoder in an autoencoder
Decoder
Encoder
Input
Output
Code

Random documents with unrelated
content Scribd suggests to you:

JALOKIVIKAUPPIAS. Serviteur, herrani! Rouva Staabi käski minun
tulla tänne; olen jalokivikauppias.
LEERBEUTEL. Näyttäkääpäs, mitä teillä on! Nämä miellyttävät
minua enimmin. Kenties kreivi ostaa ne, mutta nyi ei hänellä ole
aikaa niitä tarkastaa, sillä me odotamme tänne raatiherroja.
Tahdotteko jättää ne tänne huomiseen saakka, niin saatte silloin
vastauksen?
JALOKIVIKAUPPIAS. Minä tuon tänne jalokivet huomisaamulla.
LEERBEUTEL. Monsieur! Minä annan teille paremman neuvon.
Pysykää vain poissa, koska niin kovin pelkäätte! Täällä kävi äskettäin
toinenkin tarjoomassa meille tavaroitaan ja olisi jättänyt näytteeksi
niin monta kuin vain olisimme halunneet. Mutta minä annoin hänen
mennä koska kerran sama vaimo oli puhunut minulle teistä.
JALOKIVIKAUPPIAS. Oi, herra hovimestari! En minä sitä
tarkoittanut!
LEERBEUTEL. Ei, monsieur. En tahdo johdattaa teitä enkä ketään
muutakaan kiusaukseen. Ehkä nukkuisitte yönne levottomasti, jos
uskoisitte pari jalokiveä pyhän roomalaisen valtakunnan
palatsikreiville.
JALOKIVIKAUPPIAS. Oi, älkää suuttuko, kunnianarvoisa herra
hovimestari!
LEERBEUTEL. En minä ollenkaan suutu, kehun vain
varovaisuuttanne.
JALOKIVIKAUPPIAS. Oi, teidän jalosukuisuutenne!

LEERBEUTEL. Monsieur, minä sanon sen teille, en ole ollenkaan
vihoissani, mutta te ette myöskään voi moittia minua siitä, että teen
kauppoja kenen kanssa tahansa.
JALOKIVIKAUPPIAS. Oi, jalosukuinen herra hovimestari!
LEERBEUTEL. Tässä saatte jalokivenne. Toivon, etteivät ne minun
käsissäni ole pilaantuneet.
JALOKIVIKAUPPIAS. Kautta kunniani, minä en ota niitä takaisin.
Pyydän nöyrimmästi, pitäkää ne huomiseen saakka.
LEERBEUTEL. Minä en ota niitä!
JALOKIVIKAUPPIAS. Pyydän ainoastaan, älköön kreivi olko minulle
epäsuosiollinen.
(Menee).
LEERBEUTEL. Kuulkaas, monsieur, koska huomaan, ettette tehnyt
tätä epäluulosta, niin saatte sitte jättää ne tänne!
JALOKIVIKAUPPIAS. Kiitän teitä, hovimestari.
LEERBEUTEL. Voitte tulla tänne huomisaamuna kello yhdeksän
JALOKIVIKAUPPIAS. Kuten käskette!
Kohtaus 6.
3 raatiherraa. Leerbeutel.

1 RAATIHERRA. Nöyrin palvelijanne! En tiedä, kuulutteko
palatsikreivin seurueeseen.
LEERBEUTEL. Minä olen hänen hovimestarinsa, teidän
palvelijanne. Palatsikreivi on ottanut itselleen vapauden kutsua teidät
illallisille, ettehän siitä pahastu.
2 RAATIHERRA. Me olemme palatsikreivin alamaisia palvelijoita ja
me kiitämme meille osoitetusta kunniasta.
LEERBEUTEL. Hänen isänsä, vanha palatsikreivi sanoo nuorena
ulkomailla matkustellessaan aina kutsuneensa pitoihin kaupungin
esivallan. Samoin toivoo hän rakkaan poikansa tekevän.
Raatiherroihin tutustuminen on sekä opiksi että hyödyksi. Ei pidä
matkustaa ulkomaille katsomaan vain taloja ja rakennuksia vaan
keskustelemaan kunnon ihmisten kanssa.
1 RAATIHERRA. Toivomme jollakin lailla huvittavamme hänen
armoansa. Mutta mitä hyötyä on meidän seurustelustamme niin
ylhäiselle, oppineelle ja paljon matkustaneelle herralle? Herra
hovimestarin kohteliaisuudesta, viisaudesta ja hienosta käytöstavasta
päättäen voimme arvata millainen teidän korkea-arvoisa isäntänne
on.
LEERBEUTEL. Kiitän teitä nöyrimmästi siitä, että ajattelette
minusta niin hyvää. Minun ansioni ovat hyvin pienet ja isäntäni… sen
pahempi… Oi, hyvät herrat, minä en voi sitä sanoa.
1 RAATIHERRA. Miksi ette, herra hovimestari? Eihän herrallenne
vain ole tapahtunut mitään onnettomuutta.

LEERBEUTEL. Hyvät herrat! Luonto on hyvin oikullinen jakaessaan
lahjojansa. Mitä ruumiiseen tulee niin herraltani ei puutu mitään, sillä
siinä on kaikki kuten olla pitää, hänellä on hyvä terveys, hän on
sitäpaitsi rikas ja hänellä on hyvä toimeentulo. Mutta, hyvät herrat…
(Leerbeutel itkee katkerasti). Oi, minun sydämmeni on murtua sitä
ajatellessani.
1 RAATIHERRA. Ehkä palatsikreivi on hiukan huikentelevainen,
useilla nuorilla herroillahan on se vika.
LEERBEUTEL (itkee taas).
1 RAATIHERRA. Mutta se katoaa ajan pitkään.
LEERBEUTEL. Ei, herrani, oi, jospa hän olisikin hiukan raju ja
huikentelevainen, sillä se on hyvä merkki nuorista ihmisistä.
1 RAATIHERRA. Ehkä hän on taipuvainen synkkämielisyyteen.
LEERBEUTEL. Oi, jospa hän olisikin hiukan synkkämielinen, sillä
siinä voi olla jotain hyvääkin.
1 RAATIHERRA. Ehkä hän on liian intohimoinen rakastelija ja
antaa naisten vietellä itseänsä.
LEERBEUTEL. Oi, jospa hän olisikin rakastelija, siliä
rakastelemisesta johtuu paljon hyvää, jos pahaakin.
1 RAATIHERRA. Sitte emme voi tietää mikä häntä vaivaa. Ehkä
hän kohtelee alamaisiaan ankarasti?
LEERBEUTEL. Ei hän ole ankara. Jospa hän olisikin hiukan ankara,
sillä se saattaa toisinaan olla sellaisille ihmisille hyödyksi.

1 RAATIHERRA. Sitte mahtaa hänen armollaan olla päähänpistoja.
LEERBEUTEL. Ei, hyvät herrat. Ei suuremmassa eikä
pienemmässäkään määrässä. Hänellä ei ole mitään päähänpistoja.
Kun hänet näette, niin voitte päättää, millainen hän on, luulette
hänet pikemmin talonpojaksi kuin palatsikreiviksi. Hän on kuin
puupölkky, ei ole hänellä älyä eikä muistoa ja kaikki hänen herra
isänsä puuhat ovat menneet myttyyn, (itkee taas). Oi, sinä jalo,
vanha palatsikreivi, sydämmeni on murtua ajatellessani niitä
suolaisia kyyneleitä, joita olet vuodattanut, niitä huokauksia, jotka
ovat rinnastasi lähteneet! Tämä vanha kunnon herra on tehnyt
lapsensa tähden kaiken voitavansa, hän on valinnut hänelle maan
paraimmat kotiopettajat, paraimmat harjoitusmestarit, hän on
antanut hänen vihdoin matkustaa ulkomaille mutta kaikki on ollut
turhaa työtä.
1 RAATIHERRA. Kuinka vanha palatsikreivi nyt on?
LEERBEUTEL. Yhdeksäntoista vuotta.
1 RAATIHERRA. Kyllä hänestä sitte, herra hovimestari, vielä voi
paljon toivoa, sillä siitä on nähty paljon esimerkkejä.
LEERBEUTEL. Ha, ha, jospa asian laita olisi niin! Mutta anteeksi,
hyvät herrat, minä jätän teidät nyt hetkeksi! Lähden saattamaan
herraa alas hänen huoneestaan.
1 RAATIHERRA. Nöyrin palvelijanne!
Kohtaus 7.
Raatiherrat kahdenkesken.

1 RAATIHERRA. Kovin herttainen mies tämä hovimestari, eikö niin,
herra virkaveli! Ansioittensa puolesta tulisi hänen olla sitä, mitä
hänen herransa on.
2 RAATIHERRA. Olen aivan samaa mieltä; sillä olenpa tosiaankin
kovin ihastunut häneen.
3 RAATIHERRA. Olen hyvin utelias näkemään, onko nuori
palatsikreivi todella niin tyhmänsekainen.
1 RAATIHERRA. Kyllä mahtaa vanhemmilla olla surua
huomatessaan, ettei mikään kuri pysty lapseen. Mutta kas tuolta hän
varmaan tuleekin!
Kohtaus 8.
Palatsikreivi. Leerbeutel. Raatiherrat.
1 RAATIHERRA. Alamaisimmat palvelijanne! Kiitämme teitä,
palatsikreivi, armosta, jonka olette meille osoittanut kutsuessanne
halvat palvelijanne tänne.
2 RAATIHERRA. Jos olisimme tienneet palatsikreivin tulosta, niin
olisi kunniamme vaatinut jo aikoja sitten käymään alamaisimmalla
kunniatervehdyksellä sekä onnittelemaan teidän armoanne tulonne
johdosta.
PALATSIKREIVI. Eikös teillä, reilut miehet, ole tupakanpurua,
raukasee niin riivatusti.
1 RAATIHERRA (hiljaa). Voi taivas, onko tämä palatsikreivin
tervehdys!

LEERBEUTEL. Hyvät herrat. Palatsikreivi oli makuulla ja nukahti,
maattuansa näin iltapäivällä on hän niin torruksissa, että tulee
tajuihinsa vasta puolen tunnin kuluttua. Pyydän nöyrimmästi, olkaa
hyvä ja istukaa, silloin myöskin herrani istuu!
1 RAATIHERRA. Kuinka tämä taitava mies osaakaan peitellä
herransa vikoja!
LEERBEUTEL. Hyvät herrat, suvaitkaa, istukaa, sillä hänen
armonsa ei istu ennen!
(Palatsikreivi istuutuu ensiksi, sitten muut, Leerbeutel
jää seisomaan hänen tuolinsa viereen).
1 RAATIHERRA. Teidän armonne on joutunut hyvin
epäterveelliselle paikkakunnalle. Parasta että teidän armonne alussa
on hiukan varovainen.
(Palatsikreivi röykäisee).
LEERBEUTEL. Palatsikreivillä on hirveän huono vatsa ja hän pyytää
nöyrimmästi anteeksi, että hän teidän läsnäollessanne käyttää
hyväkseen mukavuuksiansa; sillä pitkiin aikoihin ei hän ole saanut
ilmaa sisäänsä ja sentähden ottaa hän itsellensä oikeuden tehdä
sellaista, jota hän ei koskaan tekisi, jollei äärimmäinen hätä häntä
siihen pakoiltaisi.
1 RAATIHERRA. Teidän armonne käyttäköön vain hyväkseen
vapauttansa, sillä terveys on kallein asia maailmassa. Onko teidän
armoanne muuten jo kauvankin vaivannut tuollainen umpitauti?
PALATSIKREIVI. Kysykää hovimestariltani!

1 RAATIHERRA (hovimestarille). Onko hänen armoansa jo
kauvankin vaivannut tällainen tauti.
LEERBEUTEL. Jo muutama vuosi.
1 RAATIHERRA. Minulla on vallan erinomaisia vatsatippoja, jos
teidän armonne suvaitsee niitä koittaa, vakuutan, ettei mikään ole
vatsalle terveellisempää.
PALATSIKREIVI. Eihän kahdestatoista sellaisesta lääkepullosta tule
edes korttelin pulloa täyteen. Ei sillä saa janoansa sammumaan.
LEERBEUTEL. Hyvät herrat, palatsikreivi ei ole tottunut tippoihin.
Hän käyttää vain dekoktia, sitä hän juo isot määrät. Hän luuli teidän
tarkoittavan dekoktia.
1 RAATIHERRA. Teidän palatsikreivillinen armonne, en tarkoittanut
dekoktia. Vain kymmenen tippaa kerrallaan.
1 RAATIHERRA. Tunteeko herra hovimestari näitä tippoja?
LEERBEUTEL. Minä repostelen hiukan lääketieteenkin alalla. Niin,
minä tunnen heti hajusta, mitä lajin lääkkeet ovat. Tämä on
väkevätä liuvosta, sitä ei voi kerralla nauttia enempää kuin
kymmenen tippaa.
1 RAATIHERRA. Mutta kuinka kaupunkimme muuten miellyttää
teidän palatsikreivillistä armoanne?

LEERBEUTEL. Palatsikreivi tietää, että minä jo hiukan kävin
katselemassa kaupunkia, siksi on minun velvollisuuteni kertoa siitä,
sillä hän itse ei ole vielä nähnyt mitään. Minusta täällä on monta
kaunista ja kallista rakennusta.
1 RAATIHERRA. Niin, kaupunki on kyllä kaunis. Se on kasvanut
muutamassa vuodessa. Jos palatsikreivi tahtoo käydä katsomassa
kaupunkia, niin tarjoudumme alamaisesti oppaaksi kaikkialle.
LEERBEUTEL. Kohteliaisuutenne on niin suuri, ettei hänen
armonsa kiireessä löydä vastaukseksi kyllin voimakkaita sanoja. Niin,
hänen vaitiolonsakin osoittaa kuinka syvästi hänen sydämmensä on
liikutettu.
1 RAATIHERRA. Sitä ei voi ollenkaan sanoa kohteliaisuudeksi, sillä
meidän velvollisuutemme on mikäli mahdollista huvittaa
palatsikreiviä.
LEERBEUTEL. Hyvät herrat! Palatsikreivi ei puhu montaa sanaa,
mutta hän ajattelee sitä enemmän. Sen on hän perinyt isäherraltaan,
sillä kun joku tekee hänen isälleen hyvänteon, ei hän kiitä sanoilla,
vaan osoittaa kiitollisuuttansa teoilla.
1 RAATIHERRA. Onko teidän armonne isäherra vielä hyvissä
voimissa?
LEERBEUTEL. Hyvät herrat, saan teille ilmoittaa, ettei viime
postissa saapunut vanhalta palatsikreiviltä kellekään muulle kirjettä
paitsi minulle. Sentähden on hänen armonsa hiukan huonolla
tuulella. Kiitän muuten hänen armonsa puolesta kohteliaasta

kysymyksestä. Vanha palatsikreivi niinkuin palatsikreivitärkin ovat
hyvissä voimissa.
1 RAATIHERRA. Vai niin, elääkö teidän armonne rouva äitikin?
LEERBEUTEL. Ha, ha, hyvät herrat, huomaattehan, että herrani on
huonolla tuulella sen johdosta, ettei hän viime postissa saanut
yhtään kirjettä kotoaan. Ah, rauhoittukaa teidän armonne, ensi
postissa saatte taas kirjeitä, enkä minä saa ollenkaan.
ISÄNTÄ (tulee). Nyt on kaikki valmiina, herra hovimestari, ja ruoka
on pöydällä jos suvaitsette astua huoneeseen!
(Palatsikreivi tahtoo kulkea edellä, mutta Leerbeutel pidättää
häntä takinliepeestä ja pakoittaa muut käymään edellä).
LEERBEUTEL. Isäntä, ovatko jo tilaamani soittajat tulleet?
ISÄNTÄ. Ovat. Saatte kuulla kaupungin parainta musiikkia.
LEERBEUTEL. Antaa niiden soittaa muutamia kauniita kappaleita
sillä aikaa kuin herrat ovat aterialla!
(Isäntä pyytää soittajia soittamaan. Soitetaan valituita
kappaleita. Tanssija tulee esittäen kauniin tanssin).
Kolmas näytös.
Kohtaus 1.

Rouva Staabi. Isäntä.
ROUVA STAABI. Täällä on tavattoman hiljaisia tänään. Eikä ihmisiä
näy missään vaikka kello jo on yhdeksän. Enhän voi uskoa että kaikki
vielä nukkuvat. Minun täytyy koputtaa isännän huoneen ovelle.
ISÄNTÄ (tulee puettuna yönuttuun ja henkseleihin, hieroen
silmiään).
Huomenta! Tepäs olette tänään aikaisin liikkeellä.
ROUVA STAABI. Onko nyt aikaista? Kello on yhdeksän.
ISÄNTÄ (haukoitellen). Hyvänen aika, onko kello jo yhdeksän? Sitä
en olisi uskonut. Joimme eilen jotenkin vahvasti, niin että pääni on
kuin rikkirevitty. Palatsikreivi kestitsi meitä niin, että olimme kaikki
hyvässä hutikassa. Ja sen sanon, että hänen hovimestarinsa on
herra kiireestä kantapäähän ja niin armollinen, niin armollinen. Voi,
rouva Staabi, en voi sitä kuvailla! Tämä kunnon mies täytti itse lasini
ja kehoitti minua juomaan palatsikreivin maljan. Uskokaa minua,
nuoresta palatsikreivistä tulee vielä hyvä herra! Hän joi toisen maljan
toisensa jälkeen ja kilisteli vanhojen raatiherrojen kanssa ja sen
sanon teille, rakas rouva Staabi, että hän lopulta kuitenkin joutui
alakynteen. Palatsikreivi on vielä nuori mies eikä ole vielä
kahdenkaankymmenen vuoden vanha, kuinka hän siis voisi kestää
juomingeissa miesten seurassa, jotka ovat olleet raadissa jo monta
vuota.
ROUVA STAABI. Millainen herra hän muuten on?
ISÄNTÄ. Kovin hiljainen. Pöydän ääressä en kuullut hänen sanovan
luotua sanaa, mutta hovimestari johti sen sijaan yksin puhetta.

ROUVA STAABI. Niinpä niin, hovimestari näyttää hyvin kohteliaalta
mieheltä.
ISÄNTÄ. En ole koskaan nähnyt sellaista miestä. Kiitollisuudella
olen häntä aina muisteleva. Mutta anteeksi, minun täytyy mennä
hiukan peseytymään ja kampaamaan tukkaani, sillä, käyttääkseni
hienoa kieltä, nousinpa vasta äsken vuoteeltani.
Kohtaus 2.
Jalokivikauppias. Soittaja. Rouva Staabi.
ROUVA STAABI. Palvelijattarenne, herra jalokivikauppias. Olette
ehkä samoilla asioilla kuin minäkin.
JALOKIVIKAUPPIAS. Niin, hovimestari pyysi minua tulemaan tänne
tähän aikaan. Madame, kiitän teitä muuten siitä, että suosittelitte
minua hovimestarille, sillä luulenpa hänen ostavan ne kaksi jalokiveä,
jotka hänelle eilen toin.
SOITTAJA. Minulla myöskin on häneltä saatavaa eilisestä soitosta.
JALOKIVIKAUPPIAS. Olikos siellä hauskaa?
SOITTAJA. Oli. Koko soittokunta soitteli. Koko raati aterioitsi.
Onkos teilläkin, miekkosilla, asiaa palatsikreivin palvelusväelle?
JALOKIVIKAUPPIAS. Niin, minulla on saatavaa kahdesta
jalokivestä.
ROUVA STAABI. Ja minulla kolmestakymmenestä kyynärästä
broderieriä.

SOITTAJA. Minulla ei ole läheskään niin suuria saatavia kuin teillä.
Te, miekkoset, ansaitsette yhdessä tunnissa niin paljon kuin
meikäläiset ammattimiehet koko vuodessa.
JALOKIVIKAUPPIAS. Niin, monsieur, ei joka päivä lennä suuhun
tällaisia paisteja.
ROUVA STAABI. Se on varma se.
SOITTAJA. Mutta eikö vielä kukaan palvelijoista ole jalkeilla?
Minulle on aika rahaa ja pitäisi vielä tästä ehtiä erään suutarin
häihin. Siellä otetaan vieraat vastaan trumpeetin soitolla. Kas tuolta
tulee isäntä, kysykäämme häneltä!
Kohtaus 3.
Jalokivikauppias. Soittaja. Rouva Staabi. Isäntä.
SOITTAJA. Kuulkaas, isäntä, toimittakaa tänne joku palatsikreivin
palvelijoista.
ISÄNTÄ. En ole tänään nähnyt vielä yhtä ainoatakaan.
JALOKIVIKAUPPIAS. Kyllähän jo jonkun pitäisi olla hereillä.
ISÄNTÄ. Sitä minäkin. Menenpä lakeijain kamariin heitä
ravistelemaan, jolleivät vielä ole heränneet.
(Menee).
ROUVA STAABI. Sehän on kauheata, että palvelijat uskaltavat
maata niin kauvan.

SOITTAJA. Vasta myöhään yöllä pääsivät he levolle tosin hyvässä
hutikassa, mutta tällainen makaaminen menee jo liian pitkälle.
ISÄNTÄ (palaten). Mitä hemmettiä tämä on? Kamarissa ei ollut
ketään. Taitavat olla hovimestarin luona, mutta en ole tänään vielä
kuullut hänen oveansa avattavan.
JALOKIVIKAUPPIAS. Juoskaa heti katsomaan! Oh, minä pelkään,
että tässä on piru pelissä.
ISÄNTÄ (mennen ja tullen taas). Voi, taivaan talivasara, en
hovimestarinkaan huoneessa tavannut ketään!
JALOKIVIKAUPPIAS. Ah, sappeni jo kiehuu minussa!
ROUVA STAABI. Oih, oih, minä vapisen!
ISÄNTÄ. Lähden kurkistamaan palatsikreivin huoneeseen. Ha, ha,
ha, oikein sydämmeni keventyi. Minä näin, että hän makaa vielä.
JALOKIVIKAUPPIAS. Hyvä isäntä, minä vakuutan, että minä
pelkäsin äsken kuin jänis.
ISÄNTÄ. Kiihtyneessä mielentilassa ollessaan kuvittelee ihminen
itselleen mitä tahansa.
ROUVA STAABI. Synti ja häpeä, kun epäilin kunnon ihmisiä.
Kohtaus 4.
Pietari. Edelliset.

PIETARI (tulee). Voi, isäntä, mitä on tapahtunut! Kaikki kolme
hevosta on viety meidän tallista.
ISÄNTÄ. Viety? Mitäs sinä sanot? Oletkos hullu?
PIETARI. Isäntä, se on ihan totinen tosi.
ISÄNTÄ. No sun taivaan talivasara! Jollen itse olisi nähnyt
palatsikreivin nukkuvan, niin luulisin, että tässä on paha pelissä.
JALOKIVIKAUPPIAS. Herra isäntä, älkää siinä enää jaaritelko, vaan
menkää heti herättämään kreiviä ja sanokaa, että hänen palvelijansa
ovat poissa niinkuin teidän hevosennekin. Sillä me tahdomme tietää
koko asianlaadun, ottakoon kreivi sen vastaan suosiollisesti tai
epäsuosiollisesti.
ISÄNTÄ (aukaisee kamarinoven ja huutaa): Teidän armonne,
teidän armonne, teidän armonne!
Kohtaus 5.
Palatsikreivi puettuna yönuttuun ja tohveleihin. Edelliset.
PALATSIKREIVI (ojennellen jäseniään). Tahdotteko puhutella
minun?
ISÄNTÄ. Pyydän nöyrimmästi anteeksi, että olen tehnyt teidän
armonne levottomaksi, mutta…
PALATSIKREIVI. Menkää minun hovimestarini tykö.
JALOKIVIKAUPPIAS. Armollinen herra, me emme tiedä, missä hän
on. Hän käski minun tulla tänne tähän aikaan, mutta…

PALATSIKREIVI. Johan minä sanoin, menkää hovimestarini tykö!
JALOKIVIKAUPPIAS. Teidän armonne, hovimestaria ei ole missään.
PALATSIKREIVI. No käskekää tänne sitte kamaripalvelija!
ISÄNTÄ. Kamaripalvelija on poissa.
PALATSIKREIVI. No sitte mahtaa hän olla hovimestarin tykönä,
perhana soikoon, menkää kaikki tyyni hovimestarin tykö!
ISÄNTÄ. Hovimestari, kamaripalvelija, lakeijat, hevoset kaikki ovat
poissa.
PALATSIKREIVI. Minkäs minä sille mahdan!
JALOKIVIKAUPPIAS. Jos herra itse maksaa minun jalokiveni, niin
saavat muut olla vaikka kymmenen peninkulman päässä.
ROUVA STAABI. Ja minun kolmekymmentä kyynärää broderieriäni!
SOITTAJA. Ja minun musiikini!
ISÄNTÄ. Ja minulle ruuasta, juomasta, trahtamenteista ja muusta,
mitä olen menettänyt.
PALATSIKREIVI. Mitäs minua liikuttaa teidän pörinänne? Menkää
senkin hirtehiset hovimestarin tykö.
ISÄNTÄ. Missäs hovimestari sitten on?
PALATSIKREIVI. Mikä tuhma tolvana, nousinhan juuri äsken ja
vielä häntä kysyy missä hovimestari on!

JALOKIVIKAUPPIAS. Herra isäntä! Minä vaadin tämän henkilön
vangittavaksi, kunnes olen saanut maksuni.
ISÄNTÄ. Teidän armonne saa jäädä tänne kunnes kaikki olemme
saaneet maksumme. Minä huomaan, että kaikki teidän palvelijanne
ovat olleet yksissä juonissa ja ovat sitte pötkineet tiehensä;
kirjoittakaa nyt isällenne, että hän lähettää muutama tuhat riksiä
teidän lunnaiksenne.
PALATSIKREIVI. Isällenikö? Ei hänellä ole ristin ropoa millä
maksaisi tämän vuotisen maaveronsa.
ISÄNTÄ. Mitä? Ettekö ole palatsikreivi?
PALATSIKREIVI. Se voit itse olla. Minä olen Pekka Niilonpoika
Viikistä.
JALOKIVIKAUPPIAS. Ettekö ole palatsikreivi?
PALATSIKREIVI. Kälmi sanokoon minusta sellaista!
ISÄNTÄ. Mistäs sitte saitte kaiken palvelusväen, minkä eilen toitte
tänne?
PALATSIKREIVI. Kysykää hovimestarilta! Lempo soikoon, mistäs
minä sen tiedän. Näin hänet eilen ensi kerran tullessani kaupunkiin
tervaa ostamaan, silloin se kysyi minulta, että tahtoisinko minä
seurata häntä ja jos minä tekisin hänen käskynsä mukaan, niin
minulle laulaisi kunnian kukko, sanoi, saisin syötävää ja juotavaa,
sanoi. No mitäs minä siihen muuta, kuin että kiitos vaan
tarjouksesta. Sitte se riisui minun talonpoikaisvaatteeni ja kiskoi
niskaani sellaisen takintekeleen, painoi päähäni hiustötterön ja ripotti

siihen vehnäjauhoja, sitte sanoi se minua palatsikreiviksi ja samoin
tekivät muutkin. Kummallista, että täällä annetaan sellainen nimi.
ISÄNTÄ. Sen… senkin sakramenskattu lurjus, näin hävyttömästi
olet sinä petkuttanut meitä.
PALATSIKREIVI. Oleks hullu, olenkos minä sinua petkuttanut?
ISÄNTÄ. Etkö sinä meitä petkuttanut, koska sanoit olevasi
palatsikreivi vaikka oletkin vain moukka!
PALATSIKREIVI. Meidän kylässä on kuustoista talonpoikaa, jotka
ovat olleet kukkaskreivejä eikä meidän vouti koskaan ole heitä sen
takia rökittänyt, sitä paitsi on tämä tapahtunut vastoin minun
tahtoani, te itse olette tehnyt minut kukkaskreiviksi vastoin vanhoja,
maalaisia juhlatapoja, sillä siellä ei kestään tehdä kukkaskreiviä
muulloisin kuin kevätjuhlissa.
JALOKIVIKAUPPIAS. Toimita takaisin minun kalliit kiveni, senkin
penikka!
PALATSIKREIVI. Kadotitko kivesi? Sepäs oli kuittia sinulle, senkin
runtti!
ROUVA STAABI (itkien). Ja minun kolmekymmentä kyynärää
broderieriäni.
PALATSIKREIVI. Rotan vietäviä? Pidä sinä itse huolta rotan
vietävistäsi!
SOITTAJA. Minä tahdon maksun soitostani. Isäntä, minä
turvaudun teihin.

ISÄNTÄ. Täytyykö minun kaupanpäällisiksi maksaa vielä
soitostakin. Eikö minua jo tarpeeksi ole nyletty.
PALATSIKREIVI. Selkäsaunan sinä olisit tarvinnut etkä rahoja,
soitit kuin viheliäinen. Viime keväänä kun minä olin kukkaskreivinä
niin saatiin kuulla toisellaista musiikkia, silloin rummutkin pärisi.
Täällä ei saanut kuulla edes kunnollista tanskanpolskaakaan. Enpä
oikein tiedä, mitä sekulia se oli. Tuntui kuin olisi vedetty kissoja
hännästä, ensin kirkui yksi, sitte kirkui toinen ja sitte kaikki yhteen
samaan iloon. Isäntä, jos minä olisin teidän sijassanne, niin en
maksaisi killinkiäkään.
Kohtaus 6.
Raatiherra. Edellisen kohtauksen henkilöt.
RAATIHERRA. Hyvää huomenta, herra isäntä! Kiitos eilisestä!
Kestitys oli oivallinen. Ruuat ja juomat olivat kerrassa erinomaisia.
ISÄNTÄ. Niin, sillä seurauksella, että minun kukkarossani nyt on
tuhannen reikää.
RAATIHERRA. Olen tullut alamaisimmasti tervehtimään
palatsikreiviä sekä kiittämään teitä eilisestä.
ISÄNTÄ. Minä puolestani aioin alamaisimmasti tulla teitä
tervehtimään anoen nöyrimmästi, että palatsikreivi hirtettäisiin
ennen auringonlaskua.
RAATIHERRA. Kuinka on asian laita?

ISÄNTÄ. Asianlaita on se, että palatsikreivi saa riippua hirressä.
Hovimestari oli emälurjus, palatsikreivi on muuttunut viheliäiseksi
talonpojaksi, minut on nyletty puti puhtaaksi, näitä kunnon miehiä
on petkutettu; tuossa seisoo hän itse, nyt voitte häntä tutkia.
RAATIHERRA. Kuuleppas! Miksi olet sanonut olevasi palatsikreivi?
RAATIHERRA. Missä hän sitten on?
ISÄNTÄ. Hän on pötkinyt tiehensä, ottanut kolme hevostani ja
matkakapineet sekä jättänyt tämän viheliäisen talonpojan pantiksi.
RAATIHERRA. Voi, taivas mikä juttu! Hirsipuussa sinä vielä roikut;
tämä on kuulumatonta!
PALATSIKREIVI. Turkanen, minä kantelen voudille jos vain aiotte
minut hirttää. Hän on mies hirttämään teidät kaikki tyyni.
RAATIHERRA. Viekää hänet raatihuoneeseen! Asia on selvä.
(Vievät hänet pois).
Kohtaus 7.
Pojan isä ja äiti, edellisen kohtauksen henkilöt.
ÄITI. Ei se ollut minun tahtoni, että lähetimme tämän
yksinkertaisen poikamme kaupunkiin.
ISÄ. Täytyihän hänen nyt kerran päästä kauppalaan oppiakseen
jotakin.

ÄITI. Vielä mun mitä, mitä se tuhma luontokappale oppisi. Pahoin
pelkään, että hän on värvätty sotilaaksi.
ISÄ. Jollemme häntä löydä, niin saamme maksaa Markus papille
hyvät rahat hänen peräänkuuluttamisestaan. Sillä jos pappi
saarnastuolista kuuluttaa poikamme etsittäväksi, luovuttaa kyllä
upseeri hänet takaisin vaikka hän olisikin jo värvätty.
ÄITI. Puhu sinä pukille mutta älä minulle. Upseerit eivät päästä
hevillä sotamiehiä käsistään.
ISÄ. Hyvä jollei hänelle sen pahempaa ole tapahtunut kuin että
ovat hänet värvänneet. Pahoin pelkään, Kerttu, että poikaviikari on
joutunut pahempaan kiikkiin.
ÄITI. Oi, oi, hän oli sentään ainoa poikamme ja vaikka hän onkin
aika tomppeli, on hänestä sentään apua työssä.
ISÄ. Jos hän on kadonnut, niin täytyy meidän, Kerttu, tyytyä
kohtaloomme.
ÄITI. Minä en koskaan tyydy kohtalooni. Sinä saat hankkia pojan
takaisin taikka toisen hänen sijaansa.
ISÄ. Hanki sitte joku toinen avuksesi, sillä minä olen jo liian vanha
ja heikko sellaiseen.
(Äiti itkee).
ISÄ. Älä itke, eukkoseni. Etsikäämme häntä Karjatalon kievarista.
Ehkä hän on joutunut sinne.
ÄITI. Loruja, kuinka hän olisi sinne joutunut?

ISÄ. Menkäämme nyt kuitenkin sinne! Mutta kuules sitä melua.
Näissä kauppaloissa ei näe muuta kuin paljasta pahaa. Tuossakin
kuljetellaan kurjaa syntistä raukkaa. (Raatimiehelle). Anteeksi, hyvä
herra, mitä pahaa on tämä kurja syntinen tehnyt?
RAATIHERRA. Hän on hirtettävä.
POIKA. Voi tokkiinsa, eiväthän silmäni vain ole nurin päässäni,
tuossahan ovat vanhempani. Oi, rakkaat vanhemmat, nyt tulitte
sopivaan aikaan seurataksenne minua hirsipuulle.
ÄITI. Mies hoi, tämä on meidän poikamme, Pekka Niilonpoika!
ISÄ. Niin minustakin näyttää. Pekka Niilonpoika! Mitä tämä on?
Mitä pahaa olet tehnyt?
POIKA. Voi, rakas isä! Älä suutu minuun, kadotin ne neljä killinkiä,
joilla minun piti ostaa tervaa!
RAATIHERRA. Mikä yksinkertaisuus! Minun tulee häntä sääli.
Kuulkaas isäntä, onko tämä teidän poikanne?
ISÄ (itkien). On, on, armollinen herra. Mutta miksi minun poikani
hirtetään?
RAATIHERRA. Hän on sanonut olevansa ylhäinen herra ja on
hävittänyt tässä seisovien kunnon ihmisten onnen ja menestyksen.
ÄITI. Oi, armollinen herra, se on mahdotonta. Hän on typerin
ihminen, mitä maa päällään kantaa. Onko se totta, mistä sinua
syytetään?

POIKA. Hiisi vieköön koko sen hovimestarin! Kyllä minä sen vielä
pehmitän!
RAATIHERRA. Mitenkä sinä tapasit sen hovimestarin?
POIKA. Kun minä eilen seisoin torilla ja katselin ympärilleni, tuli
hän minun luokseni ja sanoi: seuraa minua ja tee niinkuin käsken,
niin sinulle koittaa paremmat päivät kuin mitä sinun herrasväelläsi
on. Hulluhan minä olisin ollut, jollen olisi sellaista tarjousta vastaan
ottanut; minä seurasin häntä, hän puki ylleni sellaisen takintekeleen,
sanoi minua palatsikreiviksi, vei minut tämän miehen huoneeseen,
joka myöskin sanoi minua palatsikreiviksi, antoi minulle kerralla niin
paljon ruokaa ja juomaa, että se olisi riittänyt minulle koko vuoden
ajaksi. Minä panen maata enkä ajattele tuon taivaallista. Aamulla
sanotaan, että minut hirtetään sentähden, että eilen olin
palatsikreivi. Lempo ruvetkoon toiste teidän palatsikreiviksenne!
ÄITI (itkien). Armollinen herra, huomaattehan hänen
yksinkertaisuudestaan, ettei hän ole mies tekemään
konnankoukkuja, mutta toiset ihmiset ovat käyttäneet hänen
typeryyttänsä hyväksensä sekä ovat pitäneet häntä välikappaleena
muita petkuttaaksensa. Armahtakaa minua, älkää saattako minua
ennenaikaiseen hautaan.
RAATIHERRA. Kunnon kansalaiset, mitä arvelette te, jotka olette
kärsineet vahinkoa? Tyydyttekö siihen, että tämä yksinkertainen
ihminen saa kärsiä viattomasti.
JALOKIVIKAUPPIAS. Ei se hyödytä meitä laisinkaan. Minä itse
säälin häntä.

POIKA. Mielelläni minä maksan kärsityn vahingon. Toinen on
menettänyt kaksi kiveä, mielelläni hankin minä hänelle kymmenen
sijaan, ja toinen on menettänyt kolmekymmentä rotan vietävää, ne
minä kyllä myöskin voin hänelle toimittaa.
RAATIHERRA. Olen havainnut pojan aivan syyttömäksi sekä
enemmän sääliä kuin rangaistusta ansainneeksi. Emme minkään lain
mukaan voi rangaista häntä viattomana välikappaleena, vaan
luovuttakaamme hänet hänen vanhemmillensa sillä varoituksella,
etteivät he toiste anna hänen ominpäin matkustaa kaupunkiin, jottei
hän antaisi aihetta useampiin tällaisiin murhenäytelmiin.

*** END OF THE PROJECT GUTENBERG EBOOK TALONPOIKA
SATIMESSA: KOLMINÄYTÖKSINEN HUVINÄYTELMÄ ***
Updated editions will replace the previous one—the old editions will
be renamed.
Creating the works from print editions not protected by U.S.
copyright law means that no one owns a United States copyright in
these works, so the Foundation (and you!) can copy and distribute it
in the United States without permission and without paying
copyright royalties. Special rules, set forth in the General Terms of
Use part of this license, apply to copying and distributing Project
Gutenberg™ electronic works to protect the PROJECT GUTENBERG™
concept and trademark. Project Gutenberg is a registered trademark,
and may not be used if you charge for an eBook, except by following
the terms of the trademark license, including paying royalties for use
of the Project Gutenberg trademark. If you do not charge anything
for copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such as
creation of derivative works, reports, performances and research.
Project Gutenberg eBooks may be modified and printed and given
away—you may do practically ANYTHING in the United States with
eBooks not protected by U.S. copyright law. Redistribution is subject
to the trademark license, especially commercial redistribution.
START: FULL LICENSE

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

Computational Mechanics With Deep Learning An Introduction Genki Yagawa

More Related Content

Featured (20)

Computational Mechanics With Deep Learning An Introduction Genki Yagawa