Computational Epidemiology Datadriven Modeling Of Covid19 Ellen Kuhl
Computational Epidemiology Datadriven Modeling Of Covid19 Ellen Kuhl
Computational Epidemiology Datadriven Modeling Of Covid19 Ellen Kuhl
Computational Epidemiology Datadriven Modeling Of Covid19 Ellen Kuhl
1. Computational Epidemiology Datadriven Modeling
Of Covid19 Ellen Kuhl download
https://ebookbell.com/product/computational-epidemiology-
datadriven-modeling-of-covid19-ellen-kuhl-46240522
Explore and download more ebooks at ebookbell.com
2. Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Computational Epidemiology Datadriven Modeling Of Covid19 Ellen Kuhl
https://ebookbell.com/product/computational-epidemiology-datadriven-
modeling-of-covid19-ellen-kuhl-46233584
Computational Epidemiology From Disease Transmission Modeling To
Vaccination Decision Making 1st Ed Jiming Liu
https://ebookbell.com/product/computational-epidemiology-from-disease-
transmission-modeling-to-vaccination-decision-making-1st-ed-jiming-
liu-22503076
Computational Intelligence And Data Analytics Proceedings Of Iccida
2022 1st Ed 2023 Rajkumar Buyya
https://ebookbell.com/product/computational-intelligence-and-data-
analytics-proceedings-of-iccida-2022-1st-ed-2023-rajkumar-
buyya-45333718
Computational Methods In Organometallic Catalysis From Elementary
Reactions To Mechanisms Yu Lan
https://ebookbell.com/product/computational-methods-in-organometallic-
catalysis-from-elementary-reactions-to-mechanisms-yu-lan-46084622
3. Computational Intelligence Based Solutions For Vision Systems Ansari
Bajaj
https://ebookbell.com/product/computational-intelligence-based-
solutions-for-vision-systems-ansari-bajaj-46094270
Computational Methods Using Matlab An Introduction For Physicists P K
Thiruvikraman
https://ebookbell.com/product/computational-methods-using-matlab-an-
introduction-for-physicists-p-k-thiruvikraman-46098316
Computational Semiotics Jeanguy Meunier
https://ebookbell.com/product/computational-semiotics-jeanguy-
meunier-46098550
Computational Engineering Of Historical Memories With A Showcase On
Afroeurasia Ca 11001500 Ce Andrea Nanetti
https://ebookbell.com/product/computational-engineering-of-historical-
memories-with-a-showcase-on-afroeurasia-ca-11001500-ce-andrea-
nanetti-46191608
Computational Mechanics With Neural Networks Genki Yagawa Atsuya Oishi
https://ebookbell.com/product/computational-mechanics-with-neural-
networks-genki-yagawa-atsuya-oishi-46245306
9. To my mom, with whom I had the most
scientific discussions about COVID-19.
10. Foreword
Nothing will ever be quite the same.
The Great COVID-19 Pandemic that started in December 2019 will be remem-
bered as the main event of the first part of the 21st Century. It affected all parts of
the world, it laid bare social inequalities, revealed the excesses of modern life, shook
all strata of society, and changed profoundly how individuals, groups, and nations
interact with each other.
Early in the crisis, it was realized that theoretical epidemiology and modeling are
essential tools to confront the problem. As Daniel Bernoulli stated in the first-ever
epidemiological studies of 1760: "I simply wish that, in a matter which so closely
concerns the wellbeing of the human race, no decision shall be made without all the
knowledge which a little analysis and calculation can provide". By January 2020, the
first ‘little analysis and calculation’ provided by epidemiological studies revealed the
seriousness of the situation and scientists around the world rang the alarm bell. As
the crisis escalated, the attention of the media, governments, and the public turned
towards modelers with burning questions: how bad is it? how will it evolve? how
many cases should we anticipate? will the hospitals be overwhelmed? how many
deaths? what can we do?
Like many people in academia, Ellen Kuhl found herself naturally interested in
the problem. As an extreme athlete, a marathoner, a triathlete and an iron-(wo)man,
Ellen is fearless. Early on, she realized that the methods she had been developing
to understand neurodegenerative diseases could be readily adapted to the evolving
crisis. As a scientist, Ellen combines a unique ability for modeling, great technical
skills, and a wonderful intuition for good problems. She quickly built elegant data-
driven models for how the disease spreads around different parts of the world through
the airline network that soon became landmark studies. As an educator, she also
realized that students in Stanford would gain from learning more about the science
behind the disease. Rapidly, she put together a new course that was taught in the
2020 winter term, as the second wave was gathering strength. In her course that I
attended and enjoyed tremendously, she taught both the basics of epidemiology and
her own research as well as many topics related to the wider social and political
vii
11. viii Foreword
impacts of the crisis. This book is the combined result of her course, state-of-the-art
research, and general reflections.
The first part of the book is a fast-paced self-contained review of the fundamental
ideas of epidemiology. It is full of wonderful stories, biographies, and anecdotes
that brings life to the topic. It introduces and defines important concepts such as
the reproduction number, herd immunity, and vaccine efficacy. On the mathematical
side, it is centered around the development of the the famous SIR model that captures
the evolution of three homogeneous populations of people susceptible (S ), Infected
(Is ) and Recovered (R) going back to the work of Kermack and McKendrick in
1927. The model will tell you that for a reproductive number R larger than one, after
an initial exponential growth, there will be saturation when enough of the population
has been infected before the disease eventually runs its course.
This classic material is updated for our computational world in Part II where
Ellen shows how to implement efficiently the classic theory and how it can be
generalized to problems that are directly relevant to the COVID crisis. In particular,
in Chapter 7 and 8 two extra populations, the exposed (E) and asymptomatic (Ia),
are introduced to capture the propagation of the disease through populations that do
not display yet signs of infections, a crucial feature of this disease. As Ellen shows,
this simple system of four or five differential equations is already enough to capture
many features of the disease and can be used to understand early disease outbreaks.
The third part of the book brings the topic of epidemiology to current research
level. One of the key features of a pandemic is the propagation of the disease from
one region to another. In our hyper-connected world there are many ways diseases
can travel. At the global level, the airline network is a natural way for the disease to
spread especially through people who show no symptoms. Ellen shows how the SEIR
model can then be extended to a network and efficiently implemented. Remarkably,
this simple idea is sufficient to understand the early spread and different phases of
the disease.
Yet, a key question remains. How do we use daily data about the number of cases
to validate and infer parameters for the various models? Part IV of the book addresses
directly this question and brings the topic to the forefront of current data research
by presenting Bayesian inference methods and how it can be efficiently applied to
the COVID crisis. This data-driven approach combined with the network models
bring state-of-the-art techniques to the study of the disease dynamics and shows how
secondary data, such as mobility data from cell phones, can be used as a barometer
to predict the development of the disease.
Through the fog of war, governments have made many mistakes that have cost
countless lives. As Anne Frank wrote: “What is done cannot be undone, but at least
one can keep it from happening again". How do we keep it from happening? We
learn from our mistakes, sharpen our tools and models and confront the current and
next crises with the full power of science. Ellen’s wonderful book gives us the ideas,
methods, tools, knowledge, and concepts to understand the current pandemic and be
prepared for the next one.
Oxford, May 2021 Alain Goriely
12. Preface
The objective of this book is to understand the outbreak dynamics of the COVID-19
pandemic through the lens of computational modeling. Computational modeling
can provide valuable insights into the dynamics and control of a global pandemic to
guide public health decisions. So... why didn’t it? Why did dozens of computational
models make predictions that were orders of magnitude off? This book seeks to an-
swer this question by integrating innovative concepts of mathematical epidemiology,
computational modeling, physics-based simulation, and probabilistic programming.
We illustrate how we can infer critical disease parameters–in real time–from reported
case data to make informed predictions and guide political decision making. We crit-
ically discuss questions that COVID-19 models can and cannot answer and showcase
controversial decisions around the early outbreak dynamics, outbreak control, and
gradual return to normal. As scientists, it is our ethic responsibility to educate the
public to ask the right questions and to communicate the limitations of our answers.
Throughout this book, we will create data-driven models for COVID-19 to do so.
Who is this book for? If you are a student, educator, basic scientist, or medical
researcher in the natural or social sciences, or someone passionate about big data
and human health: This book is for you! Don’t worry, this book is introductory and
doesn’t require a deep knowledge in epidemiology. A fascination for numbers and a
general excitement for physics-based modeling, data science, and public health are a
lot more important. And this is why this book is both, a textbook for undergraduates
and graduate students, and a monograph for researchers and scientists.
As a textbook, this book can be used in the mathematical life sciences suitable
for courses in applied mathematics, biomedical engineering, biostatistics, computer
science, data science, epidemiology, health sciences, machine learning, mathematical
biology, numerical methods, and probabilistic programming.
As a monograph, this book integrates the basic fundamentals of mathematical
and computational epidemiology with modern concepts of data-driven modeling and
probabilistic programming. It serves researchers in epidemiology and public health,
with timely examples of computational modeling, and scientists in the data science
and machine learning, with applications to COVID-19 and human health.
ix
13. x Preface
What does this book cover? This book consists of four main parts that gradually
build from mathematical epidemiology via computational and network epidemiology
to data-driven epidemiology.
Part I. Mathematical Epidemiology introduces the basic concepts of epidemiology
in view of the COVID-19 pandemic with the objectives to understand the cause of an
infectious disease, predict its outbreak dynamics, and design strategies to control it.
We introduce the paradigm of compartment modeling and revisit analytical solutions
for the classical SIS, SIR, and SEIR models using outbreak data of the COVID-19
pandemic. Typical problems include estimating the herd immunity threshold and
efficacy of different COVID-19 vaccines and the growth rate, basic reproduction
number, contact period of the COVID-19 outbreak at different locations.
Part II. Computational Epidemiology introduces numerical methods for ordinary
differential equations and applies these methods to discretize, linearize, and solve
the governing equations of SIS, SIR, SEIR, and SEIIR models to interpret the
case data of COVID-19. Typical problems include simulating the first wave of the
COVID-19 outbreak using reported case data and understanding the effects of early
community spreading and outbreak control. We discuss why classical epidemiology
models have failed to predict the outbreak dynamics of COVID-19 and introduce new
concepts of data-driven dynamic contact rates and serology-informed asymptomatic
transmission to address these shortcomings.
Part III. Network Epidemiology discusses numerical methods for partial differ-
ential equations and applies these methods to discretize, linearize, and solve the
spreading of infectious diseases using discrete mobility networks and finite element
models. Instead of solving the outbreak dynamics of COVID-19 locally for each
region, state, or country, we now allow individuals to travel and populations to
mix globally, informed by cell phone mobility data and air travel statistics. Typi-
cal problems include understanding the early outbreak patterns, the effect of travel
restrictions, and the risk of reopening after lockdown.
Part IV. Data-driven Epidemiology covers the most timely methods and applica-
tions of this book. It focuses on probabilistic programming with the objectives to
understand, predict, and control the outbreak dynamics of the COVID-19 pandemic.
We integrate computational epidemiology and data-driven modeling to explore dis-
ease data in view of different compartment models using a probabilistic approach
and quantify the uncertainties of our analysis. Typical problems include inferring the
reproduction dynamics, visualizing the effects of asymptomatic transmission, and
correlating case data and mobility.
This book is by no means complete. It does not cover agent-based modeling, age-
dependent modeling, population mixing, purely statistical, stochastic, or probabilistic
modeling, forecasting, and many other key aspects of computational epidemiology.
Instead, it is a personal reflection on the role of data-driven modeling during the
COVID-19 pandemic, motivated by the curiosity to understand it. Because Science.
Stanford, May 2021 Ellen Kuhl
14. Acknowledgements
Now, that’s a wrap! I started writing this book on January 14, 2021, during the United
States peak of the COVID-19 pandemic and finished on May 31, 2021, at the lowest
incidence in 14 months. Dozens of people have contributed to this book, directly
or indirectly, through endless discussions around the pandemic. I was fortunate
to collaborate with an amazing international team, Alain Goriely from the UK,
Francisco Sahli Costabal from Chile, Henry van den Bedem from the Netherlands,
Kevin Linka from Germany, Mathias Peirlinck from Belgium, Paris Perdikaris from
Greece, and Proton Rahman from Canada. The daily discussions of our local outbreak
dynamics–while never meeting in person–were an essential part of my pandemic life
made possible by Zoom. Thank you for sharing this unique experience!
I enjoyed regular COVID-19 meetings with my scientific friends around the world,
Silvia Budday, Krishna Garikipati, Tom Hughes, Tinsley Oden, Paul Steinmann,
Tarek Zohdi, and the IMAG/MSM Working Group for Multiscale Modeling and Viral
Pandemics. But the true motivation for this book came from the students of the new
course Data-driven modeling of COVID-19 that we created at Stanford University
in the Fall of 2020 to make online learning a bit more tangible. The enrollment
ranged from undergraduates in biology, classics, computer science, engineering,
ethics, human biology, mechanical engineering, and Spanish, to master and PhD
students in aeronautics, astronautics, computer science, environment and resources,
management science and engineering, and mechanical engineering. Together, we
designed this course as the pandemic unfolded–in real time–and this book is a
collection of class notes and feedback from students, guest speakers, and lecturers.
We received support from the Stanford Bio-X Program and the School of Engineering
COVID-19 Research and Assistance Fund. Massive thanks to all students in the class,
to our amazing course assistants, Amelie Schäfer, Oguz Tikenogullari, Mathias
Peirlinck, and Kevin Linka, and to my favorite guest speaker, Alain Goriely.
Last but not least, I thank the true heroes of the pandemic, Jasper and Syb, and
all the kids who not only had to bear the uncertainties of a global pandemic, but
also endure their parents working from home. Thank you, Henry, for getting through
this together, with 354 miles of swimming, 3611 miles of biking, and 3279 miles of
running. This has been a truly memorable time. I’m glad it’s over–at least for now.
xi
21. 4 1 Introduction to mathematical epidemiology
1.1 A brief history of infectious diseases
In the year 2020, everybody–at home on their couch–became an infectious disease
expert. To calibrate all this newly gained knowledge, it seems a good idea to briefly
reiterate the basic concepts and nomenclature of infectious diseases. Infectious dis-
eases are spread by either bacterial or viral agents and are ever-present in society
[8]. Every once in a while, this may result in outbreaks that have a significant impact
on a local or global level. Depending on their spatial and temporal spread, we can
classify outbreaks as endemic, epidemic, or pandemic [28]. Endemic outbreaks, for
example chickenpox, are permanently present in a region or a population. Epidemic
outbreaks, for example the seasonal flu, affects a lot of people in a short period of
time, spread across several communities, and then disappear. Pandemic outbreaks,
for example the Spanish flu, are epidemic outbreaks that affects a lot of people in a
short period of time and spread across the entire world. Mathematical modeling of
infectious diseases is important to understand their outbreak dynamics and inform
political decision making to manage their spread [4].
Table 1.1 History of recent infectious disease outbreaks. Time period, type of disease, number
of deaths, and location.
period disease deaths location
1346 - 1350 Black Death 100,000,000 1/3 of Europe
1665 - 1666 Great Plague 100,000 1/4 of London
1918 - 1920 Spanish flu 50,000,000 worldwide
1980 measles 2,600,000 worldwide / year
2003 SARS 774 worldwide
2009 - 2010 H1N1 18,500 worldwide
2011 tuberculosis 1,400,000 worldwide / year
HIV/AIDS 1,200,000 worldwide / year
malaria 627,000 worldwide / year
2011 measles 160,000 worldwide (-94%)
2012 - 2020 MERS 866 worldwide
seasonal flu 35,000 United States / year
2014 - 2016 ebola 11,000 Africa
2018 ebola 2,280 Congo
2019 -2021 COVID-19 3,300,000 worldwide
The COVID-19 pandemic. On March 11, 2020, Tedros Adhanom Ghebreyesus, the
Director-General of the World Health Organization, declared the COVID-19 outbreak
a global pandemic [49]. On that day, it had affected 126,702 people worldwide. What
nobody could have foreseen is that within the following year, by March 11, 2021,
this number had increased by three orders of magnitude, to 118.57 million [12].
Naturally, we often think of the COVID-19 pandemic as the deadliest and most
devastating infectious disease in modern history.
22. 1.1 A brief history of infectious diseases 5
Endemic. The word endemic is derived from the greek words en meaning
in and demos meaning people. An infectious disease is endemic when it
is constantly maintained at a baseline level in a geographic region without
external inputs. For example, chicken pox is endemic in the United Kingdom.
A person-to-person transmitted disease is endemic if each infected person
passes the disease to one other person on average. The infection neither dies
out nor increases exponentially, it is in an endemic steady state. Infectious
disease experts ask:
• How many people are infectious at any give time? – What is I(t)?
• How fast do new infections arise? – What is dI/dt? – Stability analysis
• What are the effects of quarantine or vaccination? – What is R0?
• Can we eradicate the disease? – What is H = 1 − 1/R0? – Limit analysis
Epidemic. The word epidemic is derived from the greek words epi meaning
upon and demos meaning people. An infectious disease is epidemic when
it spreads rapidly across a large number of people in a short time period.
For example, the seasonal flu is epidemic. Epidemics often come in waves
with several recurring outbreaks, which can sometimes be seasonal. There is
usually no increase in susceptibles and the disease dies out as the number
of infectives decreases because a large enough fraction of the population has
become immune. Infectious disease experts ask:
• How severe will the epidemic be? – What are Imax?
• When will it reach its peak? – What is t(Imax)?
• How long will it last? – What are S∞ and R∞? What is t(S∞)?
• What are the effects of vaccination? – What are R0 and H = 1−1/R0 and I?
Pandemic. The word pandemic is derived from the greek words pan meaning
all and demos meaning people. An infectious disease is pandemic when it
spreads rapidly across a large region or worldwide in a short time period. A
pandemic is a global outbreak. For example, smallpox, tuberculosis, the black
death, the Spanish flu were pandemics, and COVID-19 is now. Infectious
disease experts ask:
• How severe is the pandemic, when will it peak? – What are Imax and t(Imax)?
• What are effective measures to manage the outbreak? – What is R(t)?
• What are the effects of quarantine or lockdown? – What is β(t)?
• What are the effects of travel restrictions? – What are κ and Lij?
• How do we prioritize vaccination? – What are R0 and H = 1 − 1/R0 and I?
23. 6 1 Introduction to mathematical epidemiology
Table 1.1 summarizes recent infectious disease outbreaks ranging from the Black
Death in the 14th century with an estimated 100 million deaths across Europe to the
current COVID-19 pandemic with 3.3 million deaths worldwide to date. Notably, in
1980, the annual death toll of the measles with 2.6 million was of comparable size.
Today, after massive vaccination campaigns, this number has dropped significantly,
by 94% to 160,000 [5].
Fig. 1.1 The Lessons of the Pandemic. In this famous Science publication, George A. Soper
reflects on the Spanish flu that resulted in 50 million deaths worldwide in 1918 and 1919.
Figure 1.1 shows the cover page of the Science publication The Lessons from the
Pandemic from May 1919, a reflection on the scientific understanding of the Spanish
flu [29]. While some of it is specific to the nature of the 1918 influenza and the time
during which it occurred, much of it still applies to the COVID-19 pandemic today:
The Spanish flu lasted from 1918 to 1920 and, similar to COVID-19, occurred in
multiple waves. Both, the Spanish flu and COVID-19, are contagious respiratory
illnesses that spread from person to person, mainly by droplets, through cough,
sneeze, or talk. Naturally, increased hygiene, mask wearing, and physical distancing
in addition to strict isolation and quarantine are successful strategies to manage both
conditions [10]. While COVID-19 is caused by a coronavirus, SARS-CoV-2, the
1918 influenza was caused by the H1N1 influenza A virus, a virus of avian origin.
Within 25 months, it infected 500 million people, one third of the world’s population,
and resulted in more than 50 million deaths [5]. The major differences between the
Spanish flu and COVID-19 are their high risk populations and the mechanisms of
death: In contrast to the Spanish flu, which affected mainly healthy adults between
25 and 40 years of age, COVID-19 affects mainly individuals of 65 years and older
with comorbidities. Victims of the 1918 influenza mainly died from secondary
bacterial pneumonia, whereas victims of COVID-19 die from an overactive immune
response that results in organ failure [32]. Nevertheless, comparing the COVID-
19 outbreak to previous pandemics can provide insight and guidance to manage
the current COVID-19 pandemic. An important element of this comparison are
mathematical and computational tools that have been designed to quantify and
explain the spreading mechanisms of infectious diseases. This is the objective of
mathematical and computational epidemiology [3].
24. 1.2 Introduction to epidemiology 7
1.2 Introduction to epidemiology
Epidemiology is the study of distributions, patterns, and determinants of health-
related events in human populations [3]. It is a cornerstone of public health , and
shapes policy decisions by identifying risk factors for outbreaks and targets for
prevention [28].
Epidemiology literally means the study of what is upon the people. It is
derived from the greek words epi, meaning upon demos, meaning people, and
logos meaning study, suggesting that it applies only to human populations. By
this definition, epidemiology is the scientific, systematic, data-driven study of
the distribution, i.e., where, who, and when, and determinants, i.e., causes and
risk factors, of health-related patterns and events in specified populations, i.e.,
the world, a country, state, county, city, school, neighborhood. Epidemiology
also includes the study of outbreak dynamics, outbreak control, and informing
political decision making.
Agent, host, and environment. A key premise in epidemiology is that health-related
events are not evenly distributed in a population; rather, they affect some individuals
more than others [3]. An important goal of epidemiology is to identify the causes that
put these individuals at a higher risk [2]. A simple but popular model to analyze and
explain disease causation is the epidemiologic triangle. The epidemiologic triangle
summarizes the interplay of the three components that contribute to the spread of an
infectious disease: an external agent, a susceptible host, and an environment in which
agent and host interact [9]. Descriptive epidemiologists characterize this interaction
as the seed, the soil, and the climate [42]. Effective public health measures assess
all three components and their interactions to control or prevent the spreading of a
disease. Figure 1.2 illustrates the epidemiologic triangle of COVID-19 with agent,
host, and environment. For COVID-19, the agent is the SARS-CoV-2 virus, the
hosts are people, and the environment are droplets. Interventions between any two
of these three components can help reduce the spread of the disease [20]. For exam-
ple, reduced exposure, vaccination, and antiviral treatment can modulate agent-host
interactions and reduce the number of new infections.
From descriptive to mathematical epidemiology. Mathematical models of infec-
tious diseases date back to Daniel Bernoulli’s model for smallpox in 1760 [5], and
they have been developed and improved extensively since the 1920s [14]. In the
middle of the 19th century, the English physician John Snow conducted a famous
series of experiments of the cholera outbreak in London to discover the cause of the
disease and to prevent its recurrence [46]. Because his research illustrates the classic
sequence–from descriptive epidemiology and hypothesis generation, to mathemat-
ical epidemiology and hypothesis testing–John Snow is considered the father of
modern epidemiology. Figure 1.4 summarizes the experiments of John Snow that
ended the cholera outbreak in London by removing the handle of a public water
25. 8 1 Introduction to mathematical epidemiology
Fig. 1.2 Epidemiologic triangle of COVID-19 with agent, host, and environment. The epidemi-
ologic triangle illustrates the interplay of the three components that contribute to the spread of a
disease: an external agent, a susceptible host, and an environment in which agent and host interact.
For COVID-19, the agent is the SARS-CoV-2 virus, the hosts are people, and the environment are
droplets. Interventions between any of these three components can reduce the spread of the disease.
Fig. 1.3 Daniel Bernoulli is considered the most famous
mathematician of the Bernoulli family, although he actually
studied medicine. He was born on February 8, 1700 in
Groningen, the Netherlands, and is best known for his
applications of mathematics to mechanics. To prove the
efficacy of vaccination against smallpox, he proposed the
first compartment model in 1760 [5]. It only had two
compartments, but demonstrated the potential of modeling
to understand the mechanisms of transmission, predict future
spread, and control the outbreak through vaccination. Daniel
Bernoulli died on March 27, 1782 in Basel, Switzerland.
pump. Today the field of epidemiology is considered a quantitative discipline that
uses rigorous mathematical tools of probability and statistics to develop and test
hypotheses to understand and explain health-related events [3]. In simple terms,
today’s epidemiologists count, scale, and compare. They count the number of cases;
scale this count by a characteristic population to define fractions; and compare the
evolution of these fractions over time [9]. Mathematical epidemiology has advanced
significantly throughout the past decades [11], and models have become more so-
phisticated and complex [39]. This implies that in epidemiology today, is often no
longer possible to solve epidemiological models analytically and in closed form [3].
From mathematical to computational epidemiology. Computational epidemiol-
ogy is a multidisciplinary field that integrates mathematics, computer science, and
public health to better understand central questions in epidemiology such as the
26. 1.2 Introduction to epidemiology 9
Fig. 1.4 The English physician John Snow is considered one
of the founders of modern epidemiology, in part because of
his research during the cholera outbreak in London in 1954
[46]. John Snow lived from March 15, 1813 to June 16, 1858.
He questioned the widely accepted paradigm that cholera
would spread by polluted bad air. He postulated that water
would be the cause of the cholera outbreak in Soho, London
in 1854. By talking to local residents, he identified the source
of the outbreak as the public water pump on Broad Street.
Although his chemical and microscopic observations were
inconclusive, his spreading patterns of the cholera outbreak
were convincing enough to persuade the local council to
disable a street pump on Broad Street by removing its handle.
There is a common belief that this action marked the ending
of the outbreak.
spread of diseases or the effectiveness of public health interventions [22]. Real-time
epidemiology is a rapidly developing area within computational epidemiology that
seeks to support policy makers–in real time–as an outbreak is unfolding. An impor-
tant application of real-time epidemiology is disease surveillance, the data-driven
collection, analysis, and interpretation of large volumes of disease data from a va-
riety of sources. This information can help to evaluate the effectiveness of control
and preventative health measures [12]. Central to computational epidemiology is
the accurate knowledge of people who are affected by an infectious disease at any
given point in time. During the COVID-19 pandemic, for the first time in history,
this information has been collected conscientiously, shared publicly, updated daily,
and made freely available in real time [13].
Sensitivity and specificity. A diagnostic tests, for example, the nasal swab
test, can be inaccurate in two ways: a false positive result erroneously labels
a healthy person as infected, resulting in unnecessary quarantine and contact
tracing; a false negative result oversees an infected person, resulting in the
risk of infecting others.
sensitivity = true positives / all sick people
Sensitivity measures the proportion of positives, diseased people, that are
correctly identified and not overlooked.
specificity = true negatives / all healthy people
Specificity measures the proportion of negatives, healthy people, that are
correctly identified and not classified as diseased. A perfect test has 100%
sensitivity and 100% specificity.
27. 10 1 Introduction to mathematical epidemiology
1.3 Testing, testing, testing
During a global pandemic, knowing how many people currently have the disease
or have previously had it provides crucial information for healthcare providers and
policy makers, both on the individual and population levels. Unfortunately, there is
no method to provide this information with absolute accuracy. We usually use two
measures to characterize the degree of accuracy of a test: sensitivity, the fraction
of correctly identified positive, diseased individuals; and specificity, the fraction of
correctly identified negative, healthy individuals. Understanding these limitations is
important when modeling, simulating, and predicting the outbreak dynamics of a
pandemic, especially in view of disease management and political decision making.
Table 1.2 Testing for COVID-19. Summary of the three most common tests for SARS-CoV-2.
Diagnostic tests, including molecular and antigen tests, provide information about acute infection,
whereas antibody or serology tests provide information about previous infection. Publicly available
case data are based on diagnostic testing; seroprevalence reports are based on antibody testing.
diagnostic testing antibody testing
molecular testing antigen testing serology testing
provides information about
acute COVID-19 infection
provides information about
acute COVID-19 infection
does not provide information
about acute infection
does not provide information
about past infection
does not provide information
about past infection
tests for antibody presence,
does not guarantee immunity
can take hours to days takes minutes to hours takes minutes to hours
relatively accurate early in
the infection
faster and cheaper than
molecular tests
quick results
accuracy drops later in infec-
tion
less accurate than molecular
tests
tests can vary in accuracy
point of care testing; collect
swabs at home and mail in
point of care testing; collect
swabs at home and mail in
blood test that can be done at
doctor’s office or at home
Testing for COVID-19. Since its outbreak in late 2019, the COVID-19 pandemic has
generated an exponentially growing demand for testing, and diagnostic assays that
enable mass screening have been developed at an unprecedented pace. To success-
fully use these tests to inform public health strategies, it is critical to understand
their individual strengths and limitations. There are two different types of assays for
SARS-CoV-2, the virus that causes COVID-19: diagnostic tests and antibody tests.
A diagnostic test can show if you have an active COVID-19 infection and need to
take steps to quarantine or isolate yourself from others. Molecular tests and antigen
tests fall under this category. A molecular test is a diagnostic test that detects genetic
material from the virus using, for example, reverse transcription polymerase chain
reaction or nucleic acid amplification. An antigen test is a diagnostic test that detects
specific proteins made by the virus. Samples for diagnostic tests are typically col-
lected with a nasal or throat swab or with saliva from spitting into a tube. Diagnostic
testing is critical to provide early treatment, quarantine individuals sooner, and trace
and isolate their contacts to reduce the spread of the virus.
28. 1.4 The basic reproduction number 11
Fig. 1.5 Testing for COVID-19. The objective of COVID-19
testing is to probe whether an individual is currently or
has previously been infected with SARS-CoV-2. A nasal or
throat swab test is a diagnostic test that provides information
about an acute infection. Diagnostic testing is critical to
provide early treatment, quarantine individuals, and trace
and isolate their contacts to reduce the spread of the virus. A
blood test is an antibody test that provides information about
a previous infection. Antibody tests are critical to estimate
the overall dimension of an outbreak. Throughout this book,
we use COVID-19 case data from public dashboards based
on confirmed positive diagnostic tests.
An antibody test can detect antibodies that are made by your immune system in
response to a previous infection with SARS-CoV-2. In contrast to genetic material
from the virus or specific proteins made by the virus, antibodies can take several
days or even weeks to develop, but they can remain present for several weeks or
months after recovery. A serology test is an antibody test that looks for antibodies in
blood samples. These venous blood samples are typically collected at the doctor’s
office or in the clinic. Antibody tests should not be used to diagnose an active infec-
tion. Instead, antibody testing is important to understand the kinetics of the immune
response to infection, clarify whether an infection protects from future infection,
characterize how long immunity will last, and estimate the overall dimension of an
outbreak.
Table 1.2 contrasts the three most common types of tests for COVID-19. Throughout
this book, we use COVID-19 case data from public dashboards based on confirmed
positive diagnostic tests. Only in Chapters 8 and 13, where we explore the effects of
asymptomatic transmission, we use specialized models that combine case data from
diagnostic tests with seroprevalence data from antibody tests.
1.4 The basic reproduction number
The basic reproduction number is a powerful but simple concept to explain the
contagiousness and transmissibility of an infectious disease [6]. For decades, epi-
demiologists have successfully used the basic reproduction number to quantify how
many new infections a single infectious individual creates in an otherwise com-
pletely susceptible population [6]. During the COVID-19 pandemic, the public me-
dia, scientists, and political decision makers across the globe have adopted the basic
reproduction number as an illustrative metric to explain and justify the need for dif-
ferent outbreak control strategies [10]: An outbreak will continue for reproduction
numbers larger than one, R0 > 1, and come to an end for reproduction numbers
smaller than one, R0 < 1 [8]. However, especially in the midst of a global pandemic,
it is difficult–if not impossible–to measure R0 directly [48].
29. 12 1 Introduction to mathematical epidemiology
Table 1.3 Basic reproduction numbers and herd immunity thresholds for common infectious
diseases. Herd immunity is the indirect protection from an infectious disease that occurs when a
large fraction of the population has become immune. The herd immunity threshold, H = 1 − 1/R0,
beyond which this protection occurs is a function of the basic reproduction number R0.
disease R0 H disease R0 H
measles 12 - 18 92 - 95% mumps 4.0 - 7.0 75 - 86%
pertussis 12 - 17 92 - 94% COVID-19 2.0 - 6.0 50 - 83%
rubella 6 - 7 83 - 86% SARS 2.0 - 5.0 50 - 80%
smallpox 6 - 7 83 - 86% ebola 1.5 - 2.5 33 - 60%
polio 5 - 7 80 - 86% influenza 1.5 - 1.8 33 - 44%
Table 1.3 summarizes the basic reproduction numbers for common infectious dis-
eases. It varies from R0 = 1.5 − 1.8, for less contagious diseases like influenza, to
R0 = 12 − 18 for the measles and pertussis. Knowing the precise value of R0 is
important, but challenging, because of limited testing, inconsistent reporting, and
incomplete data [6]. Throughout this book, instead of measuring the basic repro-
duction number directly, we estimate it using mathematical modeling and reported
case data [17]. Mathematical models interpret the basic reproduction number R0
as the ratio between the infectious period C, the period during which an infectious
individual can infect others, and the contact period B, the average time it takes to
come into contact with another individual [8],
R0 = C/B . (1.1)
The longer a person is infectious, and the more contacts the person has during this
time, the larger the reproduction number [6]. While we cannot control the infectious
period C, we can change our behavior to increase the contact period B [17]. This is
precisely what community mitigation strategies and political interventions seek to
attempt.
The reproduction number of COVID-19. Since the beginning of the coronavirus
pandemic, no other number has been discussed more controversially than the re-
production number of COVID-19 [22]. The earliest COVID-19 study that followed
the first 425 cases of the Wuhan outbreak via direct contact tracing reported a basic
reproduction number of R0 = 2.2 [15]. However, especially during the early stages
of the outbreak, information was limited because of insufficient testing, changes
in case definitions, and overwhelmed healthcare systems. While the concept of R0
seems fairly simple, the reported basic reproduction numbers for COVID-19 vary
hugely with country, culture, calculation, and time [17]. Most basic reproduction
numbers of COVID-19 we see in the public media today are estimates of mathemat-
ical models. These estimates depend critically on the choice of the model, its initial
conditions, and many other modeling assumptions [6]. To no surprise, the mathe-
matically predicted basic reproduction numbers cover a wide range, from R0 = 2−4
for exponential growth models to R0 = 4 − 7 for more sophisticated compartment
models [22].
30. 1.4 The basic reproduction number 13
Throughout this book, we identify or infer the reproduction number of COVID-
19–and the uncertainty associated with it–using computational epidemiology [22],
Bayesian analysis [10], and reported case data [12]: In the first example for the very
early COVID-19 outbreak in China in Section 7.8, we identify a basic reproduction
number of R0 = 12.58±3.17 across 30 Chinese provinces. In the second example for
the early outbreak in the United States in Section 10.2, we find a basic reproduction
number of R0 = 5.30 ± 0.95 across all 54 states and territories. This value suggests
that COVID-19 is less contagious than the measles and pertussis with R0 = 12 − 18,
as infectious as rubella, smallpox, polio, and mumps with R0 = 4 − 7, slightly more
infectious than SARS with R0 = 2 − 5, and more infectious than an influenza with
R0 = 1.5 − 1.8. Our basic reproduction number for the United States is significantly
lower than our basic reproduction number for China, which could be caused by
an increased awareness of COVID-19 transmission a few weeks into the global
pandemic. In our third example for the early outbreak in Europe in Section 12.2, the
basic reproduction number takes similar values of R0 = 4.62 ± 1.32 across all 27
countries.
During these early stages of exponential growth, with new case numbers doubling
within two or three days, the most urgent question amongst health care providers
and political decision makers was: Can we reduce the reproduction number? For the
broad public, this question became famously and illustratively rephrased as: Can we
flatten the curve [15]? For the modeling community, the quest for a lower reproduc-
tion number all of a sudden meant that traditional epidemiology models were no
longer suitable because of changes in the disease dynamics [12]. While traditional
models with static parameters were well-suited to model the outbreak dynamics
of unconstrained, freely evolving infectious diseases with fixed basic reproduction
numbers in the early 20th century, they fail capture how behavioral changes and po-
litical interventions can modulate the reproduction number to manage the COVID-19
pandemic in the 21st century [17]. In fact, static reproduction numbers are probably
the single most common cause of model failure in COVID-19 modeling [12].
Fortunately, several months into the pandemic, most countries have successfully
managed to flatten the new-case-number curves and the reproduction numbers have
dropped to values closer to or below one. To model these changes in disease dynamics
and reproduction, in Section 7.7, we introduce a dynamic reproduction number,
R(t), that accounts for time-varying contact periods. In our fourth example for the
European Union in Section 12.2, we show that the initial basic reproduction number
of R0 = 4.22±1.69 dropped to an effective reproduction number of R(t) = 0.67±0.18
by mid May 2020. Using machine learning, we correlate mobility and reproduction
and identify the responsiveness between the drop in air traffic, driving, walking,
and transit mobility and the drop in reproduction to ∆t = 17.24 ± 2.00 days. In
the final examples in Sections 13.2 and 13.3 of nine locations across the world,
we systematically infer the dynamic reproduction number R(t) throughout a time
window of 100 days while accounting for both symptomatic and asymptomatic
transmission.
From the failure of traditional static epidemiology models [15], we have now
learned that we need to introduce dynamic time-varying model parameters if we
31. 14 1 Introduction to mathematical epidemiology
Fig. 1.6 Basic reproduction numbers and herd immunity thresholds for common infectious
diseases. Herd immunity is the indirect protection from an infectious disease that occurs when a
large fraction of the population has become immune, either through previous infection or through
vaccination. The black line highlights the herd immunity threshold, H = 1 − 1/R0, beyond which
this protection occurs, as a function of the basic reproduction number R0. The herd immunity
threshold varies from 33-44% for influenza to 92-95% for the measles.
want to correctly model behavioral and political changes and reproduce the reported
case numbers [17]. This naturally introduces a lot of freedom, a large number of
unknowns, and a high level of uncertainty. However, in stark contrast to the epidemic
outbreaks in the early 20th century, we now have thoroughly-reported case data and
the appropriate tools [23] to address this challenge. The massive amount of COVID-
19 case data, well documented and freely available, has induced a clear paradigm
shift from traditional mathematical epidemiology towards data-driven, physics-based
modeling of infectious disease [1]. This new technology naturally learns the most
probable model parameters–in real time–from the continuously emerging case data,
allows us to make projections into the future, and quantifies the uncertainty on the
estimated parameters and predictions [26].
1.5 Concept of herd immunity
An important consequence of the basic reproduction number R0 is the condition
for herd immunity [9]. Herd immunity describes the indirect protection from an
infectious disease that occurs when a large fraction of the population has become
immune, either through previous infection or through vaccination [3]. The critical
threshold at which the disease reaches this endemic steady state is called the herd
immunity threshold,
H = 1 − 1/R0 . (1.2)
The larger the basic reproduction number R0, the higher the herd immunity threshold
H. Table 1.3 and Figure 1.6 summarize the basic reproduction numbers R0 for
32. 1.6 Concept of immunization 15
several common infectious diseases along with the estimates of their herd immunity
thresholds H. The herd immunity threshold varies from 33-44% for influenza to 92-
95% for the measles. For the reported basic reproduction numbers of R0 = 2.0 − 6.0
of COVID-19, the estimated herd immunity threshold would range from 50-83%.
Recent studies that account for the emerging new and more infectious B.1.351 and
B.1.1.7 variants of COVID-19 estimate these values to 75-95% [20].
1.6 Concept of immunization
To prevent or revert an epidemic outbreak, we need to ensure that, on average,
every infectious individual infects less than one new individual. The concept of herd
immunity describes the natural path towards ending an outbreak. However, if the
basic reproduction number R0 is large, the herd immunity threshold H = 1 − 1/R0 is
high, and waiting for herd immunity through infection alone can be quite devastating.
Vaccination is a powerful strategy to accelerate the path towards herd immunity [1].
Fig. 1.7 Vaccination against COVID-19. The objective of
COVID-19 vaccination is to provide immunity against severe
acute respiratory syndrome coronavirus 2 or SARS-CoV-2,
the virus that causes COVID-19 [30]. By May, 2021,
thirteen vaccines were authorized for public use: two RNA
vaccines (Pfizer–BioNTech and Moderna), five conventional
inactivated vaccines (BBIBP-CorV, CoronaVac, Covaxin,
WIBP-CorV and CoviVac), four viral vector vaccines
(Sputnik V, Oxford–AstraZeneca, Convidecia, and Johnson
& Johnson), and two protein subunit vaccines (EpiVacCorona
and RBD-Dimer); and more than a billion doses of COVID-
19 vaccines were administered worldwide.
Immunization threshold. Vaccination effectively reduces the susceptible population
S. Successfully immunizing a fraction I of the population, reduces the susceptible
population from S to S∗
0 = [ 1 − I ] S0 and, with it, the reproduction number from R0
to R∗
0 = [ 1 − I ] R0. The critical immunization threshold I, below which the effective
basic reproduction number is smaller than one, R∗
0 = [ 1 − I ] R0 < 1, defines the
fraction of the population that needs to be immunized to prevent or revert the outbreak
of an epidemic,
I > 1 − 1/R0 . (1.3)
From Table 1.3, we conclude that the required immunization fraction varies signif-
icantly, from I > 92 − 95% for the measles with a basic reproduction number of
R0 = 12 − 18 to I > 33 − 44% for the common influenza with a basic reproduction
number of R0 = 1.5−1.8. This explains, at least in part, why some infectious diseases
are a lot more difficult to control through immunization than others.
33. 16 1 Introduction to mathematical epidemiology
Eradication through vaccination. Once enough individuals are immunized–either
through infection or through vaccination–the outbreak stops. A disease that stops
circulating in a specific region is considered eliminated in that region. Polio, for
example, was eliminated in the United States by 1979 after widespread vaccination
efforts. A disease that is eliminated worldwide is considered eradicated. Eradicating
a disease through vaccination is a desirable but elusive goal [1]. Malaria has been
a candidate for eradication, but although its incidence has been drastically reduced
through vaccination, completely eradicating it remains challenging because infection
does not result in life long immunity. Polio has been eliminated in most countries
through massive vaccination efforts, but still remains present in some regions be-
cause its early symptoms often remain unnoticed and infected individuals continue to
infect others. Measles have been the target of widespread vaccination, but although
the disease is highly recognizable through its characteristic rash, a long latent pe-
riod from exposure to the first onset of symptoms complicates outbreak control. To
this day, smallpox is the only human infectious disease that has been successfully
eradicated through vaccination. The eradication of smallpox is the result of focused
surveillance, rapid identification, and ring vaccination [8]. In a massive vaccina-
tion campaign launched in 1967, anyone who could have possibly been exposed to
smallpox was quickly identified and vaccinated to prevent its further spread. The
last known case of smallpox occurred in Somalia in 1977. In 1980 World Health
Organization declared smallpox eradicated. The eradication of smallpox remains one
of the most notable and profound public health successes in history.
Efficacy and risk ratio. The efficacy e of a vaccine is the relative reduction in the
disease attack rate between the unvaccinated placebo group npla and the vaccinated
group nvac. For a randomized trial with an equal allocation, meaning equally sized
placebo and vaccinated groups, npla = nvac, the efficacy is
e =
npla − nvac
npla
· 100% =
1 −
nvac
npla
· 100% = 1 − r . (1.4)
The risk ratio r is the ratio between the attack rate of the vaccinated group nvac and
the placebo group npla,
r =
nvac
npla
· 100% = 1 − e . (1.5)
The efficacy is an important measure to characterize the success of a vaccine and
define critical thresholds below which a vaccination trial should stop.
Table 1.4 Contingency table to quantify the significance of a vaccine. The table compares the
total number of vaccinated and placebo individuals that developed and did not develop the disease.
positive negative total
vaccine a b a+b
placebo c d c+d
total a+c b+d n
34. 1.6 Concept of immunization 17
Example: Efficacy of the first COVID-19 vaccine. On November 9, 2020, the
Pfizer and BioNTech trial reported a number or COVID-19 cases of ncovid =
npla + nvac = 94 and an efficacy of e emin with emin = 90%. The efficacy
e of a vaccine is the relative reduction in the disease attack rate between the
unvaccinated placebo group npla and the vaccinated group nvac,
e =
1 −
nvac
npla
· 100% = 1 − r with r =
nvac
npla
· 100% = 1 − e .
where r is the risk ratio. The Pfizer and BioNTech trial was a randomized trial
with an equal allocation [30]. Although it did not report detailed numbers, we
can estimate the number of placebo and vaccinated cases npla and nvac and the
risk ratio r from the reported efficacy emin = 90% with nvac = ncovid − npla,
e =
1 −
ncovid − npla
npla
· 100% =
2 −
ncovid
npla
· 100% emin .
Solving for the number of placebo cases yields the general equation,
npla
ncovid
2 − emin
,
and, for the Pfizer and BioNTech case, npla ncovid/1.1 = 85.45. This implies
that for a total of ncovid = 94 COVID-19 positive cases, at a number of placebo
cases npla = 86 and vaccinated cases nvac = 8, the efficacy is larger than 90%.
Back-calculating the efficacy for these populations,
e =
1 −
nvac
npla
· 100% =
1 −
8
86
· 100% = 90.7% 90% = emin ,
confirms the simulation. According to the protocol, Pfizer and BioNTech
planned to take a look at the data at five stages with ncovid = 32,64,92,120,164
reported positive cases and only continue the trial if the efficacy was above
e ecrit with ecrit = 62.7%. From this information, we can calculate the
critical numbers npla and nvac below which the trial would have stopped at any
of the five stages. Solving for the number of placebo cases yields the general
equation,
npla
ncovid
2 − ecrit
,
and, for the Pfizer and BioNTech case, npla ncovid/1.373. This implies that for
ncovid = 32,64,92,120,164, the minimum number of placebo cases to continue
the trial was npla ≥ 24,47,68,88,120, for which the resulting efficacies of
e = 66.7%,63.8%,64.7%,63.6%,63.3% would all have been slightly above
the critical threshold of ecrit = 62.7%.
35. 18 1 Introduction to mathematical epidemiology
Fisher’s exact test and contingency tables. Fisher’s exact test is a statistical sig-
nificance test to analyze contingency tables that quantify the effects of a vaccine
compared to a placebo. Table 1.4 illustrates a generic contingency table. From it,
we can calculate the case rate across the entire trial as the ratio between the total
number of positive cases and the total number of enrolled individuals,
νdisease =
a + b
a + b + c + d
with a + b + c + d = ntot . (1.6)
The case rates across the vaccinated and placebo groups are,
νvac =
a
a + b
with a = nvac and νpla =
c
c + d
with c = npla . (1.7)
Most vaccination trials are designed as a randomized trial, meaning they assign
participants at random to a vaccinated or placebo group, with equal allocation,
meaning they target an equal enrollment into both groups, a + b ≈ c + d ≈ ntot/2.
We can use the contingency table to calculate the statistical significance p of the
deviation from a null hypothesis,
p =
a+b
a
c+d
c
n
a+c
=
a+b
b
c+d
d
n
b+d
=
(a + b)!(c + d)!(a + c)!(b + d)!
a! b! c! d! n!
, (1.8)
in terms of binominal coefficients or factorial operators using a hypergeometric
distribution. For a vaccination trial, the lower the p-value, the larger the effect on the
vaccinated group compared to the placebo group.
Example: Statistical significance of the first COVID-19 vaccine. OnNovem-
ber 9, 2020, the Pfizer and BioNTech trial reported ntot = 43,538 enrolled par-
ticipants. From this number, we can calculate the case rates across the entire
trial, in the vaccinated group, and in the placebo group assuming a random-
ization at 1:1 between the vaccinated and placebo groups. The case rate of the
entire trial is the ratio between the total number of COVID-19 positive cases
and the total number of enrolled individuals,
νcovid =
ncovid
ntot
=
94
43,538
= 0.190% .
The case rates of the two groups are the ratios between the vaccinated and
placebo COVID-19 positive cases and half of the enrolled cases,
νvac =
nvac
ntot/2
=
8
43,538/2
= 0.037% νpla =
npla
ntot/2
=
86
43,538/2
= 0.395% .
Using Fisher’s exact test, we can test the null hypothesis that vaccinated and
placebo participants will equally likely contract COVID-19.
36. 1.7 Mathematical modeling in epidemiology 19
positive negative total
vaccine 8 21,761 21,769
placebo 86 21,683 21,769
total 94 43,444 43,538
We can use the contingency table to estimate the efficiency of the Pfizer BioN-
Tech vaccine. Fisher’s exact test calculates the significance of the deviation
from a null hypothesis,
p =
21769! 21769! 94! 43444!
8! 21761! 86! 21683! 43538!
,
and confirms, for the Pfizer and BioNTech case with p 0.00001, that indi-
viduals in the vaccinated and placebo groups will not equally likely contract
COVID-19.
1.7 Mathematical modeling in epidemiology
During the early onset of the COVID-19 pandemic, all eyes were on mathematical
modeling with the general expectation that mathematical models could precisely
predict the trajectory of the pandemic. Mathematical modeling rapidly became front
and center to understanding the exponential increase of infections, the shortage of
ventilators, and the limited capacity of hospital beds; too rapidly as we now know.
Bold and catastrophic predictions not only initiated a massive press coverage, but
also a broad anxiety in the general population [15]. However, within only a few
weeks, the vastly different predictions and conflicting conclusions began to create
the impression that all mathematical models are generally unreliable and inherently
wrong [12]. While the failure of COVID-19 modeling–often by an order of mag-
nitude and more–was devastating for policymakers and public health practitioners,
initial mistakes are not new to the modeling community where an iterative cycle of
prediction, failure, and redesign is common standard and best practice [26]. However,
the successful use of mathematical models implies to set the expectations right [45].
Understanding what models can and cannot predict is critical to the Art of Mod-
eling. Epidemiologists distinguish two kinds of models to understand the outbreak
dynamics of an infectious disease: statistical models and mechanistic models [12].
Depending on the degree of complexity, the most popular mechanistic models are
compartment models and agent-based models.
Statistical models, or more precisely, purely statistical models, use machine learn-
ing or regression to analyze massive amounts of data and project the number of
infections into the future. The essential idea is to select a function D(t), use statisti-
cal tools to fit its coefficients to reported case data D̂(t), and make projections into
the future. The function can be quadratic, cubic, logistic, power-law, or exponential,
37. 20 1 Introduction to mathematical epidemiology
Table 1.5 Mathematical modeling of COVID-19. Summary of the three most common models.
Statistical or forecasting models fit nonlinear functions to case data over time, whereas mechanistic
models, including compartment and agent-based models, simulate outbreak and contact dynamics.
Throughout this book, we use compartment models to simulate the outbreak dynamics of infectious
diseases including COVID-19.
statistical models mechanistic models
forecasting models compartment models agent-based models
use machine learning, statis-
tics, regression, or method of
least squares
use physics-based modeling
based on nonlinear reaction-
diffusion equations
use rule-based approaches to
study the interaction of au-
tonomous systems
model case numbers through
a nonlinear function
model population through
compartments
model every individual as an
independent agent
formulate number of cases as
a function of time
formulate rules by which in-
dividuals pass through the
compartments
formulate simple rules by
which individual agents in-
teract
fit coefficients that are purely
phenomenological
infer parameters that have a
mechanistic interpretation
identify parameters that
summarize human behavior
predicts case numbers from
fitting a function
predict outbreak dynam-
ics, characterize sensitivi-
ties, quantify uncertainties
predict outbreak dynamics
as emergent collective be-
havior of individual agents
no feedback mechanisms nonlinear feedback discrete contact networks
predictions are inexpensive,
but very unreliable
predictions are reliable only
for a small time window
predictions are detailed, but
computationally expensive
for example, D(ϑ,t) = exp(c0 + c1 t + c2 t2 + c3 t3), where D(ϑ,t) are the modeled
cumulative cases per day, ϑ = {c0,c1,c2,c3} are the model parameters, and t is
the time. One of the simplest statistical tools to compare the model D(ϑ,t) to the
data D̂(t) and identify values for its parameters ϑ is the method of least squares.
It is important to understand that these parameters are purely phenomenological,
they are derived purely by fitting a curve, and typically do not have a mechanistic
interpretation. Early in an outbreak, when little is known about disease transmission,
epidemiologists often use statistical models because they do not rely on any prior
knowledge of the disease. An example for a statistical model of COVID-19 is the ini-
tial IHME model [23]. Early in the pandemic, the IHME model used case data from
China and Italy to create similar curves, forecast case numbers in the United States,
and inform the White House’s response to the pandemic. Carefully constructed sta-
tistical frameworks can be used for short-time forecasting using machine learning or
regression. This could potentially be useful to understand how to allocate resources
or make rapid short-term recommendations. However, purely statistical models can
neither capture the dynamics of disease transmission nor the effects of mitigation
strategies. This explains, at least in part, why the early COVID-19 predictions based
on purely statistical models were off by an order of magnitude or more. To address
these serious limitations, several COVID-19 models have now been adjusted to com-
bine both statistical modeling and mechanistic modeling.
Mechanistic models simulate the outbreak through interacting disease mechanisms
by using local nonlinear population dynamics and global mixing of populations
38. 1.7 Mathematical modeling in epidemiology 21
[4]. The underlying idea is to identify fundamental mechanisms that drive disease
dynamics, for example the duration of the infectious period or the number of con-
tacts an infectious individual has during this time. Unlike purely statistical models,
mechanistic models include important nonlinear feedback: The more people be-
come infected, the faster the disease spreads. By their very nature, the parameters
of mechanistic models are not just fitting parameters, they usually have a clear epi-
demiological interpretation. This makes mechanistic modeling a powerful strategy
to explore different outbreak scenarios or study how an outbreak would change
under various assumptions and political interventions [12]. Another advantage of
mechanistic models is that we can adjust and improve them dynamically as more
information becomes available. Throughout this book, we gradually improve a class
of mechanistic models by adding new information. For example, we introduce time-
varying dynamic contact rates that vary in different lockdown levels and add the
effect of asymptomatic transmission [26]. Even if we do not precisely know the
dimension of asymptomatic disease spread, we can use mechanistic models to study
what-if scenarios: What would the disease landscape look like if two third of all
infectious were asymptomatic? Mechanistic modeling naturally extends into sensi-
tivity analysis and uncertainty quantification. As such, it not only provides valuable
information about the robustness of the model, but also about the most effective
parameters to modulate a disease outbreak [19]. Rather than studying a single one
disease trajectory, we could explore a range of trajectories around the mean and
characterize the best- and worst-case scenarios. The two most popular mechanistic
models in epidemiology are compartment models and agent-based models, and both
have been used to understand the outbreak dynamics of COVID-19. When choosing
between compartment models and agent-based, it is important to understand the
major strengths and weaknesses of each model.
Compartment models are the most common approach to model the epidemiology
of an infectious disease [14]. Compartment models simulate the collective behavior
of subgroups of the population through a number of compartments with labels, for
example, SEIR for susceptible, exposed, infectious, and recovered. Individuals move
between compartments and the order of the labels indicates the successive motion,
for example, SEIS means susceptible, exposed, infectious, then susceptible again [4].
The underlying principle is to model the time evolution of these groups through a set
of coupled ordinary differential equations, identify rate constants that characterize
their interaction using reported case data, and vary these rate constants to probe
different outbreak scenarios.
Figure 1.8 illustrates a compartment model that represents the characteristic time-
line of COVID-19 through six compartments, the susceptible, exposed, infectious,
recovered, hospitalized, and dead groups. This SEIRHD model is defined through a
set of six ordinary differential equations that simulate how many individuals reside
in each compartment throughout the duration of the outbreak. The model parameters
of a compartment model define the transition rates between the individual compart-
ments and, for more complex models, the fraction of individuals that transition into
a particular path of the disease. For this example, the parameters ϑ = {β,α,γ,νh,νd}
39. 22 1 Introduction to mathematical epidemiology
Fig. 1.8 Characteristic timeline of COVID-19. On day 0, a fraction of all susceptible individuals
is exposed to the virus. After a latent period of A = 1/α = 3 days, the exposed individuals become
infectious. After an infectious period of C = 1/γ = 10 days, a fraction (1 − νh) transition to
the recovered group, whereas a fraction νh develops severe symptoms and is hospitalized. Of the
hospitalized individuals, a fraction (1 − νd) recovers, whereas a fraction νd becomes dead. We can
simulate this behavior through an SEIRHD model with six compartments.
are the contact rate β, latent rate α, and infectious rate γ, and the hospitalized and
dead fractions, νh and νh · νd. From reported case numbers, hospitalizations, and
deaths, we can identify or infer the set of model parameters ϑ, the rate constants
and fractions, that best explain the model output using statistical tools. Importantly,
in contrast to purely statistical models, compartment models are based on model
parameters that have a clear physical interpretation. The most important parameters
of any compartment model are the contact rate β and the infectious rate γ. Together,
they define an important nonlinear feedback that is not present in purely statistical
models: The more people become infected, the faster the spread of the disease. The
basic reproduction number R0 = β/γ, the ratio of the contact rate β and the infec-
tious rate γ, characterizes the magnitude of this feedback. It is an easy-to-understand
disease metric that explains how quickly susceptible individuals become infected
and how fast a disease spreads across a population [6]. Some epidemiologists ar-
gue that, because of their mechanistic nature, compartment models are better suited
for long-term predictions than purely statistical models. While this might be true
for infectious diseases that develop freely, without any political intervention, the
COVID-19 pandemic has taught us that long-term predictions of outbreak dynamics
are challenging, even with the most sophisticated mechanistic models [15]. Under-
standing the potential and the limitations of compartment modeling is one of the
main objectives of this book.
In Chapters 2 and 3, we introduce two simple classical compartment models be-
fore we introduce the most common compartment model for COVID-19 in Chapter
4. We show how we can use these models to estimate the reproduction number
from reported COVID-19 case data. Knowing the precise reproduction number has
important consequences for estimating the dimensions of herd immunity and im-
munization. Compartment models capture the fundamental dynamics of disease
transmission and the effects of public health interventions; however, classical com-
partment models ignore the dynamics of the contact rate and its variation across
a population. In Section 7.7 we introduce a compartment model that explicitly ac-
counts for a time-varying dynamic contact rate, β(t), and captures a varying contact
behavior in different subgroups of the population. The dynamic nature of the contact
40. 1.8 Data-driven modeling in epidemiology 23
rate naturally introduces a dynamic reproduction number, R(t) = β(t)/γ and allows
us to quantify the effectiveness of policy measures as we discuss in Section 12.2.
Agent-based models simulate individuals or agents interacting in various social
settings and estimate the spread of a disease as these agents come into contact with
one another. The underlying idea is to represent each agent individually, formulate
relatively simple rules by which individual agents interact, and interpret the collec-
tive behavior across all agents as the emergent dynamics of an outbreak. A strength
of agent-based models is that they simulate human behavior very granularly: They
can assign different parameters or behavior patterns to each individual agent instead
of simulating the collective behavior of entire populations. As such, agent-based
models offer a lot more freedom than compartment models, but also require a lot
more detail. For example, to formulate rules of interaction, agent-based models draw
on social connectivity networks, from activity surveys, cell phone locations, public
transportation, or airlines statistics. By their very nature, agent-based approaches are
computationally expensive, especially for large populations. For small populations,
agent-based modeling is a powerful strategy to predict how individual behavior, for
example the violation of quarantine, leads to a collective behavior and modulates
disease spread. For large populations, agent-based modeling can help rationalize
collective model parameters, for example contact rates or reproduction numbers,
that feed into more abstract, population level models. Above a certain population
size, agent-based models simply become computationally unfeasible and most epi-
demiologists would turn to a more macroscopic approach that represents groups of
individuals collectively as subgroups of the population. Since our objective is to
design and discuss data-driven models for the COVID-19 pandemic, throughout the
remainder of this book, we focus exclusively on compartment models.
When choosing between purely statistical, compartment, and agent-based models,
it is important to know upfront which questions the model should address [45]. Ta-
ble 1.5 compares the three models and summarizes their strengths and weaknesses.
Throughout this book, we focus on mechanistic compartment modeling.
1.8 Data-driven modeling in epidemiology
One year after the World Health Organization had declared the COVID-19 outbreak a
global pandemic, SARS-CoV-2 has resulted in more than 118 million reported cases
across more than 180 countries and over 2.6 million deaths worldwide. Unlike any
other disease in history, the COVID-19 pandemic has generated an unprecedented
volume of data, well documented, continuously updated, and broadly available to
the general public. There is a critical need for time- and cost-efficient strategies
to analyze and interpret these data to systematically manage the pandemic on a
global level. Yet, the precise role of physics-based modeling and machine learning
in providing quantitative insight into the dynamics of COVID-19 remains a topic of
ongoing debate.
41. 24 1 Introduction to mathematical epidemiology
Physics-based modeling is a successful strategy to integrate multiscale, multi-
physics data and uncover mechanisms that explain the dynamics of specific outbreak
characteristics. However, physics-based modeling alone often fails to efficiently
combine large data sets from different sources and different levels of resolution [26].
Machine learning is as a powerful technique to integrate multimodality, multifidelity
data, from cities, counties, states, and countries across the world, and reveal cor-
relations between different disease phenomena. However, machine learning alone
ignores the fundamental laws of physics and can result in ill-posed problems or non-
physical solutions [1]. Throughout this book, we illustrate how data-driven modeling
can integrate classical physics-based modeling and machine learning to infer critical
disease parameters–in real time–from reported case data to make informed predic-
tions and guide political decision making. As a valuable by product, this approach
naturally lends itself in sensitivity analysis and uncertainty quantification. From the
COVID-19 pandemic, we have learnt that even small inaccuracies in the model can
trigger large changes in the number of cases. To understand the vulnerability of the
model to these small changes, especially in view of the varying reporting practices
of the COVID-19 case data, sensitivity analysis and quantifying uncertainty have
become critical elements of robust predictive modeling.
Epidemiology data. In data-driven modeling, we need data to fit or infer our model
parameters. Unlike earlier pandemics for which case data are often sparse, irregular,
or incomplete, the COVID-19 pandemic is amazingly well documented [13]. On
hundreds of public COVID-19 dashboards, we can find and download a vast variety
of case numbers: daily new cases, active cases, recovered cases, seriously critical
cases, cumulative cases, and deaths, at the level of cities, counties, states, countries,
or the entire world [9, 12, 25]. Local dashboards often also share the number of hos-
pitalizations and intensive care units, which were of great concern especially at the
early onset of the pandemic. More recently, these dashboards have also included the
number of tests and vaccines. Seroprevalence data with information about the history
of the disease, both asymptomatic and symptomatic, are rare and often only avail-
able from scientific publications rather than governmental databases. These data are
typically updated on a daily basis, and contain notable weekday-weekend alterations.
When using the data to infer model parameters and learn about the outbreak behav-
ior, we usually smoothen these alteration using seven-day moving averages. Finally,
to compare data from different locations, we typically scale the reported case data
by the population. A common metric for comparison is the seven-day-per-100,000
incidence, the number of new cases per 100,000 individuals across a seven-day win-
dow [38]. Policy makers across the globe use this incidence value to characterize the
severity of the outbreak and justify the need for political interventions.
Figure 1.9 illustrates a typical data set for the COVID-19 outbreak that we use to
infer our epidemiological model parameters. The orange lines summarize the out-
break dynamics worldwide throughout the year after the World Health Organization
had declared COVID-19 a global pandemic, from March 11, 2020 to March 11,
2021. The light orange lines represent the reported daily new cases, which display
notable weekday-weekend fluctuations associated with testing and reporting irreg-
42. 1.8 Data-driven modeling in epidemiology 25
Fig. 1.9 Daily new cases of COVID-19 and their seven-day moving average. Daily new COVID-
19 cases are reported on public dashboards worldwide. The light orange curve represents the raw
data, the dark orange curve is the seven-day moving average, both reported throughout the year
after the WHO had declared COVID-19 a global pandemic on March 11, 2020.
ularities. The dark orange lines are the associated seven-day moving average ˆ
I(t),
which smoothens the fluctuations and displays a much clearer trend in the disease
dynamics. Within this one-year window, the absolute daily case numbers peaked
on January 8, 2021 with 841,304 new cases. The moving seven-day average peaked
on January 11, 2021 with ∆Imax = 745,404 cases. If we assume an infectious pe-
riod of C = 1/γ = 7 days, this would result in a maximum infectious population
of Imax = 7 · ∆Imax = 7 · 745,404 = 5,217,828, meaning that mid January 2021,
more than 5 million people were sick with COVID-19. For a total population of
N = 7.8 billion people, this corresponds to a peak seven-day-per-100,000 incidence
of Imax · 100,000/N = 67. There are many different ways to use the reported case
data. The simplest way is to select a function, use statistical tools to fit its coefficients
to the reported case data, and make projections into the future. However, not only the
daily new cases, but also the seven-day average in Figure 1.9 display substantial fluc-
tuation and it seems difficult to find a function that could explain the orange curves.
For data-driven modeling, it is often easier to use the total cumulative case numbers,
which always increase monotonically and tend to be more smooth in general.
Figure 1.10 shows the total cumulative cases of COVID-19 D̂(t) and a simple
statistical model D(t) to fit the data throughout the first year of the pandemic.
During the very early stages of an outbreak, an exponential growth model, D(t) =
D0 exp(G t), with a growth rate G often provides a good approximation of the total
number of cases D̂(t) and is easy to fit. Indeed, this, or similar exponential models,
is what many early approaches used. While the initial phase of the outbreak is
well represented by exponential growth models, they soon tend to overestimate the
outbreak. This is why, during the early COVID-19 pandemic, there was a broad
overestimate of the number of cases and of the number of ventilators and hospital
beds needed to tread diseased individuals [15]. In this book, instead of using purely
statistical models, we use mechanistic models like the compartment model in Figure
43. 26 1 Introduction to mathematical epidemiology
Fig. 1.10 Total cumulative cases of COVID-19 and a statistical model to fit the data. Cumulative
COVID-19 cases, the sum of all daily new cases to that date, are reported on public dashboards
worldwide. The solid orange curve represents the raw data, the dashed orange curve is an example
of a statistical model, both reported throughout the year after the WHO had declared COVID-19 a
global pandemic on March 11, 2020.
1.8. For this SEIRHD compartment model, the number of total cases is the sum
of all infectious, hospitalized, recovered and dead individuals, I(t) + R(t) + H(t) +
D(t). Throughout this book, we use different types of compartment models, infer
their model parameters using reported COVID-19 case data from public databases
[9, 12, 25], seroprevalence data from scientific publications [26], and mobility data
[2, 10, 10], and vary the parameters to probe different outbreak scenarios.
Problems
1.1 Testing, testing, testing. On June 6, 2020, Donald J. Trump, the President of
the United States, famously said “Remember this, when you test more, you have
more cases.” What is wrong with this statement? What did he really mean to say?
Discuss the implications of the testing frequency on reported case numbers, and,
ultimately, on policy making.
1.2 Herd immunity. On September 15, 2020, Donald J. Trump, the President of the
United States, claimed that COVID-19 would go away, without a vaccine. “You’ll
develop herd. Like a herd mentality.” What did he really mean to say? Discuss the
effects of vaccination on herd immunity.
1.3 Herd immunity. Assume the basic reproduction number of the swine flu, caused
by the H1N1 virus, was on the order of R0 = 1.4 − 1.6. Calculate its herd immunity
threshold H and compare it against the herd immunity thresholds of other infectious
diseases in Table 1.3 and Figure 1.6. Comment on whether this makes the swine flu
a good candidate for eradication through vaccination.
44. 1.8 Data-driven modeling in epidemiology 27
1.4 COVID-19 variants B.1.351 and B.1.1.7. Assume the basic reproduction num-
ber of COVID-19 in December 2020 was R0 = 2.0. Calculate the herd immunity
threshold H. Now, assume that the variant B.1.351 with a 50% increased basic re-
production number has been introduced into the population [20]. How does the herd
immunity threshold change? How would the variant B.1.1.7 with a 56% increased
basic reproduction number change the herd immunity threshold? Comment on your
results.
Fig. 1.11 The Lessons of the Pandemic. Twelve condensed rules to the the avoidance of unneces-
sary personal risks and to the promotion of better personal health by George A. Soper in reflection
of the Spanish flu in 1919.
1.5 The Lessons of the Pandemic. In his Science publication The Lessons of the
Pandemic, George A. Soper lists twelve public health measures to manage the Spanish
flu in 1919. Read the twelve recommendations in Figure 1.11 and comment on which
of them are still valid for the COVID-19 pandemic today.
1.6 Efficacy of the AstraZeneca COVID-19 vaccine. The randomized, equal allo-
cation AstraZeneca trial enrolled 11636 participants in its interim efficacy analysis,
5807 in the vaccinated group and 5829 in the unvaccinated placebo group. Of the
vaccinated group, 30 developed COVID-19, and of the placebo group 101. Calculate
the efficacy and risk ratio of the AstraZeneca vaccine.
1.7 Case rates during the AstraZeneca COVID-19 trial. The AstraZeneca trial en-
rolled 11636 participants in its interim efficacy analysis, 5807 in the vaccinated
group and 5829 in the unvaccinated placebo group. Of the vaccinated group, 30
developed COVID-19, and of the placebo group 101. Create a contingency table for
the interim efficacy analysis. Calculate the overall case rate of the trial and the case
rates in the vaccinated and placebo groups.
1.8 Efficacy of the Janssen COVID-19 vaccine. The randomized, equal allocation
Janssen COVID-19 trial enrolled 39,321 participants, 19,630 received the vaccine
45. 28 1 Introduction to mathematical epidemiology
and 19,691 received placebo. At least 14 days after vaccination, the trial recorded 116
COVID-19 cases in the vaccine group and 348 cases in the placebo group. Calculate
the efficacy of the Janssen COVID-19 vaccine and compare it to the efficacy of the
Pfizer BioNTech and AstraZeneca vaccines. How would you expect the efficacy to
change after another 14 days?
1.9 Efficacy of the Janssen COVID-19 vaccine. The randomized, equal allocation
Janssen COVID-19 trial enrolled 39,321 participants, 19,630 received the vaccine
and 19,691 received placebo. At least 28 days after vaccination, the trial recorded 66
COVID-19 cases in the vaccine group and 193 cases in the placebo group. Calculate
the efficacy of the Janssen COVID-19 vaccine after 28 days and compare it to the
efficacy after 14 days. Comment on why you would or would not have expected this
result.
1.10 Statistical models. Find and download the total cumulative COVID-19 case
data D̂(t) from your own city, county, state, country, or the world. Plot the case data
similar to Figure 1.10. Try to fit a linear function D(t) = D0 + c1 t and a quadratic
function D(t) = D0 + c1 t + c2 t2 to the very early stages of the outbreak. Which
function is easier to fit? What does that tell you about the early outbreak?
1.11 Statistical models. Find and download the total cumulative COVID-19 case
data D̂(t) from your own city, county, state, country, or the world. Plot the case data
similar to Figure 1.10. Fit an exponential function D(t) = D0 exp(G t) to the early
stages of the outbreak. What is your growth rate G? When does the exponential
function fail to describe the case data D̂(t)?
1.12 Epidemiology data. Find and download the daily new COVID-19 case data
from your own city, county, state, country, or the world. Plot the raw data similar to
Figure 1.9. Explain the local fluctuations in the reported case data that occur on the
order of days.
1.13 Epidemiology data. Find and download the daily new COVID-19 case data
from your own city, county, state, country, or the world. Calculate and plot the seven-
day moving average similar to Figure 1.9. How many waves can you identify? Explain
the global fluctuations that occur on the order of weeks or months. Interpret the
growth or decay in case numbers in view of specific events or political interventions.
1.14 Epidemiology data. Find the COVID-19 case data from your own state or
country and your furthest away vacation destination. Compare the different outbreak
dynamics. Do you think the reporting between both locations is consistent? Identify
at least four potential sources of error in reporting daily COVID-19 case data.
1.15 Incidence. Find and download the daily new COVID-19 case data from your
own city, county, state, country, or the world. Calculate the seven-day moving average.
Identify the peak seven-day moving average ∆Imax within your simulation window.
Assume an infectious period of C = 1/γ = 7 days. Calculate the maximum infectious
population Imax = 7·∆Imax. Find the total population N of your location and calculate
its peak seven-day-per-100,000 incidence, Imax · 100,000/N.
46. References 29
1.16 Incidence and the effect of scale. Studying an outbreak at a more global
scale tends to smoothen fluctuations and local peaks. Find and download the daily
new COVID-19 case data from the next smaller or larger scale compared to the
previous problem. If you have studied your state, now study your city, county, or
country. Identify the peak seven-day moving average ∆Imax within your simulation
window and calculate the maximum infectious population Imax = 7 · ∆Imax. Find
the total population N of your location and calculate its peak seven-day-per-100,000
incidence, Imax ·100,000/N. Compare you results against the seven-day-per-100,000
incidence at the smaller or larger scale. Interpret your results.
1.17 Epidemiology data and compartment models. The compartment model in
Figure 1.8 introduces three possible disease paths from infection: direct recovery
at a fraction (1 − νh), hospitalization and recovery at a fraction νh (1 − νd), and
hospitalization and death at a fraction νh · νd. The fraction νh · νd is called the case
fatality rate and is about 2% worldwide for COVID-19. Find extreme values for the
case fatality rate. Discuss which factors influence the case fatality rate both globally
and locally.
1.18 Epidemiology data and compartment models. Assume you want to learn
parameters for your compartment model, for example the one in Figure 1.8, from
reported case data, hospitalizations, and deaths. Early in the pandemic, when testing
was slow and reporting was delayed, epidemiologists suggested to use deaths rather
than daily new cases for model calibration. This initiated a controversial and still
ongoing discussion how to count COVID-19 deaths. Discuss the difference between
death from and death with COVID-19 in terms of absolute numbers, case fatality
ratios, and model parameters.
References
1. Alber M, Buganza Tepole A, Cannon W, De S, Dura-Bernal S, Garikipati K, Karniadakis
G, Lytton WW, Perdikaris P, Petzold L, Kuhl E (2019) Integrating machine learning and
multiscale modeling: Perspectives, challenges, and opportunities in the biological, biomedical,
and behavioral sciences. npj Digital Medicine 2:115.
2. Anderson RM, May RM (1982) Directly transmitted infectious diseases: control by vaccination.
Science 215:1053-1060.
3. Anderson RM, May RM (1991) Infectious Diseases of Humans. Oxford University Press,
Oxford.
4. Apple Mobility Trends. https://www.apple.com/covid19/mobility. accessed: June 1,
2021.
5. Bernoulli D (1760) Essay d’une nouvelle analyse de la mortalite causee par la petite verole et
des avantages de l’inoculation pour la prevenir. Mémoires de Mathématiques et de Physique,
Académie Royale des Sciences, Paris 1-45.
6. Brauer F, Castillo-Chavez C (2001) Mathematical Models in Population Biology and Epidemi-
ology. Springer-Verlag New York.
7. Brauer F, van den Dreissche P, Wu J (2008) Mathematical Epidemiology. Springer-Verlag
Berlin Heidelberg.
47. 30 1 Introduction to mathematical epidemiology
8. Brauer F (2017) Mathematical epidemiology: Past, present and future. Infectious Disease
Modelling 2:113-127.
9. Brauer F, Castillo-Chavez C, Feng Z (2019) Mathematical Models in Epidemiology. Springer-
Verlag New York.
10. Delamater PL, Street EJ, Leslie TF, Yang YT, Jacobsen KH (2019) Complexity of the basic
reproduction number (R0). Emerging Infectious Diseases 25:1-4.
11. Dieckmann O, Heesterbeek JAP (2000) Mathematical Epidemiology of Infectious Diseases:
Model Building, Analysis and Interpretation. Wiley.
12. Dietz K (1993) The estimation of the basic reproduction number for infectious diseases.
Statistical Methods in Medical Research 2:23-41.
13. Dong E, Gardner L (2020) An interactive web-based dashboard to track COVID-19 in real
time. The Lancet Infectious Diseases 20:533-534.
14. European Centre for Disease Prevention and Control. Situation update worldwide. https:
//www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases
accessed: June 1, 2021.
15. Eurostat. Your key to European statistics. Air transport of passengers. https://ec.europa.
eu/eurostat accessed: June 1, 2021.
16. Evans AS (1976) Viral Infections of Humans. Epidemiology and Control. Plenum Medical
Book Company, New York and London.
17. Fauci AS, Lane HC, Redfield RR (2020) Covid-19–Navigating the uncharted. New England
Journal of Medicine 382:1268-1269.
18. Fine PEM (1993) Herd immunity: history, theory, practice. Epidemiologic Reviews 15:265-
302.
19. Gelman A, Carlin JB, Stern HS, Dunson DB, Vektari A, Rubin DB (2013) Bayesian Data
Analysis. Chapman and Hall/CRC, 3rd edition.
20. Gorbalenya AE, Baker SC, Baric RS, de Groot RJ, Drosten C, Gulyaeva AA, Haagmans BL,
Lauber C, Leontovich AM, Neuman BW, Penzar D, Perlman S, Poon LLM, Samborskiy D,
Sidorov IA, Sola I, Ziebuhr J (2020) Severe acute respiratory syndrome-related coronavirus:
the species and its viruses-a statement of the coronavirus study group. Nature Microbiology
5:536-544.
21. Hethcote HW (2000) The mathematics of infectious diseases. SIAM Review 42:599-653.
22. Holmdahl I, Buckee C. Wrong but useful–What Covid-19 epidemiolgic models can and cannot
tell us. New England Journal of Medicine 383:303-305.
23. Institute for Health and Metrics Evaluation IHME. COVID-19 Projections. https://covid1
9.healthdata.org. assessed: July 27, 2020.
24. International Air Transport Association (2020) https://www.iata.org. accessed: July 9,
2020.
25. Ioannidis JPA, Cripps S, Tanner MA (2021) Forecasting for COVID-19 has failed. International
Journal of Forecasting, in press.
26. Johns Hopkins University (2021) Coronavirus COVID-19 Global Cases by the Center for
Systems Science and Engineering. https://coronavirus.jhu.edu/map.html, https:
//github.com/CSSEGISandData/covid-19 assessed: June 1, 2021.
27. Kermack WO, McKendrick G (1927) Contributions to the mathematical theory of epidemics,
Part I. Proceedings of the Royal Society London Series A 115:700-721.
28. Krämer A, Kretzschmar M, Krickeberg K (2010) Modern Infectious Disease Epidemiology.
Springer-Verlag New York.
29. Kuhl E (2020) Data-driven modeling of COVID-19 – Lessons learned. Extreme Mechanics
Letters 40:100921.
30. Kyriakidis NC, Lopez-Cortes A, Vasconez Gonzalez E, Barreto Grimaldos A, Ortiz Prado E.
SARS-CoV-2 vaccines strategies: a comprehensive review of phase 3 candidates. npj Vaccines
6:28.
31. Li Q, Guan X, Wu P, Wang X, ... Feng Z (2020) Early transmission dynamics in Wuhan, China,
of novel coronavirus-infected pneumonia. New England Journal of Medicine 382:1199-1207.
32. Liang ST, Liang LT, Rosen JM (2021) COVID-19: a comparison to the 1918 influenza and
how we can defeat it. BMJ Postgraduate Medical Journal 97:273-274
48. References 31
33. Linka K, Peirlinck M, Kuhl E (2020) The reproduction number of COVID-19 and its correlation
with public heath interventions. Computational Mechanics 66:1035-1050.
34. Linka K, Goriely A, Kuhl E (2021) Global and local mobility as a barometer for COVID-19
dynamics. Biomechanics and Modeling in Mechanobiology 20:651–669.
35. Linka K, Peirlinck M, Schafer A, Ziya Tikenogullari O, Goriely A, Kuhl E (2021) Effects
of B.1.1.7 and B.1.351 on COVID-19 dynamics. A campus reopening study. Archives of
Computational Methods in Engineering. doi:10.1007/s11831-021-09638-y.
36. Liu J, Shang, X (2020) Computational Epidemiology. Springer International Publishing.
37. Liu Y, Gayle AA, Wilder-Smith A, Rocklöv J (2020) The reproductive number of COVID-19
is higher compared to SARS coronavirus. Journal of Travel Medicine (2020) 27:taaa021.
38. Lu H, Weintz C, Pace J, Indana D, Linka K, Kuhl E (2021) Are college campuses superspread-
ers? A data-driven modeling study. Computer Methods in Biomechanics and Biomedical
Engineering doi:10.1080/10255842.2020.1869221.
39. Martcheva M (2015) An Introduction to Mathematical Epidemiology. Springer Science +
Business Media New York.
40. New York Times (2020) Coronavirus COVID-19 Data in the United States. https://github
.com/nytimes/covid-19-data/blob/master/us-states.csv assessed: June 1, 2021.
41. Osvaldo M (2018) Bayesian Analysis with Python: Introduction to Statistical Modeling and
Probabilistic Programming Using PyMC3 and ArviZ. Packt Publishing, 2nd edition.
42. Paul JR (1966) Clinical Epidemiology. University of Chicago Press, Chicago.
43. Peirlinck M, Linka K, Sahli Costabal F, Bendavid E, Bhattacharya J, Ioannidis J, Kuhl E (2020)
Visualizing the invisible: The effect of asymptomatic transmission on the outbreak dynamics
of COVID-19. Computer Methods in Applied Mechanics and Engineering 372:113410.
44. Peng GCY, Alber M, Buganza Tepole A, Cannon W, De S, Dura-Bernal S, Garikipati K,
Karniadakis G, Lytton WW, Perdikaris P, Petzold L, Kuhl E (2021) Multiscale modeling meets
machine learning: What can we learn? Archive of Computational Methods in Engineering
28:1017-1037.
45. Siegenfeld AF, Taleb NN, Bar-Yam Y (2020) What models can and cannot tell us about
COVID-19. Proceedings of the National Academy of Sciences 117:16092-16095.
46. Snow J (1855) On the Mode of Communication of Cholera (2nd edition). London, John
Churchill.
47. Soper GA (1919) The lessons of the pandemic. Science XLIX 501-506.
48. Viceconte G, Petrosillo N (2020) COVID-19 R0: Magic number or conundrum? Infectious
Disease Reports 12:8516.
49. World Health Organization. WHO Virtual Press Conference on COVID-19, March 11, 2020
https://www.who.int/docs/default-source/coronaviruse/transcripts/who-a
udio-emergencies-coronavirus-press-conference-full-and-final-11mar2020
.pdf?sfvrsn=cb432bb32 accessed: June 1, 2021.
50. 34 2 The classical SIS model
Fig. 2.1 Classical SIS model. The classical SIS model contains two compartments for the sus-
ceptible and infectious populations, S and I. The transition rates between the compartments, the
contact and infectious rates, β and γ, are inverses of the contact and infectious periods, B = 1/β
and C = 1/γ.
of compartment modeling [8]. The SIS model is often used to illustrate the basic
features of compartment models because we can solve the dynamics of its two pop-
ulations, S(t) and I(t), analytically at any point t in time [7]. The SIS model also has
simple analytical solutions for the converged final sizes, S∞ and I∞, as t → ∞ [13].
For the transition from the susceptible to the infectious group, we assume a mass
action incidence [4], which implies that the rate of new infections is proportional
to the size of the susceptible and infectious groups S and I weighted by the contact
rate β, Û
I = β SI. For the transition from the infectious to the susceptible group, we
Fig. 2.2 The transition from the susceptible to the infectious
group is based on the assumption of mass action incidence
for which the rate of new infections is proportional to the size
of the susceptible and infectious groups S and I, weighted
by the contact rate β, Û
I = β SI. The transition from the
infectious to the susceptible group is based on the assumption
of constant rate recovery for which the rate of recovered
infections is proportional to the size of the infectious group
I, weighted by the infectious rate γ, Û
I = −γI.
assume a constant rate recovery, which implies that the rate of recovered infections
is proportional to the size of the infectious group I weighted by the infectious rate
γ, Û
I = γI. Figure 2.2 illustrates the dynamics of these two assumptions that result in
the following system of two coupled ordinary differential equations,
Û
S = − β SI + γ I
Û
I = + β SI − γ I .
(2.1)
The transition rates between both compartments are the contact rate β and the
infectious rate γ in units [1/days], which are the inverses of the contact period
B = 1/β and the infectious period C = 1/γ in units [days]. The ratio between the
contact and infectious rates, or similarly, between the infectious and contact periods,
defines the basic reproduction number R0,
R0 =
β
γ
=
C
B
. (2.2)
In this simple format, the SIS model (2.1) neglects all vital dynamics, Û
S + Û
I 0 and
S + I = const. = 1. It does not account for births, natural deaths, or death from the
disease.
51. 2.2 Analytical solution of the SIS model 35
2.2 Analytical solution of the SIS model
From the condition of non-vital dynamics, S = 1 − I, we can rephrase the system of
equations of the SIS model (2.1) in terms of only one independent variable, the size
of the infectious group I, governed by the following nonlinear ordinary differential
equation,
Û
I = β [ 1 − I ] I − γ I = [ β − γ ] I
1 −
I
1 − γ/β
. (2.3)
Equation (2.3) is a logistic differential equation of the form,
Û
I = r I [ 1 − I/K ] with r = β − γ and K = 1 − γ/β, (2.4)
where K is the carrying capacity. This type of equation has an explicit analytical
solution,
I(t) =
K I0
I0 + [ K − I0 ] exp (−r t)
, (2.5)
where I0 = I(0) is the initial infectious population [10]. It proves convenient to
reparameterize the equation for the infectious population (2.5) in terms of the basic
reproduction number R0 = β/γ and the infectious period C = 1/γ with r = [R0 −
1]/C and K = 1 − 1/R0. This provides the analytical solution for the SIS model in
terms of the infectious period C, the basic reproduction number R0, and the initial
infectious population I0,
S(t) = 1−
[1 − 1/R0] I0
I0 + [1 − 1/R0 − I0] exp ([1 − R0] t/C)
I(t) =
[1 − 1/R0] I0
I0 + [1 − 1/R0 − I0] exp ([1 − R0] t/C)
.
(2.6)
Figures 2.3, 2.4, and 2.5 highlight the outbreak dynamics of the SIS model, solved
analytically using equations (2.6), for the time period of one year. The three figures
demonstrate the sensitivity of the SIS model for varying infectious periods C, and
varying basic reproduction numbers R0, and initial infectious populations I0. Increas-
ing the infectious period C delays convergence to the endemic equilibrium, but the
final sizes S∞ and I∞ remain unchanged. Increasing the basic reproduction number
R0 accelerates convergence to the endemic equilibrium, decreases S∞, and increases
I∞. Increasing the initial infectious population I0 accelerates the onset of the out-
break, but the final sizes S∞ and I∞ remain unchanged. Interestingly, increasing the
initial exposed population I0 by an order of magnitude shifts the population dynam-
ics by a constant time increment, for the current parameterization by 50 days. This
highlights the exponential nature of the SIS model, which causes a constant acceler-
ation of the outbreak for a logarithmic increase of the initial infectious population,
while the overall outbreak dynamics remain the same.
52. 36 2 The classical SIS model
Fig. 2.3 Classical SIS model. Sensitivity with respect to the infectious period C. Increasing
the infectious period C delays convergence to the endemic equilibrium, but the final sizes S∞ and
I∞ remain unchanged. Basic reproduction number R0 = 2.0, initial infectious population I0 = 0.01,
and infectious period C = 5, 10, 15, 20, 25, 30 days.
Fig. 2.4 Classical SIS model. Sensitivity with respect to the basic reproduction number R0.
Increasing the basic reproduction number R0 accelerates convergence to the endemic equilibrium,
decreases S∞, and increases I∞. Infectious period C = 20 days, initial infectious population
I0 = 0.01, and basic reproduction number R0 = 1.5, 1.7, 2.0, 2.4, 3.0, 5.0, 10.0.
2.3 Final size relation of the SIS model
For practical purposes, it is interesting to estimate the final susceptible and infectious
populations S∞ and I∞ [15]. From the analytical solution (2.5),
I∞ =
K I0
I0 + [ K − I0 ] exp (−r t∞)
with r = β − γ and K = 1 − γ/β, (2.7)
53. 2.3 Final size relation of the SIS model 37
Fig. 2.5 Classical SIS model. Sensitivity with respect to the initial infectious population I0.
Increasing the initial infectious population I0 accelerates the onset of the outbreak, but the final
sizes S∞ and I∞ remain unchanged. Infectious period C = 20 days, basic reproduction number
R0 = 2.0, and initial infectious population I0 = 10−1, 10−2, 10−3, 10−4, 10−5, 10−6.
it is easy to show that we can distinguish two cases. If there is a finite initial infectious
population, I0 0, the infectious population will converge to either zero, for β γ,
r 0, and exp (−r t∞) → ∞, or to K for β γ, r 0, and exp (−r t∞) → 0 [2, 9],
thus
I∞ =
0 if β γ
K if β γ .
(2.8)
We can rephrase these limit conditions (2.8) in terms of the basic reproduction
number R0. In classical epidemiology, the two converged states are known as the
disease-free equilibrium and the endemic equilibrium [3]. The equations that define
the final converged populations of the endemic equilibrium state are the final size
relation.
R0 1 disease-free equilibrium: S∞ = 1 and I∞ = 0
R0 1 endemic equilibrium: S∞ = 1/R0 and I∞ = 1 − 1/R0 .
(2.9)
Figure 2.6 illustrates the final size relation and emphasizes the role of the basic
reproduction number R0 as the distinguishing outbreak characteristic between the
disease-free and endemic equilibrium. For reproduction numbers smaller than one,
R0 1, the SIS model converges to a disease-free equilibrium with S∞ = 1 and
I∞ = 0. For reproduction numbers larger than one, R0 1, the SIS model converges
to an endemic equilibrium with S∞ = 1/R0 and I∞ = 1 − 1/R0. The larger the
reproduction number R0, the smaller the final susceptible population S∞ and the
larger the final infectious population I∞ [3].
54. 38 2 The classical SIS model
Fig. 2.6 Classical SIS model. Final size relation as a function of the basic reproduction num-
ber R0. For R0 1, the SIS model converges to a disease-free equilibrium with S∞ = 1 and
I∞ = 0. For R0 1, the SIS model converges to an endemic equilibrium with S∞ = 1/R0 and
I∞ = 1 − 1/R0. Increasing R0 reduces the susceptible population S∞ and increases the infectious
population I∞.
Problems
2.1 Basic reproduction number. Throughout the winter of 1980, a Scottish board-
ing school reported an influenza that, on any day, affected on average 408 of its 1632
students. Estimate the basic reproduction number.
2.2 Basic reproduction number. Throughout the winter of 1980, a Scottish board-
ing school reported an influenza with an infectious period of C = 4 days and a contact
period of B = 2.8 days. How many of its 1632 students are infectious at endemic
equilibrium?
2.3 Basic reproduction number. Assume the common flu has an infectious period
of C = 7 days and a contact rate of β = 0.2/ days. Determine the basic reproduction
number R0 and the final sizes of the susceptible and infectious populations S∞ and
I∞.
2.4 Contact rate. Assume the common flu has an infectious period of C = 7 days.
Determine the contact rate β for which the infectious population I never increases
beyond 20% of the population. What is the basic reproduction number under these
conditions?
2.5 Contact rate. Assume the common flu has an infectious period of C = 7 days.
Determine the contact rate β for which the infectious population I never increases
beyond 30% of the population. Comment on how the basic reproduction number for
a maximum infectious population of 30% differs to the basic reproduction number
for a maximum infectious population of 20%.
56. the cords used in tying the blankets into bales, which same cross
lines appear as cords in l, Fig. 33. Mr. Coronel also possesses small
figures of Mexicans, of various conditions of life, costumes, trades,
and professions, one of which, a painted statuette, is a
representation of a Mexican lying down flat upon an outspread
serape, similar in color and form to the black and white bands shown
in the upper figure of d, Fig. 32, and a, b, of Fig. 33, and instantly
suggesting the explanation of those figures. Upon the latter the
continuity of the black and white bands is broken, as the human
figures are probably intended to be in front, or on top, of the
drawings of the blankets.
Fig. 33.—Petroglyph in Santa Barbara county, California.
The small statuette above mentioned is that of a Mexican trader, and
if the circles in the petroglyphs are considered to represent bales of
blankets, the character in Fig. 32, d, is still more interesting, from
the union of one of these circles with a character representing the
trader, i. e., the man possessing the bales. Bales, or what appear to
be bales, are represented to the top and right of the circle in d, in
that figure. In Fig. 33, l, a bale is upon the back of what appears to
be a horse, led in an upward direction by an Indian whose
headdress and ends of the breechcloth are visible. To the right of the
bale are three short lines, evidently showing the knot or ends of the
57. cords used in tying a bale of blankets without colors, therefore of
less importance, or of other goods. Other human forms appear in
the attitude of making gestures, one also in j, Fig. 33, probably
carrying a bale of goods. In the same figure u represents a
centipede, an insect found occasionally south of the mountains, but
reported as extremely rare in the immediate northern regions. For
remarks upon x in the same figure see Chapter xx, Section 2, under
the heading The Cross.
Mr. Coronel stated that when he first settled in Los Angeles, in 1843,
the Indians living north of the San Fernando mountains
manufactured blankets of the fur and hair of animals, showing
transverse bands of black and white similar to those depicted, which
were sold to the inhabitants of the valley of Los Angeles and to
Indians who transported them to other tribes.
It is probable that the pictographs are intended to represent the
salient features of a trading expedition from the north. The ceiling of
the cavity found between the paintings represented in the two
figures has disappeared, owing to disintegration, thus leaving a
blank about 4 feet long, and 6 feet from the top to the bottom
between the paintings as now presented.
COLORADO.
Petroglyphs are reported by Mr. Cyrus F. Newcomb as found upon
cliffs on Rock creek, 15 miles from Rio Del Norte, Colorado. Three
small photographs, submitted with this statement, indicate the
characters to have been pecked; they consist of men on horseback,
cross-shaped human figures, animals, and other designs greatly
resembling those found in the country of the Shoshonean tribes,
examples of which are given infra.
Another notice of the same general locality is made by Capt. E. L.
Berthoud (a) as follows:
58. The place is 20 miles southeast of Rio Del Norte, at the
entrance of the canyon of the Piedra Pintada (Painted
rock) creek. The carvings are found on the right of the
canyon or valley and upon volcanic rocks. They bear the
marks of age and are cut in, not painted, as is still done
by the Utes everywhere. They are found for a quarter of a
mile along the north wall of the canyon, on the ranches of
W. M. Maguire and F. T. Hudson, and consist of all manner
of pictures, symbols, and hieroglyphics done by artists
whose memory even tradition does not now preserve. The
fact that these are carvings done upon such hard rock
invests them with additional interest, as they are quite
distinct from the carvings I saw in New Mexico and
Arizona on soft sandstone. Though some of them are
evidently of much greater antiquity than others, yet all are
ancient, the Utes admitting them to have been old when
their fathers conquered the country.
Mr. Charles D. Wright, of Durango, Colorado, in a communication
dated February 20, 1885, gives an account of some “hieroglyphs” on
rocks and upon the walls of cliff houses near the boundary line
between Colorado and New Mexico. He says:
The following were painted in red and black paints on the
wall (apparently the natural rock wall) of a cliff house: At
the head was a chief on his horse, armed with spear and
lance and wearing a pointed hat and robe; behind this
character were some twenty characters representing
people on horses lassoing horses, etc. In fact the whole
scene represented breaking camp and leaving in a hurry.
The whole painting measured about 12 by 16 feet.
Mr. Wright further reports characters on rocks near the San Juan
river. Four characters represent men as if in the act of taking an
obligation, hands extended, and wearing a “kind of monogram on
breast, and at their right are some hieroglyphics written in black
paint covering a space 3 by 4 feet.”
59. The best discussed and probably the most interesting of the
petroglyphs in the region are described and illustrated by Mr. W. H.
Holmes (a), of the Bureau of Ethnology. The illustrations are here
reproduced in Figs. 34 to 37, and the remarks of Mr. Holmes, slightly
condensed, are as follows:
The forms reproduced in Fig. 34 occur on the Rio Mancos,
near the group of cliff houses. They are chipped into the
rock evidently by some very hard implement and rudely
represent the human figure. They are certainly not
attempts to represent nature, but have the appearance
rather of arbitrary forms, designed to symbolize some
imaginary being.
Fig. 34.—Petroglyphs on the Rio Mancos, Colorado.
The forms shown in Fig. 35 were found in the same
locality, not engraved, but painted in red and white clay
upon the smooth rocks. These were certainly done by the
cliff-builders, and probably while the houses were in
process of construction, since the material used is
identical with the plaster of the houses. The sketches and
notes were made by Mr. Brandegee. The reproduction is
approximately one-twelfth the size of the original.
60. Fig. 35.—Petroglyphs on the Rio Mancos, Colorado.
The examples shown in Fig. 36 occur on the Rio San Juan
about 10 miles below the mouth of the Rio La Plata and
are actually in New Mexico. A low line of bluffs, composed
of light-colored massive sandstones that break down in
great smooth-faced blocks, rises from the river level and
sweeps around toward the north. Each of these great
blocks has offered a very tempting tablet to the graver of
the primitive artist, and many of them contain curious and
interesting inscriptions. Drawings were made of such of
these as the limited time at my disposal would permit.
They are all engraved or cut into the face of the rock, and
the whole body of each figure has generally been chipped
out, frequently to the depth of one-fourth or one-half of
an inch.
61. Fig. 36.—Petroglyphs on the Rio San Juan, New Mexico.
The work on some of the larger groups has been one of
immense labor, and must owe its completion to strong and
enduring motives. With a very few exceptions the
engraving bears undoubted evidence of age. Such new
figures as occur are quite easily distinguished both by the
freshness of the chipped surfaces and by the designs
themselves. The curious designs given in the final group
have a very perceptible resemblance to many of the
figures used in the embellishment of pottery.
The most striking group observed is given in Fig. 37 a,
same locality. It consists of a great procession of men,
birds, beasts, and fanciful figures. The whole picture as
placed upon a rock is highly spirited and the idea of a
62. general movement toward the right, skillfully portrayed. A
pair of winged figures hover about the train as if to watch,
or direct its movements; behind these are a number of
odd figures, followed by an antlered animal resembling a
deer, which seems to be drawing a notched sledge
containing two figures of men. The figures forming the
main body of the procession appear to be tied together in
a continuous line, and in form resemble one living
creature about as little as another. Many of the smaller
figures above and below are certainly intended to
represent dogs, while a number of men are stationed
about here and there as if to keep the procession in order.
Fig. 37.—Petroglyphs on the Rio San Juan, New Mexico.
As to the importance of the event recorded in this picture,
no conclusions can be drawn; it may represent the
migration of a tribe or family or the trophies of a victory. A
number of figures are wanting in the drawing at the left,
while some of those at the right may not belong properly
to the main group. The reduction is, approximately, to
one-twelfth.
Designs B and C of the same figure represent only the
more distinct portions of two other groups. The
63. complication of figures is so great that a number of hours
would have been necessary for their delineation, and an
attempt to analyze them here would be fruitless.
It will be noticed that the last two petroglyphs are in New Mexico,
but they are so near the border of Colorado and so connected with
the series in that state that they are presented under the same
heading.
CONNECTICUT.
The following account is extracted from Rafn’s Antiquitates
Americanæ (a):
In the year 1789 Doctor Ezra Stiles, D. D., visited a rock
situated in the Township of Kent in the State of
Connecticut, at a place called Scaticook, by the Indians.
He thus describes it: “Over against Scaticook and about
one hundred rods East of Housatonic River, is an
eminence or elevation which is called Cobble Hill. On the
top of this stands the rock charged with antique unknown
characters. This rock is by itself and not a portion of the
Mountains; it is of White Flint; ranges North and South; is
from twelve to fourteen feet long; and from eight to ten
wide at base and top; and of an uneven surface. On the
top I did not perceive any characters; but the sides all
around are irregularly charged with unknown characters,
made not indeed with the incision of a chisel, yet most
certainly with an iron tool, and that by pecks or picking,
after the manner of the Dighton Rock. The Lacunae or
excavations are from a quarter to an inch wide; and from
one tenth to two tenths of an inch deep. The engraving
did not appear to be recent or new, but very old.”
GEORGIA.
64. Charles C. Jones, jr., (a) describes a petroglyph in Georgia as
follows:
In Forsyth county, Georgia, is a carved or incised bowlder
of fine grained granite, about 9 feet long, 4 feet 6 inches
high, and 3 feet broad at its widest point. The figures are
cut in the bowlder from one-half to three-fourths of an
inch deep. It is generally believed that they are the work
of the Cherokees.
The illustration given by him is here reproduced in Fig. 38. It will be
noted that the characters in it are chiefly circles, including plain,
nucleated, and concentric, sometimes two or more being joined by
straight lines, forming what is now known as the “spectacle shaped”
figure. The illustrations should be compared with the many others
presented in this paper under the heading of Cup Sculptures, see
Chapter v, infra.
Fig. 38.—Petroglyphs in Georgia.
65. Dr. M. F. Stephenson (a) mentions sculptures of human feet, various
animals, bear tracks, etc., in Enchanted mountain, Union county,
Georgia. The whole number of sculptures is reported as one hundred
and forty-six.
Mr. Jones (b) gives a different résumé of the objects depicted, as
follows:
66. Upon the Enchanted mountain, in Union county, cut in
plutonic rock, are the tracks of men, women, children,
deer, bears, bisons, turkeys, and terrapins, and the
outlines of a snake, of two deer, and of a human hand.
These sculptures—so far as they have been ascertained
and counted—number one hundred and thirty-six. The
most extravagant among them is that known as the
footprint of the “Great Warrior.” It measures 18 inches in
length and has six toes. The other human tracks and
those of the animals are delineated with commendable
fidelity.
IDAHO.
Mr. G. K. Gilbert, of the U. S. Geological Survey, has furnished a
small collection of drawings of Shoshonean petroglyphs from Oneida,
Idaho, shown in Fig. 39. Some of them appear to be totemic
characters, and possibly were made to record the names of visitors
to the locality.
67. Fig. 39.—Petroglyphs in Idaho (Shoshonean).
Mr. Willard D. Johnson, of the U. S. Geological Survey, reports
pictographic remains observed by him near Oneida, Idaho, in 1879.
The figures represent human beings and were on a rock of basalt.
A copy of another petroglyph found in Idaho appears in Fig. 1092,
infra.
ILLINOIS.
Petroglyphs are reported by Mr. John Criley as occurring near Ava,
Jackson county, Illinois. The outlines of the characters observed by
him were drawn from memory and submitted to Mr. Charles S.
Mason, of Toledo, Ohio, through whom they were furnished to the
68. Bureau of Ethnology. Little reliance can be placed upon the accuracy
of such drawing, but from the general appearance of the sketches
the originals of which they are copies were probably made by one of
the middle Algonquian tribes of Indians.
The “Piasa” rock, as it is generally designated, was referred to by the
missionary explorer Marquette in 1675. Its situation was immediately
above the city of Alton, Illinois.
Marquette’s remarks are translated by Dr. Francis Parkman (a) as
follows:
On the flat face of a high rock were painted, in red, black,
and green, a pair of monsters, each “as large as a calf,
with horns like a deer, red eyes, a beard like a tiger, and a
frightful expression of countenance. The face is something
like that of a man, the body covered with scales; and the
tail so long that it passes entirely round the body, over the
head, and between the legs, ending like that of a fish.”
Another version, by Davidson and Struvé (a), of the discovery of the
petroglyph is as follows:
Again they (Joliet and Marquette) were floating on the
broad bosom of the unknown stream. Passing the mouth
of the Illinois, they soon fell into the shadow of a tall
promontory, and with great astonishment beheld the
representation of two monsters painted on its lofty
limestone front. According to Marquette, each of these
frightful figures had the face of a man, the horns of a
deer, the beard of a tiger, and the tail of a fish so long that
it passed around the body, over the head, and between
the legs. It was an object of Indian worship and greatly
impressed the mind of the pious missionary with the
necessity of substituting for this monstrous idolatry the
worship of the true God.
A footnote connected with the foregoing quotation gives the
following description of the same rock:
69. Near the mouth of the Piasa creek, on the bluff, there is a
smooth rock in a cavernous cleft, under an overhanging
cliff, on whose face, 50 feet from the base, are painted
some ancient pictures or hieroglyphics, of great interest to
the curious. They are placed in a horizontal line from east
to west, representing men, plants, and animals. The
paintings, though protected from dampness and storms,
are in great part destroyed, marred by portions of the rock
becoming detached and falling down.
Mr. McAdams (a), of Alton, Illinois, says “The name Piasa is Indian
and signifies, in the Illini, ‘The bird which devours men.’” He
furnishes a spirited pen-and-ink sketch, 12 by 15 inches in size and
purporting to represent the ancient painting described by Marquette.
On the picture is inscribed the following in ink: “Made by Wm.
Dennis, April 3d, 1825.” The date is in both letters and figures. On
the top of the picture in large letters are the two words, “FLYING
DRAGON.” This picture, which has been kept in the old Gilham family
of Madison county and bears the evidence of its age, is reproduced
as Fig. 40.
Fig. 40.—The Piasa petroglyph.
He also publishes another representation (Fig. 41) with the following
remarks:
One of the most satisfactory pictures of the Piasa we have
ever seen is in an old German publication entitled “The
70. Valley of the Mississippi Illustrated. Eighty illustrations
from nature, by H. Lewis, from the Falls of St. Anthony to
the Gulf of Mexico,” published about the year 1839 by
Arenz Co., Düsseldorf, Germany. One of the large full-
page plates in this work gives a fine view of the bluff at
Alton, with the figure of the Piasa on the face of the rock.
It is represented to have been taken on the spot by artists
from Germany. We reproduce that part of the bluff (the
whole picture being too large for this work) which shows
the pictographs. In the German picture there is shown just
behind the rather dim outlines of the second face a
ragged crevice, as though of a fracture. Part of the bluff’s
face might have fallen and thus nearly destroyed one of
the monsters, for in later years writers speak of but one
figure. The whole face of the bluff was quarried away in
1846-’47.
Fig. 41.—The Piasa petroglyph.
Under Myths and Mythic Animals, Chapter xiv, Section 2, are
illustrations and descriptions which should be compared with these
accounts, and Chapter xxii gives other examples of errors and
discrepancies in the description and copying of petroglyphs.
Mr. A. D. Jones (a) says of the same petroglyph:
71. After the distribution of firearms among the Indians,
bullets were substituted for arrows, and even to this day
no savage presumes to pass the spot without discharging
his rifle and raising his shout of triumph. I visited the spot
in June (1838) and examined the image and the ten
thousand bullet marks on the cliff seemed to corroborate
the tradition related to me in the neighborhood.
Mr. McAdams, loc. cit., also reports regarding Fig. 42:
Fig. 42.—Petroglyph on the Illinois river.
Some twenty-five or thirty miles above the mouth of the
Illinois river, on the west bank of that stream, high up on
the smooth face of an overhanging cliff, is another
interesting pictograph sculptured deeply in the hard rock.
It remains to-day probably in nearly the same condition it
was when the French voyagers first descended the river
and got their first view of the Mississippi. The animal-like
body, with the human head, is carved in the rock in
outline. The huge eyes are depressions like saucers, an
inch or more in depth, and the outline of the body has
been scooped out in the same way; also the mouth.
The figure of the archer with the drawn bow, however, is
painted, or rather stained with a reddish brown pigment,
over the sculptured outline of the monster’s face.
72. Mr. McAdams suggests that the painted figure of the human form
with the bow and arrows was made later than the sculpture.
The same author (b) says, describing Fig. 43:
Fig. 43.—Petroglyph near Alton, Illinois.
Some 3 or 4 miles above Alton, high up beneath the
overhanging cliff, which forms a sort of cave shelter on
the smooth face of a thick ledge of rock, is a series of
paintings, twelve in number. They are painted or rather
stained in the rock with a reddish brown pigment that
seems to defy the tooth of time. It may be said, however,
that their position is so sheltered that they remain almost
perfectly dry. We made sketches of them some thirty
years ago and on a recent visit could see that they had
changed but little, although their appearance denotes
great age.
These pictographs are situated on the cliff more than a
hundred feet above the river. A protruding ledge, which is
easily reached from a hollow in the bluff, leads to the
cavernous place in the rock.
Mr. James D. Middleton, formerly of the Bureau of Ethnology,
mentions the occurrence of petroglyphs on the bluffs of the
Mississippi river, in Jackson county, about 12 miles below Rockwood.
Also of others about 4 or 5 miles from Prairie du Rocher, near the
Mississippi river.
IOWA.
Mr. P. W. Norris, of the Bureau of Ethnology, found numerous caves
on the banks of the Mississippi river, in northeastern Iowa, 4 miles
73. south of New Albion, containing incised petroglyphs. Fifteen miles
south of this locality paintings occur on the cliffs. He also discovered
painted characters upon the cliffs on the Mississippi river, 19 miles
below New Albion.
KANSAS.
Mr. Edward Miller reports in Proceedings of the American
Philosophical Society, vol. x, 1869, p. 383, the discovery of a
petroglyph near the line of the Union Pacific railroad, 15 miles
southeast of Fort Harker, formerly known as Fort Ellsworth, Kansas.
The petroglyph is upon a formation belonging to No. 1, Lower
Cretaceous group, according to the classification of Meek and
Hayden.
The parts of the two plates vii and viii of the work cited, which bear
the inscriptions, are now presented as Fig. 44, being from two views
of the same rock.
Fig. 44.—Petroglyphs in Kansas.
KENTUCKY.
Mr. James D. Middleton, formerly of the Bureau of Ethnology, in a
letter dated August 14, 1886, reports that at a point in Union county,
Kentucky, nearly opposite Shawneetown, Illinois, petroglyphs are
74. found, and from the description given by him they appear to
resemble those in Jackson county, Illinois, mentioned above.
Mr. W. E. Barton, of Wellington, Ohio, in a communication dated
October 4, 1890, writes as follows:
At Clover Bottom, Kentucky, on a spur of the Big Hill, in
Jackson county, about 13 miles from Berea, is a large rock
which old settlers say was covered with soil and
vegetation within their memory. Upon it are
representations of human tracks, with what appear to be
those of a bear, a horse, and a dog. These are all in the
same direction, as though a man leading a horse, followed
the dog upon the bear’s track. Crossing these is a series of
tracks of another and larger sort which I can not attempt
to identify. The stone is a sandstone in the
subcarboniferous. As I remember, the strata are nearly
horizontal, but erosion has made the surface a slope of
about 20°. The tracks ascending the slope cross the
strata. I have not seen them for some years.
The crossing of the strata shows that the tracks are the
work of human hands, if indeed it were not preposterous
to think of anything else in rocks of that period. Still the
tracks are so well made that one is tempted to ask if they
can be real. They alternate right and left, though the
erosion and travel have worn out some of the left tracks.
A wagon road passes over the rock and was the cause of
the present exposure of the stone. It can be readily found
a fourth of a mile or less from the Pine Grove
schoolhouse.
MAINE.
A number of inscribed rocks have been found in Maine and
information of others has been obtained. The most interesting of
75. them and the largest group series yet discovered in New England is
shown in Pl. xii.
BUREAU OF ETHNOLOGY TENTH ANNUAL REPORT PL. XII
PETROGLYPHS IN MAINE.
The rock upon which the glyphs appear is in the town of
Machiasport, Maine, at Clarks point, on the northwestern side of
Machias bay, 2 miles below the mouth of Machias river. The rock or
ledge is about 50 feet long from east to west and about fifteen feet
in width, nearly horizontal for two-thirds its length, from the bank or
western end at high water, thence inclining at an angle of 15° to
low-water mark. Its southern face is inclined about 40°. The
formation is schistose slate, having a transverse vein of trap dike
extending nearly across its section. Nearly the entire ledge is of
blue-black color, very dense and hard except at the upper or western
end, where the periodical formation of ice has scaled off thin layers
of surface and destroyed many figures which are remembered by
persons now living. The ebb and flow of tides, the abrasion of
moving beach stones or pebble wash and of ice-worn bowlders,
have also effaced many figures along the southern side, until now
but one or two indentations are discernible. Visitors, in seeking to
remove some portion of the rock as a curiosity or in striving to
perpetuate their initials, have obscured several of the most
interesting, and until recently the best defined figures. It was also
evident to the present writer, who carefully examined the rock in
1888, that it lay much deeper in the water than once had been the
76. case. At the lowest tides there were markings seen still lower, which
could not readily have been made if that part of the surface had not
been continuously exposed. The depression of a rock of such great
size, which was so gradual that it had not been observed by the
inhabitants of the neighboring settlement, is an evidence of the
antiquity of the peckings.
The intaglio carving of all the figures was apparently made by
repeated blows of a pointed instrument—doubtless of hard stone;
not held as a chisel, but working by a repetition of hammerings or
peckings. The deepest now seen is about three-eighths of an inch.
The amount of patient labor bestowed upon these figures must have
been great, considering the hardness of the rock and the rude
implement with which they were wrought.
There is no extrinsic evidence of their age. The place was known to
traders early in the seventeenth century, and much earlier was
visited by Basque fishermen, and perhaps by the unfortunate
Cortereals in 1500 and 1503. The descendants of the Mechises
Indians, a tribal branch of the Abnaki, who once occupied the
territory between the St. Croix and Narraguagus rivers, when
questioned many years ago, would reply in substance that “all their
old men knew of them,” either by having seen them or by traditions
handed down through many generations.
Several years ago Mr. H. R. Taylor, of Machias, who made the original
sketch in 1868 and kindly furnished it to the Bureau of Ethnology,
applied to a resident Indian there (Peter Benoit, then nearly 80 years
old) for assistance in deciphering the characters. He gave little
information, but pointed out that the figures must not all be read
“from one side only,” thus, the one near the center of the sketch,
which seen from the south was without significance, became from
the opposite point a squaw with sea fowl on her head, denoting, as
he said, “that squaw had smashed canoe, saved beaver-skin, walked
one-half moon all alone toward east, just same as heron wading
alongshore.” Also that the three lines below the figure mentioned,
which together resemble a bird track or a trident, represent the
77. three rivers, the East, West, and Middle rivers of Machias, which join
not far above the locality. The mark having a rough resemblance to a
feather, next on the right of this river-sign, is a fissure in the rock.
Most of the figures of human beings and other animals are easily
recognizable.
Peckings of a character similar to those on the Picture rock at Clarks
point, above described, were found and copied 600 feet south of it
at high-water mark on a rock near Birch point. Others were
discovered and traced on a rock on Hog island, in Holmes bay, a part
of Machias bay. All these petroglyphs were without doubt of Abnaki
origin, either of the Penobscot or the Passamaquoddy divisions of
that body of Indians. The rocks lay on the common line of water
communication between those divisions and were convenient as
halting places.
MARYLAND.
In the Susquehanna river, about half a mile south of the state line, is
a group of rocks, several of the most conspicuous being designated
as the “Bald Friars.” Near by are several mound-shaped bowlders of
the so-called “nigger-head” rock, which is reported as a dark-
greenish chlorite schist. Upon the several bowlders are deep
sculpturings, apparently finished by rubbing the depression with
stone, or wood and sand, thus leaving sharp and distinct edges to
the outlines. Some of these figures are an inch in depth, though the
greater number are becoming more and more eroded by the
frequent freshets, and by the running ice during the breaking up in
early spring of the frozen river.
The following account is given by Prof. P. Frazer (a):
Passing the Pennsylvania state line one reaches the
southern barren serpentine rocks, which are in general
tolerably level for a considerable distance.
78. About 700 yards, or 640 meters, south of the line, on the
river shore, are rocks which have been named the Bald
Friars. French’s tavern is here, at the mouth of a small
stream which empties into the Susquehanna. About 874
yards (800 meters) south of this tavern are a number of
islands which have local names, but which are curious as
containing inscriptions of the aborigines.
The material of which most of these islands are composed
is chlorite schist, but as this rock is almost always
distinguished by the quartz veins which intersect it, so in
this case some of the islands are composed of this
material almost exclusively, which gives them a very
striking white appearance.
One of these, containing the principal inscriptions, is called
Miles island.
The figures, which covered every part of the rocks that
were exposed, were apparently of historical or at least
narrative purport, since they seemed to be connected.
Doubtless the larger portion of the inscription has been
carried away by the successive vicissitudes which have
broken up and defaced, and in some instances obliterated,
parts of which we find evidence of the previous existence
on the islands.
Every large bowlder seems to contain some traces of
previous inscription, and in many instances the pictured
side of the bowlder is on its under side, showing that it
has been detached from its original place. The natural
agencies are quite sufficient to account for any amount of
this kind of displacement, for the rocks in their present
condition are not refractory and offer no great resistance
to the wear of weather and ice; but in addition to this
must be added human agencies.
79. Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com