SlideShare a Scribd company logo
Modeling with Data
  INTRO TO COMPUTING FOR COMPLEX SYSTEMS
                             (Session XVI)

                                  Jon Zelner
                      University of Michigan

                                 8/11/2010
Data in the Modeling Process

     Agents              Observed Behavior




   Environment   Model   Simulated Behavior
Pattern Oriented Modeling (POM)
 Term coined by Grimm et al. in 2005 Science paper

 Modeling process should be guided by patterns of
  interest
   Can use patterns @ multiple levels:
     Individual agents
     Environment
     Aggregate agent behavior

 Patterns should be used both to guide model
  development and to calibrate and validate models.
Types of data for modelers
 Counts/Proportions:        Time Series:
   Infections                 Evolution of outbreak in
   Occupied patches            time
                               Timeline of conflict
 Distributions                Number of firms over
   Age                         time
   Lifespan
                             Qualitative:
   Duration of infection
                               ‘Norovirus-like’
 Rates                         outbreaks
   Birthrates                 Size and shape of forest
                                patches
   Transmission rate
                               Clusters of settlements
Pattern Oriented Modeling
Pattern Oriented Modeling: Kayenta
Anasazi (Axtell et al. 2001)
 Trying to understand
  population growth and
  collapse among the
  Kayenta Anasazi in U.S.
  Southwest

 Many factors in this:
   Weather
   Farming
   Kinship

 Optimize models by
  explaining multiple
  patterns @ one time.
Anasazi (cont’d)
Inference for POM
 Bayesian/Qualitative
   Use some kind of quality function to score goodness of runs
    and optimize by minimizing distance between model output
    and optimum quality and/or data.
     Number of occupied patches
     Size of elephant herds

 Frequentist/Likelihood-based
   Define a likelihood function for Data | Model
   Simulate runs from the model and evaluate likelihood of
    data as (# runs == Data) / # runs
How Infections Propagate
    After Point-Source Events
An Analysis of Secondary Norovirus Transmission

                         Jon Zelnera,b, Aaron A. Kinga,c, Christine Moee &
                                                 Joseph N.S. Eisenberga,d

                                                          University of Michigan
    a   Center for the Study of Complex Systems, b Sociology & Public Policy,
                   c   Ecology & Evolutionary Biology, d School of Public Health


                                                               Emory University
                                               e   Rollins School of Public Health
Norovirus (NoV) Epidemiology
 Most common cause of non-bacterial
  gastroenteritis in the US and worldwide.
   Est. 90 million cases in 2007
   Explosive diarrhea & projectile vomiting
    in symptomatic cases.

 Single-stranded, non-enveloped RNA
  virus
   Member of family Caliciviridae

 Often transmitted via food
   Salad greens
   Shellfish

 Most person-to-person transmission is
  via the environment and fomites.
Why model transmission after
point-source events?
                                                           IA
 Typical analysis of point-source events
  focuses on primary, one-to-many risk:
   How many cases are created by an
     infectious food handler?
                                                           IS
   How many people infected after water               H
     treatment failure?                            S



 However, actual size of point source events is
  underestimated without including secondary
  transmission risk.

 Within-household transmission is an important
  bridge between point-source events.
   So, even if within-household Ro < 1,
     household cases have important dynamic
     consequences at the community level.
NoV Transmission Dynamics

 Norovirus transmission dynamics tend to be
  locally unstable but globally persistent.
   E.g., small, explosive outbreaks in Mercer
     County, but no local NoV epidemic
   Multiple reported NoV outbreaks
     throughout New Jersey every week.
   Stochasticity operates at multiple levels.
      Disease/Contact
NoV Transmission Dynamics


                     Exponential Growth,
                       Global Invasion
                     (e.g.,Pandemic Flu)




                          Short, Explosive &
                               Limited
                           (Typical of NoV
                             outbreaks)
Outbreak Data
 Gotz et al. (2001) observed 500+
  households exposed to NoV after a
  point-source outbreak in a
  network of daycare centers in
  Stockholm, Sweden.
   Traceable to salad prepared by a
    food handler who was shedding
    post-symptoms.

 Followed 153 of these households
   Eliminating those with only one
    person.
   49 had secondary cases
   104 have no secondary cases
Deterministic SEIR model
 Infinite population
                                            dS
 Mass-action mixing                           = "#SI
                                            dt
 Frequency-dependent                       dE
  transmission                                  = #SI " $E
                                            dt
 When I > 0, a fraction of the             dI
  susceptible population is infected           = $E " %I
  at every instant                          dt
   Constant average rate of recovery       dR
   Doesn’t matter who is infected
                                               = %I
                                            dt
   ‘Nano-fox’ problem
                                        !
Why use a stochastic model?
 Deterministic models work well
  when assumptions are plausible,
  but are less useful when:
   Populations are small:
     e.g.,Household outbreak

   Global contact patterns deviate
    from homogeneous mixing:
     Social networks                 Exponential RV
     Realistic behavior

   Disease natural history is not
    memoryless:
     Recovery period is not
       exponentially or gamma
       distributed
     Lots of variability in
       individual infectiousness
                                      Lognormal RV
Progression of NoV Infection
 Short incubation period (~1.5 days)

 Typical symptom duration around 1.5 days.
   Exceptional cases up to a year have been reported.

 Most people shed asymptomatically after recovery of
  symptoms:
   Typically for several days
   Not uncommon for shedding to last > 1 month, year or more
   15-50% of all infections may be totally asymptomatic
Basic NoV Transmission Model for
  Household Outbreaks
 SEIR Transmission Model
   Individuals may be in one of four states:



     Susceptible
     Exposed/Incubating
     Infectious
     Recovered/Immune

 Multiple boxes in E & I states correspond to shape parameter of gamma distributed
  waiting times.

 Background infection parameter, α. (Fixed to 0.001/day)

 Although NoV immunity tends to be partial and short-lived, this model is adequate
  for analyzing short-lived outbreaks.
Analysis Objectives
 Estimate daily person-to-person rate of infection (β).
   0.14/infections per day

 Estimate average effective duration of infection (1/γ)
  and shape parameter of gamma-distributed
  infectiousness duration.
    1.2 days; γs = 1

 Effect of missing household sizes on results.
   Minimal

 Effect of asymptomatic infection.
   .035 increase in β for each 10% increase in
     proportion of individuals who are asymptomatically
     infectious
What makes these data
challenging to work with?
 We want to understand:
     Daily person-to-person rate of infection (β).
     Average effective duration of infection (1/γ).
     Variability in 1/γ.
     Generation of asymptomatic infections.

 But household data are noisy and only partially observed:
   We know time of symptom onset but are missing:
         Time of infection
         Time of recovery
         Firm estimate of asymptomatic ratio & infectiousness
         Household Sizes (!)

 Strength of these data are that we can treat each household
  as an independent trial of a random infection process.
Likelihood Function for Fully
Observed Household Outbreaks
  Force of Infection @ t                 λ(Sij ,I ij ,β ,α ) = Sij ( βI ij + α)

                                                N Q −1
  Likelihood of no infections over
  all infection-free intervals
                                      i, a =   ∏ exp(− λ(S           ij   ,I ij ,β ,α)(tj+1 − tj ))
                                                j =0
                        €
                                                               NK

   Probability of all infections
                          €                            ! i,b = % " (Sik ,Iik , #, $ )
                                                               k =1




                                     !




         x = infection;      = symptom onset;                       = recovery
Likelihood Function for Fully
Observed Household Outbreaks



      x = infection;   = symptom onset;     = recovery


 Likelihood of a household            ! i = ! i,a " ! i,b !
 observation


 Likelihood of all household   !       O =     ∏     i
 observations                                   i∈H
Likelihood Function for Fully
Observed Household Outbreaks
  Force of Infection @ t              λ(Sij ,I ij ,β ,α ) = Sij ( βI ij + α)

                                                 N Q −1
  Likelihood of no infections over
  all infection-free intervals
                                       i, a =   ∏ exp(− λ(S        ij   ,I ij ,β ,α)(tj+1 − tj ))
                                                 j =0
                        €
                                                             NK

  Probability of all infections
                          €                        ! i,b = % " (Sik ,Iik , #, $ )
                                                             k =1




  Likelihood of a household                               ! i = ! i,a " ! i,b !
  observation                     !


  Likelihood of all household                              O =     ∏
                                      !                                        i
  observations                                                      i∈H
Unobserved Infection States




                 + 104 Households w/
                  No Secondary Cases
Unobserved Infection States
 Use data augmentation to generate
  complete observations.

   For each symptom onset event (q):

     Draw incubation time, k, from distribution
       Infection time, a = q – k
       If you draw any a < 0, whole sample has
        likelihood = 0.

     Draw recovery time, r, from symptom
      duration distribution.
       If r > observation period, w:
         r=w
         For right-censoring in data.

   Repeat for many (1K+) samples
Unobserved Infection States




        x = infection;   = symptom onset;   = recovery


   Evaluate likelihood w/respect to β and α for each sample.
   E(L) is estimated likelihood of data.
Unobserved Household Sizes
 Sizes of households in Stockholm outbreak are
  unknown.

 Expected number of cases is:
     S(βI + a)Δt
   Missing S!

 Solution:
   Assume exposed households are sampled at random from
    the whole population.
     For each augmented household time series, sample household
      size from Swedish census distribution.
     Save samples by setting a lower bound:
       Likelihood of outbreak with have fewer individuals than
         observed infections = 0, so don’t sample these.
Results: MLE Parameter Values
and 95% Confidence Intervals




 1/γ limited to values >= 1 day; infectiousness duration < 1 day not plausible
Results: Likelihood Surface




 Contour plot shows likelihood for combinations of β and 1/γ for γs = 1.

 Triangle is location of MLE; Dashed oval 95% confidence bounds

 Parameter space isn’t very large, optimize using brute force.
At end of step:
                        Transition from   E "I those who have infectiousness onset time <= t.

Goodness ofI "R
             fit        Transition          those who have recovery time <= t

                Else:  !
                     STOP
 Simulate       from
                    ! SEIR
                   model using fitted parameters
  and same demographics as outbreak.

        If          :
                " = #SI
                Draw number of new infections, x, from Binomial(S, ")

                S=S–x
    !           E=E+x
                                                    !
                Draw symptom onset times from                             for all new infections.
                t = t + dt

                At end of step:
                Transition from   E "I those who have infectiousness onset time <= t.
                Transition   I "R those who have recovery time <= t

        Else:      !
                 STOP
                !
Goodness of fit
 Simulate from SEIR model using
  fitted parameters and same
  demographics as outbreak.

 Quantify model performance
  based on closeness to outbreak       # of infections in households w/       !


  characteristics                              2-ary transmission

   Average number of infections in
    households with secondary cases.
     Simulated: 1.9, SD = 0.2
     Stockholm: 1.6
   Average number of households
    with no secondary cases.
     Simulated: 110.5, SD = 5.5
                                                                          !
     Stockholm: 104                      # of households with zero
                                               secondary cases
Sensitivity Analysis:
Household Sizes
 Want to understand the extent to which using sampled
  household sizes biases results.

 Simulate outbreaks with household sizes drawn from
  Swedish census distribution.
   Estimate parameters using:
     Sampled household sizes
     Known sizes from simulation
   Compare results.
Results: Sensitivity Analysis
 Estimate parameters for outbreak with β = 0.14/day
  and 1/γ = 1.2 days




 Dashed lines show fit when household sizes are
  known, solid are unknown.
 Results almost exactly the same.
Asymptomatic Infections
 Problem: Only observed symptomatic infections
 Asymptomatics likely don’t contribute much to outbreaks in
  households with symptomatic cases, but can be infected during
  these outbreaks.
   Are very important for seeding new outbreaks:
      Stockholm outbreak started by post-symptomatic food-handler
      Afternoon Delight outbreak in Ann Arbor
      Subway outbreaks in Kent County, MI

 Full analysis of asymptomatic infections requires active surveillance
   e.g., Stool and environmental samples.

 Solution: Estimate parameters for outbreaks with varying levels of
  asymptomatic infection using simulated data.
Modeling asymptomatic infection

 π is proportion of new infections that are asymptomatic.
   Assume asymptomatic infections are non-infectious during household
    outbreak.


 Sample 20 outbreaks each for combinations of:
   Β = {0.075,0.085,…,.2}
   π = {0, .1,…,.5}
Modeling asymptomatic infection
     If          :
             " = #SI
             Draw number of new infections, x, from Binomial(S, ")
             Draw number never symptomatic, a, from Binomial(x, " )
 !           S = S – (x-a)
             E = E + (x-a)                       !
             R=R+a                                !

             Draw symptom onset times from                             for all new infections.
             t = t + dt

             At end of step:
             Transition from   E "I those who have infectiousness onset time <= t.
             Transition   I "R those who have recovery time <= t

     Else:      !
              STOP
             !
Modeling asymptomatic infection

 π is proportion of new infections that are asymptomatic.
   Assume asymptomatic infections are non-infectious during household
    outbreak.

 Sample 20 outbreaks each for combinations of:
   Β = {0.075,0.085,…,.2}
   π = {0, .1,…,.5}

 Estimate parameters using data augmentation method.
   Assume π = 0, as when fitting Stockholm data.

 Find expected value of β for each tau when estimated β = 0.14.
Modeling asymptomatic infection
Norovirus outbreaks in realistic
communities
 Norovirus has interesting qualitative outbreak dynamics in the
  community.
   Outbreaks are explosive but typically limited.
   Multiple levels of transmission:
      Can embed findings about household transmission.
      Community rate of transmission is unknown.

 Data on community and region-level Norovirus outbreaks are rare.

 Take a pattern-oriented approach to building community-level
  models of NoV transmission.

 Build a model based on observed patterns and data that can
  recreate outbreaks with NoV-like characteristics.
Detailed Transmission Model
              (βIS*IS) + (βIA*IA)
                                                     IA1
              S              E           IS                         R

                                                     IA2

NoV transmission is marked by heterogeneous asymptomatic infectious periods.

~5% of the population will shed for 100+ days.

Existing theory predicts that increasing variability in individual infectiousness
makes outbreaks less predictable, but smaller on average.

Want to understand how this heterogeneity impacts outbreak dynamics in the
context of heterogeneous contact structure.
Contact structure
 Household sizes:
   Assume a representative community, i.e., household sizes are a
    random sample from the census distribution of household sizes.

 Contacts in the community:
   Individuals separated into compartments:
      School, work, etc
   Social network:
      How do we choose a network topology that is useful and informative?

 Food handlers:
   About 1% of U.S. adults are food handlers
   Average norovirus point-source outbreak size is about 40
Empirical contact networks

 Many empirical
  community contact
  networks have an
  exponentially
  distributed degree.
   Moderate
    heterogeneity in
    contact
Outbreak Realizations




 = Household Transmission

 = Community Transmission

 = Point Source Event

More Related Content

PPTX
Tracking Epidemics with Natural Language Processing and Crowdsourcing
DOCX
Ebola_2014_outbreak
PDF
PPTX
Preparing for armageddon
PPTX
Measuring the Potential Impact of Frailty on the Apparent Declining Efficacy ...
PDF
A mathematical model for Rift Valley fever transmission dynamics
PPTX
Virus vector ppt
PDF
Cooper_Laura_poster_resized
Tracking Epidemics with Natural Language Processing and Crowdsourcing
Ebola_2014_outbreak
Preparing for armageddon
Measuring the Potential Impact of Frailty on the Apparent Declining Efficacy ...
A mathematical model for Rift Valley fever transmission dynamics
Virus vector ppt
Cooper_Laura_poster_resized

What's hot (7)

PPTX
OS20 POSTER - Modelling the spread of transboundary animal disease in and bet...
PPTX
Spatial and temporal dynamics of skin microbial communities in a Neotropical ...
DOCX
epidemiology report on The Coming Plague by Laurie Garrett
PDF
Social dimensions of zoonoses in interdisciplinary research
PDF
MODELLING THE SPREAD OF TRANSBOUNDARY ANIMAL DISEASE IN AND BETWEEN DOMESTIC ...
PPTX
Global distribution maps of the arbovirus vectors Ae. aegypti and Ae. Albopictus
PDF
Relations between pathogens, hosts and environment
OS20 POSTER - Modelling the spread of transboundary animal disease in and bet...
Spatial and temporal dynamics of skin microbial communities in a Neotropical ...
epidemiology report on The Coming Plague by Laurie Garrett
Social dimensions of zoonoses in interdisciplinary research
MODELLING THE SPREAD OF TRANSBOUNDARY ANIMAL DISEASE IN AND BETWEEN DOMESTIC ...
Global distribution maps of the arbovirus vectors Ae. aegypti and Ae. Albopictus
Relations between pathogens, hosts and environment
Ad

Similar to ICPSR 2011 - Bonus Content - Modeling with Data (20)

PDF
ICPSR - Complex Systems Models in the Social Sciences - 2013 - Professor Dani...
PDF
Sensitivity Analysis of the Dynamical Spread of Ebola Virus Disease
PPTX
network_epidemic_models.pptx
PPT
Terminologies communicable diseases
PPT
introduction of infectious diseases. .ppt
PPT
Principles of infectious disease. epidemiology
PPTX
4 ecology of parasites part 1
PPTX
Concepts in Infectious Diseases Epidemiology.pptx
PDF
Dynamics and Control of Infectious Diseases (2007) - Alexander Glaser
PPTX
Frequency Measures in pptx.pptx
PPT
BASIC MEASUREMENTS IN EPIDEMIOLOGY presentation
PPT
Epidemiology
PPTX
epidemiology advanced for masters and undergraduates
PPT
Principles of Communicable Diseases Epidemiology.ppt
PPTX
Chain transmission in humans dna and rna
PPTX
p03-chain-transmission_0.pptx infectious
PPTX
Pre-empting the emergence of zoonoses by understanding their socio-ecology
PPTX
Introduction to Epidemiology presentation .pptx
PPT
epidemiolyyyyyyyyyyyyyy-180511045025.ppt
PPT
epidemiolyyyyyyyyyyyyyy-180511045025.ppt
ICPSR - Complex Systems Models in the Social Sciences - 2013 - Professor Dani...
Sensitivity Analysis of the Dynamical Spread of Ebola Virus Disease
network_epidemic_models.pptx
Terminologies communicable diseases
introduction of infectious diseases. .ppt
Principles of infectious disease. epidemiology
4 ecology of parasites part 1
Concepts in Infectious Diseases Epidemiology.pptx
Dynamics and Control of Infectious Diseases (2007) - Alexander Glaser
Frequency Measures in pptx.pptx
BASIC MEASUREMENTS IN EPIDEMIOLOGY presentation
Epidemiology
epidemiology advanced for masters and undergraduates
Principles of Communicable Diseases Epidemiology.ppt
Chain transmission in humans dna and rna
p03-chain-transmission_0.pptx infectious
Pre-empting the emergence of zoonoses by understanding their socio-ecology
Introduction to Epidemiology presentation .pptx
epidemiolyyyyyyyyyyyyyy-180511045025.ppt
epidemiolyyyyyyyyyyyyyy-180511045025.ppt
Ad

More from Daniel Katz (20)

PDF
Legal Analytics versus Empirical Legal Studies - or - Causal Inference vs Pre...
PDF
Can Law Librarians Help Law Become More Data Driven ? An Open Question in Ne...
DOCX
Why We Are Open Sourcing ContraxSuite and Some Thoughts About Legal Tech and ...
PDF
Fin (Legal) Tech – Law’s Future from Finance’s Past (Some Thoughts About the ...
PDF
Exploring the Physical Properties of Regulatory Ecosystems - Professors Danie...
PDF
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
PDF
Building Your Personal (Legal) Brand - Some Thoughts for Law Students and Oth...
PDF
Measure Twice, Cut Once - Solving the Legal Profession Biggest Challenges Tog...
PDF
Artificial Intelligence and Law - 
A Primer
PDF
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...
PDF
Technology, Data and Computation Session @ The World Bank - Law, Justice, and...
PDF
LexPredict - Empowering the Future of Legal Decision Making
PDF
{Law, Tech, Design, Delivery} Observations Regarding Innovation in the Legal ...
PDF
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
PDF
Legal Analytics Course - Class 12 - Data Preprocessing using dPlyR - Professo...
PDF
Legal Analytics Course - Class 10 - Information Visualization + DataViz in R ...
PDF
Legal Analytics Course - Class #4 - Github and RMarkdown Tutorial - Professor...
PDF
Legal Analytics Course - Class 9 - Clustering Algorithms (K-Means & Hierarch...
PDF
Legal Analytics Course - Class 8 - Introduction to Random Forests and Ensembl...
PDF
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...
Legal Analytics versus Empirical Legal Studies - or - Causal Inference vs Pre...
Can Law Librarians Help Law Become More Data Driven ? An Open Question in Ne...
Why We Are Open Sourcing ContraxSuite and Some Thoughts About Legal Tech and ...
Fin (Legal) Tech – Law’s Future from Finance’s Past (Some Thoughts About the ...
Exploring the Physical Properties of Regulatory Ecosystems - Professors Danie...
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
Building Your Personal (Legal) Brand - Some Thoughts for Law Students and Oth...
Measure Twice, Cut Once - Solving the Legal Profession Biggest Challenges Tog...
Artificial Intelligence and Law - 
A Primer
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...
Technology, Data and Computation Session @ The World Bank - Law, Justice, and...
LexPredict - Empowering the Future of Legal Decision Making
{Law, Tech, Design, Delivery} Observations Regarding Innovation in the Legal ...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 12 - Data Preprocessing using dPlyR - Professo...
Legal Analytics Course - Class 10 - Information Visualization + DataViz in R ...
Legal Analytics Course - Class #4 - Github and RMarkdown Tutorial - Professor...
Legal Analytics Course - Class 9 - Clustering Algorithms (K-Means & Hierarch...
Legal Analytics Course - Class 8 - Introduction to Random Forests and Ensembl...
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...

ICPSR 2011 - Bonus Content - Modeling with Data

  • 1. Modeling with Data INTRO TO COMPUTING FOR COMPLEX SYSTEMS (Session XVI) Jon Zelner University of Michigan 8/11/2010
  • 2. Data in the Modeling Process Agents Observed Behavior Environment Model Simulated Behavior
  • 3. Pattern Oriented Modeling (POM)  Term coined by Grimm et al. in 2005 Science paper  Modeling process should be guided by patterns of interest  Can use patterns @ multiple levels:  Individual agents  Environment  Aggregate agent behavior  Patterns should be used both to guide model development and to calibrate and validate models.
  • 4. Types of data for modelers  Counts/Proportions:  Time Series:  Infections  Evolution of outbreak in  Occupied patches time  Timeline of conflict  Distributions  Number of firms over  Age time  Lifespan  Qualitative:  Duration of infection  ‘Norovirus-like’  Rates outbreaks  Birthrates  Size and shape of forest patches  Transmission rate  Clusters of settlements
  • 6. Pattern Oriented Modeling: Kayenta Anasazi (Axtell et al. 2001)  Trying to understand population growth and collapse among the Kayenta Anasazi in U.S. Southwest  Many factors in this:  Weather  Farming  Kinship  Optimize models by explaining multiple patterns @ one time.
  • 8. Inference for POM  Bayesian/Qualitative  Use some kind of quality function to score goodness of runs and optimize by minimizing distance between model output and optimum quality and/or data.  Number of occupied patches  Size of elephant herds  Frequentist/Likelihood-based  Define a likelihood function for Data | Model  Simulate runs from the model and evaluate likelihood of data as (# runs == Data) / # runs
  • 9. How Infections Propagate After Point-Source Events An Analysis of Secondary Norovirus Transmission Jon Zelnera,b, Aaron A. Kinga,c, Christine Moee & Joseph N.S. Eisenberga,d University of Michigan a Center for the Study of Complex Systems, b Sociology & Public Policy, c Ecology & Evolutionary Biology, d School of Public Health Emory University e Rollins School of Public Health
  • 10. Norovirus (NoV) Epidemiology  Most common cause of non-bacterial gastroenteritis in the US and worldwide.  Est. 90 million cases in 2007  Explosive diarrhea & projectile vomiting in symptomatic cases.  Single-stranded, non-enveloped RNA virus  Member of family Caliciviridae  Often transmitted via food  Salad greens  Shellfish  Most person-to-person transmission is via the environment and fomites.
  • 11. Why model transmission after point-source events? IA  Typical analysis of point-source events focuses on primary, one-to-many risk:  How many cases are created by an infectious food handler? IS  How many people infected after water H treatment failure? S  However, actual size of point source events is underestimated without including secondary transmission risk.  Within-household transmission is an important bridge between point-source events.  So, even if within-household Ro < 1, household cases have important dynamic consequences at the community level.
  • 12. NoV Transmission Dynamics  Norovirus transmission dynamics tend to be locally unstable but globally persistent.  E.g., small, explosive outbreaks in Mercer County, but no local NoV epidemic  Multiple reported NoV outbreaks throughout New Jersey every week.  Stochasticity operates at multiple levels.  Disease/Contact
  • 13. NoV Transmission Dynamics Exponential Growth, Global Invasion (e.g.,Pandemic Flu) Short, Explosive & Limited (Typical of NoV outbreaks)
  • 14. Outbreak Data  Gotz et al. (2001) observed 500+ households exposed to NoV after a point-source outbreak in a network of daycare centers in Stockholm, Sweden.  Traceable to salad prepared by a food handler who was shedding post-symptoms.  Followed 153 of these households  Eliminating those with only one person.  49 had secondary cases  104 have no secondary cases
  • 15. Deterministic SEIR model  Infinite population dS  Mass-action mixing = "#SI dt  Frequency-dependent dE transmission = #SI " $E dt  When I > 0, a fraction of the dI susceptible population is infected = $E " %I at every instant dt  Constant average rate of recovery dR  Doesn’t matter who is infected = %I dt  ‘Nano-fox’ problem !
  • 16. Why use a stochastic model?  Deterministic models work well when assumptions are plausible, but are less useful when:  Populations are small:  e.g.,Household outbreak  Global contact patterns deviate from homogeneous mixing:  Social networks Exponential RV  Realistic behavior  Disease natural history is not memoryless:  Recovery period is not exponentially or gamma distributed  Lots of variability in individual infectiousness Lognormal RV
  • 17. Progression of NoV Infection  Short incubation period (~1.5 days)  Typical symptom duration around 1.5 days.  Exceptional cases up to a year have been reported.  Most people shed asymptomatically after recovery of symptoms:  Typically for several days  Not uncommon for shedding to last > 1 month, year or more  15-50% of all infections may be totally asymptomatic
  • 18. Basic NoV Transmission Model for Household Outbreaks  SEIR Transmission Model  Individuals may be in one of four states:  Susceptible  Exposed/Incubating  Infectious  Recovered/Immune  Multiple boxes in E & I states correspond to shape parameter of gamma distributed waiting times.  Background infection parameter, α. (Fixed to 0.001/day)  Although NoV immunity tends to be partial and short-lived, this model is adequate for analyzing short-lived outbreaks.
  • 19. Analysis Objectives  Estimate daily person-to-person rate of infection (β).  0.14/infections per day  Estimate average effective duration of infection (1/γ) and shape parameter of gamma-distributed infectiousness duration.  1.2 days; γs = 1  Effect of missing household sizes on results.  Minimal  Effect of asymptomatic infection.  .035 increase in β for each 10% increase in proportion of individuals who are asymptomatically infectious
  • 20. What makes these data challenging to work with?  We want to understand:  Daily person-to-person rate of infection (β).  Average effective duration of infection (1/γ).  Variability in 1/γ.  Generation of asymptomatic infections.  But household data are noisy and only partially observed:  We know time of symptom onset but are missing:  Time of infection  Time of recovery  Firm estimate of asymptomatic ratio & infectiousness  Household Sizes (!)  Strength of these data are that we can treat each household as an independent trial of a random infection process.
  • 21. Likelihood Function for Fully Observed Household Outbreaks Force of Infection @ t λ(Sij ,I ij ,β ,α ) = Sij ( βI ij + α) N Q −1 Likelihood of no infections over all infection-free intervals  i, a = ∏ exp(− λ(S ij ,I ij ,β ,α)(tj+1 − tj )) j =0 € NK Probability of all infections € ! i,b = % " (Sik ,Iik , #, $ ) k =1 ! x = infection; = symptom onset; = recovery
  • 22. Likelihood Function for Fully Observed Household Outbreaks x = infection; = symptom onset; = recovery Likelihood of a household ! i = ! i,a " ! i,b ! observation Likelihood of all household ! O = ∏ i observations i∈H
  • 23. Likelihood Function for Fully Observed Household Outbreaks Force of Infection @ t λ(Sij ,I ij ,β ,α ) = Sij ( βI ij + α) N Q −1 Likelihood of no infections over all infection-free intervals  i, a = ∏ exp(− λ(S ij ,I ij ,β ,α)(tj+1 − tj )) j =0 € NK Probability of all infections € ! i,b = % " (Sik ,Iik , #, $ ) k =1 Likelihood of a household ! i = ! i,a " ! i,b ! observation ! Likelihood of all household O = ∏ ! i observations i∈H
  • 24. Unobserved Infection States + 104 Households w/ No Secondary Cases
  • 25. Unobserved Infection States  Use data augmentation to generate complete observations.  For each symptom onset event (q):  Draw incubation time, k, from distribution  Infection time, a = q – k  If you draw any a < 0, whole sample has likelihood = 0.  Draw recovery time, r, from symptom duration distribution.  If r > observation period, w:  r=w  For right-censoring in data.  Repeat for many (1K+) samples
  • 26. Unobserved Infection States x = infection; = symptom onset; = recovery  Evaluate likelihood w/respect to β and α for each sample.  E(L) is estimated likelihood of data.
  • 27. Unobserved Household Sizes  Sizes of households in Stockholm outbreak are unknown.  Expected number of cases is:  S(βI + a)Δt  Missing S!  Solution:  Assume exposed households are sampled at random from the whole population.  For each augmented household time series, sample household size from Swedish census distribution.  Save samples by setting a lower bound:  Likelihood of outbreak with have fewer individuals than observed infections = 0, so don’t sample these.
  • 28. Results: MLE Parameter Values and 95% Confidence Intervals 1/γ limited to values >= 1 day; infectiousness duration < 1 day not plausible
  • 29. Results: Likelihood Surface  Contour plot shows likelihood for combinations of β and 1/γ for γs = 1.  Triangle is location of MLE; Dashed oval 95% confidence bounds  Parameter space isn’t very large, optimize using brute force.
  • 30. At end of step: Transition from E "I those who have infectiousness onset time <= t. Goodness ofI "R fit Transition those who have recovery time <= t Else: ! STOP  Simulate from ! SEIR model using fitted parameters and same demographics as outbreak. If : " = #SI Draw number of new infections, x, from Binomial(S, ") S=S–x ! E=E+x ! Draw symptom onset times from for all new infections. t = t + dt At end of step: Transition from E "I those who have infectiousness onset time <= t. Transition I "R those who have recovery time <= t Else: ! STOP !
  • 31. Goodness of fit  Simulate from SEIR model using fitted parameters and same demographics as outbreak.  Quantify model performance based on closeness to outbreak # of infections in households w/ ! characteristics 2-ary transmission  Average number of infections in households with secondary cases.  Simulated: 1.9, SD = 0.2  Stockholm: 1.6  Average number of households with no secondary cases.  Simulated: 110.5, SD = 5.5 !  Stockholm: 104 # of households with zero secondary cases
  • 32. Sensitivity Analysis: Household Sizes  Want to understand the extent to which using sampled household sizes biases results.  Simulate outbreaks with household sizes drawn from Swedish census distribution.  Estimate parameters using:  Sampled household sizes  Known sizes from simulation  Compare results.
  • 33. Results: Sensitivity Analysis  Estimate parameters for outbreak with β = 0.14/day and 1/γ = 1.2 days  Dashed lines show fit when household sizes are known, solid are unknown.  Results almost exactly the same.
  • 34. Asymptomatic Infections  Problem: Only observed symptomatic infections  Asymptomatics likely don’t contribute much to outbreaks in households with symptomatic cases, but can be infected during these outbreaks.  Are very important for seeding new outbreaks:  Stockholm outbreak started by post-symptomatic food-handler  Afternoon Delight outbreak in Ann Arbor  Subway outbreaks in Kent County, MI  Full analysis of asymptomatic infections requires active surveillance  e.g., Stool and environmental samples.  Solution: Estimate parameters for outbreaks with varying levels of asymptomatic infection using simulated data.
  • 35. Modeling asymptomatic infection  π is proportion of new infections that are asymptomatic.  Assume asymptomatic infections are non-infectious during household outbreak.  Sample 20 outbreaks each for combinations of:  Β = {0.075,0.085,…,.2}  π = {0, .1,…,.5}
  • 36. Modeling asymptomatic infection If : " = #SI Draw number of new infections, x, from Binomial(S, ") Draw number never symptomatic, a, from Binomial(x, " ) ! S = S – (x-a) E = E + (x-a) ! R=R+a ! Draw symptom onset times from for all new infections. t = t + dt At end of step: Transition from E "I those who have infectiousness onset time <= t. Transition I "R those who have recovery time <= t Else: ! STOP !
  • 37. Modeling asymptomatic infection  π is proportion of new infections that are asymptomatic.  Assume asymptomatic infections are non-infectious during household outbreak.  Sample 20 outbreaks each for combinations of:  Β = {0.075,0.085,…,.2}  π = {0, .1,…,.5}  Estimate parameters using data augmentation method.  Assume π = 0, as when fitting Stockholm data.  Find expected value of β for each tau when estimated β = 0.14.
  • 39. Norovirus outbreaks in realistic communities  Norovirus has interesting qualitative outbreak dynamics in the community.  Outbreaks are explosive but typically limited.  Multiple levels of transmission:  Can embed findings about household transmission.  Community rate of transmission is unknown.  Data on community and region-level Norovirus outbreaks are rare.  Take a pattern-oriented approach to building community-level models of NoV transmission.  Build a model based on observed patterns and data that can recreate outbreaks with NoV-like characteristics.
  • 40. Detailed Transmission Model (βIS*IS) + (βIA*IA) IA1 S E IS R IA2 NoV transmission is marked by heterogeneous asymptomatic infectious periods. ~5% of the population will shed for 100+ days. Existing theory predicts that increasing variability in individual infectiousness makes outbreaks less predictable, but smaller on average. Want to understand how this heterogeneity impacts outbreak dynamics in the context of heterogeneous contact structure.
  • 41. Contact structure  Household sizes:  Assume a representative community, i.e., household sizes are a random sample from the census distribution of household sizes.  Contacts in the community:  Individuals separated into compartments:  School, work, etc  Social network:  How do we choose a network topology that is useful and informative?  Food handlers:  About 1% of U.S. adults are food handlers  Average norovirus point-source outbreak size is about 40
  • 42. Empirical contact networks  Many empirical community contact networks have an exponentially distributed degree.  Moderate heterogeneity in contact
  • 43. Outbreak Realizations  = Household Transmission  = Community Transmission  = Point Source Event