SlideShare a Scribd company logo
Data Collection and Analysis
Methodologies
Road Safety
A.A. 2023-2024
Module 3
Stephen Kome
Mod. 3: Data Analysis Methodologies
• 3-0 Road Safety Data Collection
• 3-1 Basic statistical concepts
• 3-2 Contingency Tables
• 3-3 Safety Performance Functions
Slide 2
Mod. 3
3-0 - ROAD SAFETY DATA
COLLECTION
Slide 3
• Reliable and harmonised road accident data are crucial
for defining evidence-based road safety policies and
to monitor performances and assess results
• Data on infrastructures, traffic (exposure), accident
costs, Safety Performance Indicators are also needed
• Also road users must be involved in the information
collection and planning process
• European Union invested a lot of resources in improving
quality and availability of accident data, mainly through
dedicated research projects
• Observatories of different levels (Continental, National,
Regional, Urban) are fundamental tools
Page 4
The problem of accident data
Why collect data?
Define
problems
Identify risk
factors,
priorities
Formulate
strategy
Set targets
Monitor
performance
Slide 5
Which data to collect?
Only 22% of
countries are able to
provide information
on road traffic
fatalities, non-fatal
injuries, economic
impact and selected
SPIs (WHO)
Slide 6
SOURCE AND TYPES OF
DATA
Slide 7
Who are the main sources?
• Police
• Health authorities
• Transport bodies
• Other stakeholders may include:
– national statistics office,
– the insurance industry,
– non-governmental organizations working for road
safety,
– academic institutions…
Slide 8
Sources and types of data (1)
WHO, 2004
Slide 9
Mod. 2
Sources and types of data (2)
WHO, 2004
Slide 10
Mod. 2
QUALITY OF ACCIDENT DATA
Slide 11
What affects crash data quality?
1. Definitions
2. Reporting/under-reporting of crashes or
injuries
3. Missing data
4. Errors
Slide 12
The definition of road accident fatality
• The classification of the severity of injuries
and crashes vary among countries.
• The range of injury severity categories that
may be used by health professionals or police
officers includes slight/minor, moderate,
serious/severe, and fatal
• The recommended definition of a road traffic
fatality is (WHO 2009):
– “any person killed immediately or dying within 30 days
as a result of a road traffic injury accident, excluding
suicides”
Slide 13
How many countries use ‘died within
30 days’ definition?
Less than half of the 178 countries monitored by WHO use the
recommended definition of a road traffic fatality (WHO, 2009)
Slide 14
When a road fatality is not defined in such a
way, the reported number of fatalities can be
made more accurate by multiplying the reported
number by an appropriate adjustment factor.
(depending on the definition used recommended
by the European Conference of Ministers of
Transport).
Under-reporting of road accidents
• Not all crashes and injuries that occur are
documented in the data system (particularly
for slight injury and PDO).
• Reasons for under-reporting are:
– Police may not be informed when a crash occurs
– Police do not always go to the scene
– Police may go to the crash scene, but not formally
register the crash
– Some data are missing
– Data are not transmitted to Statistical Offices
– Health consequences follow-up not monitored
– Died after 30 days
Slide 15
Methods for assessing under-reporting
• Compare the number of police reports filed on
certain events to those captured in the database.
• Compare the number of road traffic fatalities
and/or injuries counted by one data source,
usually the police database, to those counted in a
survey.
• Compare the number of road traffic fatalities
and/or injuries counted in the police database to
the number counted in other databases.
• Use linkage or capture-recapture methods to
match records from different databases
Slide 16
Mean level of accident reporting by
injury severity
Elvik & Mysen, 1999
Slide 17
Reporting of road accident injuries in
different countries
64
88
21
39
43
37
49
0 20 40 60 80 100
Australia
Canada
Danimarca
Germania
Olanda
Norvegia
USA
Elvik, 2004
Official crash Statistics / Hospitals data
Slide 18
Page 19
In African Countries 1
Source: WHO, SAFERAFRICA Project
Page 20
In African Countries 1
7 12
27
47 48
116
181
238
85
332 324
229
264
276
241
311
269
281
0
50
100
150
200
250
300
350
400
Congo,
Dem. Rep.
Central
African
Republic
Gabon Congo, Rep. Cameroon Chad Sao Tome
and Principe
Angola Central
Africa
Reported fatality rates per million population (2013) Estimatedfatality rates per millionpopulation (2013)
Source: WHO, SAFERAFRICA Project
Actions On Data Loss
Slide 21
Lesson 3 18/12/2024
Lost Information Direct/Indirect Actions
Accidents not detected by the police Collect health and insurance statistics
Accidents involving only property
damage
Collect data on all accidents by police
authorities
Accidents not reported Local control of reporting procedures
Data not collected in the field Use of innovative IT tools, Redefine
collected data
Health consequences monitoring not
controlled
Local control of information exchange
procedures with AUSL (local health
services)
Deaths occurring 30 days after the
accident
Collect health statistics data
Field collection errors, Transcription
errors on ISTAT forms
Use of innovative IT tools
Data difficult to collect Use of innovative IT tools
ITALY CASE STUDY
Slide 22
Example: Accident data collection in
Italy
• Road accidents are recorded by 3 police
bodies:
– National Traffic Police
– Local Police
– Carabinieri
• The data related to injury accidents
collected from each body are sent to ISTAT
• During the process, some is lost
Slide 23
Police bodies in Italy
Urban area Outside urban
area
Property
Damage
Only (PDO)
accidents
Local police
On request: Traffic
Police and Carabinieri
-Traffic Police
-Carabinieri
Exception: Local Police
Injury
Accidents
-Local police
-Traffic police
- Carabinieri
-Traffic Police
- Carabinieri
Slide 24
Accident data collection in Italy
Slide 25
National Traffic
Police
Carabinieri
Carabinieri Headquarter
Local Police
Local
Statistics
Office
National
Institute of
Statistics -
ISTAT
Provincial
Monitoring
Centers
ISTAT
Regional
offices
Headquar
ters
Police Data
Center
The ISTAT Ctt/Inc accident form
Location
Vehicles involved
Crash conditions
Injuries
Drivers
Up to 200 «variables» per accident
Occup
ants
Slide 26
ITS for Data Collection
Page 27
• Creation and implementation of traffic
accident databases and of an information
system for road safety at national level
• Creation of the National Centre for Analysis of
Traffic Accidents
Coordinated by CTL
Main partners: IBSR,
IT, SWOV
Page 28
Example of good practices: Cameroon
Page 29
The actors involved
Page 30
The network architecture
Page 31
The modules
Sfinge ©
Statistical
analysis
Collection and management data
module
Documentation
module
Authentication, roles and
security module
Plan
Module
Online help
Integration and
validation data
Hospital services
Module
Geographic
Images module
Page 32
Screenshots
Page 33
Automatic location
Page 34
Mapping
Page 35
Charts
Page 36
Hospital Data Collection screenshot
Slide 37
Road Safety Databases
http://istat.maps.arcgis.com/apps/MapSeries/index.html?app
id=b34ba84168da4147b810f0d04f59881d
https://ec.europa.eu/transport/road_safety/specialist/statistics
/map-viewer/
https://extranet.who.int/roadsafety/death-on-the-
roads/#deaths//all
World level
European level
Italian level
3-1 BASIC STATISTICAL
CONCEPTS
Slide 38
Mod. 3
Main contents
• Safety, units, traits and populations
• Recorded and Expected number of accidents
• Random and Systematic variation in accident
counts
• Regression-to-the-mean
Slide 39
Mod. 3
What is SAFETY?
0
1
2
3
4
2001 2002 2003 2004 2005 2006 2007 2008 2009
Recorded number of accident at an intrsection in
Perugia
Recorded number of accidents
Here is a count of injury accidents for an
intersection in Perugia.
What is its SAFETY?
Slide 40
Mod. 3
What is a UNIT?
• … “what is its safety?” implies that SAFETY
is a property of UNITS
• A Unit can be:
– a road segment
– an intersection
– Mr. Mario Rossi
– a car
– etc.
Slide 41
Mod. 3
What is the safety of a Unit?
• …The number of accidents that has been reported at
a certain location during a certain period?
Slide 42
0
1
2
3
4
2001 2002 2003 2004 2005 2006 2007 2008 2009
Recorded number ofaccident at an intrsection in
Perugia
Recorded number of accidents
• The intersection shows
different values of
accidents, and in general
some fluctuations
• If we use the recorded
number of accidents,
that would mean that
safety improved from
2002 to 2003,
deteriorated from 2003 to
2004 etc.
• The probability that the
intersection is chosen for
interventions depends on
the year taken for
reference
Mod. 3
Observed values: the recorded number
of accidents
• The number of accidents that has been
reported at a certain location during a certain
period
• The recorded number ≠ the “true” number
Slide 43
Mod. 3
0
1
2
3
4
2001 2002 2003 2004 2005 2006 2007 2008 2009
Recorded number of accident at an intrsection in
Perugia
Recorded number of accidents Annual mean
18/12/2024
Mod. 2 Slide 44
There are 3 elements in the graph:
1. Observed values ●
2. The invisible (unknown) safety property μ
3. Our estimate of the unknown property ○
Number
of years
Average
value
1 3.0
2 2.5
3 2.7
4 2.3
5 2.0
6 2.0
What if we calculate the average value?
Variation in short-term accidents
frequency
Slide 45
Mod. 3
The Recorded Number of accidents is
“not useful” for safety management…
• … because safety changes even if there is no
change in safety-relevant traits. (exposure,
traffic control, physical features, user
demography, etc.).
• Accidents are (thankfully!) rare events and
their pattern exhibits random fluctuations
• We need a definition of the safety of a unit
such that, as long as the ‘safety-relevant’
traits of the unit do not change, it’s ‘safety’
does not change.
Slide 46
Mod. 3
What is the safety of a Unit?
The safety property of a unit is “the number of
accidents by type and severity, expected to
occur on it in a specified period of time.”
(Hauer, 1997)
It will always be denoted by μ and its estimate
by
Slide 47
“ “
Mod. 3
The ‘safety’ of a unit depends on its
‘traits’
• Mass
• Height
• Engine capacity
• Stiffness
• Colour
• …
Slide 48
Mod. 3
The ‘safety’ of a unit depends on its
‘traits’
Slide 49
• N°of
approaches
• Type of traffic
control
• AADT
• Number of lanes
• Visibility
• Roadside
conditions
• Road surface
condition
• …..
Mod. 3
What is the link between safety and
traits?
• A trait is ‘safety-related’ (s-r) if when it
changes, μ changes.
• Consequence: Units with the same s-r traits
have the same μ (and of course, units that
differ in some s-r traits differ in μ‘s).
Slide 50
Mod. 3
Populations
• Units that share some traits form a
population of units.
• Example: (1) rural, (2) two-lane road
segments in (3) flat terrain
• Because only some traits are common, the
units differ in many safety-related traits and
therefore differ in their μ
Slide 51
Mod. 3
Parameters of populations
We will describe the safety of a
population by:
Mean of μ’s, E{μ} and
Standard deviation of μ’s, σ{μ}
Slide 52
Mod. 3
Notational conventions to remember
μ - the expected number of accidents for a
unit
- estimate of μ . Caret above always
means: estimate of ...
- Mean of μ’s in a population of units.
- standard deviation of μ’s in a
population of units.
Slide 53
Mod. 3
Variation in accident counts
Random variation
=
variation in the recorded
number of accidents
around a given expected
number of accidents
Systematic variation
=
variation in the expected
number of accidents in
time or space between
given units of observation
(drivers, road sections,
modes of travel, etc)
Slide 54
Mod. 3
Why variation is important
Variation must be considered at two critical
points in safety analyses:
1. Identifying the best entities for investment;
2. Evaluating effectiveness of the action.
Slide 55
Mod. 3
RANDOM VARIATION
Slide 56
Mod. 3
Modeling accidents with the Binomial
distribution
18/12/2024
Mod. 2 Slide 57
Parameters:
n ∈ {0,1,2,…} - number of
trials (number of
opportunities for an accident,
i.e. exposure )
p ∈ [0,1] - success probability
for each trial (i.e. probability
of an accident, i.e. accident
risk)
The probability of observing k
accidents:
“n choose k”: represents the number
of combinations of selecting (k) items
from a set of (n) distinct items
From the Binomial distribution to the
Poisson distribution
Consider a set of binomial
trials:
1. Each trial has 2 possible
outcomes: success or
failure (not accident or
accident)
2. The probability of
success (or failure) is the
same at each trial
3. The outcome of each trial
is independent of the
outcome of other trials
• When the probability of
success (risk of accident)
goes toward zero, and
• When the number of trials
(exposure) goes toward
infinity, then
• The binomial distribution
will approach the Poisson
distribution
Slide 58
Mod. 3
Pure random variation: The Poisson
probability model
• The variance of the accidents counts equals
the mean (E)
Var (x)=  = E
x= accidents counts
Slide 59
x!
)
x;
p(X



−

=
=
e
x
Mod. 3
Exercise
• A city’s traffic department reports that in a
particular busy intersection, accidents occur
at an average rate of 2 per week. Let’s
assume the number of accidents follows a
Poisson distribution.
– What is the probability that in a given week, there
will be exactly 3 accidents?
– What is the probability that in a given week, there
will be 2 or fewer accidents?
– What is the probability that in a given week, there
will be more than 4 accidents?
18/12/2024
Mod. 2 Slide 60
San Francisco Data (1974-1975)
Number of
Intersections
Number of Accidents
/Intersection In 1974
Average Number of
Accidents/Intersection
in 1975
553 0 0.54
296 1 0.97
144 2 1.53
65 3 1.97
31 4 2.10
21 5 3.24
9 6 5.67
13 7 4.69
5 8 3.80
2 9 6.50
Average 1.142 intersections 1.09
Accidents counted on 1.142 4-legs Stop sign regulated
intersections in San Francisco
Slide 61
(2 intersections had 13 accidents, one had 16)
Mod. 3
San Francisco Data (1975-1976)
Source: Hauer, E., 1986
Number of
Intersections
Number of Accidents
/Intersection in 1975
Average Number of
Accidents/Intersection
in 1976
559 0 0.55
286 1 0.98
144 2 1.41
73 3 1.82
35 4 1.97
18 5 2.50
11 6 3.91
9 7 4.22
3 8 2.00
1 9 3.00
2 10 2.50
1 11 5.00
Slide 62
Mod. 3
San Francisco Data (1976-1977)
Source: Hauer, E., 1986
Number of
Intersections
Number of Accidents
Per Intersection in 1976
Average Number of
Accidents Per
Intersection in 1977
562 0 0.53
287 1 0.94
155 2 1.37
74 3 1.72
33 4 2.61
13 5 3.00
11 6 2.64
4 7 2.25
1 8 1.00
2 9 3.50
Slide 63
Mod. 3
The evolution of the first groups
Slide 64
Mod. 3
Regression-to-the-mean (RTM)
• If, in part or in whole as a result of random
variation, an abnormally high or low number
of accidents has been recorded in a specific
period, the number of accidents in the next
period will return to (regress towards) the
long-term expected value
• High numbers go down, low numbers go up
Slide 65
Mod. 3
Regression-to-the-mean (RTM) and
RTM Bias
Slide 66
Mod. 3
Autre exemple
• Nous avons 100 carrefours dans la même ville
ayant les mêmes caractéristiques (régulation,
flux de trafic, géométrie)
• Le nombre prévu (réel) d'accidents est de 3
accidents par an pour chaque intersection
• En réalité, ils ont des fluctuations aléatoires,
pour lesquelles on peut supposer une
distribution de Poisson :
x!
)
x;
p(X



−

=
=
e
x
Slide 67
Lesson 3 18/12/2024
Chiffres attendus
Nombre
d’accidents X
Probabilité d’avoir une
intersection avec
l’incidence X
Nombre
d’intersections
attendus avec X
accidents
0 0.0498 5
1 0.1494 15
2 0.2240 22
3 0.2240 22
4 0.1680 17
5 0.1008 10
6 0,0504 5
7 0,0216 2
8 0,0081 1
9 0.0040 1
Slide 68
Lesson 3 18/12/2024
Qu'en est-il du traitement de certaines
intersections ?
Nombre
d’accidents
X
Probabilité d’avoir
une intersection avec
X accidents
Nombre
d’intersections
attendus avec X
accidents
0 0.0498 5
1 0.1494 15
2 0.2240 22
3 0.2240 22
4 0.1680 17
5 0.1008 10
6 0,0504 5
7 0,0216 2
8 0,0081 1
9 0.0040 1
Slide 69
Lesson 3 18/12/2024
Traitement de certaines intersections...
• Nous introduisons un feu de circulation à la
place de la régulation par un STOP aux
carrefours où se produisent un certain
nombre d'accidents ≥ 5 (19 cas)
• Supposons que les feux de circulation
réduisent les accidents de 10%.
• Combien d'accidents seront évités si un feu
de circulation est mis en place aux carrefours
avec x ≥ 5 ?
Slide 70
Lesson 3 18/12/2024
Resultats de l’intervention (1)
• Pour les 19 carrefours signalisés, la valeur
moyenne prévue pour l'année suivante (après
l’intervention) est de 2,7 accidents / an
• Le nombre total d'accidents au cours de la
première année dans les 19 carrefours
signalisés a été de 111
• Le nombre total attendu dans les mêmes 19
intersections après l'introduction du feu de
signalisation est de 2,7 x 19 = 51 accidents
18/12/2024
Lesson 3 Slide 71
• La réduction semble avoir été de 54% (de
111 à 51), alors qu'elle est concrètement de
10%.
• Pourquoi ?
• → Si le traitement n'a eu aucun effet et que
rien d'autre n'a changé, combien d'accidents
sont prévus pour la période d’après
traitement" ?
Slide 72
Lesson 3 18/12/2024
Resultats de l’intervention (2)
• Avant: 5*10+6*5+7*2+8*1+9*1=111
accidents.
• S’il est inéfficace, nombre d’accidents prévu
après intervention = 19*3 = 57 accidents.
• Réduction attendue = 19*3*(1-0.9)=5.7
accidents.
• 111-57 = 54 Regression vers la Moyenne !
Slide 73
Lesson 3 18/12/2024
Resultats de l’intervention
Statistical modelling in accidents
• PURE RANDOM VARIATION is usually
modelled by the Poisson probability law.
• SYSTEMATIC VARIATION is modelled by
Multivariate statistical models (also known
as Safety Performance Functions) used to
analyse factors that explain systematic
variation of the number of accidents
Slide 74
Mod. 3
3-2 CONTINGENCY TABLES
Slide 75
Mod. 3
Investigating accident causation
• Case-by-case approach: Accident causes
identified through expert judgement based on
accident reconstruction and causation
analysis
• Statistical approach: the causal relation
between a risk factor and accident
occurrence is not investigated directly, but
inferred from the association between these
two
Slide 76
Mod. 3
Measures of association
• Chi-square
• Risk ratio or Relative risk (RR)
• Odds Ratio (OR)
Slide 77
Mod. 3
Contingency Tables
• They allow for:
• Analysis of accidents frequency (deaths and
injured) relating with two or more variables
• Assessment of the association between the
variables examined
• They can include both category variables and
quantitative discrete or continuous variables
(divided in classes)
Slide 78
Mod. 3
Example of Contingency Tables
Slide 79
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,00 412,500
Total 2,100 574,500 576,600
Absolute
Frequency
Marginal
Frequency
Mod. 3
Association between the variables
• The contingency tables allow to assess if
there is an association between the variables
considered
• E.g., in the previous case: is the use of
seatbelt associated with the accident
consequence?
Slide 80
Mod. 3
Conditions for association (1)
Slide 81
A
B
B1 … Bj … Bc Total
A1 n11 … n1j … n1c n1o
… … … … … … …
Ai ni1 … nij … nic nio
… … … … … … …
Ar nr1 … nrj … nrc nro
Total no1 … noj … noc n..
Having a generic contingency table
Mod. 3
Conditions for association (2)
• B is not associated with A if nij / nio for each j
fixed does not vary with i
• Thus, in symbols:
Slide 82
.
..........
..........
..........
..........
.....
.....
0
1
0
1
10
11
r
r
i
i
n
n
n
n
n
n
=
=
=
=
Mod. 3
In the previous example
Slide 83
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 576,600
We should compare the same accident
consequence (fatal accident) with the two possible
conditions of seatbelt use (No / Yes)
Mod. 3
The conditions are
• The variable B (Accident consequence) is
associated with the variable A (Use of
seatbelt) if the possible conditions of A (Yes
or No) provide information on the possible
conditions of B (Fatal or Non fatal)
• In numbers:
Slide 84
1,600 / 164,100 = 0,0097  500 / 412,500 = 0,0012
Mod. 3
Degree of association
• It can be measured through statistical tests
calculated based on the difference between
the observed frequencies and the expected
(theoretical) frequencies:
• Observed Frequencies: the effective number of
observations (accidents)
• Theoretical Frequencies: the number of
observations (accidents) expected under the
hypothesis of complete independency between
variables
Slide 85
Mod. 3
Theoretical frequencies
• If the variables are not associated, for the given
frequencies the following relationship would be
valid:
• This formula is used to estimate the theoretical
frequencies
• Expected =(row total X column total)/Grand Total
Slide 86
𝑛𝑖𝑗 =
𝑛𝑖0 ⋅ 𝑛0𝑗
𝑛
Mod. 3
In the previous example
Slide 87
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 576,600
502
,
163
600
,
576
500
,
574
100
,
164
0
0
=

=

=
n
n
n
n
j
i
ij
Mod. 3
The Pearson 2
• This statistic is based on the differences
between observed and theoretical
frequencies
• The higher is “Chi-Square”, the higher is the
association between the variables
Slide 88
0 
Variable
associated
Variables
independent
Mod. 3
How to calculate 2
• nij = observed frequency for the cell (i,j)
• nij = theoretical frequency for the cell (i,j)
• r = number of rows
• c = number of columns
Slide 89
2 = r
i = 1 c
j =1 (nij - nij)2/ nij
Mod. 3
In the previous example
Slide 90
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 577,600
Exercise: calculate the Chi-square
Result:
Chi-Square = 2,358
Mod. 3
How to interpretate 2
• To assess the degree of association, the estimated
Chi-square has to be compared with a critical Chi-
square tabled based on «degrees of freedom» (df)
and on the significance level (p = 0,001-0,05):
df = (r-1)*(c-1)
• r = number of rows
• c = number of columns
In the previous example: df = (2-1)*(2-1) = 1
There is association if the calculated chi squares
is greater than the estimate
Slide 91
Mod. 3
Example
Table 2*2 ➔ df = 1
2 critical = 10.83 /
2 estimated = 2,358
Thus the variables are
associated
Slide 92
Mod. 3
This suggests that not using a
seatbelt is associated with a higher
likelihood of fatal injuries.
Risk
• At the road user level, accident involvement
risk is the ratio of two counts, namely, the
number N* of accident-involved road users
and the total number N of all road users
exposed to accident involvement risk during
the study period of one year:
Slide 93
𝑅 = Τ
𝑁∗ 𝑁
Mod. 3
The Relative Risk (RR)
• RR measures the risk of an event occurring
as a result of exposure to one or more causal
factors (e.g. not using seatbelt)
Slide 94
0 
Positive
association
Negative
association
1
No
association
Mod. 3
The Relative Risk formula
Accident No Accident Total
Exposed a b r1
Not exposed c d r2
Total c1 c2 T
Slide 95
If more than two groups are distinguished (risk factor measured at
several levels), one group (e.g. group 1) may be considered as the
reference group (also termed base group) and the analyst may relate
the risk of the other groups to that of the reference group.
𝑅𝑅 =
ൗ
𝑎
𝑟1
ൗ
𝑐
𝑟2
Mod. 3
Let’s calculate the relative risk!
18/12/2024
Mod. 2 Slide 96
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 577,600
In the example…
• For a driver not wearing the seatbelt, the risk
of death is approximately 8 times higher than
the risk of death for a driver wearing the
seatbelt.
Slide 97
𝑅𝑅 =
ൗ
1,600
164,100
ൗ
500
412,500
= 8.04
Mod. 3
Odds
• At the road user level, Odds are the ratio
between the number N* of accident-involved
road users and the number of road users not
involved in accidents:
Slide 98
𝑅 = Τ
𝑁∗ (𝑁−𝑁∗)
Mod. 3
The Odds Ratio (OR)
• OR represents the odds that an outcome will
occur given a particular exposure, compared
to the odds of the outcome occurring in the
absence of that exposure
Slide 99
0 
Positive
association
Negative
association
1
No
association
Mod. 3
The Odds Ratio formula
Accident No Accident Total
Exposed a b r1
Not exposed c d r2
Total c1 c2 T
Slide 100
OR =
ൗ
𝑎
𝑏
ൗ
𝑐
𝑑
Mod. 3
Let’s calculate the Odds Ratio!
18/12/2024
Mod. 2 Slide 101
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 577,600
Odds Ratio…
• For a driver not wearing the seatbelt, the
odds of death is approximately 8 times more
frequent than the odds of death for a driver
wearing the seatbelt.
Slide 102
𝑂𝑅 =
ൗ
1,600
162,500
ൗ
500
412,000
=8.11
Mod. 3
Levels of association
Slide 103
Mod. 3
Exercise 1
18/12/2024
Exercise Slide 104
Road
Geometry
Fatal Injury Non-Fatal
Injury
Total
Straight 8 20 28
Curved 18 12 30
Total 26 32 58
1. Calculate the value of Chi-Square 2
and Determine the association between
road geometry and crash severity
2. Calculate the risk ratio and odds ratio
Exercise 2
18/12/2024
Exercise Slide 105
Pavement
Condition
Fatal Injury
Non-Fatal
Injury
Total
Wet 15 25 40
Dry 5 45 50
Frozen 10 10 20
Total 30 80 110
1. Calculate the value of Chi-Square 2 and
Determine the association between road pavement
conditions and crash severity
2. Calculate the odds of wet vs dry pavement
3-3 SAFETY PERFORMANCE
FUNCTIONS
Slide 106
Mod. 3
Summary
• Distribution of accidents
– Poisson distribution
• Statistical modelling of systematic variation
– Estimating the E{μ} and the σ{μ} of a population
– Accident prediction models, Safety Performance
Functions (SPF)
• Statistical modelling of random variation
– The Empirical Bayes method
Slide 107
Mod. 3
Variation in accident counts
• Random variation =
variation in the
recorded number of
accidents around a
given expected number
of accidents
• Systematic variation =
variation in the
expected number of
accidents in time or
space between given
units of observation
(drivers, road sections,
modes of travel, etc)
Slide 108
Mod. 3
Statistical modelling of systematic
variation
Slide 109
Mod. 3
Statistical road safety modelling
(SRSM)
• It is the fitting of a statistical model to data:
Accidents prediction models
• Data are about past accidents and traits for a
set of road elements
• Two uses of SRSM:
– To estimate the expected number of accidents over a
given time period on an infrastructure based on its
traits
– To estimate the change in the expected number of
accidents over a given time period on an
infrastructure caused by a change in its traits
Slide 110
Mod. 3
What is a Safety Performance Function
SPM is “a device which for a multitude of
populations provides estimates of two elements:
1. E{μ}, the mean of the μ’s in populations;
2. σ{μ}, the standard deviation of the μ’s in these
populations.”
Hauer, 2014
Slide 111
Mod. 3
Accidents Predictions models
Regression models
• Use historic accident data collected at sites
with similar roadway characteristics
• Answer ‘What Is the Relationship Between
the Variables?’
• Equation Used
– 1 Numerical Dependent (Response) Variable
– 1 or More Numerical or Categorical Independent
(Explanatory) Variables
Slide 112
Mod. 3
Model equations
𝑌 = 𝑓(𝑋1, 𝑋2, 𝑋3 , ……, 𝑋𝑛, β1, β2, β3 ,………, β𝑛 )
• The SRSM model is that of curve fitting
Slide 113
Dependent
(Response)
Variable
Independent
(Explanatory)
Variable
Parameters
Mod. 3
Variables
• Numerical (or continuous)
• Categorical (or discrete)
– Count data (e.g. 0, 1, 2, 3, …)
– Binomial data (e.g. 0 or 1)
– Censored data (e.g. >0)
• The type of dependent variable largely
determines the type of model
Slide 114
Mod. 3
Explanatory variables ( Xi )
• Variables commonly included in accident models:
– A measure of traffic volume, usually AADT
– Variables describing cross-section (lane width, number of
lanes)
– Variables describing traffic control (speed limit is most
common)
– Variables describing type of land use (rural, urban)
• Variables often missing from accident models:
– Traffic volume of pedestrians and cyclists
– Variables describing road user behaviour (exceeding in
speed, etc)
– Variables describing safety measures on the road
Slide 115
Mod. 3
Dependent variables ( Y )
• Total number of accidents (mixing all levels of severity)
• Groups of accidents formed according to (for example):
– Accident severity (property damage, injury, fatal)
– Type of accident (pedestrian, cyclist, motor vehicle only)
– Type and severity combined
• Accident rate (accidents per million vehicle kilometres)
Accident rates are rarely used as dependent variable in
recent models, as any rate relies on an assumption of
linearity that may not be correct
Slide 116
Mod. 3
Example of Variables
Slide 117
Mod. 3
Inputs
Explanatory variables Response
variable
Traffic Data Infrastructure data Environment data Crash data
Flow data Speed Key parameters Road quality
parameters
Pavement
- AADT
- Hourly and
daily traffic
- Pedestrian
flows along
and across
- Vehicle
kilometers
travelled
- Traffic flows
for all road
users
(Data for night and
day)
- Free flow
speed
(headwa
y > 5 s)
- Average
speed for
each
road user
(Data for night
and day)
- Segment
length
- Median type
and median
width
- Number of
lanes and lane
width
- Shoulder width
(right shoulder
and left
shoulder)
- Vertical grades
(%)/ degree of
hilliness
- Access density
- Junction
density
- Degree of
horizontal and
vertical
curvature
- Bend density
- Land use type
- Bus stops
Presence of:
- Safety
barriers
- Roadside
hazards
- Road
markings and
signs
- Pedestrian
crossing
facilities
- Sidewalks
- International
roughness
coefficient
- Surface type or
pavement type
- Surface
condition or
pavement
condition index
- Rainfall data:
average rainy
days per year:
monthly rainfall
events
- Wind data:
average number
of windy days
per year (if exist)
A Proxy on post crash
care advancement
would be included
- Total
crashes
- Number of
Fatal
crashes
- Number of
Injury
crashes
- Number of
casualties
(Fatal,
serious
injury) and
type (road
user)
- Injury
casualties
- Vehicle type
for fatalities
- Collision
type
- Accident
location
Variables types example
• Numerical e.g. Speed, volume, lenght
• Categorical (or discrete)
– Count data: number of intersections, number of
parkings
– Binomial data e.g presence of junction, bus stops,
safety barriers
– Censored data e.g a Proxy on post crash care
advancement
Slide 118
Mod. 3
Common measures of exposure
• AADT
• Entering vehiclesmajor, entering vehiclesminor
• Annual kilometres of driving
• Often mixes very different types of road users and may not
include all of them (pedestrians and cyclists are rarely
counted)
• Averages over conditions representing different levels of risk
• Relationship to the number of accidents is often highly non-
linear
• Different composite measures of exposure can be developed
Mod. 3
How to build a SPF
120
Period:1994-1998; Segment Length: 0,5 to
1,0 miles; N=2.228 segments.
AADT Bins No. of I&F
accidents
No. of 0.5-1.5
mile segments
0-1.000 376 975 0,39
1.000-2.000 445 466 0,95
... ... ... ...
9.000-10.000 102 19 5,37
10.000-11.000 81 18 4,50
... ...
Data
Bins
and
Computations
( )
μ
Ê
Hauer, 2014
18/12/2024 Slide 120
An average segment in this bin had
102/19=5.37 I&F crashes in 5 years.
Mod. 3
AADT Bins
0-1.000 0,39
1.000-2.000 0,95
... ...
9.000-10.000 5,37
10.000-11.000 4,50
Ordinate, , is
estimate of
average number
of crashes/
segment in bin
Ê{μ}
Ê{μ}
Slide 121
Mod. 3
SPM development
Model equation selection
Data for selected
variables
Parameters
estimation
Slide 122
𝑌 = 𝑓(𝑋1, 𝑋2, 𝑋3 , ……, 𝑋𝑛, β1, β2, β3 ,………, β𝑛 )
N= 𝐿 ∗ (β1𝑋1 + β2, 𝑋2)
N= 𝐿 ∗ (𝛽0 𝑋1
𝛽1
)
N= 𝐿 ∗ (𝑒𝛽0𝑒𝛽1𝑥1)
Mod. 3
The estimate of {}
AADT Bins
I&F
acc. Segments S2
... ... ... ... ...
9K-10K 102 19 5.37 ... 35.18 ±5.46
... ... ...
 
μ
Ê  
μ
σ̂
  counts
accident
of
mean
Sample
counts
accident
of
variance
Sample
μ
σ̂ −
=
Slide 123
Mod. 3
Safety Performance Function and
AADT
• The most common formulation is
N = a* (AADT)b
– Depedent variable is N is the predicted crash
frequency over a given time period,
– Explanatory variable is AADT the average
annual daily traffic volume
– a, b regression coefficients
Slide 124
Mod. 3
Examples of functions
Effect of flow: Elasticity b
Accidents with injuries 0,911
Car occupants injured 0,962
Injured Motorcyclists 0,749
Cyclists injured 1,079
Pedestrians injured 1,109
Multi-vehicle injury accidents 1,032
Single-vehicle accidents 0,804
Fridstrom, 1999 - Norway
Pagina 125
Mod. 3
The contribution of traffic volume to
explaining systematic variation of the
number of accidents
Pagina 126
Mod. 3
Graphically
Elvik, 2004
0
20
40
60
80
100
1 50 99
Relative
number
of
accidents
Relative traffic volume
Injury accidents
Fatal accidents
79.4
25.9
Slide 127
N = AADT b
Mod. 3
Schematically ...
• By varying traffic volume we move along the curve
• Varying other factors will change the slope and / or
the shape of the curve
Traffic volume
Accidents
Slide 128
Mod. 3
It’s useful to note that:
Generally, significant increases in traffic volumes,
corresponds to an increase of accidents, but a decrease
in the accident rate (angular coeff. in the figure)
Traffic volume
Accidents
Slide 129
Mod. 3
18/12/2024
Mod. 1 Pagina 130
Let us test our understanding
Slide 131
Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
1. What is the general name given to this type of equations?
Let us test our understanding
Slide 132
Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
1. What is the general name given to this type of equations?
Let us test our understanding
Ans: Safety performance function; Crash/accident prediction model
Slide 133
Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
2. What is the general name given to the variables Q, L, V or G?
Let us test our understanding
a) Dependent variables
b) Explanatory variables
c) Response variable
d) Categorical variables
Slide 134
Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
2. What is the general name given to the variables Q, L, V or G?
Let us test our understanding
a) Dependent variables
b) Explanatory variables (or predictor variables)
c) Response variable
d) Categorical variables
Slide 135
Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
3. The variable 𝑮𝒊 is best described as a ?
Let us test our understanding
a) Numerical variable
b) Categorial variables
c) Continuous variable
Slide 136
Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
3. The variable 𝑮𝒊 is best described as a ?
Let us test our understanding
a) Numerical variable
b) Categorial variables
c) Continuous variable
Slide 137
Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
4. Which of these varaibles have the highest effect on
accidents?
Let us test our understanding
a) Volume (Q)
b) Speed (V)
c) Length (L)
Slide 138
Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
4. Which of these varaibles have the highest effect on
accidents?
Let us test our understanding
a) Volume (Q)
b) Speed (V)
c) Length (L)
Slide 139
Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
5. Keeping all other variables constant, A 10% change in
one of these variables leads to the same 10% change in
Crashes, which variable is this?
Let us test our understanding
a) Volume (Q)
b) Speed (V)
c) Length (L)
Slide 140
Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
5. Keeping all other variables constant, A 10% change in
one of these variables leads to the same 10% change in
Crashes, which variable is this?
Let us test our understanding
a) Volume (Q) (10 % change leads to ( 7.2%)
b) Speed (V) (10% change leads to 26.6% change)
c) Length (L) (10% change leads to 10% change)

More Related Content

PDF
Measuring Effectiveness and Efficiency of Road Safety Programs in the Unite...
PPTX
RSE U 2.pptx
PDF
Dr.Makendran Chapter -II Accident Studies & Collision Diagram .pdf
PDF
2024 RS_1 - Basic Concepts_Updated F.pdf
PDF
Crash studies chapter ten of transportation
PPTX
Human made disasters.pptxholiday homework
PPTX
Traffic Safety
PPTX
Road Accident Analysis done by different type Vehicles , current year acciden...
Measuring Effectiveness and Efficiency of Road Safety Programs in the Unite...
RSE U 2.pptx
Dr.Makendran Chapter -II Accident Studies & Collision Diagram .pdf
2024 RS_1 - Basic Concepts_Updated F.pdf
Crash studies chapter ten of transportation
Human made disasters.pptxholiday homework
Traffic Safety
Road Accident Analysis done by different type Vehicles , current year acciden...

Similar to 2023 RS_3 - Data Analysis Methodologies.pdf (20)

PDF
Causes of accident on Mumbai-Pune Expressway
PPTX
Use of Road Accidents Data by Government Stakeholders to reduce Road Accident...
PPTX
Road accidents causes and preventations in detail
PPT
06 116 saif
PPTX
Fatality Analysis Reporting System
PPTX
Accidents studies by ravindra c
PPTX
TOPIC (Epidemiology of NCD): ACCIDENTS AND INJURIES
PDF
Capstone
PPTX
Unit2-Road Accidents and its Investigations - Copy.pptx
PDF
SafetyCube - Safety CaUsation, Benefits and Efficiency
PPT
Movement for Liveable London Street Talks - Amy Aeron-Thomas 5th July 2011
PDF
Tushar Dalvi DWBI
PDF
Parte 1 road safety-manual 176p_compressed
PPTX
Final 2015 road safety status report
PPTX
Louise lloydreductioninfatals
PPTX
Achieving Safety Results by Addressing Behavioral Issues
PPTX
Capstone Fatal Collisions_.pptx
PPTX
Applying Safety Data and Analysis to Performance-based Transportation Planning
PDF
Road Safety Methodology And Analysis
PPT
In-Depth Accident Data Collection - Ravishankar Rajaram
Causes of accident on Mumbai-Pune Expressway
Use of Road Accidents Data by Government Stakeholders to reduce Road Accident...
Road accidents causes and preventations in detail
06 116 saif
Fatality Analysis Reporting System
Accidents studies by ravindra c
TOPIC (Epidemiology of NCD): ACCIDENTS AND INJURIES
Capstone
Unit2-Road Accidents and its Investigations - Copy.pptx
SafetyCube - Safety CaUsation, Benefits and Efficiency
Movement for Liveable London Street Talks - Amy Aeron-Thomas 5th July 2011
Tushar Dalvi DWBI
Parte 1 road safety-manual 176p_compressed
Final 2015 road safety status report
Louise lloydreductioninfatals
Achieving Safety Results by Addressing Behavioral Issues
Capstone Fatal Collisions_.pptx
Applying Safety Data and Analysis to Performance-based Transportation Planning
Road Safety Methodology And Analysis
In-Depth Accident Data Collection - Ravishankar Rajaram
Ad

Recently uploaded (20)

PDF
composite construction of structures.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
DOCX
573137875-Attendance-Management-System-original
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Digital Logic Computer Design lecture notes
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Geodesy 1.pptx...............................................
PPTX
Construction Project Organization Group 2.pptx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
PPT on Performance Review to get promotions
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Well-logging-methods_new................
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
composite construction of structures.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Embodied AI: Ushering in the Next Era of Intelligent Systems
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
573137875-Attendance-Management-System-original
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Digital Logic Computer Design lecture notes
Internet of Things (IOT) - A guide to understanding
Geodesy 1.pptx...............................................
Construction Project Organization Group 2.pptx
additive manufacturing of ss316l using mig welding
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPT on Performance Review to get promotions
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Well-logging-methods_new................
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Ad

2023 RS_3 - Data Analysis Methodologies.pdf

  • 1. Data Collection and Analysis Methodologies Road Safety A.A. 2023-2024 Module 3 Stephen Kome
  • 2. Mod. 3: Data Analysis Methodologies • 3-0 Road Safety Data Collection • 3-1 Basic statistical concepts • 3-2 Contingency Tables • 3-3 Safety Performance Functions Slide 2 Mod. 3
  • 3. 3-0 - ROAD SAFETY DATA COLLECTION Slide 3
  • 4. • Reliable and harmonised road accident data are crucial for defining evidence-based road safety policies and to monitor performances and assess results • Data on infrastructures, traffic (exposure), accident costs, Safety Performance Indicators are also needed • Also road users must be involved in the information collection and planning process • European Union invested a lot of resources in improving quality and availability of accident data, mainly through dedicated research projects • Observatories of different levels (Continental, National, Regional, Urban) are fundamental tools Page 4 The problem of accident data
  • 5. Why collect data? Define problems Identify risk factors, priorities Formulate strategy Set targets Monitor performance Slide 5
  • 6. Which data to collect? Only 22% of countries are able to provide information on road traffic fatalities, non-fatal injuries, economic impact and selected SPIs (WHO) Slide 6
  • 7. SOURCE AND TYPES OF DATA Slide 7
  • 8. Who are the main sources? • Police • Health authorities • Transport bodies • Other stakeholders may include: – national statistics office, – the insurance industry, – non-governmental organizations working for road safety, – academic institutions… Slide 8
  • 9. Sources and types of data (1) WHO, 2004 Slide 9 Mod. 2
  • 10. Sources and types of data (2) WHO, 2004 Slide 10 Mod. 2
  • 11. QUALITY OF ACCIDENT DATA Slide 11
  • 12. What affects crash data quality? 1. Definitions 2. Reporting/under-reporting of crashes or injuries 3. Missing data 4. Errors Slide 12
  • 13. The definition of road accident fatality • The classification of the severity of injuries and crashes vary among countries. • The range of injury severity categories that may be used by health professionals or police officers includes slight/minor, moderate, serious/severe, and fatal • The recommended definition of a road traffic fatality is (WHO 2009): – “any person killed immediately or dying within 30 days as a result of a road traffic injury accident, excluding suicides” Slide 13
  • 14. How many countries use ‘died within 30 days’ definition? Less than half of the 178 countries monitored by WHO use the recommended definition of a road traffic fatality (WHO, 2009) Slide 14 When a road fatality is not defined in such a way, the reported number of fatalities can be made more accurate by multiplying the reported number by an appropriate adjustment factor. (depending on the definition used recommended by the European Conference of Ministers of Transport).
  • 15. Under-reporting of road accidents • Not all crashes and injuries that occur are documented in the data system (particularly for slight injury and PDO). • Reasons for under-reporting are: – Police may not be informed when a crash occurs – Police do not always go to the scene – Police may go to the crash scene, but not formally register the crash – Some data are missing – Data are not transmitted to Statistical Offices – Health consequences follow-up not monitored – Died after 30 days Slide 15
  • 16. Methods for assessing under-reporting • Compare the number of police reports filed on certain events to those captured in the database. • Compare the number of road traffic fatalities and/or injuries counted by one data source, usually the police database, to those counted in a survey. • Compare the number of road traffic fatalities and/or injuries counted in the police database to the number counted in other databases. • Use linkage or capture-recapture methods to match records from different databases Slide 16
  • 17. Mean level of accident reporting by injury severity Elvik & Mysen, 1999 Slide 17
  • 18. Reporting of road accident injuries in different countries 64 88 21 39 43 37 49 0 20 40 60 80 100 Australia Canada Danimarca Germania Olanda Norvegia USA Elvik, 2004 Official crash Statistics / Hospitals data Slide 18
  • 19. Page 19 In African Countries 1 Source: WHO, SAFERAFRICA Project
  • 20. Page 20 In African Countries 1 7 12 27 47 48 116 181 238 85 332 324 229 264 276 241 311 269 281 0 50 100 150 200 250 300 350 400 Congo, Dem. Rep. Central African Republic Gabon Congo, Rep. Cameroon Chad Sao Tome and Principe Angola Central Africa Reported fatality rates per million population (2013) Estimatedfatality rates per millionpopulation (2013) Source: WHO, SAFERAFRICA Project
  • 21. Actions On Data Loss Slide 21 Lesson 3 18/12/2024 Lost Information Direct/Indirect Actions Accidents not detected by the police Collect health and insurance statistics Accidents involving only property damage Collect data on all accidents by police authorities Accidents not reported Local control of reporting procedures Data not collected in the field Use of innovative IT tools, Redefine collected data Health consequences monitoring not controlled Local control of information exchange procedures with AUSL (local health services) Deaths occurring 30 days after the accident Collect health statistics data Field collection errors, Transcription errors on ISTAT forms Use of innovative IT tools Data difficult to collect Use of innovative IT tools
  • 23. Example: Accident data collection in Italy • Road accidents are recorded by 3 police bodies: – National Traffic Police – Local Police – Carabinieri • The data related to injury accidents collected from each body are sent to ISTAT • During the process, some is lost Slide 23
  • 24. Police bodies in Italy Urban area Outside urban area Property Damage Only (PDO) accidents Local police On request: Traffic Police and Carabinieri -Traffic Police -Carabinieri Exception: Local Police Injury Accidents -Local police -Traffic police - Carabinieri -Traffic Police - Carabinieri Slide 24
  • 25. Accident data collection in Italy Slide 25 National Traffic Police Carabinieri Carabinieri Headquarter Local Police Local Statistics Office National Institute of Statistics - ISTAT Provincial Monitoring Centers ISTAT Regional offices Headquar ters Police Data Center
  • 26. The ISTAT Ctt/Inc accident form Location Vehicles involved Crash conditions Injuries Drivers Up to 200 «variables» per accident Occup ants Slide 26
  • 27. ITS for Data Collection Page 27
  • 28. • Creation and implementation of traffic accident databases and of an information system for road safety at national level • Creation of the National Centre for Analysis of Traffic Accidents Coordinated by CTL Main partners: IBSR, IT, SWOV Page 28 Example of good practices: Cameroon
  • 29. Page 29 The actors involved
  • 30. Page 30 The network architecture
  • 31. Page 31 The modules Sfinge © Statistical analysis Collection and management data module Documentation module Authentication, roles and security module Plan Module Online help Integration and validation data Hospital services Module Geographic Images module
  • 36. Page 36 Hospital Data Collection screenshot
  • 37. Slide 37 Road Safety Databases http://istat.maps.arcgis.com/apps/MapSeries/index.html?app id=b34ba84168da4147b810f0d04f59881d https://ec.europa.eu/transport/road_safety/specialist/statistics /map-viewer/ https://extranet.who.int/roadsafety/death-on-the- roads/#deaths//all World level European level Italian level
  • 39. Main contents • Safety, units, traits and populations • Recorded and Expected number of accidents • Random and Systematic variation in accident counts • Regression-to-the-mean Slide 39 Mod. 3
  • 40. What is SAFETY? 0 1 2 3 4 2001 2002 2003 2004 2005 2006 2007 2008 2009 Recorded number of accident at an intrsection in Perugia Recorded number of accidents Here is a count of injury accidents for an intersection in Perugia. What is its SAFETY? Slide 40 Mod. 3
  • 41. What is a UNIT? • … “what is its safety?” implies that SAFETY is a property of UNITS • A Unit can be: – a road segment – an intersection – Mr. Mario Rossi – a car – etc. Slide 41 Mod. 3
  • 42. What is the safety of a Unit? • …The number of accidents that has been reported at a certain location during a certain period? Slide 42 0 1 2 3 4 2001 2002 2003 2004 2005 2006 2007 2008 2009 Recorded number ofaccident at an intrsection in Perugia Recorded number of accidents • The intersection shows different values of accidents, and in general some fluctuations • If we use the recorded number of accidents, that would mean that safety improved from 2002 to 2003, deteriorated from 2003 to 2004 etc. • The probability that the intersection is chosen for interventions depends on the year taken for reference Mod. 3
  • 43. Observed values: the recorded number of accidents • The number of accidents that has been reported at a certain location during a certain period • The recorded number ≠ the “true” number Slide 43 Mod. 3
  • 44. 0 1 2 3 4 2001 2002 2003 2004 2005 2006 2007 2008 2009 Recorded number of accident at an intrsection in Perugia Recorded number of accidents Annual mean 18/12/2024 Mod. 2 Slide 44 There are 3 elements in the graph: 1. Observed values ● 2. The invisible (unknown) safety property μ 3. Our estimate of the unknown property ○ Number of years Average value 1 3.0 2 2.5 3 2.7 4 2.3 5 2.0 6 2.0 What if we calculate the average value?
  • 45. Variation in short-term accidents frequency Slide 45 Mod. 3
  • 46. The Recorded Number of accidents is “not useful” for safety management… • … because safety changes even if there is no change in safety-relevant traits. (exposure, traffic control, physical features, user demography, etc.). • Accidents are (thankfully!) rare events and their pattern exhibits random fluctuations • We need a definition of the safety of a unit such that, as long as the ‘safety-relevant’ traits of the unit do not change, it’s ‘safety’ does not change. Slide 46 Mod. 3
  • 47. What is the safety of a Unit? The safety property of a unit is “the number of accidents by type and severity, expected to occur on it in a specified period of time.” (Hauer, 1997) It will always be denoted by μ and its estimate by Slide 47 “ “ Mod. 3
  • 48. The ‘safety’ of a unit depends on its ‘traits’ • Mass • Height • Engine capacity • Stiffness • Colour • … Slide 48 Mod. 3
  • 49. The ‘safety’ of a unit depends on its ‘traits’ Slide 49 • N°of approaches • Type of traffic control • AADT • Number of lanes • Visibility • Roadside conditions • Road surface condition • ….. Mod. 3
  • 50. What is the link between safety and traits? • A trait is ‘safety-related’ (s-r) if when it changes, μ changes. • Consequence: Units with the same s-r traits have the same μ (and of course, units that differ in some s-r traits differ in μ‘s). Slide 50 Mod. 3
  • 51. Populations • Units that share some traits form a population of units. • Example: (1) rural, (2) two-lane road segments in (3) flat terrain • Because only some traits are common, the units differ in many safety-related traits and therefore differ in their μ Slide 51 Mod. 3
  • 52. Parameters of populations We will describe the safety of a population by: Mean of μ’s, E{μ} and Standard deviation of μ’s, σ{μ} Slide 52 Mod. 3
  • 53. Notational conventions to remember μ - the expected number of accidents for a unit - estimate of μ . Caret above always means: estimate of ... - Mean of μ’s in a population of units. - standard deviation of μ’s in a population of units. Slide 53 Mod. 3
  • 54. Variation in accident counts Random variation = variation in the recorded number of accidents around a given expected number of accidents Systematic variation = variation in the expected number of accidents in time or space between given units of observation (drivers, road sections, modes of travel, etc) Slide 54 Mod. 3
  • 55. Why variation is important Variation must be considered at two critical points in safety analyses: 1. Identifying the best entities for investment; 2. Evaluating effectiveness of the action. Slide 55 Mod. 3
  • 57. Modeling accidents with the Binomial distribution 18/12/2024 Mod. 2 Slide 57 Parameters: n ∈ {0,1,2,…} - number of trials (number of opportunities for an accident, i.e. exposure ) p ∈ [0,1] - success probability for each trial (i.e. probability of an accident, i.e. accident risk) The probability of observing k accidents: “n choose k”: represents the number of combinations of selecting (k) items from a set of (n) distinct items
  • 58. From the Binomial distribution to the Poisson distribution Consider a set of binomial trials: 1. Each trial has 2 possible outcomes: success or failure (not accident or accident) 2. The probability of success (or failure) is the same at each trial 3. The outcome of each trial is independent of the outcome of other trials • When the probability of success (risk of accident) goes toward zero, and • When the number of trials (exposure) goes toward infinity, then • The binomial distribution will approach the Poisson distribution Slide 58 Mod. 3
  • 59. Pure random variation: The Poisson probability model • The variance of the accidents counts equals the mean (E) Var (x)=  = E x= accidents counts Slide 59 x! ) x; p(X    −  = = e x Mod. 3
  • 60. Exercise • A city’s traffic department reports that in a particular busy intersection, accidents occur at an average rate of 2 per week. Let’s assume the number of accidents follows a Poisson distribution. – What is the probability that in a given week, there will be exactly 3 accidents? – What is the probability that in a given week, there will be 2 or fewer accidents? – What is the probability that in a given week, there will be more than 4 accidents? 18/12/2024 Mod. 2 Slide 60
  • 61. San Francisco Data (1974-1975) Number of Intersections Number of Accidents /Intersection In 1974 Average Number of Accidents/Intersection in 1975 553 0 0.54 296 1 0.97 144 2 1.53 65 3 1.97 31 4 2.10 21 5 3.24 9 6 5.67 13 7 4.69 5 8 3.80 2 9 6.50 Average 1.142 intersections 1.09 Accidents counted on 1.142 4-legs Stop sign regulated intersections in San Francisco Slide 61 (2 intersections had 13 accidents, one had 16) Mod. 3
  • 62. San Francisco Data (1975-1976) Source: Hauer, E., 1986 Number of Intersections Number of Accidents /Intersection in 1975 Average Number of Accidents/Intersection in 1976 559 0 0.55 286 1 0.98 144 2 1.41 73 3 1.82 35 4 1.97 18 5 2.50 11 6 3.91 9 7 4.22 3 8 2.00 1 9 3.00 2 10 2.50 1 11 5.00 Slide 62 Mod. 3
  • 63. San Francisco Data (1976-1977) Source: Hauer, E., 1986 Number of Intersections Number of Accidents Per Intersection in 1976 Average Number of Accidents Per Intersection in 1977 562 0 0.53 287 1 0.94 155 2 1.37 74 3 1.72 33 4 2.61 13 5 3.00 11 6 2.64 4 7 2.25 1 8 1.00 2 9 3.50 Slide 63 Mod. 3
  • 64. The evolution of the first groups Slide 64 Mod. 3
  • 65. Regression-to-the-mean (RTM) • If, in part or in whole as a result of random variation, an abnormally high or low number of accidents has been recorded in a specific period, the number of accidents in the next period will return to (regress towards) the long-term expected value • High numbers go down, low numbers go up Slide 65 Mod. 3
  • 66. Regression-to-the-mean (RTM) and RTM Bias Slide 66 Mod. 3
  • 67. Autre exemple • Nous avons 100 carrefours dans la même ville ayant les mêmes caractéristiques (régulation, flux de trafic, géométrie) • Le nombre prévu (réel) d'accidents est de 3 accidents par an pour chaque intersection • En réalité, ils ont des fluctuations aléatoires, pour lesquelles on peut supposer une distribution de Poisson : x! ) x; p(X    −  = = e x Slide 67 Lesson 3 18/12/2024
  • 68. Chiffres attendus Nombre d’accidents X Probabilité d’avoir une intersection avec l’incidence X Nombre d’intersections attendus avec X accidents 0 0.0498 5 1 0.1494 15 2 0.2240 22 3 0.2240 22 4 0.1680 17 5 0.1008 10 6 0,0504 5 7 0,0216 2 8 0,0081 1 9 0.0040 1 Slide 68 Lesson 3 18/12/2024
  • 69. Qu'en est-il du traitement de certaines intersections ? Nombre d’accidents X Probabilité d’avoir une intersection avec X accidents Nombre d’intersections attendus avec X accidents 0 0.0498 5 1 0.1494 15 2 0.2240 22 3 0.2240 22 4 0.1680 17 5 0.1008 10 6 0,0504 5 7 0,0216 2 8 0,0081 1 9 0.0040 1 Slide 69 Lesson 3 18/12/2024
  • 70. Traitement de certaines intersections... • Nous introduisons un feu de circulation à la place de la régulation par un STOP aux carrefours où se produisent un certain nombre d'accidents ≥ 5 (19 cas) • Supposons que les feux de circulation réduisent les accidents de 10%. • Combien d'accidents seront évités si un feu de circulation est mis en place aux carrefours avec x ≥ 5 ? Slide 70 Lesson 3 18/12/2024
  • 71. Resultats de l’intervention (1) • Pour les 19 carrefours signalisés, la valeur moyenne prévue pour l'année suivante (après l’intervention) est de 2,7 accidents / an • Le nombre total d'accidents au cours de la première année dans les 19 carrefours signalisés a été de 111 • Le nombre total attendu dans les mêmes 19 intersections après l'introduction du feu de signalisation est de 2,7 x 19 = 51 accidents 18/12/2024 Lesson 3 Slide 71
  • 72. • La réduction semble avoir été de 54% (de 111 à 51), alors qu'elle est concrètement de 10%. • Pourquoi ? • → Si le traitement n'a eu aucun effet et que rien d'autre n'a changé, combien d'accidents sont prévus pour la période d’après traitement" ? Slide 72 Lesson 3 18/12/2024 Resultats de l’intervention (2)
  • 73. • Avant: 5*10+6*5+7*2+8*1+9*1=111 accidents. • S’il est inéfficace, nombre d’accidents prévu après intervention = 19*3 = 57 accidents. • Réduction attendue = 19*3*(1-0.9)=5.7 accidents. • 111-57 = 54 Regression vers la Moyenne ! Slide 73 Lesson 3 18/12/2024 Resultats de l’intervention
  • 74. Statistical modelling in accidents • PURE RANDOM VARIATION is usually modelled by the Poisson probability law. • SYSTEMATIC VARIATION is modelled by Multivariate statistical models (also known as Safety Performance Functions) used to analyse factors that explain systematic variation of the number of accidents Slide 74 Mod. 3
  • 76. Investigating accident causation • Case-by-case approach: Accident causes identified through expert judgement based on accident reconstruction and causation analysis • Statistical approach: the causal relation between a risk factor and accident occurrence is not investigated directly, but inferred from the association between these two Slide 76 Mod. 3
  • 77. Measures of association • Chi-square • Risk ratio or Relative risk (RR) • Odds Ratio (OR) Slide 77 Mod. 3
  • 78. Contingency Tables • They allow for: • Analysis of accidents frequency (deaths and injured) relating with two or more variables • Assessment of the association between the variables examined • They can include both category variables and quantitative discrete or continuous variables (divided in classes) Slide 78 Mod. 3
  • 79. Example of Contingency Tables Slide 79 Use of seatbelt Accident consequence Total Fatal Non fatal No 1,600 162,500 164,100 Yes 500 412,00 412,500 Total 2,100 574,500 576,600 Absolute Frequency Marginal Frequency Mod. 3
  • 80. Association between the variables • The contingency tables allow to assess if there is an association between the variables considered • E.g., in the previous case: is the use of seatbelt associated with the accident consequence? Slide 80 Mod. 3
  • 81. Conditions for association (1) Slide 81 A B B1 … Bj … Bc Total A1 n11 … n1j … n1c n1o … … … … … … … Ai ni1 … nij … nic nio … … … … … … … Ar nr1 … nrj … nrc nro Total no1 … noj … noc n.. Having a generic contingency table Mod. 3
  • 82. Conditions for association (2) • B is not associated with A if nij / nio for each j fixed does not vary with i • Thus, in symbols: Slide 82 . .......... .......... .......... .......... ..... ..... 0 1 0 1 10 11 r r i i n n n n n n = = = = Mod. 3
  • 83. In the previous example Slide 83 Use of seatbelt Accident consequence Total Fatal Non fatal No 1,600 162,500 164,100 Yes 500 412,000 412,500 Total 2,100 574,500 576,600 We should compare the same accident consequence (fatal accident) with the two possible conditions of seatbelt use (No / Yes) Mod. 3
  • 84. The conditions are • The variable B (Accident consequence) is associated with the variable A (Use of seatbelt) if the possible conditions of A (Yes or No) provide information on the possible conditions of B (Fatal or Non fatal) • In numbers: Slide 84 1,600 / 164,100 = 0,0097  500 / 412,500 = 0,0012 Mod. 3
  • 85. Degree of association • It can be measured through statistical tests calculated based on the difference between the observed frequencies and the expected (theoretical) frequencies: • Observed Frequencies: the effective number of observations (accidents) • Theoretical Frequencies: the number of observations (accidents) expected under the hypothesis of complete independency between variables Slide 85 Mod. 3
  • 86. Theoretical frequencies • If the variables are not associated, for the given frequencies the following relationship would be valid: • This formula is used to estimate the theoretical frequencies • Expected =(row total X column total)/Grand Total Slide 86 𝑛𝑖𝑗 = 𝑛𝑖0 ⋅ 𝑛0𝑗 𝑛 Mod. 3
  • 87. In the previous example Slide 87 Use of seatbelt Accident consequence Total Fatal Non fatal No 1,600 162,500 164,100 Yes 500 412,000 412,500 Total 2,100 574,500 576,600 502 , 163 600 , 576 500 , 574 100 , 164 0 0 =  =  = n n n n j i ij Mod. 3
  • 88. The Pearson 2 • This statistic is based on the differences between observed and theoretical frequencies • The higher is “Chi-Square”, the higher is the association between the variables Slide 88 0  Variable associated Variables independent Mod. 3
  • 89. How to calculate 2 • nij = observed frequency for the cell (i,j) • nij = theoretical frequency for the cell (i,j) • r = number of rows • c = number of columns Slide 89 2 = r i = 1 c j =1 (nij - nij)2/ nij Mod. 3
  • 90. In the previous example Slide 90 Use of seatbelt Accident consequence Total Fatal Non fatal No 1,600 162,500 164,100 Yes 500 412,000 412,500 Total 2,100 574,500 577,600 Exercise: calculate the Chi-square Result: Chi-Square = 2,358 Mod. 3
  • 91. How to interpretate 2 • To assess the degree of association, the estimated Chi-square has to be compared with a critical Chi- square tabled based on «degrees of freedom» (df) and on the significance level (p = 0,001-0,05): df = (r-1)*(c-1) • r = number of rows • c = number of columns In the previous example: df = (2-1)*(2-1) = 1 There is association if the calculated chi squares is greater than the estimate Slide 91 Mod. 3
  • 92. Example Table 2*2 ➔ df = 1 2 critical = 10.83 / 2 estimated = 2,358 Thus the variables are associated Slide 92 Mod. 3 This suggests that not using a seatbelt is associated with a higher likelihood of fatal injuries.
  • 93. Risk • At the road user level, accident involvement risk is the ratio of two counts, namely, the number N* of accident-involved road users and the total number N of all road users exposed to accident involvement risk during the study period of one year: Slide 93 𝑅 = Τ 𝑁∗ 𝑁 Mod. 3
  • 94. The Relative Risk (RR) • RR measures the risk of an event occurring as a result of exposure to one or more causal factors (e.g. not using seatbelt) Slide 94 0  Positive association Negative association 1 No association Mod. 3
  • 95. The Relative Risk formula Accident No Accident Total Exposed a b r1 Not exposed c d r2 Total c1 c2 T Slide 95 If more than two groups are distinguished (risk factor measured at several levels), one group (e.g. group 1) may be considered as the reference group (also termed base group) and the analyst may relate the risk of the other groups to that of the reference group. 𝑅𝑅 = ൗ 𝑎 𝑟1 ൗ 𝑐 𝑟2 Mod. 3
  • 96. Let’s calculate the relative risk! 18/12/2024 Mod. 2 Slide 96 Use of seatbelt Accident consequence Total Fatal Non fatal No 1,600 162,500 164,100 Yes 500 412,000 412,500 Total 2,100 574,500 577,600
  • 97. In the example… • For a driver not wearing the seatbelt, the risk of death is approximately 8 times higher than the risk of death for a driver wearing the seatbelt. Slide 97 𝑅𝑅 = ൗ 1,600 164,100 ൗ 500 412,500 = 8.04 Mod. 3
  • 98. Odds • At the road user level, Odds are the ratio between the number N* of accident-involved road users and the number of road users not involved in accidents: Slide 98 𝑅 = Τ 𝑁∗ (𝑁−𝑁∗) Mod. 3
  • 99. The Odds Ratio (OR) • OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure Slide 99 0  Positive association Negative association 1 No association Mod. 3
  • 100. The Odds Ratio formula Accident No Accident Total Exposed a b r1 Not exposed c d r2 Total c1 c2 T Slide 100 OR = ൗ 𝑎 𝑏 ൗ 𝑐 𝑑 Mod. 3
  • 101. Let’s calculate the Odds Ratio! 18/12/2024 Mod. 2 Slide 101 Use of seatbelt Accident consequence Total Fatal Non fatal No 1,600 162,500 164,100 Yes 500 412,000 412,500 Total 2,100 574,500 577,600
  • 102. Odds Ratio… • For a driver not wearing the seatbelt, the odds of death is approximately 8 times more frequent than the odds of death for a driver wearing the seatbelt. Slide 102 𝑂𝑅 = ൗ 1,600 162,500 ൗ 500 412,000 =8.11 Mod. 3
  • 104. Exercise 1 18/12/2024 Exercise Slide 104 Road Geometry Fatal Injury Non-Fatal Injury Total Straight 8 20 28 Curved 18 12 30 Total 26 32 58 1. Calculate the value of Chi-Square 2 and Determine the association between road geometry and crash severity 2. Calculate the risk ratio and odds ratio
  • 105. Exercise 2 18/12/2024 Exercise Slide 105 Pavement Condition Fatal Injury Non-Fatal Injury Total Wet 15 25 40 Dry 5 45 50 Frozen 10 10 20 Total 30 80 110 1. Calculate the value of Chi-Square 2 and Determine the association between road pavement conditions and crash severity 2. Calculate the odds of wet vs dry pavement
  • 107. Summary • Distribution of accidents – Poisson distribution • Statistical modelling of systematic variation – Estimating the E{μ} and the σ{μ} of a population – Accident prediction models, Safety Performance Functions (SPF) • Statistical modelling of random variation – The Empirical Bayes method Slide 107 Mod. 3
  • 108. Variation in accident counts • Random variation = variation in the recorded number of accidents around a given expected number of accidents • Systematic variation = variation in the expected number of accidents in time or space between given units of observation (drivers, road sections, modes of travel, etc) Slide 108 Mod. 3
  • 109. Statistical modelling of systematic variation Slide 109 Mod. 3
  • 110. Statistical road safety modelling (SRSM) • It is the fitting of a statistical model to data: Accidents prediction models • Data are about past accidents and traits for a set of road elements • Two uses of SRSM: – To estimate the expected number of accidents over a given time period on an infrastructure based on its traits – To estimate the change in the expected number of accidents over a given time period on an infrastructure caused by a change in its traits Slide 110 Mod. 3
  • 111. What is a Safety Performance Function SPM is “a device which for a multitude of populations provides estimates of two elements: 1. E{μ}, the mean of the μ’s in populations; 2. σ{μ}, the standard deviation of the μ’s in these populations.” Hauer, 2014 Slide 111 Mod. 3
  • 112. Accidents Predictions models Regression models • Use historic accident data collected at sites with similar roadway characteristics • Answer ‘What Is the Relationship Between the Variables?’ • Equation Used – 1 Numerical Dependent (Response) Variable – 1 or More Numerical or Categorical Independent (Explanatory) Variables Slide 112 Mod. 3
  • 113. Model equations 𝑌 = 𝑓(𝑋1, 𝑋2, 𝑋3 , ……, 𝑋𝑛, β1, β2, β3 ,………, β𝑛 ) • The SRSM model is that of curve fitting Slide 113 Dependent (Response) Variable Independent (Explanatory) Variable Parameters Mod. 3
  • 114. Variables • Numerical (or continuous) • Categorical (or discrete) – Count data (e.g. 0, 1, 2, 3, …) – Binomial data (e.g. 0 or 1) – Censored data (e.g. >0) • The type of dependent variable largely determines the type of model Slide 114 Mod. 3
  • 115. Explanatory variables ( Xi ) • Variables commonly included in accident models: – A measure of traffic volume, usually AADT – Variables describing cross-section (lane width, number of lanes) – Variables describing traffic control (speed limit is most common) – Variables describing type of land use (rural, urban) • Variables often missing from accident models: – Traffic volume of pedestrians and cyclists – Variables describing road user behaviour (exceeding in speed, etc) – Variables describing safety measures on the road Slide 115 Mod. 3
  • 116. Dependent variables ( Y ) • Total number of accidents (mixing all levels of severity) • Groups of accidents formed according to (for example): – Accident severity (property damage, injury, fatal) – Type of accident (pedestrian, cyclist, motor vehicle only) – Type and severity combined • Accident rate (accidents per million vehicle kilometres) Accident rates are rarely used as dependent variable in recent models, as any rate relies on an assumption of linearity that may not be correct Slide 116 Mod. 3
  • 117. Example of Variables Slide 117 Mod. 3 Inputs Explanatory variables Response variable Traffic Data Infrastructure data Environment data Crash data Flow data Speed Key parameters Road quality parameters Pavement - AADT - Hourly and daily traffic - Pedestrian flows along and across - Vehicle kilometers travelled - Traffic flows for all road users (Data for night and day) - Free flow speed (headwa y > 5 s) - Average speed for each road user (Data for night and day) - Segment length - Median type and median width - Number of lanes and lane width - Shoulder width (right shoulder and left shoulder) - Vertical grades (%)/ degree of hilliness - Access density - Junction density - Degree of horizontal and vertical curvature - Bend density - Land use type - Bus stops Presence of: - Safety barriers - Roadside hazards - Road markings and signs - Pedestrian crossing facilities - Sidewalks - International roughness coefficient - Surface type or pavement type - Surface condition or pavement condition index - Rainfall data: average rainy days per year: monthly rainfall events - Wind data: average number of windy days per year (if exist) A Proxy on post crash care advancement would be included - Total crashes - Number of Fatal crashes - Number of Injury crashes - Number of casualties (Fatal, serious injury) and type (road user) - Injury casualties - Vehicle type for fatalities - Collision type - Accident location
  • 118. Variables types example • Numerical e.g. Speed, volume, lenght • Categorical (or discrete) – Count data: number of intersections, number of parkings – Binomial data e.g presence of junction, bus stops, safety barriers – Censored data e.g a Proxy on post crash care advancement Slide 118 Mod. 3
  • 119. Common measures of exposure • AADT • Entering vehiclesmajor, entering vehiclesminor • Annual kilometres of driving • Often mixes very different types of road users and may not include all of them (pedestrians and cyclists are rarely counted) • Averages over conditions representing different levels of risk • Relationship to the number of accidents is often highly non- linear • Different composite measures of exposure can be developed Mod. 3
  • 120. How to build a SPF 120 Period:1994-1998; Segment Length: 0,5 to 1,0 miles; N=2.228 segments. AADT Bins No. of I&F accidents No. of 0.5-1.5 mile segments 0-1.000 376 975 0,39 1.000-2.000 445 466 0,95 ... ... ... ... 9.000-10.000 102 19 5,37 10.000-11.000 81 18 4,50 ... ... Data Bins and Computations ( ) μ Ê Hauer, 2014 18/12/2024 Slide 120 An average segment in this bin had 102/19=5.37 I&F crashes in 5 years. Mod. 3
  • 121. AADT Bins 0-1.000 0,39 1.000-2.000 0,95 ... ... 9.000-10.000 5,37 10.000-11.000 4,50 Ordinate, , is estimate of average number of crashes/ segment in bin Ê{μ} Ê{μ} Slide 121 Mod. 3
  • 122. SPM development Model equation selection Data for selected variables Parameters estimation Slide 122 𝑌 = 𝑓(𝑋1, 𝑋2, 𝑋3 , ……, 𝑋𝑛, β1, β2, β3 ,………, β𝑛 ) N= 𝐿 ∗ (β1𝑋1 + β2, 𝑋2) N= 𝐿 ∗ (𝛽0 𝑋1 𝛽1 ) N= 𝐿 ∗ (𝑒𝛽0𝑒𝛽1𝑥1) Mod. 3
  • 123. The estimate of {} AADT Bins I&F acc. Segments S2 ... ... ... ... ... 9K-10K 102 19 5.37 ... 35.18 ±5.46 ... ... ...   μ Ê   μ σ̂   counts accident of mean Sample counts accident of variance Sample μ σ̂ − = Slide 123 Mod. 3
  • 124. Safety Performance Function and AADT • The most common formulation is N = a* (AADT)b – Depedent variable is N is the predicted crash frequency over a given time period, – Explanatory variable is AADT the average annual daily traffic volume – a, b regression coefficients Slide 124 Mod. 3
  • 125. Examples of functions Effect of flow: Elasticity b Accidents with injuries 0,911 Car occupants injured 0,962 Injured Motorcyclists 0,749 Cyclists injured 1,079 Pedestrians injured 1,109 Multi-vehicle injury accidents 1,032 Single-vehicle accidents 0,804 Fridstrom, 1999 - Norway Pagina 125 Mod. 3
  • 126. The contribution of traffic volume to explaining systematic variation of the number of accidents Pagina 126 Mod. 3
  • 127. Graphically Elvik, 2004 0 20 40 60 80 100 1 50 99 Relative number of accidents Relative traffic volume Injury accidents Fatal accidents 79.4 25.9 Slide 127 N = AADT b Mod. 3
  • 128. Schematically ... • By varying traffic volume we move along the curve • Varying other factors will change the slope and / or the shape of the curve Traffic volume Accidents Slide 128 Mod. 3
  • 129. It’s useful to note that: Generally, significant increases in traffic volumes, corresponds to an increase of accidents, but a decrease in the accident rate (angular coeff. in the figure) Traffic volume Accidents Slide 129 Mod. 3
  • 130. 18/12/2024 Mod. 1 Pagina 130 Let us test our understanding
  • 131. Slide 131 Mod. 3 Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to 1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊 This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents 1. What is the general name given to this type of equations? Let us test our understanding
  • 132. Slide 132 Mod. 3 Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to 1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊 This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents 1. What is the general name given to this type of equations? Let us test our understanding Ans: Safety performance function; Crash/accident prediction model
  • 133. Slide 133 Mod. 3 Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to 1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊 This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents 2. What is the general name given to the variables Q, L, V or G? Let us test our understanding a) Dependent variables b) Explanatory variables c) Response variable d) Categorical variables
  • 134. Slide 134 Mod. 3 Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to 1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊 This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents 2. What is the general name given to the variables Q, L, V or G? Let us test our understanding a) Dependent variables b) Explanatory variables (or predictor variables) c) Response variable d) Categorical variables
  • 135. Slide 135 Mod. 3 Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to 1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊 This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents 3. The variable 𝑮𝒊 is best described as a ? Let us test our understanding a) Numerical variable b) Categorial variables c) Continuous variable
  • 136. Slide 136 Mod. 3 Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to 1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊 This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents 3. The variable 𝑮𝒊 is best described as a ? Let us test our understanding a) Numerical variable b) Categorial variables c) Continuous variable
  • 137. Slide 137 Mod. 3 Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to 1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊 This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents 4. Which of these varaibles have the highest effect on accidents? Let us test our understanding a) Volume (Q) b) Speed (V) c) Length (L)
  • 138. Slide 138 Mod. 3 Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to 1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊 This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents 4. Which of these varaibles have the highest effect on accidents? Let us test our understanding a) Volume (Q) b) Speed (V) c) Length (L)
  • 139. Slide 139 Mod. 3 Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to 1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊 This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents 5. Keeping all other variables constant, A 10% change in one of these variables leads to the same 10% change in Crashes, which variable is this? Let us test our understanding a) Volume (Q) b) Speed (V) c) Length (L)
  • 140. Slide 140 Mod. 3 Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to 1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊 This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents 5. Keeping all other variables constant, A 10% change in one of these variables leads to the same 10% change in Crashes, which variable is this? Let us test our understanding a) Volume (Q) (10 % change leads to ( 7.2%) b) Speed (V) (10% change leads to 26.6% change) c) Length (L) (10% change leads to 10% change)