2023 RS_3 - Data Analysis Methodologies.pdf

Data Collection and Analysis
Methodologies
Road Safety
A.A. 2023-2024
Module 3
Stephen Kome

Mod. 3: Data Analysis Methodologies
• 3-0 Road Safety Data Collection
• 3-1 Basic statistical concepts
• 3-2 Contingency Tables
• 3-3 Safety Performance Functions
Slide 2
Mod. 3

3-0 - ROAD SAFETY DATA
COLLECTION
Slide 3

• Reliable and harmonised road accident data are crucial
for defining evidence-based road safety policies and
to monitor performances and assess results
• Data on infrastructures, traffic (exposure), accident
costs, Safety Performance Indicators are also needed
• Also road users must be involved in the information
collection and planning process
• European Union invested a lot of resources in improving
quality and availability of accident data, mainly through
dedicated research projects
• Observatories of different levels (Continental, National,
Regional, Urban) are fundamental tools
Page 4
The problem of accident data

Why collect data?
Define
problems
Identify risk
factors,
priorities
Formulate
strategy
Set targets
Monitor
performance
Slide 5

Which data to collect?
Only 22% of
countries are able to
provide information
on road traffic
fatalities, non-fatal
injuries, economic
impact and selected
SPIs (WHO)
Slide 6

SOURCE AND TYPES OF
DATA
Slide 7

Who are the main sources?
• Police
• Health authorities
• Transport bodies
• Other stakeholders may include:
– national statistics office,
– the insurance industry,
– non-governmental organizations working for road
safety,
– academic institutions…
Slide 8

Sources and types of data (1)
WHO, 2004
Slide 9
Mod. 2

Sources and types of data (2)
WHO, 2004
Slide 10
Mod. 2

QUALITY OF ACCIDENT DATA
Slide 11

What affects crash data quality?
1. Definitions
2. Reporting/under-reporting of crashes or
injuries
3. Missing data
4. Errors
Slide 12

The definition of road accident fatality
• The classification of the severity of injuries
and crashes vary among countries.
• The range of injury severity categories that
may be used by health professionals or police
officers includes slight/minor, moderate,
serious/severe, and fatal
• The recommended definition of a road traffic
fatality is (WHO 2009):
– “any person killed immediately or dying within 30 days
as a result of a road traffic injury accident, excluding
suicides”
Slide 13

How many countries use ‘died within
30 days’ definition?
Less than half of the 178 countries monitored by WHO use the
recommended definition of a road traffic fatality (WHO, 2009)
Slide 14
When a road fatality is not defined in such a
way, the reported number of fatalities can be
made more accurate by multiplying the reported
number by an appropriate adjustment factor.
(depending on the definition used recommended
by the European Conference of Ministers of
Transport).

Under-reporting of road accidents
• Not all crashes and injuries that occur are
documented in the data system (particularly
for slight injury and PDO).
• Reasons for under-reporting are:
– Police may not be informed when a crash occurs
– Police do not always go to the scene
– Police may go to the crash scene, but not formally
register the crash
– Some data are missing
– Data are not transmitted to Statistical Offices
– Health consequences follow-up not monitored
– Died after 30 days
Slide 15

Methods for assessing under-reporting
• Compare the number of police reports filed on
certain events to those captured in the database.
• Compare the number of road traffic fatalities
and/or injuries counted by one data source,
usually the police database, to those counted in a
survey.
• Compare the number of road traffic fatalities
and/or injuries counted in the police database to
the number counted in other databases.
• Use linkage or capture-recapture methods to
match records from different databases
Slide 16

Mean level of accident reporting by
injury severity
Elvik & Mysen, 1999
Slide 17

Reporting of road accident injuries in
different countries
64
88
21
39
43
37
49
0 20 40 60 80 100
Australia
Canada
Danimarca
Germania
Olanda
Norvegia
USA
Elvik, 2004
Official crash Statistics / Hospitals data
Slide 18

In African Countries 1
Source: WHO, SAFERAFRICA Project

In African Countries 1
7 12
27
47 48
116
181
238
85
332 324
229
264
276
241
311
269
281
0
50
100
150
200
250
300
350
400
Congo,
Dem. Rep.
Central
African
Republic
Gabon Congo, Rep. Cameroon Chad Sao Tome
and Principe
Angola Central
Africa
Reported fatality rates per million population (2013) Estimatedfatality rates per millionpopulation (2013)
Source: WHO, SAFERAFRICA Project

Actions On Data Loss
Slide 21
Lesson 3 18/12/2024
Lost Information Direct/Indirect Actions
Accidents not detected by the police Collect health and insurance statistics
Accidents involving only property
damage
Collect data on all accidents by police
authorities
Accidents not reported Local control of reporting procedures
Data not collected in the field Use of innovative IT tools, Redefine
collected data
Health consequences monitoring not
controlled
Local control of information exchange
procedures with AUSL (local health
services)
Deaths occurring 30 days after the
accident
Collect health statistics data
Field collection errors, Transcription
errors on ISTAT forms
Use of innovative IT tools
Data difficult to collect Use of innovative IT tools

Example: Accident data collection in
Italy
• Road accidents are recorded by 3 police
bodies:
– National Traffic Police
– Local Police
– Carabinieri
• The data related to injury accidents
collected from each body are sent to ISTAT
• During the process, some is lost
Slide 23

Police bodies in Italy
Urban area Outside urban
area
Property
Damage
Only (PDO)
accidents
Local police
On request: Traffic
Police and Carabinieri
-Traffic Police
-Carabinieri
Exception: Local Police
Injury
Accidents
-Local police
-Traffic police
- Carabinieri
-Traffic Police
- Carabinieri
Slide 24

Accident data collection in Italy
Slide 25
National Traffic
Police
Carabinieri
Carabinieri Headquarter
Local Police
Local
Statistics
Office
National
Institute of
Statistics -
ISTAT
Provincial
Monitoring
Centers
ISTAT
Regional
offices
Headquar
ters
Police Data
Center

The ISTAT Ctt/Inc accident form
Location
Vehicles involved
Crash conditions
Injuries
Drivers
Up to 200 «variables» per accident
Occup
ants
Slide 26

ITS for Data Collection
Page 27

• Creation and implementation of traffic
accident databases and of an information
system for road safety at national level
• Creation of the National Centre for Analysis of
Traffic Accidents
Coordinated by CTL
Main partners: IBSR,
IT, SWOV
Page 28
Example of good practices: Cameroon

The network architecture

The modules
Sfinge ©
Statistical
analysis
Collection and management data
module
Documentation
module
Authentication, roles and
security module
Plan
Module
Online help
Integration and
validation data
Hospital services
Module
Geographic
Images module

Hospital Data Collection screenshot

Road Safety Databases
http://istat.maps.arcgis.com/apps/MapSeries/index.html?app
id=b34ba84168da4147b810f0d04f59881d
https://ec.europa.eu/transport/road_safety/specialist/statistics
/map-viewer/
https://extranet.who.int/roadsafety/death-on-the-
roads/#deaths//all
World level
European level
Italian level

3-1 BASIC STATISTICAL
CONCEPTS
Slide 38
Mod. 3

Main contents
• Safety, units, traits and populations
• Recorded and Expected number of accidents
• Random and Systematic variation in accident
counts
• Regression-to-the-mean
Slide 39
Mod. 3

What is SAFETY?
0
1
2
3
4
2001 2002 2003 2004 2005 2006 2007 2008 2009
Recorded number of accident at an intrsection in
Perugia
Recorded number of accidents
Here is a count of injury accidents for an
intersection in Perugia.
What is its SAFETY?
Slide 40
Mod. 3

What is a UNIT?
• … “what is its safety?” implies that SAFETY
is a property of UNITS
• A Unit can be:
– a road segment
– an intersection
– Mr. Mario Rossi
– a car
– etc.
Slide 41
Mod. 3

What is the safety of a Unit?
• …The number of accidents that has been reported at
a certain location during a certain period?
Slide 42
0
1
2
3
4
2001 2002 2003 2004 2005 2006 2007 2008 2009
Recorded number ofaccident at an intrsection in
Perugia
Recorded number of accidents
• The intersection shows
different values of
accidents, and in general
some fluctuations
• If we use the recorded
number of accidents,
that would mean that
safety improved from
2002 to 2003,
deteriorated from 2003 to
2004 etc.
• The probability that the
intersection is chosen for
interventions depends on
the year taken for
reference
Mod. 3

Observed values: the recorded number
of accidents
• The number of accidents that has been
reported at a certain location during a certain
period
• The recorded number ≠ the “true” number
Slide 43
Mod. 3

0
1
2
3
4
2001 2002 2003 2004 2005 2006 2007 2008 2009
Recorded number of accident at an intrsection in
Perugia
Recorded number of accidents Annual mean
18/12/2024
Mod. 2 Slide 44
There are 3 elements in the graph:
1. Observed values ●
2. The invisible (unknown) safety property μ
3. Our estimate of the unknown property ○
Number
of years
Average
value
1 3.0
2 2.5
3 2.7
4 2.3
5 2.0
6 2.0
What if we calculate the average value?

Variation in short-term accidents
frequency
Slide 45
Mod. 3

The Recorded Number of accidents is
“not useful” for safety management…
• … because safety changes even if there is no
change in safety-relevant traits. (exposure,
traffic control, physical features, user
demography, etc.).
• Accidents are (thankfully!) rare events and
their pattern exhibits random fluctuations
• We need a definition of the safety of a unit
such that, as long as the ‘safety-relevant’
traits of the unit do not change, it’s ‘safety’
does not change.
Slide 46
Mod. 3

What is the safety of a Unit?
The safety property of a unit is “the number of
accidents by type and severity, expected to
occur on it in a specified period of time.”
(Hauer, 1997)
It will always be denoted by μ and its estimate
by
Slide 47
“ “
Mod. 3

The ‘safety’ of a unit depends on its
‘traits’
• Mass
• Height
• Engine capacity
• Stiffness
• Colour
• …
Slide 48
Mod. 3

The ‘safety’ of a unit depends on its
‘traits’
Slide 49
• N°of
approaches
• Type of traffic
control
• AADT
• Number of lanes
• Visibility
• Roadside
conditions
• Road surface
condition
• …..
Mod. 3

What is the link between safety and
traits?
• A trait is ‘safety-related’ (s-r) if when it
changes, μ changes.
• Consequence: Units with the same s-r traits
have the same μ (and of course, units that
differ in some s-r traits differ in μ‘s).
Slide 50
Mod. 3

Populations
• Units that share some traits form a
population of units.
• Example: (1) rural, (2) two-lane road
segments in (3) flat terrain
• Because only some traits are common, the
units differ in many safety-related traits and
therefore differ in their μ
Slide 51
Mod. 3

Parameters of populations
We will describe the safety of a
population by:
Mean of μ’s, E{μ} and
Standard deviation of μ’s, σ{μ}
Slide 52
Mod. 3

Notational conventions to remember
μ - the expected number of accidents for a
unit
- estimate of μ . Caret above always
means: estimate of ...
- Mean of μ’s in a population of units.
- standard deviation of μ’s in a
population of units.
Slide 53
Mod. 3

Variation in accident counts
Random variation
=
variation in the recorded
number of accidents
around a given expected
number of accidents
Systematic variation
=
variation in the expected
number of accidents in
time or space between
given units of observation
(drivers, road sections,
modes of travel, etc)
Slide 54
Mod. 3

Why variation is important
Variation must be considered at two critical
points in safety analyses:
1. Identifying the best entities for investment;
2. Evaluating effectiveness of the action.
Slide 55
Mod. 3

RANDOM VARIATION
Slide 56
Mod. 3

Modeling accidents with the Binomial
distribution
18/12/2024
Mod. 2 Slide 57
Parameters:
n ∈ {0,1,2,…} - number of
trials (number of
opportunities for an accident,
i.e. exposure )
p ∈ [0,1] - success probability
for each trial (i.e. probability
of an accident, i.e. accident
risk)
The probability of observing k
accidents:
“n choose k”: represents the number
of combinations of selecting (k) items
from a set of (n) distinct items

From the Binomial distribution to the
Poisson distribution
Consider a set of binomial
trials:
1. Each trial has 2 possible
outcomes: success or
failure (not accident or
accident)
2. The probability of
success (or failure) is the
same at each trial
3. The outcome of each trial
is independent of the
outcome of other trials
• When the probability of
success (risk of accident)
goes toward zero, and
• When the number of trials
(exposure) goes toward
infinity, then
• The binomial distribution
will approach the Poisson
distribution
Slide 58
Mod. 3

Pure random variation: The Poisson
probability model
• The variance of the accidents counts equals
the mean (E)
Var (x)=  = E
x= accidents counts
Slide 59
x!
)
x;
p(X



−

=
=
e
x
Mod. 3

Exercise
• A city’s traffic department reports that in a
particular busy intersection, accidents occur
at an average rate of 2 per week. Let’s
assume the number of accidents follows a
Poisson distribution.
– What is the probability that in a given week, there
will be exactly 3 accidents?
will be 2 or fewer accidents?
will be more than 4 accidents?
18/12/2024
Mod. 2 Slide 60

San Francisco Data (1974-1975)
Number of
Intersections
Number of Accidents
/Intersection In 1974
Average Number of
Accidents/Intersection
in 1975
553 0 0.54
296 1 0.97
144 2 1.53
65 3 1.97
31 4 2.10
21 5 3.24
9 6 5.67
13 7 4.69
5 8 3.80
2 9 6.50
Average 1.142 intersections 1.09
Accidents counted on 1.142 4-legs Stop sign regulated
intersections in San Francisco
Slide 61
(2 intersections had 13 accidents, one had 16)
Mod. 3

Source: Hauer, E., 1986
Number of
Intersections
Number of Accidents
/Intersection in 1975
Average Number of
Accidents/Intersection
in 1976
559 0 0.55
286 1 0.98
144 2 1.41
73 3 1.82
35 4 1.97
18 5 2.50
11 6 3.91
9 7 4.22
3 8 2.00
1 9 3.00
2 10 2.50
1 11 5.00
Slide 62
Mod. 3

Source: Hauer, E., 1986
Number of
Intersections
Number of Accidents
Per Intersection in 1976
Average Number of
Accidents Per
Intersection in 1977
562 0 0.53
287 1 0.94
155 2 1.37
74 3 1.72
33 4 2.61
13 5 3.00
11 6 2.64
4 7 2.25
1 8 1.00
2 9 3.50
Slide 63
Mod. 3

The evolution of the first groups
Slide 64
Mod. 3

Regression-to-the-mean (RTM)
• If, in part or in whole as a result of random
variation, an abnormally high or low number
of accidents has been recorded in a specific
period, the number of accidents in the next
period will return to (regress towards) the
long-term expected value
• High numbers go down, low numbers go up
Slide 65
Mod. 3

Regression-to-the-mean (RTM) and
RTM Bias
Slide 66
Mod. 3

Autre exemple
• Nous avons 100 carrefours dans la même ville
ayant les mêmes caractéristiques (régulation,
flux de trafic, géométrie)
• Le nombre prévu (réel) d'accidents est de 3
accidents par an pour chaque intersection
• En réalité, ils ont des fluctuations aléatoires,
pour lesquelles on peut supposer une
distribution de Poisson :
x!
)
x;
p(X



−

=
=
e
x
Slide 67
Lesson 3 18/12/2024

Chiffres attendus
Nombre
d’accidents X
Probabilité d’avoir une
intersection avec
l’incidence X
Nombre
d’intersections
attendus avec X
accidents
0 0.0498 5
1 0.1494 15
2 0.2240 22
3 0.2240 22
4 0.1680 17
5 0.1008 10
6 0,0504 5
7 0,0216 2
8 0,0081 1
9 0.0040 1
Slide 68
Lesson 3 18/12/2024

Qu'en est-il du traitement de certaines
intersections ?
Nombre
d’accidents
X
Probabilité d’avoir
une intersection avec
X accidents
Nombre
d’intersections
attendus avec X
accidents
0 0.0498 5
1 0.1494 15
2 0.2240 22
3 0.2240 22
4 0.1680 17
5 0.1008 10
6 0,0504 5
7 0,0216 2
8 0,0081 1
9 0.0040 1
Slide 69
Lesson 3 18/12/2024

Traitement de certaines intersections...
• Nous introduisons un feu de circulation à la
place de la régulation par un STOP aux
carrefours où se produisent un certain
nombre d'accidents ≥ 5 (19 cas)
• Supposons que les feux de circulation
réduisent les accidents de 10%.
• Combien d'accidents seront évités si un feu
de circulation est mis en place aux carrefours
avec x ≥ 5 ?
Slide 70
Lesson 3 18/12/2024

Resultats de l’intervention (1)
• Pour les 19 carrefours signalisés, la valeur
moyenne prévue pour l'année suivante (après
l’intervention) est de 2,7 accidents / an
• Le nombre total d'accidents au cours de la
première année dans les 19 carrefours
signalisés a été de 111
• Le nombre total attendu dans les mêmes 19
intersections après l'introduction du feu de
signalisation est de 2,7 x 19 = 51 accidents
18/12/2024
Lesson 3 Slide 71

• La réduction semble avoir été de 54% (de
111 à 51), alors qu'elle est concrètement de
10%.
• Pourquoi ?
• → Si le traitement n'a eu aucun effet et que
rien d'autre n'a changé, combien d'accidents
sont prévus pour la période d’après
traitement" ?
Slide 72
Lesson 3 18/12/2024
Resultats de l’intervention (2)

• Avant: 5*10+6*5+7*2+8*1+9*1=111
accidents.
• S’il est inéfficace, nombre d’accidents prévu
après intervention = 19*3 = 57 accidents.
• Réduction attendue = 19*3*(1-0.9)=5.7
accidents.
• 111-57 = 54 Regression vers la Moyenne !
Slide 73
Lesson 3 18/12/2024
Resultats de l’intervention

Statistical modelling in accidents
• PURE RANDOM VARIATION is usually
modelled by the Poisson probability law.
• SYSTEMATIC VARIATION is modelled by
Multivariate statistical models (also known
as Safety Performance Functions) used to
analyse factors that explain systematic
variation of the number of accidents
Slide 74
Mod. 3

3-2 CONTINGENCY TABLES
Slide 75
Mod. 3

Investigating accident causation
• Case-by-case approach: Accident causes
identified through expert judgement based on
accident reconstruction and causation
analysis
• Statistical approach: the causal relation
between a risk factor and accident
occurrence is not investigated directly, but
inferred from the association between these
two
Slide 76
Mod. 3

Measures of association
• Chi-square
• Risk ratio or Relative risk (RR)
• Odds Ratio (OR)
Slide 77
Mod. 3

Contingency Tables
• They allow for:
• Analysis of accidents frequency (deaths and
injured) relating with two or more variables
• Assessment of the association between the
variables examined
• They can include both category variables and
quantitative discrete or continuous variables
(divided in classes)
Slide 78
Mod. 3

Example of Contingency Tables
Slide 79
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,00 412,500
Total 2,100 574,500 576,600
Absolute
Frequency
Marginal
Frequency
Mod. 3

Association between the variables
• The contingency tables allow to assess if
there is an association between the variables
considered
• E.g., in the previous case: is the use of
seatbelt associated with the accident
consequence?
Slide 80
Mod. 3

Conditions for association (1)
Slide 81
A
B
B1 … Bj … Bc Total
A1 n11 … n1j … n1c n1o
… … … … … … …
Ai ni1 … nij … nic nio
… … … … … … …
Ar nr1 … nrj … nrc nro
Total no1 … noj … noc n..
Having a generic contingency table
Mod. 3

Conditions for association (2)
• B is not associated with A if nij / nio for each j
fixed does not vary with i
• Thus, in symbols:
Slide 82
.
..........
..........
..........
..........
.....
.....
0
1
0
1
10
11
r
r
i
i
n
n
n
n
n
n
=
=
=
=
Mod. 3

In the previous example
Slide 83
Use of
seatbelt
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 576,600
We should compare the same accident
consequence (fatal accident) with the two possible
conditions of seatbelt use (No / Yes)
Mod. 3

The conditions are
• The variable B (Accident consequence) is
associated with the variable A (Use of
seatbelt) if the possible conditions of A (Yes
or No) provide information on the possible
conditions of B (Fatal or Non fatal)
• In numbers:
Slide 84
1,600 / 164,100 = 0,0097  500 / 412,500 = 0,0012
Mod. 3

Degree of association
• It can be measured through statistical tests
calculated based on the difference between
the observed frequencies and the expected
(theoretical) frequencies:
• Observed Frequencies: the effective number of
observations (accidents)
• Theoretical Frequencies: the number of
observations (accidents) expected under the
hypothesis of complete independency between
variables
Slide 85
Mod. 3

Theoretical frequencies
• If the variables are not associated, for the given
frequencies the following relationship would be
valid:
• This formula is used to estimate the theoretical
frequencies
• Expected =(row total X column total)/Grand Total
Slide 86
𝑛𝑖𝑗 =
𝑛𝑖0 ⋅ 𝑛0𝑗
𝑛
Mod. 3

Slide 87
Use of
seatbelt
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 576,600
502
,
163
600
,
576
500
,
574
100
,
164
0
0
=

=

=
n
n
n
n
j
i
ij
Mod. 3

The Pearson 2
• This statistic is based on the differences
between observed and theoretical
frequencies
• The higher is “Chi-Square”, the higher is the
association between the variables
Slide 88
0 
Variable
associated
Variables
independent
Mod. 3

How to calculate 2
• nij = observed frequency for the cell (i,j)
• nij = theoretical frequency for the cell (i,j)
• r = number of rows
• c = number of columns
Slide 89
2 = r
i = 1 c
j =1 (nij - nij)2/ nij
Mod. 3

Slide 90
Use of
seatbelt
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 577,600
Exercise: calculate the Chi-square
Result:
Chi-Square = 2,358
Mod. 3

How to interpretate 2
• To assess the degree of association, the estimated
Chi-square has to be compared with a critical Chi-
square tabled based on «degrees of freedom» (df)
and on the significance level (p = 0,001-0,05):
df = (r-1)*(c-1)
• r = number of rows
• c = number of columns
In the previous example: df = (2-1)*(2-1) = 1
There is association if the calculated chi squares
is greater than the estimate
Slide 91
Mod. 3

Example
Table 2*2 ➔ df = 1
2 critical = 10.83 /
2 estimated = 2,358
Thus the variables are
associated
Slide 92
Mod. 3
This suggests that not using a
seatbelt is associated with a higher
likelihood of fatal injuries.

Risk
• At the road user level, accident involvement
risk is the ratio of two counts, namely, the
number N* of accident-involved road users
and the total number N of all road users
exposed to accident involvement risk during
the study period of one year:
Slide 93
𝑅 = Τ
𝑁∗ 𝑁
Mod. 3

The Relative Risk (RR)
• RR measures the risk of an event occurring
as a result of exposure to one or more causal
factors (e.g. not using seatbelt)
Slide 94
0 
Positive
association
Negative
association
1
No
association
Mod. 3

The Relative Risk formula
Accident No Accident Total
Exposed a b r1
Not exposed c d r2
Total c1 c2 T
Slide 95
If more than two groups are distinguished (risk factor measured at
several levels), one group (e.g. group 1) may be considered as the
reference group (also termed base group) and the analyst may relate
the risk of the other groups to that of the reference group.
𝑅𝑅 =
ൗ
𝑎
𝑟1
ൗ
𝑐
𝑟2
Mod. 3

Let’s calculate the relative risk!
18/12/2024
Mod. 2 Slide 96
Use of
seatbelt
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 577,600

In the example…
• For a driver not wearing the seatbelt, the risk
of death is approximately 8 times higher than
the risk of death for a driver wearing the
seatbelt.
Slide 97
𝑅𝑅 =
ൗ
1,600
164,100
ൗ
500
412,500
= 8.04
Mod. 3

Odds
• At the road user level, Odds are the ratio
between the number N* of accident-involved
road users and the number of road users not
involved in accidents:
Slide 98
𝑅 = Τ
𝑁∗ (𝑁−𝑁∗)
Mod. 3

The Odds Ratio (OR)
• OR represents the odds that an outcome will
occur given a particular exposure, compared
to the odds of the outcome occurring in the
absence of that exposure
Slide 99
0 
Positive
association
Negative
association
1
No
association
Mod. 3

The Odds Ratio formula
Accident No Accident Total
Exposed a b r1
Not exposed c d r2
Total c1 c2 T
Slide 100
OR =
ൗ
𝑎
𝑏
ൗ
𝑐
𝑑
Mod. 3

Let’s calculate the Odds Ratio!
18/12/2024
Mod. 2 Slide 101
Use of
seatbelt
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 577,600

Odds Ratio…
• For a driver not wearing the seatbelt, the
odds of death is approximately 8 times more
frequent than the odds of death for a driver
wearing the seatbelt.
Slide 102
𝑂𝑅 =
ൗ
1,600
162,500
ൗ
500
412,000
=8.11
Mod. 3

Levels of association
Slide 103
Mod. 3

Exercise 1
18/12/2024
Exercise Slide 104
Road
Geometry
Fatal Injury Non-Fatal
Injury
Total
Straight 8 20 28
Curved 18 12 30
Total 26 32 58
1. Calculate the value of Chi-Square 2
and Determine the association between
road geometry and crash severity
2. Calculate the risk ratio and odds ratio

Exercise 2
18/12/2024
Exercise Slide 105
Pavement
Condition
Fatal Injury
Non-Fatal
Injury
Total
Wet 15 25 40
Dry 5 45 50
Frozen 10 10 20
Total 30 80 110
1. Calculate the value of Chi-Square 2 and
Determine the association between road pavement
conditions and crash severity
2. Calculate the odds of wet vs dry pavement

3-3 SAFETY PERFORMANCE
FUNCTIONS
Slide 106
Mod. 3

Summary
• Distribution of accidents
– Poisson distribution
• Statistical modelling of systematic variation
– Estimating the E{μ} and the σ{μ} of a population
– Accident prediction models, Safety Performance
Functions (SPF)
• Statistical modelling of random variation
– The Empirical Bayes method
Slide 107
Mod. 3

Variation in accident counts
• Random variation =
variation in the
recorded number of
accidents around a
given expected number
of accidents
• Systematic variation =
variation in the
expected number of
accidents in time or
space between given
units of observation
(drivers, road sections,
modes of travel, etc)
Slide 108
Mod. 3

Statistical modelling of systematic
variation
Slide 109
Mod. 3

Statistical road safety modelling
(SRSM)
• It is the fitting of a statistical model to data:
Accidents prediction models
• Data are about past accidents and traits for a
set of road elements
• Two uses of SRSM:
– To estimate the expected number of accidents over a
given time period on an infrastructure based on its
traits
– To estimate the change in the expected number of
accidents over a given time period on an
infrastructure caused by a change in its traits
Slide 110
Mod. 3

What is a Safety Performance Function
SPM is “a device which for a multitude of
populations provides estimates of two elements:
1. E{μ}, the mean of the μ’s in populations;
2. σ{μ}, the standard deviation of the μ’s in these
populations.”
Hauer, 2014
Slide 111
Mod. 3

Accidents Predictions models
Regression models
• Use historic accident data collected at sites
with similar roadway characteristics
• Answer ‘What Is the Relationship Between
the Variables?’
• Equation Used
– 1 Numerical Dependent (Response) Variable
– 1 or More Numerical or Categorical Independent
(Explanatory) Variables
Slide 112
Mod. 3

Model equations
𝑌 = 𝑓(𝑋1, 𝑋2, 𝑋3 , ……, 𝑋𝑛, β1, β2, β3 ,………, β𝑛 )
• The SRSM model is that of curve fitting
Slide 113
Dependent
(Response)
Variable
Independent
(Explanatory)
Variable
Parameters
Mod. 3

Variables
• Numerical (or continuous)
• Categorical (or discrete)
– Count data (e.g. 0, 1, 2, 3, …)
– Binomial data (e.g. 0 or 1)
– Censored data (e.g. >0)
• The type of dependent variable largely
determines the type of model
Slide 114
Mod. 3

Explanatory variables ( Xi )
• Variables commonly included in accident models:
– A measure of traffic volume, usually AADT
– Variables describing cross-section (lane width, number of
lanes)
– Variables describing traffic control (speed limit is most
common)
– Variables describing type of land use (rural, urban)
• Variables often missing from accident models:
– Traffic volume of pedestrians and cyclists
– Variables describing road user behaviour (exceeding in
speed, etc)
– Variables describing safety measures on the road
Slide 115
Mod. 3

Dependent variables ( Y )
• Total number of accidents (mixing all levels of severity)
• Groups of accidents formed according to (for example):
– Accident severity (property damage, injury, fatal)
– Type of accident (pedestrian, cyclist, motor vehicle only)
– Type and severity combined
• Accident rate (accidents per million vehicle kilometres)
Accident rates are rarely used as dependent variable in
recent models, as any rate relies on an assumption of
linearity that may not be correct
Slide 116
Mod. 3

Example of Variables
Slide 117
Mod. 3
Inputs
Explanatory variables Response
variable
Traffic Data Infrastructure data Environment data Crash data
Flow data Speed Key parameters Road quality
parameters
Pavement
- AADT
- Hourly and
daily traffic
- Pedestrian
flows along
and across
- Vehicle
kilometers
travelled
- Traffic flows
for all road
users
(Data for night and
day)
- Free flow
speed
(headwa
y > 5 s)
- Average
speed for
each
road user
(Data for night
and day)
- Segment
length
- Median type
and median
width
- Number of
lanes and lane
width
- Shoulder width
(right shoulder
and left
shoulder)
- Vertical grades
(%)/ degree of
hilliness
- Access density
- Junction
density
- Degree of
horizontal and
vertical
curvature
- Bend density
- Land use type
- Bus stops
Presence of:
- Safety
barriers
- Roadside
hazards
- Road
markings and
signs
- Pedestrian
crossing
facilities
- Sidewalks
- International
roughness
coefficient
- Surface type or
pavement type
- Surface
condition or
pavement
condition index
- Rainfall data:
average rainy
days per year:
monthly rainfall
events
- Wind data:
average number
of windy days
per year (if exist)
A Proxy on post crash
care advancement
would be included
- Total
crashes
- Number of
Fatal
crashes
- Number of
Injury
crashes
- Number of
casualties
(Fatal,
serious
injury) and
type (road
user)
- Injury
casualties
- Vehicle type
for fatalities
- Collision
type
- Accident
location

Variables types example
• Numerical e.g. Speed, volume, lenght
• Categorical (or discrete)
– Count data: number of intersections, number of
parkings
– Binomial data e.g presence of junction, bus stops,
safety barriers
– Censored data e.g a Proxy on post crash care
advancement
Slide 118
Mod. 3

Common measures of exposure
• AADT
• Entering vehiclesmajor, entering vehiclesminor
• Annual kilometres of driving
• Often mixes very different types of road users and may not
include all of them (pedestrians and cyclists are rarely
counted)
• Averages over conditions representing different levels of risk
• Relationship to the number of accidents is often highly non-
linear
• Different composite measures of exposure can be developed
Mod. 3

How to build a SPF
120
Period:1994-1998; Segment Length: 0,5 to
1,0 miles; N=2.228 segments.
AADT Bins No. of I&F
accidents
No. of 0.5-1.5
mile segments
0-1.000 376 975 0,39
1.000-2.000 445 466 0,95
... ... ... ...
9.000-10.000 102 19 5,37
10.000-11.000 81 18 4,50
... ...
Data
Bins
and
Computations
( )
μ
Ê
Hauer, 2014
18/12/2024 Slide 120
An average segment in this bin had
102/19=5.37 I&F crashes in 5 years.
Mod. 3

AADT Bins
0-1.000 0,39
1.000-2.000 0,95
... ...
9.000-10.000 5,37
10.000-11.000 4,50
Ordinate, , is
estimate of
average number
of crashes/
segment in bin
Ê{μ}
Ê{μ}
Slide 121
Mod. 3

SPM development
Model equation selection
Data for selected
variables
Parameters
estimation
Slide 122
𝑌 = 𝑓(𝑋1, 𝑋2, 𝑋3 , ……, 𝑋𝑛, β1, β2, β3 ,………, β𝑛 )
N= 𝐿 ∗ (β1𝑋1 + β2, 𝑋2)
N= 𝐿 ∗ (𝛽0 𝑋1
𝛽1
)
N= 𝐿 ∗ (𝑒𝛽0𝑒𝛽1𝑥1)
Mod. 3

The estimate of {}
AADT Bins
I&F
acc. Segments S2
... ... ... ... ...
9K-10K 102 19 5.37 ... 35.18 ±5.46
... ... ...
 
μ
Ê  
μ
σ̂
  counts
accident
of
mean
Sample
counts
accident
of
variance
Sample
μ
σ̂ −
=
Slide 123
Mod. 3

Safety Performance Function and
AADT
• The most common formulation is
N = a* (AADT)b
– Depedent variable is N is the predicted crash
frequency over a given time period,
– Explanatory variable is AADT the average
annual daily traffic volume
– a, b regression coefficients
Slide 124
Mod. 3

Examples of functions
Effect of flow: Elasticity b
Accidents with injuries 0,911
Car occupants injured 0,962
Injured Motorcyclists 0,749
Cyclists injured 1,079
Pedestrians injured 1,109
Multi-vehicle injury accidents 1,032
Single-vehicle accidents 0,804
Fridstrom, 1999 - Norway
Pagina 125
Mod. 3

The contribution of traffic volume to
explaining systematic variation of the
number of accidents
Pagina 126
Mod. 3

Graphically
Elvik, 2004
0
20
40
60
80
100
1 50 99
Relative
number
of
accidents
Relative traffic volume
Injury accidents
Fatal accidents
79.4
25.9
Slide 127
N = AADT b
Mod. 3

Schematically ...
• By varying traffic volume we move along the curve
• Varying other factors will change the slope and / or
the shape of the curve
Traffic volume
Accidents
Slide 128
Mod. 3

It’s useful to note that:
Generally, significant increases in traffic volumes,
corresponds to an increase of accidents, but a decrease
in the accident rate (angular coeff. in the figure)
Traffic volume
Accidents
Slide 129
Mod. 3

18/12/2024
Mod. 1 Pagina 130
Let us test our understanding

Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
1. What is the general name given to this type of equations?

Mod. 3
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
1. What is the general name given to this type of equations?
Ans: Safety performance function; Crash/accident prediction model

Mod. 3
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
2. What is the general name given to the variables Q, L, V or G?
a) Dependent variables
b) Explanatory variables
c) Response variable
d) Categorical variables

Mod. 3
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
2. What is the general name given to the variables Q, L, V or G?
a) Dependent variables
b) Explanatory variables (or predictor variables)
c) Response variable
d) Categorical variables

Mod. 3
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
3. The variable 𝑮𝒊 is best described as a ?
a) Numerical variable
b) Categorial variables
c) Continuous variable

Mod. 3
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
4. Which of these varaibles have the highest effect on
accidents?
a) Volume (Q)
b) Speed (V)
c) Length (L)

Mod. 3
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
5. Keeping all other variables constant, A 10% change in
one of these variables leads to the same 10% change in
Crashes, which variable is this?
a) Volume (Q)
b) Speed (V)
c) Length (L)

Mod. 3
A= 𝟑. 𝟐𝟖𝟏 × 𝟏𝟎−𝟕 × 𝑸𝟎.𝟕𝟐𝟕 × 𝑳𝟏.𝟎𝟎𝟎 × 𝑽𝟐.𝟒𝟕𝟗 × 𝑮𝒊
5. Keeping all other variables constant, A 10% change in
one of these variables leads to the same 10% change in
Crashes, which variable is this?
a) Volume (Q) (10 % change leads to ( 7.2%)
b) Speed (V) (10% change leads to 26.6% change)
c) Length (L) (10% change leads to 10% change)

2023 RS_3 - Data Analysis Methodologies.pdf

More Related Content

Similar to 2023 RS_3 - Data Analysis Methodologies.pdf (20)

Recently uploaded (20)

2023 RS_3 - Data Analysis Methodologies.pdf