SlideShare a Scribd company logo
FATAL OR INJURY- A CASE OF DECIDING
ON PRIORITIZING RESPONDER RESOURCES
By
Piyush Lohana
Maximum accidents in the year 2007 happened due to motor vehicles.
WHY THIS PROJECT
• “Every 12 minutes someone dies in a car crash in the United States due to a car accident or a collision
between two motor vehicles.” (-NCIPC)
• Most of times the accidents are fatal or involve serious injuries and by the time the help arrives at the crash
site, a lot of loss has been done.
• We attempt to build a model that can predict the seriousness of an accident case (i.e. if an accident is fatal
or results in injury) based on the various predictors like rush or no rush hour, work zone, weather
conditions, speed limits, interstate etc.
• This helps to prioritize situations and allocates resources in scenarios where there is a high possibility of an
accident resulting in fatalities or serious injury.
• This will enable the emergency care provider on focusing on the measures and resource that can be taken
when they arrive at the scene. The accuracy of pre-hospital crash scene details and crash victim assessment
has important implications on the care that can be provided at the time of the crash scene.
WHAT ARE WE CONSIDERING
• We will be looking at the characteristics of the environment in which the accident
occurred (weather, road condition, type of road, time of day, the day of the week, and
month of the year) and the characteristics of the crash (direction of accident, speed
limit on the road, work zone area, and how many vehicles were involved).
• All of these variables can effect in what kind of accident has occurred (no injury,
injury or fatal). This can further help the medic’s team to come prepared for the
necessary actions that need to be taken at the scene.
DATA SOURCE
• http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=1158
• It has 24 different attributes and 42,183 records
• Identified Predictor and Outcome Variables
CLEAR DESCRIPTION OF DATA SET
Sl. No Variables Description
1 HOUR_I_R 1=rush hour, 0=not (rush = 6-9 am, 4-7 pm)
2 ALIGN_I 1 = straight, 2 = curve
3
STRATUM_R
1= NASS Crashes Involving At Least One Passenger
Vehicle towed due to damage from the crash scene and no
medium or heavy trucks are Involved, 0=not
4 WRK_ZONE 1= yes, 0= no
5 WKDY_I_R 1=weekday, 0=weekend
6 INT_HWY Interstate? 1=yes, 0=no
7
LGTCON_I_R
Light conditions - 1=day, 2=dark (including dawn/dusk),
3=dark, but lighted,4=dawn or dusk
8 MAN_COL_I 0=no collision, 1=head-on, 2=other form of collision
9 PED_ACC_R 1=pedestrian/cyclist involved, 0=not
10
REL_JCT_I_R
1=accident at intersection/interchange, 0=not at
intersection
CLEAR DESCRIPTION OF DATA SET
Sl. No Variables Description
11 SPD_LIM Speed limit, miles per hour
12
SUR_CON
Surface conditions (1=dry, 2=wet, 3=snow/slush, 4=ice,
5=sand/dirt/oil, 8=other, 9=unknown)
13 TRAF_WAY 1=two-way traffic, 2=divided hwy, 3=one-way road
14 VEH_INVL Number of vehicles involved
15
WEATHER_R
1=no adverse conditions, 2= rain, snow or other adverse
condition
16 INJURY_CRASH 1=yes, 0= no
17 NO_INJ_I Number of injuries
18 FATALITIES 1= yes, 0= no
19 MAX_SEV_IR 0=no injury, 1=non-fatal inj., 2=fatal inj.
FILTERING DATA
• Filtering method used is "Standard Deviations from the
Mean",
• This will eliminate the observations that are farther than
three standard deviations from their means.
DATA PARTITIONING
• We build the model with Training Data
• Test its correctness with Test Data
• Validate it with Validation Data
PREDICT, CLASSIFY OR CLUSTER ?
As we are trying to predict the categorical class label MAX_SER_INJ, our analysis is
supervised classification.
Our model intends to discover relationships between the attributes that would make it
possible to predict the outcome variable.
MODEL
The following three models are used for our analysis
• Memory Based Reasoning(MBR)
• Decision Trees
• Logistic Regression
FINAL MODEL
RESULTS AND DISCUSSION
BASELINE MISCLASSIFICATION
• MAX_SEV_IR- 0=no injury, 1=non-fatal inj., 2=fatal inj.
• Class 0 (No injury): 4949
• Class 1(Non-fatal injury): 4900
• Class 2 (Fatal Injury): 150
• The majority class is 0 (No injury)
• The percentage of majority class in the dataset is: 49.49 % (4949/9999)
• The baseline misclassification rate: 50.51 %
• This is the baseline, the model that we build will make any sense if its
misclassification rate is less than baseline misclassification.
OUR DEFINITION OF BEST MODEL AS PER BUSINESS
REQUIREMENT
• Decision Tree : A supervised learning data driven method for classification
• It is based on separating observations into more homogeneous subgroups by creating splits
on predictors.
• As Per our business requirement , this model is best in classifying the event of accident into
three cases to prioritize resources.
RESULTS
The _MISC_ Misclassification rate :
• Training: 0.40945
• Validation: 0.4113
• Test: 0.42305
Data Mining Project-Predicting Injury or Fatality in case of an accident
NODE RULES
INTERPRETATION AND IMPLEMENTATION
• Based on this rules, an application/website can be created which upon
entering all the 5 most important factors(Predictors) will give an idea of the
percentage of chances of an accident resulting in Fatality/Injury/No Injury.
• The emergency service provider can then take a decision and send the
response team to the site of an accident accordingly.
BLUE PRINT OF IMPLEMENTATION
OUTCOME
• Depending on the Node Rule, it will predict the outcome
• Red Cross predict’s there are 80% chances of Injury
• Red Cross predict’s there are 10 % chances of Fatality
• Red Cross predict’s there are 10 % chances of No injury
SCOPE FOR IMPROVEMENT
• In order to build more focused and rigorous model, we are working on identifying more predictors that
can help determine the status of accident and a more clean model that has a less misclassification.
• In order to achieve this, we intend to try Neural Network data mining algorithm.
THANK YOU

More Related Content

PDF
Designing emergency medical service systems to enhance community resilience
PDF
Delivering emergency medical services:Research, theory, and application
PDF
Crash studies chapter ten of transportation
PPTX
Accident dtection using opencv and using AI
PPTX
Applying Safety Data and Analysis to Performance-based Transportation Planning
PPTX
Predictive analysis of traffic violations
PDF
ROAD SAFETY BY DETECTING DROWSINESS AND ACCIDENT USING MACHINE LEARNING
PDF
IRJET - Road Accident and Emergency Management: A Data Analytics Approach
Designing emergency medical service systems to enhance community resilience
Delivering emergency medical services:Research, theory, and application
Crash studies chapter ten of transportation
Accident dtection using opencv and using AI
Applying Safety Data and Analysis to Performance-based Transportation Planning
Predictive analysis of traffic violations
ROAD SAFETY BY DETECTING DROWSINESS AND ACCIDENT USING MACHINE LEARNING
IRJET - Road Accident and Emergency Management: A Data Analytics Approach

Similar to Data Mining Project-Predicting Injury or Fatality in case of an accident (20)

PPTX
JiaxuZhou_GRAPoster
PPTX
Improving the understanding of safety performance of commercial motorcycles i...
PPTX
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
PPTX
Storm Prediction data analysis using R/SAS
PPTX
Analytical frameworks Lecture of Traffic Safety.pptx
PPTX
ICT Year 11 Presentation on Expert Systems
PDF
IRJET- Road Traffic Prediction using Machine Learning
PDF
Ieeepro techno solutions 2013 ieee embedded project decision making in coll...
PDF
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
PPTX
Presentation 5.pptx
PDF
Machine Learning statistical model using Transportation data
PDF
Summer Program on Transportation Statistics, What governs Highway Crashes Rec...
PDF
Summer Program on Transportation Statistics, Why Highway Crashes Have Recurri...
PDF
Federal Highway Administration Initiative and Proven Countermeasures
PPTX
Modeling Road Traffic Accidents
PPTX
Decentralized system to compute safest route
PDF
Sensor Based Detection & Classification of Actionable & Non-Actionable Condit...
PDF
SunGard Risk Assessment Module
PDF
IRJET- Measuring The Driver's Perception Error in the Traffic Accident Risk E...
PDF
An efficient automotive collision avoidance system for indian traffic conditions
JiaxuZhou_GRAPoster
Improving the understanding of safety performance of commercial motorcycles i...
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Storm Prediction data analysis using R/SAS
Analytical frameworks Lecture of Traffic Safety.pptx
ICT Year 11 Presentation on Expert Systems
IRJET- Road Traffic Prediction using Machine Learning
Ieeepro techno solutions 2013 ieee embedded project decision making in coll...
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
Presentation 5.pptx
Machine Learning statistical model using Transportation data
Summer Program on Transportation Statistics, What governs Highway Crashes Rec...
Summer Program on Transportation Statistics, Why Highway Crashes Have Recurri...
Federal Highway Administration Initiative and Proven Countermeasures
Modeling Road Traffic Accidents
Decentralized system to compute safest route
Sensor Based Detection & Classification of Actionable & Non-Actionable Condit...
SunGard Risk Assessment Module
IRJET- Measuring The Driver's Perception Error in the Traffic Accident Risk E...
An efficient automotive collision avoidance system for indian traffic conditions
Ad

Recently uploaded (20)

PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPTX
Construction Project Organization Group 2.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPT
Mechanical Engineering MATERIALS Selection
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Current and future trends in Computer Vision.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
additive manufacturing of ss316l using mig welding
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Construction Project Organization Group 2.pptx
CH1 Production IntroductoryConcepts.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
UNIT 4 Total Quality Management .pptx
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
Mechanical Engineering MATERIALS Selection
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Current and future trends in Computer Vision.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
additive manufacturing of ss316l using mig welding
Ad

Data Mining Project-Predicting Injury or Fatality in case of an accident

  • 1. FATAL OR INJURY- A CASE OF DECIDING ON PRIORITIZING RESPONDER RESOURCES By Piyush Lohana
  • 2. Maximum accidents in the year 2007 happened due to motor vehicles.
  • 3. WHY THIS PROJECT • “Every 12 minutes someone dies in a car crash in the United States due to a car accident or a collision between two motor vehicles.” (-NCIPC) • Most of times the accidents are fatal or involve serious injuries and by the time the help arrives at the crash site, a lot of loss has been done. • We attempt to build a model that can predict the seriousness of an accident case (i.e. if an accident is fatal or results in injury) based on the various predictors like rush or no rush hour, work zone, weather conditions, speed limits, interstate etc. • This helps to prioritize situations and allocates resources in scenarios where there is a high possibility of an accident resulting in fatalities or serious injury. • This will enable the emergency care provider on focusing on the measures and resource that can be taken when they arrive at the scene. The accuracy of pre-hospital crash scene details and crash victim assessment has important implications on the care that can be provided at the time of the crash scene.
  • 4. WHAT ARE WE CONSIDERING • We will be looking at the characteristics of the environment in which the accident occurred (weather, road condition, type of road, time of day, the day of the week, and month of the year) and the characteristics of the crash (direction of accident, speed limit on the road, work zone area, and how many vehicles were involved). • All of these variables can effect in what kind of accident has occurred (no injury, injury or fatal). This can further help the medic’s team to come prepared for the necessary actions that need to be taken at the scene.
  • 5. DATA SOURCE • http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=1158 • It has 24 different attributes and 42,183 records • Identified Predictor and Outcome Variables
  • 6. CLEAR DESCRIPTION OF DATA SET Sl. No Variables Description 1 HOUR_I_R 1=rush hour, 0=not (rush = 6-9 am, 4-7 pm) 2 ALIGN_I 1 = straight, 2 = curve 3 STRATUM_R 1= NASS Crashes Involving At Least One Passenger Vehicle towed due to damage from the crash scene and no medium or heavy trucks are Involved, 0=not 4 WRK_ZONE 1= yes, 0= no 5 WKDY_I_R 1=weekday, 0=weekend 6 INT_HWY Interstate? 1=yes, 0=no 7 LGTCON_I_R Light conditions - 1=day, 2=dark (including dawn/dusk), 3=dark, but lighted,4=dawn or dusk 8 MAN_COL_I 0=no collision, 1=head-on, 2=other form of collision 9 PED_ACC_R 1=pedestrian/cyclist involved, 0=not 10 REL_JCT_I_R 1=accident at intersection/interchange, 0=not at intersection
  • 7. CLEAR DESCRIPTION OF DATA SET Sl. No Variables Description 11 SPD_LIM Speed limit, miles per hour 12 SUR_CON Surface conditions (1=dry, 2=wet, 3=snow/slush, 4=ice, 5=sand/dirt/oil, 8=other, 9=unknown) 13 TRAF_WAY 1=two-way traffic, 2=divided hwy, 3=one-way road 14 VEH_INVL Number of vehicles involved 15 WEATHER_R 1=no adverse conditions, 2= rain, snow or other adverse condition 16 INJURY_CRASH 1=yes, 0= no 17 NO_INJ_I Number of injuries 18 FATALITIES 1= yes, 0= no 19 MAX_SEV_IR 0=no injury, 1=non-fatal inj., 2=fatal inj.
  • 8. FILTERING DATA • Filtering method used is "Standard Deviations from the Mean", • This will eliminate the observations that are farther than three standard deviations from their means.
  • 9. DATA PARTITIONING • We build the model with Training Data • Test its correctness with Test Data • Validate it with Validation Data
  • 10. PREDICT, CLASSIFY OR CLUSTER ? As we are trying to predict the categorical class label MAX_SER_INJ, our analysis is supervised classification. Our model intends to discover relationships between the attributes that would make it possible to predict the outcome variable.
  • 11. MODEL The following three models are used for our analysis • Memory Based Reasoning(MBR) • Decision Trees • Logistic Regression
  • 14. BASELINE MISCLASSIFICATION • MAX_SEV_IR- 0=no injury, 1=non-fatal inj., 2=fatal inj. • Class 0 (No injury): 4949 • Class 1(Non-fatal injury): 4900 • Class 2 (Fatal Injury): 150 • The majority class is 0 (No injury) • The percentage of majority class in the dataset is: 49.49 % (4949/9999) • The baseline misclassification rate: 50.51 % • This is the baseline, the model that we build will make any sense if its misclassification rate is less than baseline misclassification.
  • 15. OUR DEFINITION OF BEST MODEL AS PER BUSINESS REQUIREMENT • Decision Tree : A supervised learning data driven method for classification • It is based on separating observations into more homogeneous subgroups by creating splits on predictors. • As Per our business requirement , this model is best in classifying the event of accident into three cases to prioritize resources.
  • 16. RESULTS The _MISC_ Misclassification rate : • Training: 0.40945 • Validation: 0.4113 • Test: 0.42305
  • 19. INTERPRETATION AND IMPLEMENTATION • Based on this rules, an application/website can be created which upon entering all the 5 most important factors(Predictors) will give an idea of the percentage of chances of an accident resulting in Fatality/Injury/No Injury. • The emergency service provider can then take a decision and send the response team to the site of an accident accordingly.
  • 20. BLUE PRINT OF IMPLEMENTATION
  • 21. OUTCOME • Depending on the Node Rule, it will predict the outcome • Red Cross predict’s there are 80% chances of Injury • Red Cross predict’s there are 10 % chances of Fatality • Red Cross predict’s there are 10 % chances of No injury
  • 22. SCOPE FOR IMPROVEMENT • In order to build more focused and rigorous model, we are working on identifying more predictors that can help determine the status of accident and a more clean model that has a less misclassification. • In order to achieve this, we intend to try Neural Network data mining algorithm.