SlideShare a Scribd company logo
Predictive Analysis of Traffic violations
Group 1
Introduction
Road accidents - big worldwide threat:
● Up to 1.27 million deaths/year [2].
● Up to 50 million injuries/year [2].
● Over 2.5 millions/year involved in US [2].
● Huge economic and social impact.
Data mining is arduous for traffic violations:
● Huge data size and high dimensions [2-4].
● Popularity of classification methods [3-5].
● High dependence on collected data [3-5].
● Testing multiple to choose the best [3-5].
Flow of Project
Project followed CRISP-DM model and provided instructions
A. Selecting a dataset from any free source on web.
B. Study the variables in the selected dataset; draw
preliminary conclusions; and develop at least three
initial research questions/hypotheses.
C. Develop a Business Use Case.
D. Use the visualization (min of 5) to explore dataset;
present research hypotheses, based on the
visualizations.
E. Produce a dataset satisfying all of the criteria in part c.
F. Present 3 modeling techniques for hypotheses.
G. Develop models using 3 algorithm for each model.
H. Provide recommendations based on modeling results.
Dataset Chosen
Traffic Violations of MCP: records from 2012 to 2018
from https://catalog.data.gov/ (Jan 17, 2018)
Attributes:
● Date of Stop
● Time of Stop
● Belts
● Contribution Accident
● Description
● Phone Used
● Fatal
● Property damage
● Violation Type
● Hazmat
Data Preprocessing
Missing Data Handling
● Two attributes i.e. “Agency” and “Accident” are removed from the dataset for being
single-valued attributes with value 'No' for every one of the records.
● Records with Null values are evacuated utilizing XL miners missing data treatment.
New Attributes
● Three new binary-valued variables are presented to be specific, "Phone Usage", "
Contributed to accident " and "Fatal" in light of the visualizations made for the
dataset.
Outliers detection
● Outliers are identified for the attribute "Year" and treated.
Filtering Datasets
● For each search question separate datasets were created for modeling.
● R programming language is utilized to accomplish balanced dataset with an
equivalent weight of target variables.
Visualizations - Violations based on hours
Visualizations - Child violations with year & month
Visualizations - Phone violations with year & month
Visualizations - Violations with Personal injury & seat belt
Search Hypothesis
1
Whether a violation
occurred contributed to
an accident or not
Whether the violation
that led to an accident
was fatal or not
2
The geolocation, at
which a violation is
likely to occur based on
several factors
3
Modeling
Model 1
Predictors:
● Belt
● Alcohol
● Vehicle Type
● Year
● Phone Usage
Target variable:
Contributed to accident
Model 2
Predictors:
● Belt
● HAZMAT
● Alcohol
● Phone Usage
Target variable:
Fatal
Model 3
Predictors:
● Belt, Personal Injury
● Property Damage
● Alcohol
● Violation Type
Target variable:
Cluster ID (geo-coordinates)
Model I: contrib. to accident
Algorithm Single Tree Random Trees Naïve Bayes
Precision 0.626 0.503 0.621
Sensitivity 0.275 0.996 0.266
Specificity 0.836 0.023 0.838
F1-Score 0.383 0.669 0.373
Belts
Phone usage
Alcohol
Type:
Bus
Best model: Random Trees due to highest
sensitivity and F1-score
Belts and phone usage are the top predictors
for contribution to accident
Model II: Fatality of accident
Alcohol
Phone usage
HAZMAT
Algorithm Single Tree Random Trees Naïve Bayes
Precision 0.579 0.639 0.579
Sensitivity 0.79 0.383 0.543
Specificity 0.404 0.776 0.59
F1-Score 0.668 0.478 0.561
Best model: Single Tree due to highest
sensitivity and F1-score
Alcohol and phone usage are the top predictors for fatal accidents
Model III: geolocation for “likely” violations
● K-means clustering and
Single Tree classification are
best algorithms due to
unbiased results
● Cluster 5 has the highest
number of alcohol violations
and personal injuries
● Cluster 9 has the highest
number of belt violations
Recommendations
> Recommend MCP to increase their attention and increase enforcement of rules
for belts and phone usage as the main reason for accident contribution
> Recommend MCP to pay an extra attention and take extra measures to drunk
drivers and phone usage when driving
> Alert Maryland police about the major areas violations are likely to be caused:
>> high number of alcohol violations with personal injuries in cluster 5
>> multiple belt violations in cluster 9
> Perform similar recommendations to the insurance companies in the state of
Maryland
Conclusion
● 5 visualizations have been produced and 3 research hypotheses developed;
● Data preprocessed in several datasets;
● 3 modeling technique is performed for hypotheses;
● Several classification, clustering and regression models have been considered for
modeling: Single Tree, Random Trees, K-Means clustering, Multiple regression, etc.;
● Random Trees and Single Tree are the best algorithms for models 1 and 2 due to an
importance of high sensitivity and a high F1-score;
● K-means clustering and Single Tree classification have been considered as the best
algorithms for model 3 providing numbers of different types of violations along with the
number of injuries for various clusters;
● XLMiner, Tableau, and R are used for analysis.
References
● Discovering Knowledge in Data: An Introduction to Data Mining, Daniel T. Larose and Chantal D. Larose,
Wiley, 2nd edition: Wiley ISBN 978-0-470-90874-7
● Abellan, J., Lopez, G., & De O~na, J. (2013). Analysis of traffic accident severity using Decision Rules via
Decision Trees. Expert Systems with Applications, 40, 6047–6054.
● Chang, L.-Y., & Chien, J.-T. (2015). Analysis of driver injury severity in truck-involved accidents using a non-
parametric classification tree model. Safety Science, 51(1), 17–22.
● Chen, W. H., & Jovanis, P. P. (2012). Method for identifying factors contributing to driver injury severity in
traffic crashes. Transportation Research Record, 1717, 1–9.
● Kashani, A. T., Rabieyan, R., & Besharati, M. M. (2014). A data mining approach to investigate the factors
influencing the crash severity of motorcycle pillion passengers. Journal of Safety Research, 51, 93–98.
● Kwon, O. H., Rhee, W., & Yoon, Y. (2015). Application of classification algorithms for analysis of road safety
risk factor dependencies. Accident Analysis and Prevention, 75, 1–15.
● Xie, Y., Zhang, Y., & Liang, F. (2009). Crash injury severity analysis using Bayesian ordered probit models.
Journal of Transportation Engineering ASCE, 135(1), 18–25.
● Mujalli, M. O., & de O~na, J. (2011). A method for simplifying the analysis of traffic accidents injury severity
on two-lane highways using Bayesian networks. Journal of Safety Research, 42, 317–326.
● De O~na, J., Lopez, G., & Abellan, J. (2013). Extracting decision rules from police accident reports through
decision trees. Accident Analysis & Prevention, 50, 1151–1160.
QUESTIONS ?

More Related Content

PDF
Study On Traffic Conlict At Unsignalized Intersection In Malaysia
PPT
Chyi lee
PDF
Analysis of Roadway Fatal Accidents using Ensemble-based Meta-Classifiers
PDF
ANALYSIS OF ROADWAY FATAL ACCIDENTS USING ENSEMBLE-BASED META-CLASSIFIERS
PDF
Defect Prediction & Prevention In Automotive Software Development
PDF
IRJET- Algorithms for the Prediction of Traffic Accidents
PDF
IRJET- Road Traffic Prediction using Machine Learning
PDF
IRJET- Smart Automated Modelling using ECLAT Algorithm for Traffic Accident P...
Study On Traffic Conlict At Unsignalized Intersection In Malaysia
Chyi lee
Analysis of Roadway Fatal Accidents using Ensemble-based Meta-Classifiers
ANALYSIS OF ROADWAY FATAL ACCIDENTS USING ENSEMBLE-BASED META-CLASSIFIERS
Defect Prediction & Prevention In Automotive Software Development
IRJET- Algorithms for the Prediction of Traffic Accidents
IRJET- Road Traffic Prediction using Machine Learning
IRJET- Smart Automated Modelling using ECLAT Algorithm for Traffic Accident P...

Similar to Predictive analysis of traffic violations (20)

PPTX
Accident dtection using opencv and using AI
PDF
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
PDF
Analysis of Machine Learning Algorithm with Road Accidents Data Sets
PDF
A detection model of aggressive driving behavior based on hybrid deep learning
PDF
Accident prediction modelling for an urban road of bangalore
PDF
IRJET-Road Traffic Accident Analysis and Prediction Model: A Case Study of Va...
PDF
IRJET - Predicting Accident Severity using Machine Learning
PDF
IRJET - Road Accident and Emergency Management: A Data Analytics Approach
PDF
Analyzing Specialized Views of Transportation Under Mean Safety By Using Fuzz...
PPTX
India Vision Zero 2017: Speed - The Biggest Killer
PPTX
5th-Inter-Senior-High-School-Research-Competition-1.pptx
PDF
Dr.Makendran Chapter -II Accident Studies & Collision Diagram .pdf
PDF
Schwarz et al._2016_The Detection of Visual Distraction using Vehicle and Dri...
PDF
Sensor Based Detection & Classification of Actionable & Non-Actionable Condit...
PDF
Optimized feature selection approaches for accident classification to enhance...
PDF
Predictive Modeling for Topographical Analysis of Crime Rate
PDF
To Find out the Relationship between Errors, Lapses, Violations and Traffic A...
PDF
IRJET- Measuring The Driver's Perception Error in the Traffic Accident Risk E...
PPTX
GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...
PDF
Pedestrian Conflict Risk Model at Unsignalized Locations on a Community Street
Accident dtection using opencv and using AI
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
Analysis of Machine Learning Algorithm with Road Accidents Data Sets
A detection model of aggressive driving behavior based on hybrid deep learning
Accident prediction modelling for an urban road of bangalore
IRJET-Road Traffic Accident Analysis and Prediction Model: A Case Study of Va...
IRJET - Predicting Accident Severity using Machine Learning
IRJET - Road Accident and Emergency Management: A Data Analytics Approach
Analyzing Specialized Views of Transportation Under Mean Safety By Using Fuzz...
India Vision Zero 2017: Speed - The Biggest Killer
5th-Inter-Senior-High-School-Research-Competition-1.pptx
Dr.Makendran Chapter -II Accident Studies & Collision Diagram .pdf
Schwarz et al._2016_The Detection of Visual Distraction using Vehicle and Dri...
Sensor Based Detection & Classification of Actionable & Non-Actionable Condit...
Optimized feature selection approaches for accident classification to enhance...
Predictive Modeling for Topographical Analysis of Crime Rate
To Find out the Relationship between Errors, Lapses, Violations and Traffic A...
IRJET- Measuring The Driver's Perception Error in the Traffic Accident Risk E...
GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...
Pedestrian Conflict Risk Model at Unsignalized Locations on a Community Street
Ad

Recently uploaded (20)

PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
annual-report-2024-2025 original latest.
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Microsoft Core Cloud Services powerpoint
PPTX
modul_python (1).pptx for professional and student
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Leprosy and NLEP programme community medicine
PDF
Transcultural that can help you someday.
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPT
Predictive modeling basics in data cleaning process
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Business Analytics and business intelligence.pdf
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
CYBER SECURITY the Next Warefare Tactics
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
annual-report-2024-2025 original latest.
ISS -ESG Data flows What is ESG and HowHow
Microsoft Core Cloud Services powerpoint
modul_python (1).pptx for professional and student
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Leprosy and NLEP programme community medicine
Transcultural that can help you someday.
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Predictive modeling basics in data cleaning process
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
New ISO 27001_2022 standard and the changes
Pilar Kemerdekaan dan Identi Bangsa.pptx
Business Analytics and business intelligence.pdf
Ad

Predictive analysis of traffic violations

  • 1. Predictive Analysis of Traffic violations Group 1
  • 2. Introduction Road accidents - big worldwide threat: ● Up to 1.27 million deaths/year [2]. ● Up to 50 million injuries/year [2]. ● Over 2.5 millions/year involved in US [2]. ● Huge economic and social impact. Data mining is arduous for traffic violations: ● Huge data size and high dimensions [2-4]. ● Popularity of classification methods [3-5]. ● High dependence on collected data [3-5]. ● Testing multiple to choose the best [3-5].
  • 3. Flow of Project Project followed CRISP-DM model and provided instructions A. Selecting a dataset from any free source on web. B. Study the variables in the selected dataset; draw preliminary conclusions; and develop at least three initial research questions/hypotheses. C. Develop a Business Use Case. D. Use the visualization (min of 5) to explore dataset; present research hypotheses, based on the visualizations. E. Produce a dataset satisfying all of the criteria in part c. F. Present 3 modeling techniques for hypotheses. G. Develop models using 3 algorithm for each model. H. Provide recommendations based on modeling results.
  • 4. Dataset Chosen Traffic Violations of MCP: records from 2012 to 2018 from https://catalog.data.gov/ (Jan 17, 2018) Attributes: ● Date of Stop ● Time of Stop ● Belts ● Contribution Accident ● Description ● Phone Used ● Fatal ● Property damage ● Violation Type ● Hazmat
  • 5. Data Preprocessing Missing Data Handling ● Two attributes i.e. “Agency” and “Accident” are removed from the dataset for being single-valued attributes with value 'No' for every one of the records. ● Records with Null values are evacuated utilizing XL miners missing data treatment. New Attributes ● Three new binary-valued variables are presented to be specific, "Phone Usage", " Contributed to accident " and "Fatal" in light of the visualizations made for the dataset. Outliers detection ● Outliers are identified for the attribute "Year" and treated. Filtering Datasets ● For each search question separate datasets were created for modeling. ● R programming language is utilized to accomplish balanced dataset with an equivalent weight of target variables.
  • 7. Visualizations - Child violations with year & month
  • 8. Visualizations - Phone violations with year & month
  • 9. Visualizations - Violations with Personal injury & seat belt
  • 10. Search Hypothesis 1 Whether a violation occurred contributed to an accident or not Whether the violation that led to an accident was fatal or not 2 The geolocation, at which a violation is likely to occur based on several factors 3
  • 11. Modeling Model 1 Predictors: ● Belt ● Alcohol ● Vehicle Type ● Year ● Phone Usage Target variable: Contributed to accident Model 2 Predictors: ● Belt ● HAZMAT ● Alcohol ● Phone Usage Target variable: Fatal Model 3 Predictors: ● Belt, Personal Injury ● Property Damage ● Alcohol ● Violation Type Target variable: Cluster ID (geo-coordinates)
  • 12. Model I: contrib. to accident Algorithm Single Tree Random Trees Naïve Bayes Precision 0.626 0.503 0.621 Sensitivity 0.275 0.996 0.266 Specificity 0.836 0.023 0.838 F1-Score 0.383 0.669 0.373 Belts Phone usage Alcohol Type: Bus Best model: Random Trees due to highest sensitivity and F1-score Belts and phone usage are the top predictors for contribution to accident
  • 13. Model II: Fatality of accident Alcohol Phone usage HAZMAT Algorithm Single Tree Random Trees Naïve Bayes Precision 0.579 0.639 0.579 Sensitivity 0.79 0.383 0.543 Specificity 0.404 0.776 0.59 F1-Score 0.668 0.478 0.561 Best model: Single Tree due to highest sensitivity and F1-score Alcohol and phone usage are the top predictors for fatal accidents
  • 14. Model III: geolocation for “likely” violations ● K-means clustering and Single Tree classification are best algorithms due to unbiased results ● Cluster 5 has the highest number of alcohol violations and personal injuries ● Cluster 9 has the highest number of belt violations
  • 15. Recommendations > Recommend MCP to increase their attention and increase enforcement of rules for belts and phone usage as the main reason for accident contribution > Recommend MCP to pay an extra attention and take extra measures to drunk drivers and phone usage when driving > Alert Maryland police about the major areas violations are likely to be caused: >> high number of alcohol violations with personal injuries in cluster 5 >> multiple belt violations in cluster 9 > Perform similar recommendations to the insurance companies in the state of Maryland
  • 16. Conclusion ● 5 visualizations have been produced and 3 research hypotheses developed; ● Data preprocessed in several datasets; ● 3 modeling technique is performed for hypotheses; ● Several classification, clustering and regression models have been considered for modeling: Single Tree, Random Trees, K-Means clustering, Multiple regression, etc.; ● Random Trees and Single Tree are the best algorithms for models 1 and 2 due to an importance of high sensitivity and a high F1-score; ● K-means clustering and Single Tree classification have been considered as the best algorithms for model 3 providing numbers of different types of violations along with the number of injuries for various clusters; ● XLMiner, Tableau, and R are used for analysis.
  • 17. References ● Discovering Knowledge in Data: An Introduction to Data Mining, Daniel T. Larose and Chantal D. Larose, Wiley, 2nd edition: Wiley ISBN 978-0-470-90874-7 ● Abellan, J., Lopez, G., & De O~na, J. (2013). Analysis of traffic accident severity using Decision Rules via Decision Trees. Expert Systems with Applications, 40, 6047–6054. ● Chang, L.-Y., & Chien, J.-T. (2015). Analysis of driver injury severity in truck-involved accidents using a non- parametric classification tree model. Safety Science, 51(1), 17–22. ● Chen, W. H., & Jovanis, P. P. (2012). Method for identifying factors contributing to driver injury severity in traffic crashes. Transportation Research Record, 1717, 1–9. ● Kashani, A. T., Rabieyan, R., & Besharati, M. M. (2014). A data mining approach to investigate the factors influencing the crash severity of motorcycle pillion passengers. Journal of Safety Research, 51, 93–98. ● Kwon, O. H., Rhee, W., & Yoon, Y. (2015). Application of classification algorithms for analysis of road safety risk factor dependencies. Accident Analysis and Prevention, 75, 1–15. ● Xie, Y., Zhang, Y., & Liang, F. (2009). Crash injury severity analysis using Bayesian ordered probit models. Journal of Transportation Engineering ASCE, 135(1), 18–25. ● Mujalli, M. O., & de O~na, J. (2011). A method for simplifying the analysis of traffic accidents injury severity on two-lane highways using Bayesian networks. Journal of Safety Research, 42, 317–326. ● De O~na, J., Lopez, G., & Abellan, J. (2013). Extracting decision rules from police accident reports through decision trees. Accident Analysis & Prevention, 50, 1151–1160.

Editor's Notes

  • #5: Belts: If traffic violation involved a seat belt violation. Personal Injury: If traffic violation involved Personal Injury. Property Damage: If traffic violation involved Property Damage. Fatal: If traffic violation involved a fatality. HAZMAT: If the traffic violation involved hazardous materials. Violation Type: Violation type. (Examples: Warning, Citation, SERO) Geolocation: Geo-coded location information.