SlideShare a Scribd company logo
Jongwook Woo
HiPIC
CalStateLA
KSII The 14th Asia Pacific International Conference
on Information Science and Technology(APIC-IST),
Beijing
June 24 2019
Dalya (Dalyapraz) Dauletbak, dmanato@calstatela.edu
Jongwook Woo, PhD
Big Data AI Center (BigDAI)
California State University Los Angeles
Traffic Data Analysis and Prediction
using Big Data
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Introduction
 H/W Specification
 Architecture Chart
 Implementation steps
 Data structure
 Analysis
 Prediction
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Introduction
About me:
 Graduate Computer Information Systems Student at California State University, Los Angeles
– BS (2015): Mathematics at Nazarbayev University
– Previously: Senior Consultant/Data Analyst @ Management consulting at KPMG Central Asia
– Current: Community Manager @ International Data Engineering and Science Association (IDEAS)
Data source:
 A GPS navigation mobile application
 Provide real-time directions and up-to-date information
 Traffic
 Accidents
 Road closure
 Weather hazards
 Lurking police vehicles and etc.
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Introduction
Data source:
 Navigation app traffic data set from LA City Department*
 Information reported by users - Alerts
 information captured by user’s device - Jams
 We are going to find out:
 Areas with high volume of traffic (geography)
 Peak-hours
 Density of Alerts and Incidents
 Traffic volume by road types
 Prediction of traffic jam
*Limited authorization to access the full datasets 100 GB + original; we used
limited dataset to 9 days (Dec 31– Jan 8, 2018) ~2GB
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Introduction
 H/W Specification
 Architecture Chart
 Implementation steps
 Data structure
 Analysis
 Prediction
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
H/W Specification
Number of nodes 6
OCPUs 12
CPU speed 2195.196MHz
Memory 180 GB
Storage 682 GB
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Architecture Chart
Source: Hadoop Masterclass
Part 4 of 4: Analyzing Big Data
Lars George | EMEA Chief Architect
Cloudera
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Implementation steps
Local Computer
Raw data
files
(JSON)
Geo-Spatial
Visualization (3D
map)
Dashboard for
Analytics
Hadoop/Hive
Upload dataset to
HDFS
Parse JSON files
using Pandas
Create tables’
schema
Clean data
Create sample/summary
dataset for prediction and
visualization
Microsoft Azure
ML Studio
Upload sample
dataset
Apply data
transformation
Split dataset for
training and scoring
Train model(s)
Evaluate model(s)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data structure
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Introduction
 H/W Specification
 Architecture Chart
 Implementation steps
 Data structure
 Analysis
 Prediction
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Analysis
 Information we are using:
 Location/Time
 Level of traffic intensity
 X and Y coordinates (Longitude & Latitude)
 Counts of jams/alerts
 Tools we are using:
 Excel - 3D map
 Power BI - Flow map, pie charts, bar charts
 What we are predicting:
 Level of traffic (1 to 3 – light, medium, heavy)
 Based on date, time, location
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Traffic in LA (captured from users' devices)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Traffic in LA (reported by app users)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Video-Simulation of Traffic in LA (captured from users' devices)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Video-Simulation of Traffic in LA (reported by app users)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Traffic Analysis Dashboard
Peak
Peak
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Traffic Analysis Dashboard
Major areas of traffic are:
Downtown Los Angeles,
Santa Monica, Hollywood,
and highways.
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Introduction
 H/W Specification
 Architecture Chart
 Implementation steps
 Data structure
 Analysis
 Prediction
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Prediction of traffic congestion with Machine Learning
Data
preparation
Group label values
Join additional
dataset
Apply data
transformation
Normalize data
Model building
Model(s) selection
Cross Validation
Train model(s)
Model
evaluation
Score model
Evaluate model
(Accuracy, Recall)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Features/columns in a dataset
location x,
location y
X and Y -coordinate of location
date_pst Pacific Time of the publication of traffic report
level jam level, where 1 – almost no jam and 5 –
standstill jam
speed driver’s captured speed in mph
length length of the traffic ahead in the route of user
in meters
*date_pst *date splits into month, day, hour, min, sec,
weekday
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data transformation
 Randomly selected data ~ 100MB
 Select relevant features
 Group level into 2 classes (label: 0 & 1)
 Join holidays dataset
 Add attribute is_holiday (0 or 1)
 Change cyclical attributes from Polar
coordinates to Cartesian
 Add is_rush, is_weekend (0 or 1)
 Normalize features
 Make categorical: is_rush, is_holiday,
is_weekend, label
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
SELECT location_x, location_y,
SIN((weekday)*(2*PI()/7)) as sin_weekday,
COS((weekday)*(2*PI()/7)) as cos_weekday,
SIN((month-1)*(2*PI()/12)) as sin_month,
COS((month-1)*(2*PI()/12)) as cos_month,
SIN((day-1)*(2*PI()/31)) as sin_day, COS((day-
1)*(2*PI()/31)) as cos_day,
SIN(hour*(2*PI()/24)) as sin_hour,
COS(hour*(2*PI()/24)) as cos_hour,
SIN(min*(2*PI()/60)) as sin_min,
COS(min*(2*PI()/60)) as cos_min ,
SIN(sec*(2*PI()/60)) as sin_sec,
COS(sec*(2*PI()/60)) as cos_sec,
…
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
MODEL Evaluation
Model Accuracy Precision Recall AUC ROC
LR 0.662 0.662 1.0 0.571
BDT 0.805 0.832 0.884 0.868
DF 0.832 0.868 0.880 0.885
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary of Traffic Prediction with
Machine Learning
 Model is based on sampled
dataset ~ 1M rows (100 MB)
 Best model - Decision Forest
 Accuracy – 0.832
 Precision - 0.868
 Recall - 0.880
 Area under the Curve – 0.885
Confusion Matrix
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Introduction
 H/W Specification
 Architecture Chart
 Implementation steps
 Data structure
 Analysis
 Prediction
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary
Denser traffic on Freeways 101, 405, 10
Rush hours from 7 am to 9 am produce a lot of traffic, the
heaviest traffic time start from 3pm and gets better after 6pm.
Major areas of traffic in DTLA, Santa Monica, Hollywood
More insights can be found with bigger dataset using this
framework for analysis of traffic
Using such data and platform can also give an opportunity to
predict traffic congestions. Prediction can be performed using
machine learning algorithm – Decision Forest with the
accuracy of 83% for predicting the heaviest traffic jam.
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Questions?
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
1. J. Barbaresso, G. Cordahi, D. Garcia et al., “USDOT’s Intelligent Transportation Systems (ITS) ITS Strategic Plan
2015- 2019,” 2014.
2. “Integrated Corridor Management,” Intelligent Transportation Systems - Integrated Corridor Management,
www.its.dot.gov/research_archives/icms/. Accessed April 14, 2019.
3. J. Kestelyn, “Real-Time Data Visualization and Machine Learning for London Traffic Analysis,” Google Cloud,
2016, cloud.google.com/blog/products/gcp/real-time-data-visualization-and-machine-learning-for-london-
traffic-analysis. Accessed April 14, 2019.
4. “Connected Citizens by Waze,” Waze, www.waze.com/ccp. Accessed April 14, 2019.
5. M. Schnuerle, “Louisville and Waze: Applying Mobility Data in Cities,” Harvard Civic Analytics Network
Summit on Data-Smart Government, 2017.
6. Louisville Metro. “Thunder Jams, 2017 Traffic Delays.” CARTO, louisvillemetro-
ms.carto.com/builder/d98732d0-1f6a-4db2-9f8a-e58026bf0d39/embed. Accessed April 14, 2019.
7. Louisville Metro. “Pothole Animation.” CARTO, cdolabs-admin.carto.com/builder/a80f62bf-98e1-4591-8354-
acfa8e51a8de/embed. Accessed April 14, 2019.
8. E. Necula, “Analyzing Traffic Patterns on Street Segments Based on GPS Data Using R,” Transportation
Research Procedia, Vol. 10, pp. 276–285, 2015.
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
9. J. Woo and Y. Xu, “Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing,” in Proc. of
International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las
Vegas. 2011.
10. “Pandas.io.json.json_normalize.” Pandas.io.json.json_normalize - Pandas 0.24.2 Documentation,
pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.json_normalize.html. Accessed April
14, 2019.
11. United States, Chief Executive Office County of Los Angeles. “Cities within the County of Los Angeles.”
lacounty.gov. Accessed April 14, 2019.
12. Garyericson. “What Is - Azure Machine Learning Studio.” Microsoft Docs, docs.microsoft.com/en-
us/azure/machine-learning/studio/what-is-ml-studio. Accessed April 14, 2019.
13. A. Tharwat, “Classification Assessment Methods.” Applied Computing and Informatics, 2018.
14. M. Sokolova and L. Guy, “A Systematic Analysis of Performance Measures for Classification
Tasks,” Information Processing & Management, Vol. 45. No. 4, pp. 427–437, 2009.

More Related Content

PPTX
Applications of Artificial Intelligence in Transportation Systems
PDF
Traffic Prediction for Intelligent Transportation System using Machine Learning
PDF
PPTX
Artificial Intelligence (AI) in Transportation.pptx
PPTX
AI in Traffic Prediction.pptx
PPTX
Presentation on intelligent traffic prediction system
PPTX
Intelligent transportation system
PPT
Presentation on INTELLIGENT TRANSPORT SYSTEM by jaswinder singh
Applications of Artificial Intelligence in Transportation Systems
Traffic Prediction for Intelligent Transportation System using Machine Learning
Artificial Intelligence (AI) in Transportation.pptx
AI in Traffic Prediction.pptx
Presentation on intelligent traffic prediction system
Intelligent transportation system
Presentation on INTELLIGENT TRANSPORT SYSTEM by jaswinder singh

What's hot (20)

PDF
Intelligent Transportation Systems - ITS
PPTX
College Bus Tracking Application
PPTX
Accident study
PPTX
Intelligent Traffic monitoring System
PDF
10-Intersection Control ( Transportation and Traffic Engineering Dr. Sheriff ...
PPT
Advance Public Transportation System
PPTX
Traffic management system
PPTX
Crash Investigation and Black Spot Assessment
PPTX
Smart Traffic Management System presentation
PDF
Smart traffic management system
PPTX
Network analysis in gis
PPTX
Intelligent Transportation System
PDF
JUSTCABS - an Online Cab Reservation System (Final Year Project)
PPTX
Bus tracking application in Android
PPTX
intelligent transportation system
PPT
Trip Generation & Mode Choice (Transportation Engineering)
PDF
Bus tracking application project report
PDF
Intelligent Transportation System
PPTX
Design principles of traffic signal
PPTX
Parking management system ppt
Intelligent Transportation Systems - ITS
College Bus Tracking Application
Accident study
Intelligent Traffic monitoring System
10-Intersection Control ( Transportation and Traffic Engineering Dr. Sheriff ...
Advance Public Transportation System
Traffic management system
Crash Investigation and Black Spot Assessment
Smart Traffic Management System presentation
Smart traffic management system
Network analysis in gis
Intelligent Transportation System
JUSTCABS - an Online Cab Reservation System (Final Year Project)
Bus tracking application in Android
intelligent transportation system
Trip Generation & Mode Choice (Transportation Engineering)
Bus tracking application project report
Intelligent Transportation System
Design principles of traffic signal
Parking management system ppt
Ad

Similar to Traffic Data Analysis and Prediction using Big Data (20)

PDF
Intelligent Transportation Analytics With Google Cloud.pdf
PPTX
Travel-Time-and-Delay-Studies-Using-Artificial-Intelligence.pptx
PDF
IRJET - A Framework for Tourist Identification and Analytics using Transport ...
PPTX
Ai and traffic management application v1.0
PDF
A017160104
PDF
Application of Big Data in Intelligent Traffic System
PDF
Enhancing Traffic Prediction with Historical Data and Estimated Time of Arrival
PPTX
Jorge Sebastiao "Using AI for Smart traffic Management"
PPTX
How AI is Disrupting Traffic Management in Smart City
PPTX
Introduction to Big Data and its Trends
PPTX
History and Trend of Big Data and Deep Learning
PDF
IRJET - Driving Safety Risk Analysis using Naturalistic Driving Data
PDF
Classification Approach for Big Data Driven Traffic Flow Prediction using Ap...
PPTX
Scalable Predictive Analysis and The Trend with Big Data & AI
PDF
NJFuture Redevelopment Forum 2015 Bottigheimer
PDF
Big Data and Predictive Analysis
ODP
Analysing road traffic
PDF
SC4 Workshop 1: Evangelos Mitsakis: Big data Sources for/from Intelligent Roa...
PDF
Locations big data its
PPTX
Smart Mobility
Intelligent Transportation Analytics With Google Cloud.pdf
Travel-Time-and-Delay-Studies-Using-Artificial-Intelligence.pptx
IRJET - A Framework for Tourist Identification and Analytics using Transport ...
Ai and traffic management application v1.0
A017160104
Application of Big Data in Intelligent Traffic System
Enhancing Traffic Prediction with Historical Data and Estimated Time of Arrival
Jorge Sebastiao "Using AI for Smart traffic Management"
How AI is Disrupting Traffic Management in Smart City
Introduction to Big Data and its Trends
History and Trend of Big Data and Deep Learning
IRJET - Driving Safety Risk Analysis using Naturalistic Driving Data
Classification Approach for Big Data Driven Traffic Flow Prediction using Ap...
Scalable Predictive Analysis and The Trend with Big Data & AI
NJFuture Redevelopment Forum 2015 Bottigheimer
Big Data and Predictive Analysis
Analysing road traffic
SC4 Workshop 1: Evangelos Mitsakis: Big data Sources for/from Intelligent Roa...
Locations big data its
Smart Mobility
Ad

More from Jongwook Woo (20)

PPTX
History and Application of LLM Leveraging Big Data
PDF
How To Use Artificial Intelligence (AI) in History
PPTX
Machine Learning in Quantum Computing
PPTX
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
PPTX
Introduction to Big Data and AI for Business Analytics and Prediction
PPTX
Rating Prediction using Deep Learning and Spark
PPTX
The Importance of Open Innovation in AI era
PPTX
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
PPTX
Introduction to Big Data: Smart Factory
PPTX
AI on Big Data
PDF
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
PDF
President Election of Korea in 2017
PPTX
Big Data Trend with Open Platform
PPTX
Big Data Trend and Open Data
PPTX
Big Data Platform adopting Spark and Use Cases with Open Data
PPTX
Big Data Analysis in Hydrogen Station using Spark and Azure ML
PPTX
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
PPTX
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
PPTX
Introduction to Spark: Data Analysis and Use Cases in Big Data
PPTX
Big Data Analysis and Industrial Approach using Spark
History and Application of LLM Leveraging Big Data
How To Use Artificial Intelligence (AI) in History
Machine Learning in Quantum Computing
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Introduction to Big Data and AI for Business Analytics and Prediction
Rating Prediction using Deep Learning and Spark
The Importance of Open Innovation in AI era
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Introduction to Big Data: Smart Factory
AI on Big Data
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
President Election of Korea in 2017
Big Data Trend with Open Platform
Big Data Trend and Open Data
Big Data Platform adopting Spark and Use Cases with Open Data
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Introduction to Spark: Data Analysis and Use Cases in Big Data
Big Data Analysis and Industrial Approach using Spark

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Foundation of Data Science unit number two notes
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Computer network topology notes for revision
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Database Infoormation System (DBIS).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to machine learning and Linear Models
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Fluorescence-microscope_Botany_detailed content
IB Computer Science - Internal Assessment.pptx
1_Introduction to advance data techniques.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
.pdf is not working space design for the following data for the following dat...
Foundation of Data Science unit number two notes
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Computer network topology notes for revision
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Supervised vs unsupervised machine learning algorithms
Database Infoormation System (DBIS).pptx
Clinical guidelines as a resource for EBP(1).pdf
Introduction to machine learning and Linear Models
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Qualitative Qantitative and Mixed Methods.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Fluorescence-microscope_Botany_detailed content

Traffic Data Analysis and Prediction using Big Data

  • 1. Jongwook Woo HiPIC CalStateLA KSII The 14th Asia Pacific International Conference on Information Science and Technology(APIC-IST), Beijing June 24 2019 Dalya (Dalyapraz) Dauletbak, dmanato@calstatela.edu Jongwook Woo, PhD Big Data AI Center (BigDAI) California State University Los Angeles Traffic Data Analysis and Prediction using Big Data
  • 2. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Introduction  H/W Specification  Architecture Chart  Implementation steps  Data structure  Analysis  Prediction  Summary
  • 3. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Introduction About me:  Graduate Computer Information Systems Student at California State University, Los Angeles – BS (2015): Mathematics at Nazarbayev University – Previously: Senior Consultant/Data Analyst @ Management consulting at KPMG Central Asia – Current: Community Manager @ International Data Engineering and Science Association (IDEAS) Data source:  A GPS navigation mobile application  Provide real-time directions and up-to-date information  Traffic  Accidents  Road closure  Weather hazards  Lurking police vehicles and etc.
  • 4. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Introduction Data source:  Navigation app traffic data set from LA City Department*  Information reported by users - Alerts  information captured by user’s device - Jams  We are going to find out:  Areas with high volume of traffic (geography)  Peak-hours  Density of Alerts and Incidents  Traffic volume by road types  Prediction of traffic jam *Limited authorization to access the full datasets 100 GB + original; we used limited dataset to 9 days (Dec 31– Jan 8, 2018) ~2GB
  • 5. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Introduction  H/W Specification  Architecture Chart  Implementation steps  Data structure  Analysis  Prediction  Summary
  • 6. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA H/W Specification Number of nodes 6 OCPUs 12 CPU speed 2195.196MHz Memory 180 GB Storage 682 GB
  • 7. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Architecture Chart Source: Hadoop Masterclass Part 4 of 4: Analyzing Big Data Lars George | EMEA Chief Architect Cloudera
  • 8. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Implementation steps Local Computer Raw data files (JSON) Geo-Spatial Visualization (3D map) Dashboard for Analytics Hadoop/Hive Upload dataset to HDFS Parse JSON files using Pandas Create tables’ schema Clean data Create sample/summary dataset for prediction and visualization Microsoft Azure ML Studio Upload sample dataset Apply data transformation Split dataset for training and scoring Train model(s) Evaluate model(s)
  • 9. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data structure
  • 10. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Introduction  H/W Specification  Architecture Chart  Implementation steps  Data structure  Analysis  Prediction  Summary
  • 11. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Analysis  Information we are using:  Location/Time  Level of traffic intensity  X and Y coordinates (Longitude & Latitude)  Counts of jams/alerts  Tools we are using:  Excel - 3D map  Power BI - Flow map, pie charts, bar charts  What we are predicting:  Level of traffic (1 to 3 – light, medium, heavy)  Based on date, time, location
  • 12. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Traffic in LA (captured from users' devices)
  • 13. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Traffic in LA (reported by app users)
  • 14. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Video-Simulation of Traffic in LA (captured from users' devices)
  • 15. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Video-Simulation of Traffic in LA (reported by app users)
  • 16. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Traffic Analysis Dashboard Peak Peak
  • 17. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Traffic Analysis Dashboard Major areas of traffic are: Downtown Los Angeles, Santa Monica, Hollywood, and highways.
  • 18. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Introduction  H/W Specification  Architecture Chart  Implementation steps  Data structure  Analysis  Prediction  Summary
  • 19. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Prediction of traffic congestion with Machine Learning Data preparation Group label values Join additional dataset Apply data transformation Normalize data Model building Model(s) selection Cross Validation Train model(s) Model evaluation Score model Evaluate model (Accuracy, Recall)
  • 20. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Features/columns in a dataset location x, location y X and Y -coordinate of location date_pst Pacific Time of the publication of traffic report level jam level, where 1 – almost no jam and 5 – standstill jam speed driver’s captured speed in mph length length of the traffic ahead in the route of user in meters *date_pst *date splits into month, day, hour, min, sec, weekday
  • 21. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data transformation  Randomly selected data ~ 100MB  Select relevant features  Group level into 2 classes (label: 0 & 1)  Join holidays dataset  Add attribute is_holiday (0 or 1)  Change cyclical attributes from Polar coordinates to Cartesian  Add is_rush, is_weekend (0 or 1)  Normalize features  Make categorical: is_rush, is_holiday, is_weekend, label
  • 22. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA SELECT location_x, location_y, SIN((weekday)*(2*PI()/7)) as sin_weekday, COS((weekday)*(2*PI()/7)) as cos_weekday, SIN((month-1)*(2*PI()/12)) as sin_month, COS((month-1)*(2*PI()/12)) as cos_month, SIN((day-1)*(2*PI()/31)) as sin_day, COS((day- 1)*(2*PI()/31)) as cos_day, SIN(hour*(2*PI()/24)) as sin_hour, COS(hour*(2*PI()/24)) as cos_hour, SIN(min*(2*PI()/60)) as sin_min, COS(min*(2*PI()/60)) as cos_min , SIN(sec*(2*PI()/60)) as sin_sec, COS(sec*(2*PI()/60)) as cos_sec, …
  • 23. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA MODEL Evaluation Model Accuracy Precision Recall AUC ROC LR 0.662 0.662 1.0 0.571 BDT 0.805 0.832 0.884 0.868 DF 0.832 0.868 0.880 0.885
  • 24. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary of Traffic Prediction with Machine Learning  Model is based on sampled dataset ~ 1M rows (100 MB)  Best model - Decision Forest  Accuracy – 0.832  Precision - 0.868  Recall - 0.880  Area under the Curve – 0.885 Confusion Matrix
  • 25. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Introduction  H/W Specification  Architecture Chart  Implementation steps  Data structure  Analysis  Prediction  Summary
  • 26. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary Denser traffic on Freeways 101, 405, 10 Rush hours from 7 am to 9 am produce a lot of traffic, the heaviest traffic time start from 3pm and gets better after 6pm. Major areas of traffic in DTLA, Santa Monica, Hollywood More insights can be found with bigger dataset using this framework for analysis of traffic Using such data and platform can also give an opportunity to predict traffic congestions. Prediction can be performed using machine learning algorithm – Decision Forest with the accuracy of 83% for predicting the heaviest traffic jam.
  • 27. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Questions?
  • 28. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA References 1. J. Barbaresso, G. Cordahi, D. Garcia et al., “USDOT’s Intelligent Transportation Systems (ITS) ITS Strategic Plan 2015- 2019,” 2014. 2. “Integrated Corridor Management,” Intelligent Transportation Systems - Integrated Corridor Management, www.its.dot.gov/research_archives/icms/. Accessed April 14, 2019. 3. J. Kestelyn, “Real-Time Data Visualization and Machine Learning for London Traffic Analysis,” Google Cloud, 2016, cloud.google.com/blog/products/gcp/real-time-data-visualization-and-machine-learning-for-london- traffic-analysis. Accessed April 14, 2019. 4. “Connected Citizens by Waze,” Waze, www.waze.com/ccp. Accessed April 14, 2019. 5. M. Schnuerle, “Louisville and Waze: Applying Mobility Data in Cities,” Harvard Civic Analytics Network Summit on Data-Smart Government, 2017. 6. Louisville Metro. “Thunder Jams, 2017 Traffic Delays.” CARTO, louisvillemetro- ms.carto.com/builder/d98732d0-1f6a-4db2-9f8a-e58026bf0d39/embed. Accessed April 14, 2019. 7. Louisville Metro. “Pothole Animation.” CARTO, cdolabs-admin.carto.com/builder/a80f62bf-98e1-4591-8354- acfa8e51a8de/embed. Accessed April 14, 2019. 8. E. Necula, “Analyzing Traffic Patterns on Street Segments Based on GPS Data Using R,” Transportation Research Procedia, Vol. 10, pp. 276–285, 2015.
  • 29. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA References 9. J. Woo and Y. Xu, “Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing,” in Proc. of International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas. 2011. 10. “Pandas.io.json.json_normalize.” Pandas.io.json.json_normalize - Pandas 0.24.2 Documentation, pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.json_normalize.html. Accessed April 14, 2019. 11. United States, Chief Executive Office County of Los Angeles. “Cities within the County of Los Angeles.” lacounty.gov. Accessed April 14, 2019. 12. Garyericson. “What Is - Azure Machine Learning Studio.” Microsoft Docs, docs.microsoft.com/en- us/azure/machine-learning/studio/what-is-ml-studio. Accessed April 14, 2019. 13. A. Tharwat, “Classification Assessment Methods.” Applied Computing and Informatics, 2018. 14. M. Sokolova and L. Guy, “A Systematic Analysis of Performance Measures for Classification Tasks,” Information Processing & Management, Vol. 45. No. 4, pp. 427–437, 2009.