SlideShare a Scribd company logo
2
Most read
6
Most read
7
Most read
2015 Flight Delay Analysis
1. Description of data-
The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics releases the data
relating to flights and the delays. The data under analysis is data relating to flight delay, cancellations,
arrival times and departure times in the year 2015. This data is taken from Kaggle. The data is in the form
of .csv files which need to be imported in SQLServer for analysis.
The data is cleaned in R before inserting into MS SQL server.
Source of data- https://www.kaggle.com/usdot/flight-delays/data
Tables-
a) Airports
b) Airlines
c) Flights
Metadata-
a. Airport-
This table contains data related to airports such as IATA_CODE, AIRPORT, CITY, STATE,
LATTITUDE and LONGITUDE.
b. Airlines-
This table maps the Airline Operator names with their IATA_CODES.
c. Flights-
This table contains data related to flight arrival, departures across all the Airports and all Airlines. This is
the major table of interest which will be used for all the analysis. Flights table shares airline
‘IATA_CODE’ with Airlines table, and airport ‘IATA_CODE’ with Airports table.
2. Data Analysis Plan
3. Relational Schema
4. Data Preparation
The data is first cleaned using R. Relevant columns are added for departure data and time,
and irrelevant columns are removed. Instead of collecting all the information in one table, the
data is divided into three relevant tables that can be joined to get required information. All
the tables are connected with appropriate foreign keys which follow all major constraints in
the database.
5. Problems in the data-
Out of the three tables,
a. Airports-
This table does not have any missing values or bad column names.
b. Airlines-
This table does not have any bad column names. However, it does have some missing values
for the latitude and longitude of the airports.
select * from dbo.airports where LATITUDE is NULL or LONGITUDE IS
IATA_CODE AIRPORT CITY STATE COUNTRY LATITUDE LONGITUDE
ECP
Northwest Florida Beaches
International Airport Panama City FL USA NULL NULL
PBG
Plattsburgh International
Airport Plattsburgh NY USA NULL NULL
UST
Northeast Florida
Regional Airport (St.
Augustine Airport)
St.
Augustine FL USA NULL NULL
Solution-
The data relating to the latitude and the longitude of the above airports was taken from
Wikipedia and inserted into the database to complete the table.
update dbo.airports
set LATITUDE=30.357, Longitude = -85.7938
where IATA_CODE like 'ECP'
update dbo.airports
set LATITUDE=44.6519, Longitude = -73.4657
where IATA_CODE like 'PBG'
update dbo.airports
set LATITUDE=29.9547, Longitude = -81.343
where IATA_CODE like 'UST'
/*Checking the existence of NULL values */
select * from dbo.airports where LATITUDE is NULL or LONGITUDE IS NULL;
(0 rows affected)
c. Flights-
This table has some missing values. However, since the table is very large, we will keep the
missing values for now and decide whether to remove the missing values in the data analysis
stage.
6. On-time Performance, Market Share Analysis-
According to Bureau of Transport Statistics, a flight is considered to be delayed if it
takes off after 15 minutes of its scheduled departure. The listed flights are analysed for its
on-time performance and the market share is calculated based on the number of flights in
the given year.
Following four segments are defined and the airlines in each segment need different
strategies.
Exhibit 1
Segment Airlines Features Strategy
A
Delta Air Lines Inc. High Market Share
Take steps to increase market
share
Skywest Airlines Inc. Better on-time performance Streamline the operations for
better on-time performanceAmerican Airlines Inc.
B
Hawaiian Airlines Inc. Low Market Share
Take steps to increase market
share
Alaska Airlines Inc Better on-time performance
US Airways Inc.
Atlantic Southeast Airlines
Virgin America
American Eagle Airlines Inc.
C Southwest Airlines Co. High Market Share
Streamline the operations for
better on-time performance
D
JetBlue Airways Low Market Share
Streamline the operations for
better on-time performance
Frontier Airlines Inc. Poor On-Time Performance
United Air Lines Inc.
Spirit Air Lines
7. Ranking Airlines-
The airlines were ranked based on following criterion-
a. On-time departure are taken into consideration
b. Cancellations are penalized three times as compared to delays
c. Early departures are penalized two times as compared to delays
d. Average rank is calculated based on all the ranks
A flight is considered to be delayed if it leaves 15 minutes after the scheduled departure time.
Similarly, an early departure is considered when a flight departs 15 minutes before the scheduled
departure time.
Punctuality score is calculated as-
Punctuality Score
= (On − time Departure %) − 2 ∗ (Early Departure %) − 3
∗ (Cancellation%)
Punctuality Score for Airlines-
Rank Airline Airline Name Punctuality Score
1 HA Hawaiian Airlines Inc. 90.3647
2 DL Delta Air Lines Inc. 84.5139
3 VX Virgin America 78.8362
4 US US Airways Inc. 77.3082
5 AA American Airlines Inc. 76.9372
6 AS Alaska Airlines Inc. 75.4499
7 WN Southwest Airlines Co. 74.7893
8 OO Skywest Airlines Inc. 74.03
9 UA United Air Lines Inc. 72.0044
10 EV Atlantic Southeast Airlines 70.2602
11 B6 JetBlue Airways 70.1743
12 F9 Frontier Airlines Inc. 68.8725
13 NK Spirit Air Lines 63.6093
14 MQ American Eagle Airlines Inc. 59.4179
Exhibit 2-
Interactive Visualization at-
https://public.tableau.com/views/AirlinePunctuality/Dashboard1?:embed=y&:display_count=yes
&:showVizHome=no
Airline Name
50 55 60 65 70 75 80 85 90 95 100
Punct ualit y Score
Hawaiian Airlines Inc.
Delt a Air Lines Inc.
Virgin America
USAirways Inc.
American Airlines Inc.
Alaska Airlines Inc.
Sout hwest Airlines Co.
Skywest Airlines Inc.
Unit ed Air Lines Inc.
At lant ic Sout heast
Airlines
Jet Blue Airways
Front ier Airlines Inc.
Spirit Air Lines
American Eagle
Airlines Inc.
Punct ualit y Ranking
1 14
Punct ualit yRank
Sum of Punct ualit y Score for each Airline Name. Color shows sum of Punct ualit yRank.
8. Infographic Based on Analysis
Exhibit 3
9. Visualizations in Tableau-
Please visit following link for interactive visualization story.
https://public.tableau.com/shared/PPKDMCWYX?:display_count=yes
a. Analysis of flights between each pair of source and destination-
The user can select source and destination airports to see the status of all flights between source and
destination. Hovering the mouse on these paths and airports will reveal more information about the flights
such as percentage cancellations, average departure delay etc.
Exhibit 4
b. Airline Performance Matrix-
The performance is calculated based on the dataset created in R. Hovering on each of the circle reveals
more information about the airline performance. Darker red color represents more percentage
cancellation.
Please visit following link for interactive viz story.
https://public.tableau.com/shared/PPKDMCWYX?:display_count=yes
Exhibit 5.
c. Regression of Arrival and Departure Delays-
A regression line with 95% confidence interval is passed through the data.
Please visit following link for interactive viz story.
https://public.tableau.com/shared/PPKDMCWYX?:display_count=yes
Exhibit 6.

More Related Content

PDF
Scalar & Vector product of three & four vector .
PPTX
Federal aviation administration
PDF
New syllabus maths 7th ed 1 1
PDF
المراجعة النهائية فى الرياضيات لغات للصف الأول الابتدائى للترم الثانىMaths g1...
PPT
AIRLINE PRESENTATION.ppt
PPTX
What is Numerical And Categorical Data .pptx
PDF
1.1 Real Number Properties
PPTX
Airport runway By Nikhil Pakwanne
Scalar & Vector product of three & four vector .
Federal aviation administration
New syllabus maths 7th ed 1 1
المراجعة النهائية فى الرياضيات لغات للصف الأول الابتدائى للترم الثانىMaths g1...
AIRLINE PRESENTATION.ppt
What is Numerical And Categorical Data .pptx
1.1 Real Number Properties
Airport runway By Nikhil Pakwanne

What's hot (20)

POTX
Boeing Company Presentation
PDF
Transaction Account Builder Oracle Fusion Procurement
PPTX
Oracle Database Performance Tuning Basics
PDF
Software testing lecture notes
PDF
Functional Java 8 in everyday life
DOC
Accounting for receivables
DOCX
Oracle advanced supply chain planning implementation and user
PPTX
Southwest airlines
PDF
Report Registration Steps with effected tables in ORACLE Applications R12
PDF
Chapter 4 software project planning
DOC
TE40-Template
PDF
CS8592 Object Oriented Analysis & Design - UNIT V
PDF
32 payroll setup_part_32_(skylark_group_pvt_ltd)
PDF
AIRLINE PROJECT PROPOSAL
PDF
Relational Database Management System
PPTX
Capstone Project on Hotel Booking Cancellation
PDF
Spring Boot
PDF
OOW15 - Oracle E-Business Suite Integration Best Practices
PDF
Southwest Airlines Way
Boeing Company Presentation
Transaction Account Builder Oracle Fusion Procurement
Oracle Database Performance Tuning Basics
Software testing lecture notes
Functional Java 8 in everyday life
Accounting for receivables
Oracle advanced supply chain planning implementation and user
Southwest airlines
Report Registration Steps with effected tables in ORACLE Applications R12
Chapter 4 software project planning
TE40-Template
CS8592 Object Oriented Analysis & Design - UNIT V
32 payroll setup_part_32_(skylark_group_pvt_ltd)
AIRLINE PROJECT PROPOSAL
Relational Database Management System
Capstone Project on Hotel Booking Cancellation
Spring Boot
OOW15 - Oracle E-Business Suite Integration Best Practices
Southwest Airlines Way
Ad

Similar to 2015 Flight Delay/Cancellation Analysis (20)

PDF
Air Travel Analytics in SAS
PPTX
Prediction of Airlines Delay
PPTX
Data mining & predictive analytics for US Airlines' performance
PPTX
Airline delay prediction
PDF
Data Mining & Analytics for U.S. Airlines On-Time Performance
PDF
Flight delay detection data mining project
PDF
Winners and Losers
PDF
Airfare Analysis of Domestic Airlines in U.S.
PDF
Forecasting - MENA 2012 Conference
PDF
KNN and regression Tree
PDF
Brussels airport forecast
PDF
Analyzing 22 years of US flights with datadr and Trelliscope
PDF
Cama Aviation Articles
PPTX
Flight data analysis using apache pig--------------Final Year Project
PDF
Setting Targets
PPTX
Utsav Mahendra : Improving Service Quality and Productivity
PDF
Monthly Performance Report February 2012
PPTX
FAA Airline Project
PPTX
PRESENTATION ON CHALLENGE lab_084627 (1).pptx
PDF
Setting targets
Air Travel Analytics in SAS
Prediction of Airlines Delay
Data mining & predictive analytics for US Airlines' performance
Airline delay prediction
Data Mining & Analytics for U.S. Airlines On-Time Performance
Flight delay detection data mining project
Winners and Losers
Airfare Analysis of Domestic Airlines in U.S.
Forecasting - MENA 2012 Conference
KNN and regression Tree
Brussels airport forecast
Analyzing 22 years of US flights with datadr and Trelliscope
Cama Aviation Articles
Flight data analysis using apache pig--------------Final Year Project
Setting Targets
Utsav Mahendra : Improving Service Quality and Productivity
Monthly Performance Report February 2012
FAA Airline Project
PRESENTATION ON CHALLENGE lab_084627 (1).pptx
Setting targets
Ad

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Mega Projects Data Mega Projects Data
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Database Infoormation System (DBIS).pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
A Complete Guide to Streamlining Business Processes
PDF
annual-report-2024-2025 original latest.
PPT
Predictive modeling basics in data cleaning process
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
Modelling in Business Intelligence , information system
PDF
Introduction to the R Programming Language
PPT
DATA COLLECTION METHODS-ppt for nursing research
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Lecture1 pattern recognition............
[EN] Industrial Machine Downtime Prediction
IBA_Chapter_11_Slides_Final_Accessible.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Mega Projects Data Mega Projects Data
climate analysis of Dhaka ,Banglades.pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Database Infoormation System (DBIS).pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
A Complete Guide to Streamlining Business Processes
annual-report-2024-2025 original latest.
Predictive modeling basics in data cleaning process
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Modelling in Business Intelligence , information system
Introduction to the R Programming Language
DATA COLLECTION METHODS-ppt for nursing research
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg

2015 Flight Delay/Cancellation Analysis

  • 1. 2015 Flight Delay Analysis
  • 2. 1. Description of data- The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics releases the data relating to flights and the delays. The data under analysis is data relating to flight delay, cancellations, arrival times and departure times in the year 2015. This data is taken from Kaggle. The data is in the form of .csv files which need to be imported in SQLServer for analysis. The data is cleaned in R before inserting into MS SQL server. Source of data- https://www.kaggle.com/usdot/flight-delays/data Tables- a) Airports b) Airlines c) Flights Metadata- a. Airport- This table contains data related to airports such as IATA_CODE, AIRPORT, CITY, STATE, LATTITUDE and LONGITUDE. b. Airlines- This table maps the Airline Operator names with their IATA_CODES. c. Flights- This table contains data related to flight arrival, departures across all the Airports and all Airlines. This is the major table of interest which will be used for all the analysis. Flights table shares airline ‘IATA_CODE’ with Airlines table, and airport ‘IATA_CODE’ with Airports table. 2. Data Analysis Plan
  • 4. 4. Data Preparation The data is first cleaned using R. Relevant columns are added for departure data and time, and irrelevant columns are removed. Instead of collecting all the information in one table, the data is divided into three relevant tables that can be joined to get required information. All the tables are connected with appropriate foreign keys which follow all major constraints in the database. 5. Problems in the data- Out of the three tables, a. Airports- This table does not have any missing values or bad column names. b. Airlines- This table does not have any bad column names. However, it does have some missing values for the latitude and longitude of the airports. select * from dbo.airports where LATITUDE is NULL or LONGITUDE IS IATA_CODE AIRPORT CITY STATE COUNTRY LATITUDE LONGITUDE ECP Northwest Florida Beaches International Airport Panama City FL USA NULL NULL PBG Plattsburgh International Airport Plattsburgh NY USA NULL NULL UST Northeast Florida Regional Airport (St. Augustine Airport) St. Augustine FL USA NULL NULL Solution- The data relating to the latitude and the longitude of the above airports was taken from Wikipedia and inserted into the database to complete the table. update dbo.airports set LATITUDE=30.357, Longitude = -85.7938 where IATA_CODE like 'ECP' update dbo.airports set LATITUDE=44.6519, Longitude = -73.4657 where IATA_CODE like 'PBG' update dbo.airports set LATITUDE=29.9547, Longitude = -81.343 where IATA_CODE like 'UST' /*Checking the existence of NULL values */ select * from dbo.airports where LATITUDE is NULL or LONGITUDE IS NULL; (0 rows affected) c. Flights- This table has some missing values. However, since the table is very large, we will keep the missing values for now and decide whether to remove the missing values in the data analysis stage.
  • 5. 6. On-time Performance, Market Share Analysis- According to Bureau of Transport Statistics, a flight is considered to be delayed if it takes off after 15 minutes of its scheduled departure. The listed flights are analysed for its on-time performance and the market share is calculated based on the number of flights in the given year. Following four segments are defined and the airlines in each segment need different strategies. Exhibit 1 Segment Airlines Features Strategy A Delta Air Lines Inc. High Market Share Take steps to increase market share Skywest Airlines Inc. Better on-time performance Streamline the operations for better on-time performanceAmerican Airlines Inc. B Hawaiian Airlines Inc. Low Market Share Take steps to increase market share Alaska Airlines Inc Better on-time performance US Airways Inc. Atlantic Southeast Airlines Virgin America American Eagle Airlines Inc. C Southwest Airlines Co. High Market Share Streamline the operations for better on-time performance D JetBlue Airways Low Market Share Streamline the operations for better on-time performance Frontier Airlines Inc. Poor On-Time Performance United Air Lines Inc. Spirit Air Lines
  • 6. 7. Ranking Airlines- The airlines were ranked based on following criterion- a. On-time departure are taken into consideration b. Cancellations are penalized three times as compared to delays c. Early departures are penalized two times as compared to delays d. Average rank is calculated based on all the ranks A flight is considered to be delayed if it leaves 15 minutes after the scheduled departure time. Similarly, an early departure is considered when a flight departs 15 minutes before the scheduled departure time. Punctuality score is calculated as- Punctuality Score = (On − time Departure %) − 2 ∗ (Early Departure %) − 3 ∗ (Cancellation%) Punctuality Score for Airlines- Rank Airline Airline Name Punctuality Score 1 HA Hawaiian Airlines Inc. 90.3647 2 DL Delta Air Lines Inc. 84.5139 3 VX Virgin America 78.8362 4 US US Airways Inc. 77.3082 5 AA American Airlines Inc. 76.9372 6 AS Alaska Airlines Inc. 75.4499 7 WN Southwest Airlines Co. 74.7893 8 OO Skywest Airlines Inc. 74.03 9 UA United Air Lines Inc. 72.0044 10 EV Atlantic Southeast Airlines 70.2602 11 B6 JetBlue Airways 70.1743 12 F9 Frontier Airlines Inc. 68.8725 13 NK Spirit Air Lines 63.6093 14 MQ American Eagle Airlines Inc. 59.4179
  • 7. Exhibit 2- Interactive Visualization at- https://public.tableau.com/views/AirlinePunctuality/Dashboard1?:embed=y&:display_count=yes &:showVizHome=no Airline Name 50 55 60 65 70 75 80 85 90 95 100 Punct ualit y Score Hawaiian Airlines Inc. Delt a Air Lines Inc. Virgin America USAirways Inc. American Airlines Inc. Alaska Airlines Inc. Sout hwest Airlines Co. Skywest Airlines Inc. Unit ed Air Lines Inc. At lant ic Sout heast Airlines Jet Blue Airways Front ier Airlines Inc. Spirit Air Lines American Eagle Airlines Inc. Punct ualit y Ranking 1 14 Punct ualit yRank Sum of Punct ualit y Score for each Airline Name. Color shows sum of Punct ualit yRank.
  • 8. 8. Infographic Based on Analysis Exhibit 3
  • 9. 9. Visualizations in Tableau- Please visit following link for interactive visualization story. https://public.tableau.com/shared/PPKDMCWYX?:display_count=yes a. Analysis of flights between each pair of source and destination- The user can select source and destination airports to see the status of all flights between source and destination. Hovering the mouse on these paths and airports will reveal more information about the flights such as percentage cancellations, average departure delay etc. Exhibit 4
  • 10. b. Airline Performance Matrix- The performance is calculated based on the dataset created in R. Hovering on each of the circle reveals more information about the airline performance. Darker red color represents more percentage cancellation. Please visit following link for interactive viz story. https://public.tableau.com/shared/PPKDMCWYX?:display_count=yes Exhibit 5.
  • 11. c. Regression of Arrival and Departure Delays- A regression line with 95% confidence interval is passed through the data. Please visit following link for interactive viz story. https://public.tableau.com/shared/PPKDMCWYX?:display_count=yes Exhibit 6.