SlideShare a Scribd company logo
BANA 6043: PROJECT WORK
OCTOBER 2, 2017
JATIN SAINI
M12382157
Page | 1
SUMMARY:
Goal of thisprojectis tostudy whatfactors andhow theywouldimpactthe landingdistance of a
commercial flight.Forthis Ireceiveddatasetcontainingdetails of BOEINGandAIRBUSflights.My
firststepisto performprocedure fordata preparation toremove emptyrows,identifyduplicate
rows,identifysample size foreachvariable and take outoutliers fromthe dataset. All the mentioned
stepshelpedme understanddistributionof variables,identifynormal valuesandoutliersandform
newdatasetwithnormal values.
To furtherunderstandthe correlationof variableswiththe landingdistance,Iconducteddescriptive
studyof normal valueswith predictorvariable (whichisdistance) and all the response variables.This
studygave me some perspective aboutrelationshipof speed_airandspeed_groundwiththe landing
distance.IusedPearsoncorrelationtechniquetofurtherunderstandcollinearitybetweenall the
variables,anditturnedoutthat speed_air islinearlycorrelatedwith speed_groundand there is no
signof othervariables showingany distributionpatternwitheachother.
Finally,Ifittedalinearregressionmodelwithintercept onthe normal dataafterremoving
speed_air. Followingisthe equation:
DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) +
42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH)
We can see fromthe equationthatdurationof flightandnumberof passengerstravellinghave
negative impactonlandingdistance butspeedof ground,heightof aircraftandpitchhave positive
impact.
Page | 2
VARIABLEDICTIONARY:
Aircraft: Themake of an aircraft(Boeing or Airbus).
Duration (in minutes): Flight duration between taking off and landing. The
duration of a normal flight should always be greater than 40min.
No_pasg: The number of passengers in a flight.
Speed_ground (in miles per hour): The ground speed of an aircraftwhen
passing over the threshold of the runway. If its valueis less than 30MPH or
greater than 140MPH, then the landing would be considered as abnormal.
Speed_air (in miles per hour): The air speed of an aircraftwhen passing over
the threshold of the runway. If its value is less than 30MPH or greater than
140MPH, then the landing would be considered as abnormal.
Height (in meters): The height of an aircraft when it is passing over the
threshold of the runway. Thelanding aircraftis required to be at least 6 meters
high at the threshold of the runway.
Pitch (in degrees): Pitch angle of an aircraft when it is passing over the
threshold of the runway.
Distance (in feet): The landing distance of an aircraft. More specifically, it
refers to the distance between the threshold of the runway and the point
wherethe aircraftcan be fully stopped. The length of the airportrunway is
typically less than 6000 feet.
Page | 3
CHAPTER 1: DATA PREPARATION
DESCRIPTION: Datapreparation is a very important stepinunderstanding the
sample size. My aim here is toretainas much data as possible toobtaina
best fittedmodel.
STEP 1: Uploading Data Files:
FILENAME flight1 '/home/sainijn0/Stat_computing/FAA1.xls';
PROC IMPORT DATAFILE=FLIGHT1
DBMS=XLS
OUT=FLIGHT1;
GETNAMES=YES;
PROC PRINT DATA=FLIGHT1(obs=10);
FILENAME flight2 '/home/sainijn0/Stat_computing/FAA2.xls';
PROC IMPORT DATAFILE=FLIGHT2
DBMS=XLS
OUT=FLIGHT2;
GETNAMES=YES;
PROC PRINT DATA=FLIGHT2(obs=10);
Page | 4
STEP 2: Removing Empty Rows From DataSets:
DATA FLIGHT1;
SET FLIGHT1;
IF MISSING(NO_PASG) AND MISSING(DURATION) AND MISSING(AIRCRAFT) AND
MISSING(SPEED_GROUND)
AND MISSING(SPEED_AIR) AND MISSING(HEIGHT) AND MISSING(PITCH) AND
MISSING(DISTANCE)
THEN DELETE;
RUN;
PROC PRINT DATA=flight1(obs=10);
RUN;
Page | 5
DATA FLIGHT2;
SET FLIGHT2;
IF MISSING(NO_PASG) AND MISSING(DURATION) AND MISSING(AIRCRAFT) AND
MISSING(SPEED_GROUND)
AND MISSING(SPEED_AIR) AND MISSING(HEIGHT) AND MISSING(PITCH) AND
MISSING(DISTANCE)
THEN DELETE;
RUN;
PROC PRINT DATA=flight2(obs=10);
RUN;
STEP 3: Combining Data Sets:
DATA COMBINED_FLIGHT;
SET flight1 FLIGHT2;
PROC PRINT DATA=COMBINED_FLIGHT(OBS=10);
Page | 6
STEP 4: Removing Duplicates:
PROC SORT DATA=COMBINED_FLIGHT NODUPKEY;
BY SPEED_AIR SPEED_GROUND HEIGHT PITCH DISTANCE;
PROC PRINT DATA=combined_flight(OBS=10);
Page | 7
STEP 5: Finding Missing Values For EachVariable:
PROC MEANS DATA=COMBINED_FLIGHT NMISS;
TITLE MISSING VALUES;
STEP 6: Observing Variable Distributions:
PROC CHART DATA=COMBINED_FLIGHT;
VBAR SPEED_AIR SPEED_GROUND HEIGHT PITCH DISTANCE;
Page | 8
Page | 9
Page | 10
STEP 7: Identifying Abnormal Rows:
DATA VALIDATE1;
SET COMBINED_FLIGHT;
IF DURATION<40 THEN FLAG=1;
ELSE IF HEIGHT<6 THEN FLAG=1;
ELSE IF SPEED_GROUND<30 OR SPEED_GROUND>140 THEN FLAG=1;
ELSE IF DISTANCE>6000 THEN FLAG=1;
ELSE FLAG=0;
PROC PRINT DATA=VALIDATE1(OBS=10);
STEP 8: Summary Of Abnormal And Normal Data:
DATA FLAGGED_FLIGHTS;
SET VALIDATE1;
IF FLAG=1;
PROC MEANS DATA=FLAGGED_FLIGHTS;
TITLE ABNORMAL DATA;
Page | 11
DATA NORMAL_FLIGHTS;
SET VALIDATE1;
IF FLAG=0;
PROC MEANS DATA=NORMAL_FLIGHTS;
TITLE NORMAL DATA;
Observation:We observedthat after removingduplicate valuesand empty rows, and further
removingoutlierswe get195 rows for speed_airand 781 data rows for each of the following
variables:
 Duration
 No_pasg
 Speed_ground
 Height
 Pitch
 Distance
Page | 12
CHAPTER 2: DESCRIPTIVE STUDY
WORKING WITH NORMAL DATA
DESCRIPTION: Purpose of this chapter is to use scatterplotsandPearson
correlationtounderstandany correlationbetweenvariables.
STEP 1: Creating Scatterplots BetweenResponse Variable AndPredictor
Variables.
CODE:
PROC PLOT DATA=NORMAL_FLIGHTS;
PLOT DISTANCE*DURATION;
PLOT DISTANCE*NO_PASG;
PLOT DISTANCE*SPEED_GROUND;
PLOT DISTANCE*SPEED_AIR;
PLOT DISTANCE*HEIGHT;
PLOT DISTANCE*PITCH;
RUN;
Page | 13
Page | 14
Page | 15
Page | 16
STEP 2: PearsonCorrelationBetweenVariables:
PROC CORR DATA=NORMAL_FLIGHTS;
VAR DISTANCE DURATION NO_PASG SPEED_GROUND SPEED_AIR HEIGHT
PITCH;
OBSERVATION:
This turns out to be very useful stepas it helpsus in identifyingcorrelationbetweeneachof the
variables.
Now, it seemsfromthe output that correlationbetween“speed_ground” and“speed_air” is0.988
(nearlyperfectcorrelation) which tellsus that they show high collinearity.Therefore,we can drop
one of these variables inregressionstep.
Page | 17
Since “speed_air” hasonly 195 data rows filled ascompared to “speed_ground” whichhas 781
data rows. Hence we drop “speed_air” variable fromour regressionmodel.
CHAPTER 3: STATISTICAL STUDY
DESCRIPTION: Statistical study is avery important stepas it helps us in fitting
a regressionmodel topresent datawithinterceptandif eachvariable is
positively or negatively correlatedwith landing distance
STEP 1: Fitting Linear RegressionModel OnOur Normal Dataset:
PROC REG DATA=NORMAL_FLIGHTS;
MODEL DISTANCE = DURATION NO_PASG SPEED_GROUND HEIGHT PITCH;
RUN;
Page | 18
Page | 19
OBSERVATION: Equation From RegressionOf Normal Data ComesOut To Be:
DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) +
42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH)
We can see from the equationthat duration of flightand number of passengerstravellinghave a
negative impact on landingdistance but speedof ground,heightof aircraft and pitch have a
positive impact.
Page | 20
Q&A
1. How manyobservations(flights) doyouuse tofityour final model?If notall 950 flights,why?
Solution:I used781 observationstofitthe model outof 950. 69 rowswere removedfromdataset
because theycontainedoutliersand100 rows were duplicates(basedonkeys:speed_ground,
speed_air,height,pitchanddistance).
2. What factors andhow theyimpactthe landingdistance of aflight?
Solution:RegressionEquation
DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) +
42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH)
We can see fromthe regression equationthatdurationof flightandnumberof passengerstravelling
have a negative impactonlandingdistance butspeedof ground,heightof aircraftandpitch have a
positive impact.
3. Is there anydifferencebetweenthe twomakesBoeingandAirbus?
Solution:
RegressionEquationforBoeingaircrafts
DISTANCE = -1796.98903 + 0.45775*(DURATION) – 1.9524*(NO_PASG) +
42.28107*(SPEED_GROUND) + 14.23727*(HEIGHT) – 39.31818*(PITCH)
RegressionEquationforAirbusaircrafts
DISTANCE = -2788.83858 -0.30974*(DURATION) – 0.33445*(NO_PASG) +
42.90888*(SPEED_GROUND) + 13.98867*(HEIGHT) + 80.78737*(PITCH)
We can see a wide shiftininterceptandpitchcoefficientsforthe 2 makes,butwhatreallygetsmy
attentionisthe change insignsfor coefficientsof pitchandduration.Thistellsusthat pitchand
duration have opposite impactonthe landingdistance whenwe compare BoeingandAirbus
aircrafts.

More Related Content

PDF
Flight Landing Analysis
PDF
Helicopter rotor tip vortex diffusion
PDF
Efficient Design Exploration for Civil Aircraft Using a Kriging-Based Genetic...
PPT
Ai4 heuristic2
PDF
Calibration of A Five-Hole Probe in Null and Non-Null Technique
PDF
Flight landing Project
PDF
Predicting landing distance: Adrian Valles
PDF
Flight Landing Risk Assessment Project
Flight Landing Analysis
Helicopter rotor tip vortex diffusion
Efficient Design Exploration for Civil Aircraft Using a Kriging-Based Genetic...
Ai4 heuristic2
Calibration of A Five-Hole Probe in Null and Non-Null Technique
Flight landing Project
Predicting landing distance: Adrian Valles
Flight Landing Risk Assessment Project

Similar to Modeling and Prediction using SAS (20)

DOCX
Predicting aircraft landing distances using linear regression
PDF
Flight Data Analysis
PDF
Statistical computing project
PDF
A statistical approach to predict flight delay
DOCX
Regression Analysis on Flights data
PDF
Flight Landing Distance Study Using SAS
PPTX
Prediction of Airlines Delay
PPTX
Using PostgreSQL for Flight Planning
PDF
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...
PDF
Predicting aircraft landing overruns using quadratic linear regression
PPTX
big data slides.pptx
PDF
j2 Universal - Modelling and Tuning Braking Characteristics
PDF
DOC245-20240219-WA0000_240219_090212.pdf
PPTX
Hard landing predection
PDF
Flights Landing Overrun Project
PPTX
Airline delay prediction
PDF
Enhancing Pilot Ability to Perform CDA with Descriptive Waypoints
PDF
Boeing-VSD
PDF
Max Gap Tim Final
PPTX
8. SpaceX launch Prediction - Project Presentation by AKash Verma.pptx
Predicting aircraft landing distances using linear regression
Flight Data Analysis
Statistical computing project
A statistical approach to predict flight delay
Regression Analysis on Flights data
Flight Landing Distance Study Using SAS
Prediction of Airlines Delay
Using PostgreSQL for Flight Planning
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...
Predicting aircraft landing overruns using quadratic linear regression
big data slides.pptx
j2 Universal - Modelling and Tuning Braking Characteristics
DOC245-20240219-WA0000_240219_090212.pdf
Hard landing predection
Flights Landing Overrun Project
Airline delay prediction
Enhancing Pilot Ability to Perform CDA with Descriptive Waypoints
Boeing-VSD
Max Gap Tim Final
8. SpaceX launch Prediction - Project Presentation by AKash Verma.pptx
Ad

Recently uploaded (20)

PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Foundation of Data Science unit number two notes
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to machine learning and Linear Models
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
Reliability_Chapter_ presentation 1221.5784
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Foundation of Data Science unit number two notes
Mega Projects Data Mega Projects Data
Business Acumen Training GuidePresentation.pptx
1_Introduction to advance data techniques.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Quality review (1)_presentation of this 21
oil_refinery_comprehensive_20250804084928 (1).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
.pdf is not working space design for the following data for the following dat...
Miokarditis (Inflamasi pada Otot Jantung)
ISS -ESG Data flows What is ESG and HowHow
Introduction to machine learning and Linear Models
Fluorescence-microscope_Botany_detailed content
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Ad

Modeling and Prediction using SAS

  • 1. BANA 6043: PROJECT WORK OCTOBER 2, 2017 JATIN SAINI M12382157
  • 2. Page | 1 SUMMARY: Goal of thisprojectis tostudy whatfactors andhow theywouldimpactthe landingdistance of a commercial flight.Forthis Ireceiveddatasetcontainingdetails of BOEINGandAIRBUSflights.My firststepisto performprocedure fordata preparation toremove emptyrows,identifyduplicate rows,identifysample size foreachvariable and take outoutliers fromthe dataset. All the mentioned stepshelpedme understanddistributionof variables,identifynormal valuesandoutliersandform newdatasetwithnormal values. To furtherunderstandthe correlationof variableswiththe landingdistance,Iconducteddescriptive studyof normal valueswith predictorvariable (whichisdistance) and all the response variables.This studygave me some perspective aboutrelationshipof speed_airandspeed_groundwiththe landing distance.IusedPearsoncorrelationtechniquetofurtherunderstandcollinearitybetweenall the variables,anditturnedoutthat speed_air islinearlycorrelatedwith speed_groundand there is no signof othervariables showingany distributionpatternwitheachother. Finally,Ifittedalinearregressionmodelwithintercept onthe normal dataafterremoving speed_air. Followingisthe equation: DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) + 42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH) We can see fromthe equationthatdurationof flightandnumberof passengerstravellinghave negative impactonlandingdistance butspeedof ground,heightof aircraftandpitchhave positive impact.
  • 3. Page | 2 VARIABLEDICTIONARY: Aircraft: Themake of an aircraft(Boeing or Airbus). Duration (in minutes): Flight duration between taking off and landing. The duration of a normal flight should always be greater than 40min. No_pasg: The number of passengers in a flight. Speed_ground (in miles per hour): The ground speed of an aircraftwhen passing over the threshold of the runway. If its valueis less than 30MPH or greater than 140MPH, then the landing would be considered as abnormal. Speed_air (in miles per hour): The air speed of an aircraftwhen passing over the threshold of the runway. If its value is less than 30MPH or greater than 140MPH, then the landing would be considered as abnormal. Height (in meters): The height of an aircraft when it is passing over the threshold of the runway. Thelanding aircraftis required to be at least 6 meters high at the threshold of the runway. Pitch (in degrees): Pitch angle of an aircraft when it is passing over the threshold of the runway. Distance (in feet): The landing distance of an aircraft. More specifically, it refers to the distance between the threshold of the runway and the point wherethe aircraftcan be fully stopped. The length of the airportrunway is typically less than 6000 feet.
  • 4. Page | 3 CHAPTER 1: DATA PREPARATION DESCRIPTION: Datapreparation is a very important stepinunderstanding the sample size. My aim here is toretainas much data as possible toobtaina best fittedmodel. STEP 1: Uploading Data Files: FILENAME flight1 '/home/sainijn0/Stat_computing/FAA1.xls'; PROC IMPORT DATAFILE=FLIGHT1 DBMS=XLS OUT=FLIGHT1; GETNAMES=YES; PROC PRINT DATA=FLIGHT1(obs=10); FILENAME flight2 '/home/sainijn0/Stat_computing/FAA2.xls'; PROC IMPORT DATAFILE=FLIGHT2 DBMS=XLS OUT=FLIGHT2; GETNAMES=YES; PROC PRINT DATA=FLIGHT2(obs=10);
  • 5. Page | 4 STEP 2: Removing Empty Rows From DataSets: DATA FLIGHT1; SET FLIGHT1; IF MISSING(NO_PASG) AND MISSING(DURATION) AND MISSING(AIRCRAFT) AND MISSING(SPEED_GROUND) AND MISSING(SPEED_AIR) AND MISSING(HEIGHT) AND MISSING(PITCH) AND MISSING(DISTANCE) THEN DELETE; RUN; PROC PRINT DATA=flight1(obs=10); RUN;
  • 6. Page | 5 DATA FLIGHT2; SET FLIGHT2; IF MISSING(NO_PASG) AND MISSING(DURATION) AND MISSING(AIRCRAFT) AND MISSING(SPEED_GROUND) AND MISSING(SPEED_AIR) AND MISSING(HEIGHT) AND MISSING(PITCH) AND MISSING(DISTANCE) THEN DELETE; RUN; PROC PRINT DATA=flight2(obs=10); RUN; STEP 3: Combining Data Sets: DATA COMBINED_FLIGHT; SET flight1 FLIGHT2; PROC PRINT DATA=COMBINED_FLIGHT(OBS=10);
  • 7. Page | 6 STEP 4: Removing Duplicates: PROC SORT DATA=COMBINED_FLIGHT NODUPKEY; BY SPEED_AIR SPEED_GROUND HEIGHT PITCH DISTANCE; PROC PRINT DATA=combined_flight(OBS=10);
  • 8. Page | 7 STEP 5: Finding Missing Values For EachVariable: PROC MEANS DATA=COMBINED_FLIGHT NMISS; TITLE MISSING VALUES; STEP 6: Observing Variable Distributions: PROC CHART DATA=COMBINED_FLIGHT; VBAR SPEED_AIR SPEED_GROUND HEIGHT PITCH DISTANCE;
  • 11. Page | 10 STEP 7: Identifying Abnormal Rows: DATA VALIDATE1; SET COMBINED_FLIGHT; IF DURATION<40 THEN FLAG=1; ELSE IF HEIGHT<6 THEN FLAG=1; ELSE IF SPEED_GROUND<30 OR SPEED_GROUND>140 THEN FLAG=1; ELSE IF DISTANCE>6000 THEN FLAG=1; ELSE FLAG=0; PROC PRINT DATA=VALIDATE1(OBS=10); STEP 8: Summary Of Abnormal And Normal Data: DATA FLAGGED_FLIGHTS; SET VALIDATE1; IF FLAG=1; PROC MEANS DATA=FLAGGED_FLIGHTS; TITLE ABNORMAL DATA;
  • 12. Page | 11 DATA NORMAL_FLIGHTS; SET VALIDATE1; IF FLAG=0; PROC MEANS DATA=NORMAL_FLIGHTS; TITLE NORMAL DATA; Observation:We observedthat after removingduplicate valuesand empty rows, and further removingoutlierswe get195 rows for speed_airand 781 data rows for each of the following variables:  Duration  No_pasg  Speed_ground  Height  Pitch  Distance
  • 13. Page | 12 CHAPTER 2: DESCRIPTIVE STUDY WORKING WITH NORMAL DATA DESCRIPTION: Purpose of this chapter is to use scatterplotsandPearson correlationtounderstandany correlationbetweenvariables. STEP 1: Creating Scatterplots BetweenResponse Variable AndPredictor Variables. CODE: PROC PLOT DATA=NORMAL_FLIGHTS; PLOT DISTANCE*DURATION; PLOT DISTANCE*NO_PASG; PLOT DISTANCE*SPEED_GROUND; PLOT DISTANCE*SPEED_AIR; PLOT DISTANCE*HEIGHT; PLOT DISTANCE*PITCH; RUN;
  • 17. Page | 16 STEP 2: PearsonCorrelationBetweenVariables: PROC CORR DATA=NORMAL_FLIGHTS; VAR DISTANCE DURATION NO_PASG SPEED_GROUND SPEED_AIR HEIGHT PITCH; OBSERVATION: This turns out to be very useful stepas it helpsus in identifyingcorrelationbetweeneachof the variables. Now, it seemsfromthe output that correlationbetween“speed_ground” and“speed_air” is0.988 (nearlyperfectcorrelation) which tellsus that they show high collinearity.Therefore,we can drop one of these variables inregressionstep.
  • 18. Page | 17 Since “speed_air” hasonly 195 data rows filled ascompared to “speed_ground” whichhas 781 data rows. Hence we drop “speed_air” variable fromour regressionmodel. CHAPTER 3: STATISTICAL STUDY DESCRIPTION: Statistical study is avery important stepas it helps us in fitting a regressionmodel topresent datawithinterceptandif eachvariable is positively or negatively correlatedwith landing distance STEP 1: Fitting Linear RegressionModel OnOur Normal Dataset: PROC REG DATA=NORMAL_FLIGHTS; MODEL DISTANCE = DURATION NO_PASG SPEED_GROUND HEIGHT PITCH; RUN;
  • 20. Page | 19 OBSERVATION: Equation From RegressionOf Normal Data ComesOut To Be: DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) + 42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH) We can see from the equationthat duration of flightand number of passengerstravellinghave a negative impact on landingdistance but speedof ground,heightof aircraft and pitch have a positive impact.
  • 21. Page | 20 Q&A 1. How manyobservations(flights) doyouuse tofityour final model?If notall 950 flights,why? Solution:I used781 observationstofitthe model outof 950. 69 rowswere removedfromdataset because theycontainedoutliersand100 rows were duplicates(basedonkeys:speed_ground, speed_air,height,pitchanddistance). 2. What factors andhow theyimpactthe landingdistance of aflight? Solution:RegressionEquation DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) + 42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH) We can see fromthe regression equationthatdurationof flightandnumberof passengerstravelling have a negative impactonlandingdistance butspeedof ground,heightof aircraftandpitch have a positive impact. 3. Is there anydifferencebetweenthe twomakesBoeingandAirbus? Solution: RegressionEquationforBoeingaircrafts DISTANCE = -1796.98903 + 0.45775*(DURATION) – 1.9524*(NO_PASG) + 42.28107*(SPEED_GROUND) + 14.23727*(HEIGHT) – 39.31818*(PITCH) RegressionEquationforAirbusaircrafts DISTANCE = -2788.83858 -0.30974*(DURATION) – 0.33445*(NO_PASG) + 42.90888*(SPEED_GROUND) + 13.98867*(HEIGHT) + 80.78737*(PITCH) We can see a wide shiftininterceptandpitchcoefficientsforthe 2 makes,butwhatreallygetsmy attentionisthe change insignsfor coefficientsof pitchandduration.Thistellsusthat pitchand duration have opposite impactonthe landingdistance whenwe compare BoeingandAirbus aircrafts.