A Python based open-source data analysis project on airline on-time statistics and delay causes
Perform an exploratory analysis on airline on-time statistics and delay causes to provide insight to travelers about what to expect when flying
Airline On-Time and Delay Cause Data as well as Historical Delay Minutes Data was sourced from US DOT BTS. Time-series Data was sourced from NASDAQ. US JSON file was sourced from CareerFoundry.
Data sourcing, data cleaning, data visualizations, geospatial analysis, regression analysis, k-means clustering, time-series analysis
Jupyter Notebooks, Python, Pandas, Excel, Tableau
Sourcing Open Data
Data Cleaning
Data Exploration
Sourcing a shapefile
Data Wrangling
Geospatial Analysis
Supervised Machine Learning: Regression Analysis
Unsupervised Machine Learning: Clustering
Sourcing time-series data
Data Visualization
Dickey-Fuller test
Utilize Tableau Dashboard to present Results
•In 2020-2021 there was a 13.5% chance an arriving flight would be delayed and based on historical data we can expect that trend to continue.
•The leading cause of delays across the US is carrier delays making up 5.16% of all delays
•You have the lowest chance of an arriving flight being delayed when you're flying in Minnesota (9.77%) and Georgia (9.89%)
•The state, airport, and airline all play a role into the likelihood an arriving flight will be delayed. Flying on Endeavor Air provides the lowest chance of an arriving flight being delayed at 8.03%.
TAKEAWAYS
•Weather delays only account for extreme weather conditions that prevent flying. Because of this the data analyzed does not accurately depict the true number of weather delays and the amount of delay time weather delays caused.
•Find supplemental data on weather and NAS delays to get a clearer picture on the affect weather has on arriving flight delays.
Copyright © 2022 Shey LeGras - All Rights Reserved.
Powered by GoDaddy
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.