assignment-3

Apache Spark Practices

Description

Use Apache Spark to find the answer for the following questions.

Find the # of flights each airline made so far from 1987 until recent.
Find the mean departure delay per origination airport.
What is the average departure delay from each airport?
What day the delays are the worst?
Which day of the week is the most of the flights cancelled?
Which day of the month is the most of the flights cancelled?
Find the on-time (ArrTime - CRSArrTime <= 0) performance for each unique carrier.

Hints

If using Google Colab, attach Google drive to Google Colab.

from google.colab import drive
drive.mount('/content/drive')

Read all files in single code.

sc = spark.sparkContext
rdd = sc.textFile('/content/drive/path/to/files/*.csv.bz2')

Use .take() or .first() instead of .collect().

rdd.take(2)

Dataset

Dataset is same as assignment 2, use if from the Google Drive folder. Use spu-bigdataanalytics-211/assignment-2 to see more on data dictionary.

You can find more information about this dataset in the website of Statistical Computing. Find out more information on Airline On-Time Performance Data from Bureau of Transportation Statistics (BTS).

How to work on this Assignment?

Download this repository with git clone https://github.com/spu-bigdataanalytics-211/assignment-3.git.
Create a virtual environment and activate this environment everytime you need to use it.
Install requirements.txt file using pip install -r requirements.txt.
Create a notebook.

Questions

The repository should be self descriptive and it should guide you through assignment. Let me know if you have any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Assignment3.ipynb		Assignment3.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

assignment-3

Description

Hints

Dataset

How to work on this Assignment?

Questions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

assignment-3

Description

Hints

Dataset

How to work on this Assignment?

Questions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages