Skip to content

spu-bigdataanalytics-211/assignment-1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

assignment-1

Practice python skills.

Description

We will do practices on following areas:

  • python introduction
  • exploratory data analysis
  • parallel programming

For parallel programming, you have two tasks.

  1. Download images from internet from Unsplash
  2. Apply some image processing operations to the images.

For comparison, we already implemented the serial process for both of these tasks. Once the notebook runs, we will be measuring processing time of the cell.

You can do parallel programming with python using following packages that are available within standard library. There are other packages that you can download via pip to achieve parallelism, but, for this assignment, following packages may just be sufficient.

Dataset

Random images meta information from Unsplash.

To download meta information of random images as JSON, use the following.

from data_prep import download_unsplash_json

# downloads to data/ json folder.
download_unsplash_json()

Get data into a list or dataframe.

from data_prep import get_images_df, get_images_list

# get as pandas dataframe.
get_images_df()

# get images as list of dictionaries.
get_images_list()

You can also create an images dataset by downloading images.

from data_prep import download_images

# downloads to data/ images folder.
download_images(quality='raw')

Creating a Developer Account in Unsplash

Unsplash is a website that shares freely awailable usable images. It is a nice website with all cool images from a lot of photographers.

We will use this website to build our images dataset, using their API. You need to go to Unsplash Developers, register, and create an application. Then you need to get an access key and secret key.

Below GIF shows how I created my developer account.

unsplash-steps

How to work on this Assignment?

  1. Download this repository with git clone https://github.com/spu-bigdataanalytics-211/assignment-1.git.
  2. Create a virtual environment and activate this environment everytime you need to use it.
  3. Install requirements.txt file using pip install -r requirements.txt.
  4. Run the notebook.

Questions

The repository should be self descriptive and it should guide you through assignment. Let me know if you have any questions.

About

Practice python skills, do parallel programming.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors