Practice python skills.
We will do practices on following areas:
- python introduction
- exploratory data analysis
- parallel programming
For parallel programming, you have two tasks.
- Download images from internet from Unsplash
- Apply some image processing operations to the images.
For comparison, we already implemented the serial process for both of these tasks. Once the notebook runs, we will be measuring processing time of the cell.
You can do parallel programming with python using following packages that are available within standard library. There are other packages that you can download via pip to achieve parallelism, but, for this assignment, following packages may just be sufficient.
Random images meta information from Unsplash.
To download meta information of random images as JSON, use the following.
from data_prep import download_unsplash_json
# downloads to data/ json folder.
download_unsplash_json()Get data into a list or dataframe.
from data_prep import get_images_df, get_images_list
# get as pandas dataframe.
get_images_df()
# get images as list of dictionaries.
get_images_list()You can also create an images dataset by downloading images.
from data_prep import download_images
# downloads to data/ images folder.
download_images(quality='raw')Unsplash is a website that shares freely awailable usable images. It is a nice website with all cool images from a lot of photographers.
We will use this website to build our images dataset, using their API. You need to go to Unsplash Developers, register, and create an application. Then you need to get an access key and secret key.
Below GIF shows how I created my developer account.
- Download this repository with
git clone https://github.com/spu-bigdataanalytics-211/assignment-1.git. - Create a virtual environment and activate this environment everytime you need to use it.
- Install requirements.txt file using
pip install -r requirements.txt. - Run the notebook.
The repository should be self descriptive and it should guide you through assignment. Let me know if you have any questions.
