assignment-1

Practice python skills.

Description

We will do practices on following areas:

python introduction
exploratory data analysis
parallel programming

For parallel programming, you have two tasks.

Download images from internet from Unsplash
Apply some image processing operations to the images.

For comparison, we already implemented the serial process for both of these tasks. Once the notebook runs, we will be measuring processing time of the cell.

You can do parallel programming with python using following packages that are available within standard library. There are other packages that you can download via pip to achieve parallelism, but, for this assignment, following packages may just be sufficient.

Dataset

Random images meta information from Unsplash.

To download meta information of random images as JSON, use the following.

from data_prep import download_unsplash_json

# downloads to data/ json folder.
download_unsplash_json()

Get data into a list or dataframe.

from data_prep import get_images_df, get_images_list

# get as pandas dataframe.
get_images_df()

# get images as list of dictionaries.
get_images_list()

You can also create an images dataset by downloading images.

from data_prep import download_images

# downloads to data/ images folder.
download_images(quality='raw')

Creating a Developer Account in Unsplash

Unsplash is a website that shares freely awailable usable images. It is a nice website with all cool images from a lot of photographers.

We will use this website to build our images dataset, using their API. You need to go to Unsplash Developers, register, and create an application. Then you need to get an access key and secret key.

Below GIF shows how I created my developer account.

How to work on this Assignment?

Download this repository with git clone https://github.com/spu-bigdataanalytics-211/assignment-1.git.
Create a virtual environment and activate this environment everytime you need to use it.
Install requirements.txt file using pip install -r requirements.txt.
Run the notebook.

Questions

The repository should be self descriptive and it should guide you through assignment. Let me know if you have any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
data		data
.gitignore		.gitignore
Assignment.ipynb		Assignment.ipynb
LICENSE		LICENSE
README.md		README.md
data_prep.py		data_prep.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

assignment-1

Description

Dataset

Creating a Developer Account in Unsplash

How to work on this Assignment?

Questions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

assignment-1

Description

Dataset

Creating a Developer Account in Unsplash

How to work on this Assignment?

Questions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages