Skip to content

Latest commit

 

History

History
89 lines (59 loc) · 4.83 KB

File metadata and controls

89 lines (59 loc) · 4.83 KB

Class Materials

Binder

Hi!

This is the repository to find all everything about big data analytics. You will find tutorials we did in the class, class notes, python files, etc.

The content of this repository will be updated between March 1st to May 22nd, 2021, on a weekly basis.

Contents

The content on this repository is organized based on the topic.

Directory Description
contents/python-warmup Revisiting python, functions, classes, decorators, generators, etc.
contents/parallelism Introduction to parallelism
Upcoming... Introduction to clustering systems, MongoDB, and Cassandra, Cloud Computing

How to use this Repository?

You can use this repository either locally or with Binder.

Binder is a tool that runs a repository into a collection of interactive notebooks. This will install requirements.txt file into the backend server so that notebooks will already have all packages needed.

Use the following link to create a binder for this repository.

Binder

Note that when we start with Spark, Binder may not provide all the right tools.

To install locally, follow the list below.

  1. Download this repository to your local machine using git clone https://github.com/spu-bigdataanalytics-211/class-materials.git.
  2. Download python 3 from python.org, if you don't have python already on your computer.
  3. Create a virtual environment and activate this environment everytime you need to use it.
  4. Install requirements.txt file using pip install -r requirements.txt.
  5. That's it.

Following are the recommendations.

  1. Read the README file for each section.
  2. For some sections, there will be Examples.ipynb and Examples-Solutions.ipynb files. These will have some examples, and their solutions on the related topic.
  3. There will also be Notes.ipynb, which may have some more content about the topic.
  4. Follow up with the assignments to practice more.

How to create a new Virtual Environment?

Environment is a container for your application to run isolated with safely and without interrupting any other existing applications in the same machine.

Make sure you have python installed on your system.

python --version; pip --version

To create a virtual environment, you need a virtual environment manager. By default, python comes up with venv module. There are more modules like virtualenv, etc.

In below command, using python, you are invoking venv module to create a new virtual environment folder with name .venv, in the current directory that you are in.

python -m venv .venv

To activate the environment, you do the following.

# on windows
 me@MacBook-Pro ~ .\.venv\Scripts\Activate

# on mac
 me@MacBook-Pro ~ source .venv/bin/activate

# your console will change to this
(.venv) me@MacBook-Pro ~ 

Instructor's Note

All topics will be shared in here. Please review the content and course materials in here first. If you still have questions, please feel free to reach me by email or from GitHub at @metinsenturk.