Hi!
This is the repository to find all everything about big data analytics. You will find tutorials we did in the class, class notes, python files, etc.
The content of this repository will be updated between March 1st to May 22nd, 2021, on a weekly basis.
The content on this repository is organized based on the topic.
| Directory | Description |
|---|---|
| contents/python-warmup | Revisiting python, functions, classes, decorators, generators, etc. |
| contents/parallelism | Introduction to parallelism |
| Upcoming... | Introduction to clustering systems, MongoDB, and Cassandra, Cloud Computing |
You can use this repository either locally or with Binder.
Binder is a tool that runs a repository into a collection of interactive notebooks. This will install requirements.txt file into the backend server so that notebooks will already have all packages needed.
Use the following link to create a binder for this repository.
Note that when we start with Spark, Binder may not provide all the right tools.
To install locally, follow the list below.
- Download this repository to your local machine using
git clone https://github.com/spu-bigdataanalytics-211/class-materials.git. - Download python 3 from python.org, if you don't have python already on your computer.
- Create a virtual environment and activate this environment everytime you need to use it.
- Install requirements.txt file using
pip install -r requirements.txt. - That's it.
Following are the recommendations.
- Read the README file for each section.
- For some sections, there will be
Examples.ipynbandExamples-Solutions.ipynbfiles. These will have some examples, and their solutions on the related topic. - There will also be
Notes.ipynb, which may have some more content about the topic. - Follow up with the assignments to practice more.
Environment is a container for your application to run isolated with safely and without interrupting any other existing applications in the same machine.
Make sure you have python installed on your system.
python --version; pip --version
To create a virtual environment, you need a virtual environment manager. By default, python comes up with venv module. There are more modules like virtualenv, etc.
In below command, using python, you are invoking venv module to create a new virtual environment folder with name .venv, in the current directory that you are in.
python -m venv .venvTo activate the environment, you do the following.
# on windows
me@MacBook-Pro ~ .\.venv\Scripts\Activate
# on mac
me@MacBook-Pro ~ source .venv/bin/activate
# your console will change to this
(.venv) me@MacBook-Pro ~ All topics will be shared in here. Please review the content and course materials in here first. If you still have questions, please feel free to reach me by email or from GitHub at @metinsenturk.