Skip to content

SENATOROVAI/gradient-descent-sgd-solver-course

Gradient Descent & SGD Solver β€” Course Implementation

License Python Website PRs Welcome DOI Code Style Pre-commit

πŸš€ Professional educational project for understanding and implementing Gradient Descent and Stochastic Gradient Descent (SGD) from scratch in Python.


πŸ”₯ Project Overview

This repository demonstrates the mathematical foundation and practical implementation of:

  • Gradient Descent (GD)
  • Stochastic Gradient Descent (SGD)
  • Mini-Batch Gradient Descent
  • Convergence analysis
  • Optimization visualization

Keywords


gradient descent
stochastic gradient descent
sgd solver python
optimization algorithm
machine learning optimization
gradient descent from scratch
mini batch gradient descent
convex optimization
loss function minimization
python optimization implementation


πŸ“š Mathematical Foundation

Gradient Descent Update Rule

For parameters:

$$ \theta $$

Update rule:

$$ \theta_{t+1} = \theta_t - \eta \nabla J(\theta_t) $$

Where:

  • $$\eta$$ β€” learning rate
  • $$J(\theta)$$ β€” loss function
  • $$\nabla J(\theta)$$ β€” gradient

Stochastic Gradient Descent

Instead of full dataset gradient:

$$ \theta_{t+1} = \theta_t - \eta \nabla J_i(\theta_t) $$

Where:

  • Gradient computed on a single sample
  • Faster but noisier updates

Mini-Batch Version

$$ \theta_{t+1} = \theta_t - \eta \frac{1}{B} \sum_{i \in B} \nabla J_i(\theta_t) $$

Balances:

  • Stability
  • Speed
  • Computational efficiency

🧠 Why This Project Matters

Gradient-based optimization is used in:

  • Neural networks
  • Linear regression
  • Logistic regression
  • Deep learning
  • Large-scale optimization

This project explains the algorithm at a mathematical and implementation level.


πŸ— Project Structure


gradient-descent-sgd-solver/
β”‚
β”œβ”€β”€ README.md
β”œβ”€β”€ LICENSE
β”œβ”€β”€ requirements.txt
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ gradient_descent.py
β”‚   β”œβ”€β”€ sgd.py
β”‚   β”œβ”€β”€ loss_functions.py
β”‚   β”œβ”€β”€ optimizer.py
β”‚
β”œβ”€β”€ examples/
β”‚   └── demo.py
β”‚
β”œβ”€β”€ docs/
β”‚   └── theory.md
β”‚
β”œβ”€β”€ images/
β”‚   └── convergence_plot.png
β”‚
└── index.html


Clean structure improves:

βœ” Discoverability
βœ” Professional appearance
βœ” Portfolio quality


🐍 Example Implementation β€” Gradient Descent

import numpy as np

def gradient_descent(X, y, lr=0.01, epochs=1000):
    m, n = X.shape
    theta = np.zeros(n)

    for _ in range(epochs):
        predictions = X @ theta
        error = predictions - y
        gradient = (1/m) * X.T @ error
        theta -= lr * gradient

    return theta

πŸš€ Installation

pip install -r requirements.txt

Run example:

python examples/demo.py

πŸ“Š Visualization (Recommended)

Add:

  • Loss curve plot
  • Parameter convergence graph
  • 3D loss surface

Example:

import matplotlib.pyplot as plt

plt.plot(loss_history)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Convergence")
plt.show()

About

Stochastic Gradient Descent (SGD) is an optimization algorithm that updates model parameters iteratively using small, random subsets (batches) of data, rather than the entire dataset. It significantly speeds up training for large datasets, though it introduces noise that causes, in some cases, heavy fluctuations.deep learning/neural networks.solver

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages