This project is an implementation of a character-level diffusion model (LLaDA) for text generation, based on the principles outlined in the take-home exercise. It also includes a standard autoregressive Transformer model as a baseline for comparison.
The entire project has been refactored from a monolithic script into a modular, clean, and testable structure that supports training and inference from the command line and tracks experiments using Weights & Biases.
/
├── configs/ # Centralized configuration files
├── data/ # Raw data files (e.g., tinyshakespeare.txt)
├── outputs/ # Saved models (.pth) and plots (.png)
├── tests/ # Unit tests for the project
├── .gitignore
├── data_utils.py # Tokenizer and PyTorch Dataset classes
├── model.py # Model architectures (LLaDA and Autoregressive)
├── train.py # Training script with wandb integration
├── generate.py # Inference script for text generation
├── main.py # Main entry point for the CLI
├── README.md # This file
├── requirements.txt # Python dependencies
└── EVALUATION_REPORT.md # Analysis of the model performance
-
Clone the repository:
git clone <repository_url> cd <repository_directory>
-
Create a virtual environment:
python3 -m venv .venv source .venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
-
(Optional but Recommended) Login to Weights & Biases: To enable experiment tracking, log in to your W&B account. You will be prompted for your API key.
wandb login
The project is controlled via the main.py script with command-line arguments.
To train a model, use the --mode train argument and specify the model type.
The script will download the dataset, train the model, save the best version to outputs/models/, and log the experiment to Weights & Biases.
Train the LLaDA model:
python3 main.py --mode train --model_type lladaTrain the Autoregressive model:
python3 main.py --mode train --model_type autoregressiveTo generate text with a trained model, use the --mode generate argument. The script will automatically load the best saved model weights.
Generate with the LLaDA model:
python3 main.py --mode generate --model_type llada --prompt "O Romeo, Romeo!"Generate with the Autoregressive model:
python3 main.py --mode generate --model_type autoregressive --prompt "O Romeo, Romeo!"To ensure all components are working correctly, run the unit tests:
python3 -m unittest discover tests