Dissertation codebase for:
“Pricing Short-Dated U.S. Equity Options: A Comparative Study”
(Equities: AAPL, MSFT, NVDA, AMZN, GOOGL, META, JPM, XOM, TSLA; SPX as benchmark)
This repository contains the full research pipeline used in the dissertation:
- Cleaning and merging OptionMetrics / underlying / macro data
- Building forward prices (SOFR OIS + dividends)
- Constructing volatility panels: market IV, GARCH, SABR, Neural SDE
- Pricing American options via binomial, PDE, and Neural SDE Monte Carlo
- Evaluation, model risk metrics, and trading-style diagnostics
- Generating all figures (PNG) and animations (GIF) used in the thesis
Execution is orchestrated via VS Code tasks in .vscode/tasks.json (no CLI main.py driver).
High-level layout (non-exhaustive):
.
├── README.md
├── requirements.txt
├── pytest.ini
├── config/
│ └── config.yaml # Global settings (tickers, dates, buckets, etc.)
├── data/
│ ├── README.md # Notes on raw/processed data locations
│ ├── raw/ # WRDS / Bloomberg / FRED / etc. exports (NOT in git)
│ ├── interim/ # Intermediate cleaned files
│ └── processed/ # Final research panels, IV surfaces, eval tables
│ ├── panel*.parquet
│ ├── iv_surfaces/
│ ├── iv_true/
│ ├── eval/
│ └── (other helper CSV/Parquet files)
├── outputs/
│ └── figures/ # All PNG/GIF figures used in the dissertation
│ ├── iv_smiles/
│ ├── iv_surfaces/
│ ├── surface3d_png/
│ ├── smile_gif/
│ ├── surface_gif/
│ ├── smiles_gif/
│ └── atm_iv_vs_vix/ # ATM IV vs VIX per ticker
├── cpp/
│ ├── CMakeLists.txt
│ ├── binomial.cpp # EEP pricing (C++ / pybind-style DLL)
│ ├── neural_mc.cpp # Neural SDE Monte Carlo accelerator (C++)
│ └── build/
│ ├── libbinomial.dylib
│ └── libneural_mc.dylib
├── fortran/
│ ├── heston_pde.f90 # Heston PDE solver
│ ├── heston_pde_mod.mod
│ └── libheston_pde.dylib # Shared library used from Python
├── src/
│ ├── __init__.py
│ ├── main.py # (optional) helper; core work via module entrypoints
│ ├── cleaning.py # Raw OptionMetrics -> cleaned panel
│ ├── rates_divs.py # SOFR OIS curve, forwards, dividend/earnings flags
│ ├── eep.py # Early exercise premium utilities
│ ├── eval.py # Core evaluation, risk metrics, PnL tables
│ ├── utils.py # Logging, config loading, shared helpers
│ ├── native.py # Thin wrappers around C++/Fortran libraries
│ ├── viz.py # Summary figures (core performance plots)
│ ├── viz_atm_iv_vs_vix.py
│ ├── viz_iv_smiles.py
│ ├── viz_iv_surfaces.py
│ ├── viz_iv_surfaces_3d.py
│ ├── viz_iv_animation.py
│ ├── viz_iv_surface_animation.py
│ ├── viz_model_smiles_animation.py
│ └── models/
│ ├── garch.py # GARCH(1,1) volatility panel
│ ├── sabr.py # SABR volatility panel
│ ├── market_iv.py # Market-implied IV + basic EEP for benchmark
│ ├── neural_sde.py # Neural SDE training and path simulation
│ ├── pricing_binomial.py # Binomial + C++ lib, EEP corrections
│ ├── pricing_garch.py # American pricing under GARCH vol panel
│ ├── pricing_sabr.py # American pricing under SABR vol panel
│ ├── pricing_neural_sde.py # Neural SDE American pricing
│ ├── heston_calib.py # (optional) Heston calibration
│ ├── heston_pde.py # Heston PDE pricing using Fortran DLL
│ ├── iv_surfaces.py # IV surface panel construction
│ ├── iv_true_vs_panel.py # True vs panel IV comparison
│ ├── regime_buckets.py # Volatility regimes (VIX-based)
│ ├── early_exercise_region.py# EEP sign/size analysis, regions in (K,T)
│ ├── model_cost_benchmark.py # Runtime / efficiency benchmarks
│ ├── model_pnl_ranking.py # Hedged PnL style ranking
│ └── tail_risk_eval.py # ES / underpricing tail metrics
└── tests/
├── test_schema.py
├── test_data_integrity.py
├── test_forward.py
├── test_eep.py
├── test_heston_cos.py # Not used
├── test_models_panels.py
├── test_model_relations.py
├── test_additional_outputs.py
└── test_smoke.py
Inside the repo root:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
The VS Code tasks assume the Python interpreter lives at:
${workspaceFolder}/.venv/bin/python
If your environment path differs, update .vscode/tasks.json accordingly.
Expected directory layout under data/:
data/
raw/ # WRDS, Bloomberg, FRED etc. exports (not tracked in git)
interim/ # automatically filled with intermediate cleaned data
processed/ # final research panels, IV surfaces, evaluation tables
For local runs, set:
export DATA_DIR="<absolute-path-to>/data"
export OUT_DIR="<absolute-path-to>/outputs"
config/config.yaml stores the core parameter choices (tickers, sample period, DTE buckets, etc.). Adjust it if you want to rerun the pipeline on a different sample or equity subset.
The project is driven through VS Code tasks defined in .vscode/tasks.json.
Each task calls a specific Python module (or build tool) in the correct order.
To run a task:
- Open the workspace in VS Code (folder root = this repo).
- Make sure the
.venvis created and selected as the Python interpreter. - Press
Ctrl+Shift+B/Cmd+Shift+Bor open the Command Palette:
“Tasks: Run Task”. - Choose the desired task label (e.g.
99-full-pipeline).
0-env-show
Quick sanity check: prints the Python version of .venv.
1-cleaning
Entry point: src.cleaning
- Loads raw OptionMetrics and underlying data from data/raw
- Applies filters (DTE, moneyness, volume, OI, spread)
- Outputs cleaned option panel(s) into data/interim and data/processed
2-rates-divs-forwards
Entry point: src.rates_divs
- Builds SOFR OIS discount curves
- Integrates dividends and earnings dates
- Produces forward-adjusted panel_with_forwards.parquet
3-market-iv-american
Entry point: src.models.market_iv
- Inverts BS to obtain market-implied IV
- Applies simple American adjustment where relevant
- Produces panel_market_iv_american.parquet (benchmark)
4-garch-vol-panel
Entry point: src.models.garch
- Estimates GARCH(1,1) per ticker
- Produces panel_garch.parquet with conditional volatility forecasts
5-sabr-vol-panel
Entry point: src.models.sabr
- Calibrates SABR to cross-sections
- Produces panel_sabr.parquet
6-neural-sde-train
Entry point: src.models.neural_sde
- Trains neural SDE model(s)
- Saves weights: data/neural_sde_state*.pt
- Produces a basic neural volatility/panel output
7-build-cpp-libs
Command:
cmake --build cpp/build --config Release -j8
- Builds libbinomial.dylib and libneural_mc.dylib
- Required for fast binomial/MC pricing
8-pricing-binomial-eep
Entry point: src.models.pricing_binomial
- Uses the C++ binomial library
- Computes early exercise premia and sanity checks
- Produces panel_*_binom_*.parquet and related diagnostics
9-build-heston-fortran-lib
Command:
gfortran -O3 -shared -fPIC heston_pde.f90 -o libheston_pde.dylib
- Compiles the Heston PDE solver to a shared library
10-pricing-heston-american
Entry point: src.models.heston_pde
- Calls the Fortran DLL to price under Heston
- Produces panel_heston_american.parquet
11-pricing-garch-american
Entry point: src.models.pricing_garch
- Uses vol from panel_garch.parquet + EEP logic
- Produces panel_garch_american.parquet
12-pricing-sabr-american
Entry point: src.models.pricing_sabr
- Uses SABR vol panel and binomial/EEP machinery
- Produces panel_sabr_american.parquet
13-pricing-neural-sde-american
Entry point: src.models.pricing_neural_sde
- Uses Neural SDE paths (and, where relevant, neural_mc C++ lib)
- Produces panel_neural_sde_american.parquet
Hint: once volumes/panels are built, you can re-run any pricing task individually if you change a parameter.
14-eval-core
Entry point: src.eval
- Aggregates pricing errors into eval_by_* CSVs
- Computes eval_overall.csv, large error lists,
tail_risk_metrics.csv, and model_cost_benchmark.csv (numeric part)
15-iv-surfaces-panel
Entry point: src.models.iv_surfaces
- Builds IV surfaces and panel across (K/F, T) for each ticker
- Outputs iv_surface_<TICKER>.parquet and iv_panel_all.parquet
16-iv-true-vs-panel
Entry point: src.models.iv_true_vs_panel
- Compares “true” IV (from market) with panel reconstructions
- Outputs iv_true_vs_panel.* under data/processed/iv_true/
17-regime-buckets
Entry point: src.models.regime_buckets
- Creates volatility regimes based on VIX levels
- Produces daily regime tagging: regime_daily.* in src/data/processed/regimes
18-viz-core
Entry point: src.viz
- Produces core summary plots:
overall RMSE/MAE/MAPE, EEP behaviour, cost vs error, etc.
19-viz-iv-smiles
Entry point: src.viz_iv_smiles
- Static IV smile plots at selected dates and regimes
20-viz-iv-surfaces-2d
Entry point: src.viz_iv_surfaces
- 2D surface heatmaps in (moneyness, DTE) for each ticker
21-viz-iv-surfaces-3d
Entry point: src.viz_iv_surfaces_3d
- 3D surface plots (surface3d_png/)
22-viz-iv-smile-gifs
Entry point: src.viz_iv_animation
- Animated IV smiles over time (smile_gif/)
23-viz-iv-surface-gifs
Entry point: src.viz_iv_surface_animation
- Animated IV surfaces (surface_gif/)
24-viz-model-smiles-gifs
Entry point: src.viz_model_smiles_animation
- GIFs comparing model vs market smiles (smiles_gif/)
25-viz-atm-iv-vs-vix
Entry point: src.viz_atm_iv_vs_vix
- ATM IV vs VIX scatter/time-series by ticker (atm_iv_vs_vix/)
26-model-pnl-ranking
Entry point: src.models.model_pnl_ranking
- Simple hedged PnL-based ranking and supporting figures
27-model-cost-benchmark
Entry point: src.models.model_cost_benchmark
- Runtime benchmarking and cost-performance trade-off plots
28-tests
Command:
${workspaceFolder}/.venv/bin/python -m pytest -q
- Runs the full pytest suite in tests/
For convenience:
99-full-pipeline
dependsOn:
1-cleaning
2-rates-divs-forwards
3-market-iv-american
4-garch-vol-panel
5-sabr-vol-panel
6-neural-sde-train
7-build-cpp-libs
8-pricing-binomial-eep
9-build-heston-fortran-lib
10-pricing-heston-american
11-pricing-garch-american
12-pricing-sabr-american
13-pricing-neural-sde-american
14-eval-core
15-iv-surfaces-panel
16-iv-true-vs-panel
17-regime-buckets
18-viz-core
19-viz-iv-smiles
20-viz-iv-surfaces-2d
21-viz-iv-surfaces-3d
22-viz-iv-smile-gifs
23-viz-iv-surface-gifs
24-viz-model-smiles-gifs
25-viz-atm-iv-vs-vix
26-model-pnl-ranking
27-model-cost-benchmark
28-tests
Run this when you have raw data in place and want to regenerate
everything from the cleaned panel all the way to final figures
and tests.
98-semifull-pipeline
dependsOn:
14-eval-core
15-iv-surfaces-panel
16-iv-true-vs-panel
17-regime-buckets
18-viz-core
19-viz-iv-smiles
20-viz-iv-surfaces-2d
21-viz-iv-surfaces-3d
22-viz-iv-smile-gifs
23-viz-iv-surface-gifs
24-viz-model-smiles-gifs
25-viz-atm-iv-vs-vix
26-model-pnl-ranking
27-model-cost-benchmark
28-tests
Use this if all pricing panels already exist (e.g. after a previous
full run) and you want to regenerate evaluation + visualisations
without redoing GARCH/SABR/Neural SDE or raw data cleaning.
All tasks are simply invoking Python modules. To run manually from the shell:
source .venv/bin/activate
python -m src.cleaning
python -m src.rates_divs
python -m src.models.garch
python -m src.models.pricing_garch
python -m src.eval
python -m src.viz
and so on, following the same order as defined in .vscode/tasks.json.
You can either:
-
Run the
28-teststask inside VS Code, or -
From the repo root (with
.venvactivated):pytest -q
The test suite checks:
- Data schema & integrity
- Forwards and discount factors
- EEP convergence and sign in typical regions
- Heston vs Black–Scholes limiting behaviour
- Consistency between model panels (GARCH vs SABR vs MIV)
- Existence and basic properties of final outputs (tables + figures)
- Clone the repository.
- Create
.venvandpip install -r requirements.txt. - Place data exports under
data/rawas described indata/README.md. - Set
DATA_DIRandOUT_DIRenvironment variables. - Adjust
config/config.yamlif you want a different sample or asset set. - In VS Code, open the folder, select
.venvas the interpreter. - Run task
99-full-pipeline. - All final panels, evaluation tables, and figures should now be available in
data/processed/andoutputs/figures/. - Optionally, re-run
28-testsmanually to verify the environment.
The codebase is intended to be released under the MIT License (or the licence specified alongside the dissertation submission). Vendor data (WRDS, Bloomberg, FRED, etc.) remain subject to their respective licences and are not distributed in this repository.
Author: Kirill Papka
Supervisor: Giuseppe Cavaliere
Institution: University of Exeter
For questions about replication, extensions, or potential collaborations, please contact the author via university email: kp604@exeter.ac.uk