Skip to content

Commit a902630

Browse files
Technologicatclaude
andcommitted
Phase 8: README / CHANGELOG / LICENSE / CLAUDE.md for v1.0
README rewrite: - Replace the obsolete `python setup.py install` section with the modern `pip install wlsqm` path, source-install instructions, CFLAGS=-march=native performance tip, and a macOS-specific note covering both the published wheels (which bundle conda-forge's libomp via delocate-wheel) and source installs (which need `brew install libomp` for Apple Clang's missing omp.h). - Add the standard shields.io badge strip (CI / top language / supported Python versions / PyPI / license / open issues), plus the project's semver + AI-contributions paragraphs right under the title, matching pylu. - Rewrite the intro and feature list to reflect the current state: working test suite, WLSQM explicitly NOT described as a Taylor series, noise-robust first derivatives (with a pointer to the "fit once, differentiate twice" recipe for second derivatives), and the list of aliases this method goes by in the literature (MLS, WLSQM, diffuse approximation). - Drop the old "Experiencing crashes?" BLAS-conflict troubleshooting section, which was a pre-2020 Debian-distro-libblas-confusion artifact and is no longer relevant with pip-installed SciPy on any supported platform. - Update dev-setup section with the meson-python + PDM flow and the `--no-build-isolation` explanation. - Update the "Running tests" count to the actual 57 the new suite contains. CHANGELOG: add a v1.0.0 entry above the historical 0.1.x tail. User-facing scope per Juha's changelog policy: what changed, not how it was implemented. Highlights the data-race fix in `fit_1D_many_parallel`, the singular-input LinAlgError fix in `rescale_dgeequ`, the shift from `setup.py` to `pip install`, and the "no longer described as a Taylor series" wording change (because users were affected by the misleading framing too). Internal details go under an "Internal" subsection per Juha's changelog convention. LICENSE: copyright year range 2016-2017 → 2016-2026, and affiliation adds JAMK University of Applied Sciences alongside the University of Jyväskylä. CLAUDE.md (new): follows the pylu template, adapted for wlsqm's architecture. Covers: - What wlsqm is (with the "not a Taylor series" framing up front). - Build and dev workflow, OpenMP per-platform story. - Running tests. - Architecture walkthrough: simple vs. expert API, C structs vs cdef classes, the .pxd/.pyx split, the inter-module cimport graph, the `defs` constant system ("is NOT an enum, and here is why"), protocol constants in simple.pyx and expert.pyx, the lapackdrivers layer, and the .pxd-install invariant pinned by test_cimport.py. - Linting: two-pass ruff, non-blocking cython-lint. - Key rules: do not refactor the numerical algorithms, do not rename the public C-level API (in particular `taylor_*` stays even though it is not a Taylor series), do not remove OpenMP, do not convert C structs to cdef classes, do not change the Cython compiler directives, do not describe WLSQM as a Taylor series method, do not convert the defs constants to cdef enum, keep all six scaling algorithms in lapackdrivers.pyx. TODO.md prune: remove the obsolete "create unit tests" item (done, 57 tests in tests/), remove the obsolete "fix TODOs in setup.py" item (setup.py no longer exists), update the "Documentation" section to reference the current surrogate-model framing and the current file locations, and add a pointer from TODO.md to TODO_DEFERRED.md for modernization-pass findings. The remaining items (docstring duplication, sudoku_lhs extraction, doc PDF updates, ExpertSolver copy/pickle support, more 3D testing, performance profiling, the driver/expert deduplication, and the various `svd_c` / dtype ergonomics improvements) are kept with their full context. Also marks wlsqm ✓ in ~/.claude/CLAUDE.md's project list — it now has a project-local CLAUDE.md config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 2841bf9 commit a902630

File tree

5 files changed

+464
-119
lines changed

5 files changed

+464
-119
lines changed

CHANGELOG.md

Lines changed: 78 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,83 @@
11
## Changelog
22

3+
### v1.0.0
4+
5+
First release under modern Python and modern packaging. Python 2.7 and 3.4
6+
are no longer supported.
7+
8+
#### New
9+
10+
- **Python 3.11 – 3.14 supported.** Pre-built wheels on PyPI for Linux,
11+
macOS, and Windows, across all four Python versions, with OpenMP
12+
parallelism enabled in every wheel.
13+
- **Finite-difference stencil reproduction.** A WLSQM fit on a classical
14+
central-difference stencil (3-point 1D, 5-point plus 2D, 7-point plus 3D)
15+
now reproduces the hand-coded stencil result to machine precision on any
16+
smooth input, not just polynomials. This is the natural generalization of
17+
WLSQM's polynomial-recovery property and is pinned by the test suite.
18+
19+
#### Fixed
20+
21+
- **Data race in `fit_1D_many_parallel`**, pre-existing from 2016. The 1D
22+
branch of the basic parallel many-case fitter passed a compile-time
23+
constant `TASKID = 0` to `impl.solve()` instead of the per-thread
24+
`taskid = openmp.omp_get_thread_num()`. Every OpenMP worker clobbered
25+
thread-0's work buffer, producing silently wrong fits whenever the
26+
parallel 1D many-case path ran with `ntasks > 1`. The 2D/3D branch, the
27+
iterative parallel variant, and the serial variant were always correct.
28+
A regression test (64 cases × 4 threads) now pins this.
29+
- **`rescale_dgeequ` no longer silently accepts singular matrices.** It now
30+
checks LAPACK's `info` return and raises `numpy.linalg.LinAlgError` when
31+
a row or column is exactly zero, instead of returning nonsense scaling
32+
factors that would poison the downstream solve.
33+
34+
#### Changed
35+
36+
- **Installation is now `pip install wlsqm`.** The old `python setup.py
37+
install` path is gone; `setup.py` has been removed. The build system is
38+
[meson-python](https://meson-python.readthedocs.io/), and dev environments
39+
are managed with [PDM](https://pdm-project.org/).
40+
- **Language change on "Taylor series."** The package's internal storage
41+
layout still uses the same slots a Taylor expansion would (function
42+
value, first derivatives, second derivatives divided by `2!`, …), but
43+
the comments and docstrings no longer call the model a "Taylor series."
44+
The coefficients come from a least-squares fit, not from analytic
45+
differentiation, and the error behavior is much better than Taylor
46+
truncation would predict. The internal C-API function names
47+
`taylor_1D/2D/3D` are kept for backwards compatibility of downstream
48+
`cimport`s — see [`wlsqm/fitter/polyeval.pyx`](wlsqm/fitter/polyeval.pyx).
49+
- **Comprehensive pytest suite.** 57 tests covering polynomial recovery
50+
across dimensions and orders, `ExpertSolver` prepare/solve round-trips,
51+
interpolation accuracy at interior points, parallel ≡ serial equivalence,
52+
finite-difference stencil reproduction, first-derivative robustness to
53+
Gaussian noise, edge cases, the LAPACK driver layer, and `.pxd`
54+
installability for downstream `cimport` users.
55+
56+
#### Internal
57+
58+
- Port from Cython 0.29 to Cython 3.x. `noexcept` audit on every `cdef
59+
... nogil` function, split between pure computational helpers
60+
(`noexcept`) and LAPACK wrappers / fit dispatchers (`except -1`). `fma`
61+
now imported from `libc.math` instead of a manual `cdef extern` hack
62+
that worked around a long-fixed bug in Cython 0.20.1.
63+
- All `DEF` compile-time constants replaced with module-level `cdef`
64+
constants or inlined as literals at call sites (Cython 3 deprecated
65+
`DEF`). Function-local protocol constants like `TASKID`, `NTASKS`, and
66+
`MODE_BASIC` / `MODE_ITERATIVE` live at module scope in `simple.pyx`
67+
where the value is a project-wide convention, and inside each function
68+
where the value is per-function.
69+
- `ScalingAlgo` is now a proper `enum.IntEnum`, replacing the old bare-
70+
class Python 2 workaround.
71+
- GitHub Actions CI: lint (ruff + cython-lint), test matrix (3 OSes × 4
72+
Python versions), cibuildwheel for Linux/macOS/Windows wheels,
73+
meson-python sdist, and trusted-publisher PyPI publishing on `v*` tag
74+
push.
75+
- Copyright updated to 2016–2026 and affiliation updated to JAMK
76+
University of Applied Sciences.
77+
78+
79+
## Pre-v1.0 history
80+
381
### [v0.1.5]
482
- support both Python 3.4 and 2.7
583

@@ -18,4 +96,3 @@
1896

1997
### [v0.1.0]
2098
- initial version
21-

CLAUDE.md

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## What is wlsqm
6+
7+
Cython library that constructs a piecewise polynomial surrogate model on scattered point-cloud data in 1D, 2D, or 3D, by weighted least squares. From the surrogate you read off the function value and any derivative up to the polynomial order. BSD-2-Clause licensed.
8+
9+
The algorithm fits a local polynomial over the neighborhood of each query point, using the same monomial basis a Taylor expansion would use (so the DOF layout reads as "F, X, Y, X², XY, Y², …"). **It is not a Taylor series.** The coefficients come from a weighted least-squares linear solve, not from analytic differentiation — which is why the error behavior is much better than Taylor truncation would predict, and why the method works at all on noisy data. Downstream code that uses wlsqm should think of the result as "least-squares-optimal local derivative estimates," not "exact analytic derivatives."
10+
11+
The method appears in the literature under several names: MLS (moving least squares), WLSQM (weighted least squares meshless), "diffuse approximation." They are essentially the same idea.
12+
13+
Runtime dependencies are NumPy and SciPy. SciPy is needed at both build time (the Cython `cimport scipy.linalg.cython_lapack` requires the headers at compile time) and runtime (the compiled extensions' `cimport` chain resolves through SciPy's module-level C API at import time).
14+
15+
## Build and Development
16+
17+
Uses meson-python as build backend, PDM for dependency management. Python ≥ 3.11.
18+
19+
```bash
20+
pdm config venv.in_project true
21+
pdm use 3.14
22+
pdm install # dev deps into .venv
23+
export PATH="$(pwd)/.venv/bin:$PATH" # meson / ninja must be on PATH
24+
pip install --no-build-isolation -e . # editable install
25+
```
26+
27+
After editing a `.pyx` or `.pxd` file, the next `import wlsqm` auto-rebuilds the changed extension.
28+
29+
**Why `--no-build-isolation`:** meson-python's editable loader rebuilds the extension on import, so it needs `meson`, `ninja`, Cython, NumPy, and SciPy to remain available in the venv — not just in a throwaway PEP 517 overlay. PDM's default `pdm install` runs the backend in an isolated overlay whose `ninja` path gets burned into the loader and then disappears, causing `FileNotFoundError: .../ninja` on import. The `pip install --no-build-isolation -e .` form reuses the venv directly and produces a loader with stable paths.
30+
31+
**Version:** single source of truth is `wlsqm/VERSION`. Read by `meson.build` (build time), `pyproject.toml` (dynamic), and `wlsqm/__init__.py` (runtime). Only edit `wlsqm/VERSION` when bumping.
32+
33+
**OpenMP.** Linux (GCC libgomp), macOS (Apple Clang + Homebrew libomp), and Windows (MSVC vcomp140) are all supported. The meson build uses `dependency('openmp', required: false)` so that a build can in principle proceed without OpenMP — but in practice the `.pyx` sources `cimport openmp` and Cython unconditionally emits `#include <omp.h>`, so `omp.h` must be available at compile time regardless. On macOS, that means `brew install libomp` for source installs; published wheels bundle conda-forge's `libomp.dylib` via `delocate-wheel`.
34+
35+
## Running Tests
36+
37+
```bash
38+
pdm run pytest tests/ -v
39+
```
40+
41+
57 tests covering polynomial recovery (1D/2D/3D, orders 0–4), `ExpertSolver` prepare/solve round-trips, interpolation accuracy, `_many_parallel``_many` serial equivalence, classical finite-difference stencil equivalence on non-polynomial inputs, first-derivative robustness to Gaussian noise, the LAPACK driver layer, and `.pxd` installability.
42+
43+
## Architecture
44+
45+
### Simple API vs. Expert API
46+
47+
Two Python-facing APIs on top of the same internal machinery:
48+
49+
- **`wlsqm.fit_1D / fit_2D / fit_3D`** — one-shot fit of a single local model. The `*_many` variants loop over many independent fits. The `*_many_parallel` variants run the loop across OpenMP threads.
50+
- **`wlsqm.ExpertSolver`** — prepare/solve separation. `prepare(xi, xk)` generates and LU-factorizes the problem matrices for every fit in the batch; `solve(fk, fi)` reuses the factored matrices against new function-value data. Fast path for time-stepping an IBVP over a fixed point cloud.
51+
52+
### C structs, not cdef classes
53+
54+
The hot-path data structures — `Allocator`, `BufferSizes`, `CaseManager`, `Case` — are all `cdef struct` in `wlsqm/fitter/infra.pxd`, with C-style constructor/destructor functions (`Case_new`, `Case_del`, `CaseManager_new`, `CaseManager_commit`, `CaseManager_del`, …). The only real `cdef class` in the codebase is `PointerWrapper` in `wlsqm/utils/ptrwrap.pyx`, a trivial void-pointer carrier used in one place by `ExpertSolver`.
55+
56+
**Do not convert the C structs to `cdef class`.** The struct layout is what lets `Case_new` be called from inside `nogil` parallel loops without allocating Python objects.
57+
58+
### The .pxd / .pyx split
59+
60+
Extension modules under `wlsqm/fitter/` and `wlsqm/utils/` come in matched `.pxd` + `.pyx` pairs (except `expert.pyx` which is Python-API-only and has no `.pxd`, and `defs.pxd` which exposes module-level `cdef int` constants rather than function declarations). The `.pxd` is the Cython-level API that downstream `cimport` users consume; the `.pyx` contains the implementation. Both must be installed alongside the compiled `.so` / `.pyd` for `cimport wlsqm.fitter.*` to work from other Cython projects.
61+
62+
### Inter-module cimport graph
63+
64+
Build order (leaf → root):
65+
66+
```
67+
defs, ptrwrap → infra, polyeval → lapackdrivers → interp → impl → simple, expert
68+
```
69+
70+
`defs` is a pure-constants leaf. `lapackdrivers` cimports `scipy.linalg.cython_lapack`. `simple` and `expert` sit at the top of the chain and `cimport` everything below them.
71+
72+
### The `defs` constant system is NOT an enum
73+
74+
`wlsqm/fitter/defs.pxd` declares module-level `cdef int` variables (e.g. `i2_X2_c`, `b2_XY_c`, `ALGO_BASIC_c`); `wlsqm/fitter/defs.pyx` assigns their values and also exports Python-accessible copies (`i2_X2`, `b2_XY`, `ALGO_BASIC`). This **looks** like an enum and the `ALGO_*` / `WEIGHT_*` constants even act like one, but it is not — and must not be converted to either `cdef enum` or Python `enum.Enum`:
75+
76+
- **`i1_*`, `i2_*`, `i3_*`** are **array indices** into the `fi` DOF vector. Their specific numerical values (0, 1, 2, …) are load-bearing: the fitting code writes `fi[ i2_X2_c ] = ...` on the assumption that `i2_X2_c` is a specific integer slot in a dense array whose layout is known to every `make_c_*`, `make_A`, `solve`, `interpolate_*` function.
77+
- **`b1_*`, `b2_*`, `b3_*`** are **bitmasks** for the `knowns` parameter. `b2_F_c = 1 << i2_F_c`. They combine via `|`. An enum would force every bit to fit into a single named value.
78+
- **`SIZE1_c`, `SIZE2_c`, `SIZE3_c`** are array sizes: one-past-end of each order's DOF range.
79+
80+
The one place where an enum conversion **did** make sense was `ScalingAlgo` in `wlsqm/utils/lapackdrivers.pyx`, which is now a proper `enum.IntEnum`. It was a Python 2 compatibility workaround (a bare class with integer class attributes); the old comment literally said "TODO: use real enum type for Python 3.4+".
81+
82+
### Protocol constants in simple.pyx
83+
84+
The serial fitting routines in `wlsqm/fitter/simple.pyx` share a set of protocol constants — `TASKID = 0`, `NTASKS = 1`, and `MODE_BASIC = 0` / `MODE_ITERATIVE = 1` — that live at module scope at the top of the file. They express project-wide conventions ("serial path has a single task at work buffer 0") and are passed as positional arguments to `CaseManager_new` / `Case_new` / `impl.solve`. Do not inline them as bare literals at call sites; the named constants document what the argument position means.
85+
86+
`expert.pyx` has an analogous local `SERIAL_TASKID = 0` inside `ExpertSolver.solve`, used only in the `ntasks == 1` serial-fallback branches of that one method. Local scope is correct there because the parallel and serial paths share a function, and the name `SERIAL_TASKID` self-documents the context at each call site.
87+
88+
### The lapackdrivers layer
89+
90+
`wlsqm/utils/lapackdrivers.pyx` is a thin Cython wrapper over SciPy's Cython LAPACK bindings, exposing:
91+
92+
- Single-matrix solvers (`general`, `symmetric`, `tridiag`, `svd`).
93+
- Batched solvers for many independent systems (`mgeneral_c`, `msymmetric_c`, and their `*p_c` parallel variants that iterate with OpenMP `prange`).
94+
- Six preconditioning / scaling algorithms (`rescale_columns_c`, `rescale_rows_c`, `rescale_twopass_c`, `rescale_dgeequ_c`, `rescale_ruiz2001_c`, `rescale_scalgm_c`), dispatched through a function-pointer table by `do_rescale`. **Keep all six.** They were compared experimentally for wlsqm's own use (Ruiz 2001 won and is the default), and the full set is a legitimate home for the comparison.
95+
96+
The LAPACK wrapper functions return `int` and use `except -1 nogil` to propagate LAPACK errors as Python exceptions. The pure computational helpers (`cimin`, `cimax`, `copygeneral_c`, `symmetrize_c`, `init_scaling_c`, `apply_scaling_c`, the `basic_scale_up/down_*` family) use `noexcept nogil` and never raise. Do not mix the two styles — they are chosen per-function based on whether the function can fail.
97+
98+
### Critical constraint: .pxd installation
99+
100+
Every `.pxd` file must be installed alongside the compiled extension so downstream `cimport wlsqm.fitter.*` works. Handled by `py.install_sources(...)` in each subpackage's `meson.build`. `tests/test_cimport.py` pins this invariant by generating a minimal `.pyx` per module and asking `cython -3` to compile it; the test fails if any `.pxd` is missing or unreachable.
101+
102+
## Linting
103+
104+
**Python files** (ruff, blocking):
105+
106+
```bash
107+
ruff check . --ignore SIM103
108+
```
109+
110+
Plus a non-blocking advisory pass for the return-condition-directly rule:
111+
112+
```bash
113+
ruff check . --select SIM103 || true
114+
```
115+
116+
**Cython files** (cython-lint, non-blocking in CI, blocking-clean in practice):
117+
118+
```bash
119+
cython-lint wlsqm/fitter/*.pyx wlsqm/fitter/*.pxd \
120+
wlsqm/utils/*.pyx wlsqm/utils/*.pxd
121+
```
122+
123+
Config for both lives in `pyproject.toml`. The canonical lint config and the rationale behind each ignore are in `~/.claude/PROJECT-SETUP-NOTES.md` under "Lint and style configuration."
124+
125+
## Code Conventions
126+
127+
- **Line width:** ~130 for Python, ~200 for Cython signatures (many `cdef` signatures are legitimately long because of memoryview type declarations).
128+
- **Docstring format:** the `.pyx` files use a custom `def name(args):\n"""def name(args):\n\n...` header-echo convention. Leave existing docstrings in that style.
129+
- **Comments can carry math.** Derivations, accuracy-bound sketches, and back-of-the-envelope FLOP counts in comments are the project style. Don't prune them.
130+
- **Dependencies:** NumPy and SciPy are the runtime deps. OpenMP is a build/runtime system dependency. Do not add other runtime deps.
131+
132+
## Python Version Compatibility
133+
134+
When adding support for a new Python version:
135+
136+
1. Update `requires-python` in `pyproject.toml` (if changing the floor).
137+
2. Add the version classifier in `pyproject.toml`.
138+
3. Add to the CI matrix in `.github/workflows/ci.yml` (both `test` and `build-wheels` jobs).
139+
4. Add to the `cibuildwheel` `build = "..."` line in `pyproject.toml`.
140+
5. Run the full test suite on the new version and verify the cimport test passes.
141+
142+
NumPy, SciPy, and Cython compatibility with the new Python version are the main risk factors.
143+
144+
## Key Rules
145+
146+
- **Do not refactor the numerical algorithms.** The WLSQM fit, the iterative refinement loop, the Ruiz 2001 scaling, and the polynomial evaluators are mathematically correct and performance-tested.
147+
- **Do not rename public C-level API functions.** The `taylor_1D/2D/3D`, `interpolate_*`, `make_c_*`, `make_A`, `preprocess_A`, `solve`, and the LAPACK wrappers are all part of the Cython API that downstream projects `cimport`. Renaming them would break existing users. In particular, `taylor_*` stays `taylor_*` even though the comment block at the top of `polyeval.pyx` explains at length that it is not a Taylor series.
148+
- **Do not remove OpenMP.** Parallel fitting across independent local problems is one of the headline features.
149+
- **Do not convert `cdef struct` to `cdef class`.** The struct layout is what lets `Case_new` run from inside `nogil` parallel loops without touching the Python object allocator.
150+
- **Do not change the Cython compiler directives** (`wraparound = False`, `boundscheck = False`, `cdivision = True`). They are intentional performance settings for numerical code.
151+
- **Do not describe WLSQM as a "Taylor series" method.** The DOF layout looks like one, but the coefficients come from a least-squares fit — not analytic differentiation — and the error behavior is correspondingly much better than Taylor truncation would predict. `polyeval.pyx`'s header block has the full framing.
152+
- **Do not convert the `defs.pxd` / `defs.pyx` constant system to `cdef enum` or Python `enum`.** The index and bitmask values are load-bearing — see "The `defs` constant system is NOT an enum" above. `ScalingAlgo` in `lapackdrivers.pyx` is the one exception; it is a genuine enum and IS now `IntEnum`.
153+
- **Keep all six scaling algorithms in `lapackdrivers.pyx`.** They were compared experimentally; the library is a legitimate home for the comparison.
154+
155+
## Technical documentation
156+
157+
The `doc/` directory contains theory PDFs:
158+
159+
- `doc/wlsqm_gen.pdf` — the generalized version (including unknown function values), the accuracy analysis, and why WLSQM works.
160+
- `doc/wlsqm.pdf` — older writeup for the pure-Python version originally written for FREYA, plus the sensitivity calculation.
161+
- `doc/eulerflow.pdf` — application example in compressible flow, cleaner presentation of the original version.
162+
163+
A future documentation / tutorial pass is deferred — see `TODO_DEFERRED.md`.

LICENSE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright (c) 2016-2017, Juha Jeronen and University of Jyväskylä.
1+
Copyright (c) 2016-2026, Juha Jeronen, University of Jyväskylä, and JAMK University of Applied Sciences.
22
All rights reserved.
33

44
Redistribution and use in source and binary forms, with or without

0 commit comments

Comments
 (0)