This guide explains how to replace or modify the curve fitting engine in RegressionLab. By default, RegressionLab uses SciPy's curve_fit function, but you may want to use a different optimization library for specific needs.
RegressionLab's fitting architecture is modular, making it relatively easy to swap out the optimization backend. This guide covers:
- Understanding the current architecture
- Why you might want to change it
- How to replace SciPy with another library
- Alternative optimization libraries
- Performance considerations
RegressionLab's fitting pipeline:
User Interface (Tkinter/Streamlit)
↓
Fitting Wrapper Functions (ajlineal, ajsin, etc.)
↓
Generic Fit Utility (generic_fit)
↓
SciPy curve_fit
↓
Results (parameters, uncertainties, fitted curve)
The core fitting logic is in src/fitting/fitting_utils.py:
def generic_fit(func, x, y, uy=None, p0=None):
"""
Generic fitting function using scipy.optimize.curve_fit.
Args:
func: Mathematical function to fit
x: Independent variable data
y: Dependent variable data
uy: Uncertainties in y (optional)
p0: Initial parameter guess (optional)
Returns:
params: Fitted parameters
y_fitted: Fitted y values
r_squared: Coefficient of determination
"""
from scipy.optimize import curve_fit
# Prepare weights from uncertainties
sigma = uy if uy is not None else None
# Perform optimization
popt, pcov = curve_fit(
func, x, y,
p0=p0,
sigma=sigma,
absolute_sigma=True if sigma is not None else False
)
# Calculate fitted values
y_fitted = func(x, *popt)
# Calculate R² and RMSE (residuals computed once and reused)
residuals_sq = (y - y_fitted) ** 2
ss_res = np.sum(residuals_sq)
ss_tot = np.sum((y - np.mean(y)) ** 2)
r_squared = 1 - (ss_res / ss_tot)
rmse = float(np.sqrt(np.mean(residuals_sq)))
return popt, y_fitted, r_squaredThis function is the single point of control for the optimization backend. Equation lookups (get_equation_format_for_function, get_equation_param_names_for_function) use O(1) reverse dicts built at module load, so there is no performance cost when resolving equations by function name.
- Performance: Some libraries are faster for specific problem types
- Robustness: Better handling of difficult optimization problems
- Features: Access to different optimization algorithms (global optimization, constraints, etc.)
- Licensing: Some libraries have different licenses
- Custom needs: Specialized algorithms for your domain
| Library | Pros | Cons |
|---|---|---|
| lmfit | High-level API, parameter constraints, many models | Additional dependency |
| scipy.optimize.least_squares | More options than curve_fit, robust loss functions | Lower-level API |
| scipy.optimize.minimize | General optimization, many algorithms | Not specialized for curve fitting |
| PyGMO | Global optimization, parallel evaluation | Complex, heavy dependency |
| NLopt | Fast, many algorithms, constraints | C library binding, harder to install |
| TensorFlow/PyTorch | GPU acceleration, automatic differentiation | Overkill for simple fits, steep learning curve |
This is the recommended approach as it requires minimal changes to the codebase.
Step 1: Install lmfit
pip install lmfitAdd to requirements.txt:
lmfit>=1.2.0
Step 2: Create Alternative Generic Fit
Create a new function in fitting_utils.py:
def generic_fit_lmfit(func, x, y, uy=None, p0=None):
"""
Generic fitting using lmfit library.
lmfit provides more control over parameter bounds, constraints,
and fitting statistics compared to scipy.curve_fit.
Args:
func: Mathematical function to fit
x: Independent variable data
y: Dependent variable data
uy: Uncertainties in y (optional, used as weights)
p0: Initial parameter guess (optional)
Returns:
params: Fitted parameters (as ndarray)
y_fitted: Fitted y values
r_squared: Coefficient of determination
"""
from lmfit import Model
import inspect
# Create lmfit Model from function
model = Model(func, independent_vars=['x'])
# Get parameter names from function signature
sig = inspect.signature(func)
param_names = [p for p in sig.parameters.keys() if p != 'x' and p != 't']
# Create parameters object
params = model.make_params()
# Set initial values if provided
if p0 is not None:
for name, value in zip(param_names, p0):
params[name].value = value
# Perform fit
if uy is not None:
# Weighted fit
result = model.fit(y, params, x=x, weights=1.0/uy)
else:
# Unweighted fit
result = model.fit(y, params, x=x)
# Extract results
popt = np.array([result.params[name].value for name in param_names])
y_fitted = result.best_fit
r_squared = 1 - result.residual.var() / np.var(y)
return popt, y_fitted, r_squaredStep 3: Switch to New Implementation
Option A - Replace original:
# In fitting_utils.py, rename functions:
def generic_fit_scipy(func, x, y, uy=None, p0=None):
# ... original scipy implementation ...
pass
# Make generic_fit use lmfit
generic_fit = generic_fit_lmfitOption B - Use configuration flag:
# In config/constants.py or config/env.py
USE_LMFIT = True # Set to False to use SciPy
# In fitting_utils.py
def generic_fit(func, x, y, uy=None, p0=None):
from config import USE_LMFIT
if USE_LMFIT:
return generic_fit_lmfit(func, x, y, uy, p0)
else:
return generic_fit_scipy(func, x, y, uy, p0)For more control and robust loss functions:
def generic_fit_least_squares(func, x, y, uy=None, p0=None, loss='linear'):
"""
Generic fitting using scipy.optimize.least_squares.
Provides more robust loss functions and better convergence for
difficult problems.
Args:
func: Mathematical function to fit
x: Independent variable data
y: Dependent variable data
uy: Uncertainties in y (used as weights)
p0: Initial parameter guess (required!)
loss: Loss function ('linear', 'soft_l1', 'huber', 'cauchy', 'arctan')
Returns:
params: Fitted parameters
y_fitted: Fitted y values
r_squared: Coefficient of determination
"""
from scipy.optimize import least_squares
import inspect
# Validate inputs
if p0 is None:
raise ValueError("least_squares requires initial parameter guess (p0)")
# Get number of parameters
sig = inspect.signature(func)
n_params = len(sig.parameters) - 1 # Subtract independent variable
# Define residual function
def residuals(params):
y_model = func(x, *params)
if uy is not None:
# Weighted residuals
return (y - y_model) / uy
else:
return y - y_model
# Perform optimization
result = least_squares(
residuals,
p0,
loss=loss, # Robust loss function
f_scale=1.0, # Scale parameter for loss
max_nfev=10000 # Maximum function evaluations
)
# Extract results
popt = result.x
y_fitted = func(x, *popt)
# Calculate R²
ss_res = np.sum((y - y_fitted) ** 2)
ss_tot = np.sum((y - np.mean(y)) ** 2)
r_squared = 1 - (ss_res / ss_tot)
return popt, y_fitted, r_squaredRobust Loss Functions:
'linear': Standard least squares (default, same as curve_fit)'soft_l1': Robust to outliers, smooth transition'huber': Robust to outliers, less aggressive'cauchy': Very robust to outliers, but can converge slowly'arctan': Similar to Cauchy
For specialized needs, create domain-specific fitting functions:
def generic_fit_global(func, x, y, uy=None, bounds=None):
"""
Global optimization for multimodal functions.
Uses differential evolution to find global minimum,
useful when curve_fit gets stuck in local minima.
Args:
func: Mathematical function to fit
x: Independent variable data
y: Dependent variable data
uy: Uncertainties (optional)
bounds: Parameter bounds as list of (min, max) tuples (required!)
Returns:
params: Fitted parameters
y_fitted: Fitted y values
r_squared: R² value
"""
from scipy.optimize import differential_evolution
if bounds is None:
raise ValueError("Global optimization requires parameter bounds")
# Define objective function (minimize sum of squared residuals)
def objective(params):
y_model = func(x, *params)
if uy is not None:
residuals = (y - y_model) / uy
else:
residuals = y - y_model
return np.sum(residuals ** 2)
# Run global optimization
result = differential_evolution(
objective,
bounds,
seed=42, # Reproducibility
maxiter=1000,
atol=1e-7,
tol=0.01
)
# Extract results
popt = result.x
y_fitted = func(x, *popt)
# Calculate R²
ss_res = np.sum((y - y_fitted) ** 2)
ss_tot = np.sum((y - np.mean(y)) ** 2)
r_squared = 1 - (ss_res / ss_tot)
return popt, y_fitted, r_squaredFor uncertainty quantification with complex models:
def generic_fit_mcmc(func, x, y, uy=None, p0=None):
"""
Bayesian fitting using MCMC (requires emcee library).
Provides full posterior distributions for parameters,
not just point estimates.
Args:
func: Mathematical function to fit
x: Independent variable data
y: Dependent variable data
uy: Uncertainties in y (required!)
p0: Initial parameter guess
Returns:
params: Median parameter values
y_fitted: Fitted y values
r_squared: R² value
"""
try:
import emcee
except ImportError:
raise ImportError("MCMC fitting requires emcee: pip install emcee")
if uy is None:
uy = np.ones_like(y) * 0.1 * np.std(y)
n_params = len(p0)
# Define log-likelihood
def log_likelihood(params):
y_model = func(x, *params)
return -0.5 * np.sum(((y - y_model) / uy) ** 2)
# Define log-prior (uniform priors)
def log_prior(params):
# Simple bounds check
if np.all(np.isfinite(params)):
return 0.0
return -np.inf
# Define log-probability
def log_probability(params):
lp = log_prior(params)
if not np.isfinite(lp):
return -np.inf
return lp + log_likelihood(params)
# Initialize walkers
n_walkers = max(32, 2 * n_params)
pos = p0 + 1e-4 * np.random.randn(n_walkers, n_params)
# Run MCMC
sampler = emcee.EnsembleSampler(n_walkers, n_params, log_probability)
sampler.run_mcmc(pos, 5000, progress=False)
# Get results (discard burn-in)
samples = sampler.get_chain(discard=1000, flat=True)
popt = np.median(samples, axis=0)
# Calculate fitted values and R²
y_fitted = func(x, *popt)
ss_res = np.sum((y - y_fitted) ** 2)
ss_tot = np.sum((y - np.mean(y)) ** 2)
r_squared = 1 - (ss_res / ss_tot)
return popt, y_fitted, r_squaredModify individual fitting functions to include bounds:
def ajlineal_con_n_bounded(data, x_name: str, y_name: str):
"""Linear fit with slope constrained to be positive."""
from scipy.optimize import curve_fit
x = data[x_name].values
y = data[y_name].values
uy = data[f'u{y_name}'].values if f'u{y_name}' in data.columns else None
# Constrain: m > 0, n unrestricted
bounds = ([0, -np.inf], [np.inf, np.inf])
popt, _ = curve_fit(
func_lineal_con_n,
x, y,
sigma=uy,
bounds=bounds,
absolute_sigma=True if uy is not None else False
)
# ... rest of function ...Add bounds to function metadata:
# In config/constants.py
EQUATION_BOUNDS = {
'exponential_decay': ([0, 0], [np.inf, np.inf]), # a > 0, b > 0
'sine_function': ([-np.inf, 0], [np.inf, np.inf]), # b > 0
# Add bounds for other equations
}
# In fitting_utils.py
def generic_fit_with_bounds(func, x, y, uy=None, p0=None, equation_name=None):
"""Generic fit with optional bounds from configuration."""
from scipy.optimize import curve_fit
from config import EQUATION_BOUNDS
# Get bounds if available
bounds = EQUATION_BOUNDS.get(equation_name, (-np.inf, np.inf))
sigma = uy if uy is not None else None
popt, _ = curve_fit(
func, x, y,
p0=p0,
sigma=sigma,
bounds=bounds,
absolute_sigma=True if sigma is not None else False
)
y_fitted = func(x, *popt)
r_squared = 1 - (np.sum((y - y_fitted)**2) / np.sum((y - np.mean(y))**2))
return popt, y_fitted, r_squaredCreate a benchmark script:
# benchmark_fitting.py
import time
import numpy as np
import pandas as pd
from fitting.fitting_utils import generic_fit, generic_fit_lmfit
# Generate test data
x = np.linspace(0, 10, 1000)
y = 2.5 * x + 1.3 + np.random.normal(0, 0.5, 1000)
def func_linear(t, m, n):
return m * t + n
# Benchmark scipy
start = time.time()
for _ in range(100):
params, y_fit, r2 = generic_fit(func_linear, x, y)
scipy_time = time.time() - start
# Benchmark lmfit
start = time.time()
for _ in range(100):
params, y_fit, r2 = generic_fit_lmfit(func_linear, x, y)
lmfit_time = time.time() - start
print(f"SciPy: {scipy_time:.3f} seconds (100 fits)")
print(f"lmfit: {lmfit_time:.3f} seconds (100 fits)")
print(f"Speedup: {lmfit_time/scipy_time:.2f}x")- Vectorize operations: Use NumPy arrays, not loops
- Good initial guesses: Reduces iterations
- Appropriate tolerances: Don't over-optimize
- Warm starts: Reuse previous fits as initial guesses
Example with warm starts:
def batch_fit_with_warm_start(files, equation_type):
"""Fit multiple files using previous result as initial guess."""
p0 = None # Start with no guess
results = []
for file in files:
data = load_data(file)
x, y = data['x'].values, data['y'].values
params, y_fit, r2 = generic_fit(func, x, y, p0=p0)
# Use current result as next initial guess
p0 = params
results.append((params, r2))
return resultsFor automatic error propagation:
def generic_fit_with_uncertainties(func, x, y, uy=None, p0=None):
"""
Fit and return parameter uncertainties using uncertainties package.
Returns parameters as ufloat objects with uncertainty.
"""
try:
from uncertainties import ufloat, correlated_values
except ImportError:
raise ImportError("Install uncertainties: pip install uncertainties")
from scipy.optimize import curve_fit
# Perform fit
popt, pcov = curve_fit(func, x, y, p0=p0, sigma=uy, absolute_sigma=True if uy is not None else False)
# Create uncertain parameters
params_with_uncertainty = correlated_values(popt, pcov)
# Calculate fitted values
y_fitted = func(x, *popt)
r_squared = 1 - (np.sum((y - y_fitted)**2) / np.sum((y - np.mean(y))**2))
return params_with_uncertainty, y_fitted, r_squared
# Example usage
params_unc, y_fit, r2 = generic_fit_with_uncertainties(func_linear, x, y, uy)
m, n = params_unc
print(f"Slope: {m}") # Prints: 2.50+/-0.03
print(f"Intercept: {n}") # Prints: 1.30+/-0.15Create a comprehensive configuration system:
# config/constants.py or new config/fitting_config.py
FITTING_CONFIG = {
'backend': 'scipy', # Options: 'scipy', 'lmfit', 'least_squares'
'scipy_options': {
'method': 'lm', # Levenberg-Marquardt
'max_nfev': 10000,
},
'least_squares_options': {
'loss': 'soft_l1', # Robust loss
'f_scale': 1.0,
},
'lmfit_options': {
'method': 'leastsq',
},
}
# fitting_utils.py
def generic_fit(func, x, y, uy=None, p0=None):
"""Generic fit that uses configured backend."""
from config import FITTING_CONFIG
backend = FITTING_CONFIG['backend']
if backend == 'scipy':
return generic_fit_scipy(func, x, y, uy, p0)
elif backend == 'lmfit':
return generic_fit_lmfit(func, x, y, uy, p0)
elif backend == 'least_squares':
return generic_fit_least_squares(func, x, y, uy, p0)
else:
raise ValueError(f"Unknown fitting backend: {backend}")Users can then switch backends by editing the config package (e.g. config/constants.py) without modifying code.
Always test thoroughly when changing the fitting core:
# tests/test_fitting_backends.py
import pytest
import numpy as np
from fitting.fitting_utils import generic_fit, generic_fit_lmfit
def test_backends_agree():
"""Test that different backends give similar results."""
# Generate perfect linear data
x = np.linspace(0, 10, 50)
y = 2.0 * x + 3.0
def func(t, a, b):
return a * t + b
# Fit with scipy
params_scipy, _, _ = generic_fit(func, x, y)
# Fit with lmfit
params_lmfit, _, _ = generic_fit_lmfit(func, x, y)
# Check agreement
np.testing.assert_array_almost_equal(params_scipy, params_lmfit, decimal=6)
def test_with_noise():
"""Test fitting with noisy data."""
x = np.linspace(0, 10, 100)
y = 2.5 * x + 1.0 + np.random.normal(0, 0.5, 100)
def func(t, a, b):
return a * t + b
params, y_fit, r2 = generic_fit(func, x, y)
# Check reasonable results
assert 2.0 < params[0] < 3.0 # Slope near 2.5
assert 0.5 < params[1] < 1.5 # Intercept near 1.0
assert r2 > 0.95 # Good fit| Approach | Difficulty | Flexibility | When to Use |
|---|---|---|---|
Modify generic_fit |
Easy | Low | Simple backend swap |
| Add backend option | Medium | Medium | Support multiple backends |
| Custom fitting functions | Medium | High | Specialized needs per equation |
| Configuration system | Hard | Very High | Production deployments |
Recommendations:
- Start simple: Modify
generic_fitdirectly - For production: Use configuration system
- For research: Create specialized fitting functions
- Always test: Verify results match expectations
- Test thoroughly: Use known data to validate new backend
- Benchmark: Compare performance with original
- Document: Explain why you chose a different backend
- Contribute: Share your improvements with the community!
For more technical details, see the API Documentation.