[TESTING]: Automated end-to-end tests for SQL Sanitizer plugin via plugin management API

## Summary

Build an automated end-to-end test suite for the SQL Sanitizer plugin (originally delivered via #1065, currently at `plugins/sql_sanitizer/`) that drives the plugin management API to activate, configure, and deactivate the plugin for a given user/tenant, and verifies that SQL injection patterns in tool arguments and outputs are handled correctly across the full matrix of configuration settings.

> **Packaging prerequisite**: before (or alongside) this work, the plugin should be **moved out of the main repo into the `cpex-plugins` repository** as a standalone `cpex-sql-sanitizer` PyPI package (following the #3965 pattern used for `pii_filter`, `secrets_detection`, etc.), and **hardened** — audit of detection patterns against a current SQLi corpus, multi-dialect coverage review, clearer severity/mode semantics, and performance/latency characterization. The e2e suite should target the hardened, `cpex-*`-packaged version.

## Motivation

- SQL injection prevention is a high-stakes security feature; manual verification is not sufficient and does not catch regressions.
- The dynamic plugin configuration and per-tool bindings added in #4068 / #4143 are now the supported way to turn plugins on and off at runtime. We need tests that exercise that real control plane, not just in-process plugin unit tests.
- An end-to-end suite also doubles as executable documentation for how operators are expected to enable and tune the plugin.
- Hardening + repo relocation aligns SQL Sanitizer with every other first-party plugin's distribution and review model.

## Scope

### Packaging & Hardening (prerequisite)

- Move `plugins/sql_sanitizer/` to the `cpex-plugins` repo as `cpex-sql-sanitizer`.
- Audit detection rules against a current SQLi corpus (union-based, boolean-based, time-based, stacked-query, second-order, NoSQL-injection variants where applicable).
- Review and document supported SQL dialects (PostgreSQL, MySQL, SQLite, MSSQL, Oracle) and their known gaps.
- Define explicit modes (`block`, `sanitize`, `flag-only`) with unambiguous semantics.
- Benchmark per-invocation overhead; publish in the plugin README.
- Add the package to the `[plugins]` extra in `pyproject.toml`; update any `kind:` references and docs.

### Test Harness

- Stand up a ContextForge instance (test container or in-process app) with observability enabled.
- Create a test user/tenant, mint a JWT, and drive all plugin state changes through the plugin management / bindings API — no static YAML edits.
- Wrap common actions (enable plugin, set config, bind to tool, invoke tool, assert on response) in reusable pytest fixtures / helpers.

### Configuration Matrix

At minimum, each scenario should vary and assert on:

- **Plugin state**: disabled → enabled → disabled (confirm state transitions take effect on subsequent invocations without restart).
- **Mode**: `block`, `sanitize`, `flag-only` (or whatever modes the hardened plugin exposes).
- **SQL dialect**: PostgreSQL, MySQL, SQLite, MSSQL (at minimum) — tested individually where dialect config is supported.
- **Attack categories**: classic union-based, boolean-blind, time-based-blind, stacked queries, comment-injection, encoded payloads — tested individually and in combinations.
- **Scope / binding**: plugin bound globally vs. per-tool vs. per-tenant; confirm non-bound tools/tenants are unaffected.
- **Hook coverage**: `tool_pre_invoke` (argument sanitization) and `tool_post_invoke` (output scrubbing of reflected SQL) — verify handling on both inbound args and outbound responses as configured.
- **False-positive guard**: benign queries and natural-language strings that look SQL-adjacent must pass through unaltered in `sanitize` / `flag-only` modes.

### Assertions

For each scenario:

- The response / tool-call payload matches the expected block / sanitize / flag behavior.
- Violations (when applicable) are recorded with the expected category, dialect, and confidence.
- Disabling the plugin mid-test restores pass-through behavior on the next invocation.
- Other users / tools with different bindings are unaffected (isolation check).
- Observability signals (spans, structured logs) reflect plugin activity — useful smoke test that the plugin ran at all.

## Proposed Location

- `tests/e2e/plugins/test_sql_sanitizer_e2e.py` (new), with shared fixtures in `tests/e2e/plugins/conftest.py` (reusing the harness established for the PII e2e suite in #4221).
- A dedicated `make test-e2e-plugins` target (or extension of existing e2e target) so the suite can be run independently of unit tests.

## Acceptance Criteria

- [ ] `plugins/sql_sanitizer/` relocated to `cpex-plugins` repo as `cpex-sql-sanitizer`, published to PyPI, and pinned in the `[plugins]` extra.
- [ ] Hardening review completed: updated SQLi corpus coverage, explicit dialect support, documented modes, published latency benchmarks.
- [ ] End-to-end test file exercises activate → configure → invoke → assert → deactivate flow entirely through the plugin management API.
- [ ] Test matrix covers at least: all supported modes, 4+ attack categories, 3+ dialects, 2+ binding scopes, and state-transition (enable/disable) cases.
- [ ] False-positive scenarios explicitly asserted (benign strings pass through unaltered).
- [ ] Isolation check confirms changes to one user/tenant/tool binding do not affect others.
- [ ] Suite runs in CI (on a schedule or gated target if runtime is long); failures are actionable.
- [ ] Tests use real JWTs and real API calls — no monkey-patching of the plugin manager internals.

## References

- #1065 — original SQL Sanitizer plugin (closed as completed; source of the current in-tree implementation)
- #3965 — `cpex-*` plugin packaging pattern
- #4068 — multi-tenant plugin configuration with per-tool plugin config
- #4143 — `binding_reference_id` and expanded plugin config schemas
- #4221 — sibling e2e testing issue for the PII plugin (shared harness)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TESTING]: Automated end-to-end tests for SQL Sanitizer plugin via plugin management API #4222

Summary

Motivation

Scope

Packaging & Hardening (prerequisite)

Test Harness

Configuration Matrix

Assertions

Proposed Location

Acceptance Criteria

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[TESTING]: Automated end-to-end tests for SQL Sanitizer plugin via plugin management API #4222

Description

Summary

Motivation

Scope

Packaging & Hardening (prerequisite)

Test Harness

Configuration Matrix

Assertions

Proposed Location

Acceptance Criteria

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions