Skip to content

[TESTING]: Automated end-to-end tests for SQL Sanitizer plugin via plugin management API #4222

@jonpspri

Description

@jonpspri

Summary

Build an automated end-to-end test suite for the SQL Sanitizer plugin (originally delivered via #1065, currently at plugins/sql_sanitizer/) that drives the plugin management API to activate, configure, and deactivate the plugin for a given user/tenant, and verifies that SQL injection patterns in tool arguments and outputs are handled correctly across the full matrix of configuration settings.

Packaging prerequisite: before (or alongside) this work, the plugin should be moved out of the main repo into the cpex-plugins repository as a standalone cpex-sql-sanitizer PyPI package (following the #3965 pattern used for pii_filter, secrets_detection, etc.), and hardened — audit of detection patterns against a current SQLi corpus, multi-dialect coverage review, clearer severity/mode semantics, and performance/latency characterization. The e2e suite should target the hardened, cpex-*-packaged version.

Motivation

Scope

Packaging & Hardening (prerequisite)

  • Move plugins/sql_sanitizer/ to the cpex-plugins repo as cpex-sql-sanitizer.
  • Audit detection rules against a current SQLi corpus (union-based, boolean-based, time-based, stacked-query, second-order, NoSQL-injection variants where applicable).
  • Review and document supported SQL dialects (PostgreSQL, MySQL, SQLite, MSSQL, Oracle) and their known gaps.
  • Define explicit modes (block, sanitize, flag-only) with unambiguous semantics.
  • Benchmark per-invocation overhead; publish in the plugin README.
  • Add the package to the [plugins] extra in pyproject.toml; update any kind: references and docs.

Test Harness

  • Stand up a ContextForge instance (test container or in-process app) with observability enabled.
  • Create a test user/tenant, mint a JWT, and drive all plugin state changes through the plugin management / bindings API — no static YAML edits.
  • Wrap common actions (enable plugin, set config, bind to tool, invoke tool, assert on response) in reusable pytest fixtures / helpers.

Configuration Matrix

At minimum, each scenario should vary and assert on:

  • Plugin state: disabled → enabled → disabled (confirm state transitions take effect on subsequent invocations without restart).
  • Mode: block, sanitize, flag-only (or whatever modes the hardened plugin exposes).
  • SQL dialect: PostgreSQL, MySQL, SQLite, MSSQL (at minimum) — tested individually where dialect config is supported.
  • Attack categories: classic union-based, boolean-blind, time-based-blind, stacked queries, comment-injection, encoded payloads — tested individually and in combinations.
  • Scope / binding: plugin bound globally vs. per-tool vs. per-tenant; confirm non-bound tools/tenants are unaffected.
  • Hook coverage: tool_pre_invoke (argument sanitization) and tool_post_invoke (output scrubbing of reflected SQL) — verify handling on both inbound args and outbound responses as configured.
  • False-positive guard: benign queries and natural-language strings that look SQL-adjacent must pass through unaltered in sanitize / flag-only modes.

Assertions

For each scenario:

  • The response / tool-call payload matches the expected block / sanitize / flag behavior.
  • Violations (when applicable) are recorded with the expected category, dialect, and confidence.
  • Disabling the plugin mid-test restores pass-through behavior on the next invocation.
  • Other users / tools with different bindings are unaffected (isolation check).
  • Observability signals (spans, structured logs) reflect plugin activity — useful smoke test that the plugin ran at all.

Proposed Location

Acceptance Criteria

  • plugins/sql_sanitizer/ relocated to cpex-plugins repo as cpex-sql-sanitizer, published to PyPI, and pinned in the [plugins] extra.
  • Hardening review completed: updated SQLi corpus coverage, explicit dialect support, documented modes, published latency benchmarks.
  • End-to-end test file exercises activate → configure → invoke → assert → deactivate flow entirely through the plugin management API.
  • Test matrix covers at least: all supported modes, 4+ attack categories, 3+ dialects, 2+ binding scopes, and state-transition (enable/disable) cases.
  • False-positive scenarios explicitly asserted (benign strings pass through unaltered).
  • Isolation check confirms changes to one user/tenant/tool binding do not affect others.
  • Suite runs in CI (on a schedule or gated target if runtime is long); failures are actionable.
  • Tests use real JWTs and real API calls — no monkey-patching of the plugin manager internals.

References

Metadata

Metadata

Assignees

Labels

pluginssecurityImproves securitytestingTesting (unit, e2e, manual, automated, etc)triageIssues / Features awaiting triage

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions