🐞 Bug Report
Two pre-existing silent-success traps surfaced during the PR #4204 review cycle for #4202. Both cause metrics_buffer.record_tool_metric(success=True) to be recorded for responses the caller sees as errors, inflating tool success rates and masking real failures in observability. Neither is a #4202 regression — they predate that PR — but they share the same "success flag diverges from tool_result.is_error" root cause the #4202 REST-branch fix addressed, and are natural follow-ups.
🔍 Trap 1 — A2A 200-OK branch ignores JSONRPC error envelopes
Location: mcpgateway/services/tool_service.py — the A2A branch around the elif tool_integration_type == "A2A" dispatch (grep for a2a_agent_endpoint_url). The 200-OK branch sets success = True and tool_result = ToolResult(..., is_error=False) unconditionally, regardless of whether the agent's response_data carries a JSONRPC {"error": {...}} envelope or an application-level failure payload.
Impact: Any A2A agent that legitimately surfaces errors in the body of a 200-OK response (common for JSONRPC-over-HTTP) has every call recorded as success=True. The non-200 HTTP branch also never re-sets success after initialising it to False, so it relies on scope fall-through rather than explicit assignment.
Proposed fix: Route A2A upstream responses through _coerce_to_tool_result (the canonical helper introduced in #4204 / #4210 step 0) and then compute success = not tool_result.is_error, symmetric with the fix applied to REST (success = not getattr(tool_result, "is_error", False) after the optional validator call) and direct-proxy (success = not tool_result.is_error after _coerce_to_tool_result). Separately, teach the A2A branch to detect {"error": ...} / {"isError": true} envelopes in the body and promote them to is_error=True before computing success.
Regression guard: Mirror test_rest_payload_shaped_error_records_success_false_in_metrics for A2A — mock the A2A HTTP client to return a 200 with a JSONRPC error body and assert record_tool_metric receives success=False.
🔍 Trap 2 — Plugin TOOL_POST_INVOKE hook can silently flip error state
Location: mcpgateway/services/tool_service.py — the TOOL_POST_INVOKE plugin invocation after the branch dispatch (roughly lines 5446-5473, grep for TOOL_POST_INVOKE). The plugin returns a modified_payload that can replace tool_result wholesale; the reconstruction at that site calls ToolResult(content=modified_result["content"], structured_content=structured) which drops is_error entirely (Pydantic defaults it to False).
Impact:
- A plugin that legitimately transforms a successful result into an error payload (e.g. a validation plugin detecting an un-caught bad response) has its intent silently erased —
is_error defaults to False and success — already computed earlier — isn't re-synced.
- The reverse: a plugin that flips an upstream error into a successful recovery still records
success=False because success was frozen before the plugin ran.
Proposed fix:
- Preserve plugin intent during reconstruction:
is_error=modified_result.get("isError", modified_result.get("is_error", False)).
- After the post-invoke block, recompute
success = not tool_result.is_error so metrics and span attributes reflect the plugin-modified state.
Regression guards: Two tests — one with a plugin that flips is_error True→False (asserts success=True), one that flips False→True (asserts success=False).
📎 Related
🧠 Scope note
Both traps are small, local fixes individually (a dozen lines each) but touching the A2A and plugin-post-invoke paths is enough of a behavioural shift to warrant a dedicated PR, a dedicated regression test, and explicit review — rather than folding them into an already-large #4202 PR.
🐞 Bug Report
Two pre-existing silent-success traps surfaced during the PR #4204 review cycle for #4202. Both cause
metrics_buffer.record_tool_metric(success=True)to be recorded for responses the caller sees as errors, inflating tool success rates and masking real failures in observability. Neither is a #4202 regression — they predate that PR — but they share the same "successflag diverges fromtool_result.is_error" root cause the #4202 REST-branch fix addressed, and are natural follow-ups.🔍 Trap 1 — A2A 200-OK branch ignores JSONRPC error envelopes
Location:
mcpgateway/services/tool_service.py— the A2A branch around theelif tool_integration_type == "A2A"dispatch (grep fora2a_agent_endpoint_url). The 200-OK branch setssuccess = Trueandtool_result = ToolResult(..., is_error=False)unconditionally, regardless of whether the agent'sresponse_datacarries a JSONRPC{"error": {...}}envelope or an application-level failure payload.Impact: Any A2A agent that legitimately surfaces errors in the body of a 200-OK response (common for JSONRPC-over-HTTP) has every call recorded as
success=True. The non-200 HTTP branch also never re-setssuccessafter initialising it toFalse, so it relies on scope fall-through rather than explicit assignment.Proposed fix: Route A2A upstream responses through
_coerce_to_tool_result(the canonical helper introduced in #4204 / #4210 step 0) and then computesuccess = not tool_result.is_error, symmetric with the fix applied to REST (success = not getattr(tool_result, "is_error", False)after the optional validator call) and direct-proxy (success = not tool_result.is_errorafter_coerce_to_tool_result). Separately, teach the A2A branch to detect{"error": ...}/{"isError": true}envelopes in the body and promote them tois_error=Truebefore computingsuccess.Regression guard: Mirror
test_rest_payload_shaped_error_records_success_false_in_metricsfor A2A — mock the A2A HTTP client to return a 200 with a JSONRPC error body and assertrecord_tool_metricreceivessuccess=False.🔍 Trap 2 — Plugin
TOOL_POST_INVOKEhook can silently flip error stateLocation:
mcpgateway/services/tool_service.py— theTOOL_POST_INVOKEplugin invocation after the branch dispatch (roughly lines 5446-5473, grep forTOOL_POST_INVOKE). The plugin returns amodified_payloadthat can replacetool_resultwholesale; the reconstruction at that site callsToolResult(content=modified_result["content"], structured_content=structured)which dropsis_errorentirely (Pydantic defaults it toFalse).Impact:
is_errordefaults toFalseandsuccess— already computed earlier — isn't re-synced.success=Falsebecausesuccesswas frozen before the plugin ran.Proposed fix:
is_error=modified_result.get("isError", modified_result.get("is_error", False)).success = not tool_result.is_errorso metrics and span attributes reflect the plugin-modified state.Regression guards: Two tests — one with a plugin that flips
is_errorTrue→False (assertssuccess=True), one that flips False→True (assertssuccess=False).📎 Related
ToolResult. Both traps dissolve naturally when A2A routes through the unified_post_invoke_pipelineand plugin post-invoke is framed as a stage of that pipeline with explicitToolResultin/out contracts.🧠 Scope note
Both traps are small, local fixes individually (a dozen lines each) but touching the A2A and plugin-post-invoke paths is enough of a behavioural shift to warrant a dedicated PR, a dedicated regression test, and explicit review — rather than folding them into an already-large #4202 PR.