Problem
When a tool call is rejected via the approval flow, buildApprovalRejectionResult produces a function_call_result with status: 'completed'. The only signal that the tool was rejected is the text content of the output.
I believe this is what causes the model to frequently hallucinate that the rejected tool call succeeded, especially in multi-tool scenarios (e.g. approve one email, reject another — model claims both were sent).
Root cause
In runner/toolExecution.mjs, getToolCallOutputItem hardcodes status: 'completed'. buildApprovalRejectionResult passes the rejection message through getToolCallOutputItem, so rejected tools get status: 'completed' identical to successful ones.
The model sees:
function_call_result { name: "SendEmail", callId: "abc", status: "completed", output: "Email sent to bob" }
function_call_result { name: "SendEmail", callId: "def", status: "completed", output: "Tool execution was not approved." }
The structural status: 'completed' contradicts the rejection text. The rejection message can be strengthened via ToolErrorFormatter to mitigate this, but we're fighting an opposing force — the status: 'completed' field works against the rejection text, and we can't fully eliminate the contradictory signal the model receives.
Suggested fix
Consider using status: 'incomplete' for rejected tool calls, or ideally adding a dedicated status: 'rejected' value to the function_call_result spec so the model gets a consistent signal.
Problem
When a tool call is rejected via the approval flow,
buildApprovalRejectionResultproduces afunction_call_resultwithstatus: 'completed'. The only signal that the tool was rejected is the text content of the output.I believe this is what causes the model to frequently hallucinate that the rejected tool call succeeded, especially in multi-tool scenarios (e.g. approve one email, reject another — model claims both were sent).
Root cause
In
runner/toolExecution.mjs,getToolCallOutputItemhardcodesstatus: 'completed'.buildApprovalRejectionResultpasses the rejection message throughgetToolCallOutputItem, so rejected tools getstatus: 'completed'identical to successful ones.The model sees:
The structural
status: 'completed'contradicts the rejection text. The rejection message can be strengthened viaToolErrorFormatterto mitigate this, but we're fighting an opposing force — thestatus: 'completed'field works against the rejection text, and we can't fully eliminate the contradictory signal the model receives.Suggested fix
Consider using
status: 'incomplete'for rejected tool calls, or ideally adding a dedicatedstatus: 'rejected'value to thefunction_call_resultspec so the model gets a consistent signal.