Feat: Add Anthropic CUA adaptive thinking#1912
Feat: Add Anthropic CUA adaptive thinking#1912chromiebot wants to merge 4 commits intobrowserbase:mainfrom
Conversation
Add comprehensive tests for the new adaptive thinking API used by Claude 4.6 models (claude-opus-4-6, claude-sonnet-4-6). Tests verify: - Adaptive thinking uses thinking.type: 'adaptive' (not 'enabled') - Effort levels are passed via output_config.effort (not budget_tokens) - All effort levels: low, medium, high, max - Older models continue using deprecated budget_tokens API - Model name detection handles provider-prefixed names These tests define the expected API contract per: https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update AnthropicCUAClient to use the correct API contract for
Claude 4.6 models (claude-opus-4-6, claude-sonnet-4-6).
Claude 4.6 models use adaptive thinking:
- thinking: { type: "adaptive" }
- output_config: { effort: "low" | "medium" | "high" | "max" }
This replaces the deprecated API:
- thinking: { type: "enabled", budget_tokens: N }
Changes:
- Add ThinkingEffort type for effort levels
- Add thinkingEffort option to ClientOptions
- Detect 4.6 models and use adaptive thinking with output_config
- Keep backward compatibility with thinkingBudget for older models
- Add deprecation notice for thinkingBudget on 4.6 models
The implementation follows the API contract documented at:
https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
|
This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run. |
There was a problem hiding this comment.
No issues found across 3 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant App as Application/User
participant Client as AnthropicCUAClient
participant SDK as Anthropic SDK (Beta)
participant API as Anthropic API
Note over App,API: Request Flow for Claude 4.6+ (Adaptive Thinking)
App->>Client: getAction(messages, options)
Note right of Client: Includes NEW: thinkingEffort (low/medium/high/max)
Client->>Client: Identify Model Version
alt Model is Claude 4.6+ (Opus/Sonnet)
Client->>Client: NEW: Set thinking.type = "adaptive"
Client->>Client: NEW: Set output_config.effort = thinkingEffort
Client->>Client: NEW: Select computer_20251124 tool
else Model is Legacy (Claude 4.5 or older)
Client->>Client: CHANGED: Set thinking.type = "enabled"
Client->>Client: CHANGED: Set thinking.budget_tokens = thinkingBudget
Client->>Client: Select legacy computer tool version
end
Client->>SDK: beta.messages.create(payload)
Note right of SDK: Payload contains specific<br/>thinking config & tool versions
SDK->>API: POST /v1/messages (Beta Headers)
API-->>SDK: Response with thinking block + tool use
SDK-->>Client: Message object
Client-->>App: Action/Response
- Add test for default "medium" effort when thinkingEffort not set - Add tests verifying temperature=1 is set for adaptive thinking - Add test that older models don't force temperature=1 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Always set temperature=1 when adaptive thinking is enabled (required by API) - Default to "medium" effort for Claude 4.6 models when thinkingEffort not set - This ensures adaptive thinking works out of the box for 4.6 models Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/tests/unit/anthropic-cua-adaptive-thinking.test.ts">
<violation number="1" location="packages/core/tests/unit/anthropic-cua-adaptive-thinking.test.ts:179">
P3: Duplicate coverage: the new claude-sonnet-4-6 temperature test repeats the existing low-effort adaptive-thinking test with identical setup and assertions, adding no new behavior coverage.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| ); | ||
| }); | ||
|
|
||
| it("should set temperature to 1 for claude-sonnet-4-6 with adaptive thinking", async () => { |
There was a problem hiding this comment.
P3: Duplicate coverage: the new claude-sonnet-4-6 temperature test repeats the existing low-effort adaptive-thinking test with identical setup and assertions, adding no new behavior coverage.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/tests/unit/anthropic-cua-adaptive-thinking.test.ts, line 179:
<comment>Duplicate coverage: the new claude-sonnet-4-6 temperature test repeats the existing low-effort adaptive-thinking test with identical setup and assertions, adding no new behavior coverage.</comment>
<file context>
@@ -142,9 +146,57 @@ describe("AnthropicCUAClient adaptive thinking", () => {
+ );
+ });
+
+ it("should set temperature to 1 for claude-sonnet-4-6 with adaptive thinking", async () => {
+ const client = new AnthropicCUAClient(
+ "anthropic",
</file context>
|
This PR was approved by @miguelg719 and mirrored to #1954. All further discussion should happen on that PR. |
why
what changed
test plan
Summary by cubic
Add adaptive thinking for Anthropic Claude 4.6 models in the CUA client. Uses
thinkingEffortwithoutput_config.effort, sets temperature=1, and keepsthinkingBudgetfor older models.New Features
claude-opus-4-6,claude-sonnet-4-6) and usesthinking: { type: "adaptive" }withoutput_config.effort."medium"effort when not set and forcestemperature: 1for adaptive thinking.ThinkingEffortandthinkingEffortinClientOptions("low" | "medium" | "high" | "max"); falls back tothinking: { type: "enabled", budget_tokens }withthinkingBudgeton older models.computer_20251124for 4.6 andclaude-opus-4-5-20251101; tests cover effort levels, defaults and temperature, provider-prefixed names, and legacy budget behavior.Migration
thinkingEffort;thinkingBudgetis deprecated. Adaptive thinking setstemperature: 1automatically.thinkingBudgeton older Claude models. No changes needed if you don’t use thinking.Written for commit 0ea0332. Summary will update on new commits. Review in cubic