Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,9 @@ await stagehand.act("click on the stagehand repo");
// Use agent() for multi-step tasks
const agent = stagehand.agent();
await agent.execute("Get to the latest PR");
await agent.execute(
'Upload the file at "/Users/me/Documents/resume.pdf" to the resume file input',
);

// Use extract() to get structured data from the page
const { author, title } = await stagehand.extract(
Expand Down
39 changes: 31 additions & 8 deletions packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts
Original file line number Diff line number Diff line change
Expand Up @@ -48,21 +48,30 @@ function buildToolsSection(
{
name: "click",
description:
"Click on an element (PREFERRED - more reliable when element is visible in viewport)",
"Click on an element (PREFERRED - more reliable when element is visible in viewport). Never use this for file upload buttons or file inputs.",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These "never use this for file upload" annotations to tool calls seems overly prescriptive and not the right pattern to add

},
{
name: "type",
description:
"Type text into an element (PREFERRED - more reliable when element is visible in viewport)",
"Type text into an element (PREFERRED - more reliable when element is visible in viewport). Never use this for file inputs or local file paths.",
},
{
name: "act",
description:
"Perform a specific atomic action (click, type, etc.) - ONLY use when element is in ariaTree but NOT visible in screenshot. Less reliable but can interact with out-of-viewport elements.",
"Perform a specific atomic action (click, type, etc.) - ONLY use when element is in ariaTree but NOT visible in screenshot. Less reliable but can interact with out-of-viewport elements. Never use this for file upload buttons, file inputs, or local file paths.",
},
{
name: "upload",
description:
"Upload one or more local files into a file input when the user has provided a file path",
},
{ name: "dragAndDrop", description: "Drag and drop an element" },
{ name: "clickAndHold", description: "Click and hold on an element" },
{ name: "keys", description: "Press a keyboard key" },
{
name: "keys",
description:
"Press a keyboard key or type into the currently focused element. Never use this to enter local file paths for uploads.",
},
{
name: "fillFormVision",
description: "Fill out a form using coordinates",
Expand All @@ -87,9 +96,19 @@ function buildToolsSection(
},
{
name: "act",
description: "Perform a specific atomic action (click, type)",
description:
"Perform a specific atomic action (click, type). Never use this for file upload buttons, file inputs, or local file paths.",
},
{
name: "upload",
description:
"Upload one or more local files into a file input when the user has provided a file path",
},
{
name: "keys",
description:
"Press a keyboard key or type into the currently focused element. Never use this to enter local file paths for uploads.",
},
{ name: "keys", description: "Press a keyboard key" },
{ name: "fillForm", description: "Fill out a form" },
{ name: "think", description: "Think about the task" },
{ name: "extract", description: "Extract structured data" },
Expand Down Expand Up @@ -147,13 +166,17 @@ export function buildAgentSystemPrompt(
`<item>Tool selection priority: Use specific tools (click, type) when elements are visible in viewport for maximum reliability.</item>`,
`<item>Always use screenshot to get proper grounding of the coordinates you want to type/click into.</item>`,
`<item>When interacting with an input, always use the type tool to type into the input, over clicking and then typing into it.</item>`,
`<item>When the task requires uploading a file and a local path is available, use the upload tool instead of clicking the visible upload button.</item>`,
`<item>Never use click, type, act, fillForm, or fillFormVision to interact with a file input or upload button. Those tools cannot complete native file selection reliably.</item>`,
`<item>Use ariaTree as a secondary tool when elements aren't visible in screenshot or to get full page context.</item>`,
`<item>Only use act when element is in ariaTree but NOT visible in screenshot.</item>`,
]
: [
`<item>Tool selection priority: Use act tool for all clicking and typing on a page.</item>`,
`<item>Always check ariaTree first to understand full page content without scrolling - it shows all elements including those below the fold.</item>`,
`<item>When interacting with an input, always use the act tool to type into the input, over clicking and then typing.</item>`,
`<item>When the task requires uploading a file and a local path is available, use the upload tool instead of clicking the visible upload button.</item>`,
`<item>Never use click, type, act, fillForm, or fillFormVision to interact with a file input or upload button. Those tools cannot complete native file selection reliably.</item>`,
`<item>If an element is present in the ariaTree, use act to interact with it directly - this eliminates the need to scroll.</item>`,
`<item>Use screenshot for visual confirmation when needed, but rely primarily on ariaTree for element detection.</item>`,
];
Expand Down Expand Up @@ -213,8 +236,8 @@ export function buildAgentSystemPrompt(
// Build variables section only if variables are provided
const hasVariables = variables && Object.keys(variables).length > 0;
const variableToolsNote = isHybridMode
? "Use %variableName% syntax in the type, fillFormVision, or act tool's value/text/action fields."
: "Use %variableName% syntax in the act or fillForm tool's action fields.";
? "Use %variableName% syntax in the type, fillFormVision, act, or upload tool's text/action/path fields."
: "Use %variableName% syntax in the act, fillForm, or upload tool's action/path fields.";
const variableEntries = getVariablePromptEntries(variables);
const variablesSection = hasVariables
? `<variables>
Expand Down
9 changes: 9 additions & 0 deletions packages/core/lib/v3/agent/tools/act.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import type { V3 } from "../../v3.js";
import type { Action } from "../../types/public/methods.js";
import type { AgentModelConfig, Variables } from "../../types/public/agent.js";
import { TimeoutError } from "../../types/public/sdkErrors.js";
import { getFileUploadGuardError } from "../utils/fileUploadGuard.js";

export const actTool = (
v3: V3,
Expand All @@ -24,6 +25,14 @@ export const actTool = (
}),
execute: async ({ action }) => {
try {
const fileUploadGuardError = getFileUploadGuardError(action);
if (fileUploadGuardError) {
return {
success: false,
error: fileUploadGuardError,
};
}

v3.logger({
category: "agent",
message: `Agent calling tool: act`,
Expand Down
9 changes: 9 additions & 0 deletions packages/core/lib/v3/agent/tools/click.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import type {
ModelOutputContentItem,
} from "../../types/public/agent.js";
import { processCoordinates } from "../utils/coordinateNormalization.js";
import { getFileUploadGuardError } from "../utils/fileUploadGuard.js";
import { ensureXPath } from "../utils/xpath.js";
import { waitAndCaptureScreenshot } from "../utils/screenshotHandler.js";

Expand All @@ -26,6 +27,14 @@ export const clickTool = (v3: V3, provider?: string) =>
}),
execute: async ({ describe, coordinates }): Promise<ClickToolResult> => {
try {
const fileUploadGuardError = getFileUploadGuardError(describe);
if (fileUploadGuardError) {
return {
success: false,
error: fileUploadGuardError,
};
}

const page = await v3.context.awaitActivePage();
const processed = processCoordinates(
coordinates[0],
Expand Down
11 changes: 11 additions & 0 deletions packages/core/lib/v3/agent/tools/fillFormVision.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import type {
Variables,
} from "../../types/public/agent.js";
import { processCoordinates } from "../utils/coordinateNormalization.js";
import { getFileUploadGuardError } from "../utils/fileUploadGuard.js";
import { ensureXPath } from "../utils/xpath.js";
import { waitAndCaptureScreenshot } from "../utils/screenshotHandler.js";
import { substituteVariables } from "../utils/variables.js";
Expand Down Expand Up @@ -62,6 +63,16 @@ MANDATORY USE CASES (always use fillFormVision for these):
}),
execute: async ({ fields }): Promise<FillFormVisionToolResult> => {
try {
const fileUploadGuardError = getFileUploadGuardError(
...fields.flatMap((field) => [field.action, field.value]),
);
Comment on lines +66 to +68
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The upload guard only inspects the raw field values; any %variable% that expands to a local file path will bypass the guard even though the substituted value is what gets typed. Run the guard against the substituted value so file uploads can’t slip through fillFormVision.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/agent/tools/fillFormVision.ts, line 66:

<comment>The upload guard only inspects the raw field values; any `%variable%` that expands to a local file path will bypass the guard even though the substituted value is what gets typed. Run the guard against the substituted value so file uploads can’t slip through fillFormVision.</comment>

<file context>
@@ -62,6 +63,16 @@ MANDATORY USE CASES (always use fillFormVision for these):
     }),
     execute: async ({ fields }): Promise<FillFormVisionToolResult> => {
       try {
+        const fileUploadGuardError = getFileUploadGuardError(
+          ...fields.flatMap((field) => [field.action, field.value]),
+        );
</file context>
Suggested change
const fileUploadGuardError = getFileUploadGuardError(
...fields.flatMap((field) => [field.action, field.value]),
);
const fileUploadGuardError = getFileUploadGuardError(
...fields.flatMap((field) => [
field.action,
substituteVariables(field.value, variables),
]),
);
Fix with Cubic

if (fileUploadGuardError) {
return {
success: false,
error: fileUploadGuardError,
};
}

const page = await v3.context.awaitActivePage();

// Process coordinates and substitute variables for each field
Expand Down
11 changes: 11 additions & 0 deletions packages/core/lib/v3/agent/tools/fillform.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import type { V3 } from "../../v3.js";
import type { Action } from "../../types/public/methods.js";
import type { AgentModelConfig, Variables } from "../../types/public/agent.js";
import { TimeoutError } from "../../types/public/sdkErrors.js";
import { getFileUploadGuardError } from "../utils/fileUploadGuard.js";

export const fillFormTool = (
v3: V3,
Expand All @@ -30,6 +31,16 @@ export const fillFormTool = (
}),
execute: async ({ fields }) => {
try {
const fileUploadGuardError = getFileUploadGuardError(
...fields.map((field) => field.action),
);
if (fileUploadGuardError) {
return {
success: false,
error: fileUploadGuardError,
};
}

v3.logger({
category: "agent",
message: `Agent calling tool: fillForm`,
Expand Down
34 changes: 33 additions & 1 deletion packages/core/lib/v3/agent/tools/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import { fillFormVisionTool } from "./fillFormVision.js";
import { thinkTool } from "./think.js";
import { searchTool as browserbaseSearchTool } from "./browserbaseSearch.js";
import { searchTool as braveSearchTool } from "./braveSearch.js";
import { uploadTool } from "./upload.js";

import type { ToolSet, InferUITools } from "ai";
import type { V3 } from "../../v3.js";
Expand All @@ -33,7 +34,7 @@ export interface V3AgentToolOptions {
logger?: (message: LogLine) => void;
/**
* Tool mode determines which set of tools are available.
* - 'dom' (default): Uses DOM-based tools (act, fillForm) - removes coordinate-based tools
* - 'dom' (default): Uses DOM-based tools (act, fillForm, upload) - removes coordinate-based tools
* - 'hybrid': Uses coordinate-based tools (click, type, dragAndDrop, etc.) - removes fillForm
*/
mode?: AgentToolMode;
Expand Down Expand Up @@ -156,6 +157,8 @@ export function createAgentTools(v3: V3, options?: V3AgentToolOptions) {
extract: "— try using a smaller or simpler schema",
fillForm:
"(it may continue executing in the background) — try filling fewer fields at once or use a different tool",
upload:
"— make sure the path exists locally and the target describes the actual file input element",
};

const unwrappedTools: ToolSet = {
Expand All @@ -173,6 +176,7 @@ export function createAgentTools(v3: V3, options?: V3AgentToolOptions) {
screenshot: screenshotTool(v3),
scroll: mode === "hybrid" ? scrollVisionTool(v3, provider) : scrollTool(v3),
type: typeTool(v3, provider, variables),
upload: uploadTool(v3, executionModel, variables, toolTimeout),
};

if (options?.useSearch && options.browserbaseApiKey) {
Expand Down Expand Up @@ -206,6 +210,33 @@ export function createAgentTools(v3: V3, options?: V3AgentToolOptions) {

export type AgentTools = ReturnType<typeof createAgentTools>;

export function createCuaAgentTools(
v3: V3,
tools: ToolSet = {},
options?: Pick<
V3AgentToolOptions,
"executionModel" | "toolTimeout" | "variables"
>,
): ToolSet {
const builtInUploadTool = wrapToolWithTimeout(
uploadTool(
v3,
options?.executionModel,
options?.variables,
options?.toolTimeout,
),
"upload()",
v3,
options?.toolTimeout,
"— make sure the path exists locally and the target describes the actual file input element",
);

return {
upload: builtInUploadTool,
...tools,
};
}

/**
* Type map of all agent tools for strong typing of tool calls and results.
* Note: `search` is optional — enabled via useSearch: true (Browserbase) or BRAVE_API_KEY env var (legacy).
Expand All @@ -229,6 +260,7 @@ export type AgentToolTypesMap = {
| ReturnType<typeof braveSearchTool>;
think: ReturnType<typeof thinkTool>;
type: ReturnType<typeof typeTool>;
upload: ReturnType<typeof uploadTool>;
wait: ReturnType<typeof waitTool>;
};

Expand Down
14 changes: 13 additions & 1 deletion packages/core/lib/v3/agent/tools/keys.ts
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
import { tool } from "ai";
import { z } from "zod";
import type { V3 } from "../../v3.js";
import { getFileUploadGuardError } from "../utils/fileUploadGuard.js";

export const keysTool = (v3: V3) =>
tool({
description: `Send keyboard input to the page without targeting a specific element. Unlike the type tool which clicks then types into coordinates, this sends keystrokes directly to wherever focus currently is.

Use method="type" to enter text into the currently focused element. Preferred when: input is already focused, text needs to flow across multiple fields (e.g., verification codes)

Use method="press" for navigation keys (Enter, Tab, Escape, Backspace, arrows) and keyboard shortcuts (Cmd+A, Ctrl+C, Shift+Tab).`,
Use method="press" for navigation keys (Enter, Tab, Escape, Backspace, arrows) and keyboard shortcuts (Cmd+A, Ctrl+C, Shift+Tab).

Never use this tool to type local file paths for uploads. Use the upload tool instead.`,
inputSchema: z.object({
method: z.enum(["press", "type"]),
value: z
Expand All @@ -20,6 +23,15 @@ Use method="press" for navigation keys (Enter, Tab, Escape, Backspace, arrows) a
}),
execute: async ({ method, value, repeat }) => {
try {
const fileUploadGuardError =
method === "type" ? getFileUploadGuardError(value) : null;
if (fileUploadGuardError) {
return {
success: false,
error: fileUploadGuardError,
};
}

const page = await v3.context.awaitActivePage();
v3.logger({
category: "agent",
Expand Down
9 changes: 9 additions & 0 deletions packages/core/lib/v3/agent/tools/type.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import type {
Variables,
} from "../../types/public/agent.js";
import { processCoordinates } from "../utils/coordinateNormalization.js";
import { getFileUploadGuardError } from "../utils/fileUploadGuard.js";
import { ensureXPath } from "../utils/xpath.js";
import { waitAndCaptureScreenshot } from "../utils/screenshotHandler.js";
import { substituteVariables } from "../utils/variables.js";
Expand Down Expand Up @@ -38,6 +39,14 @@ export const typeTool = (v3: V3, provider?: string, variables?: Variables) => {
text,
}): Promise<TypeToolResult> => {
try {
const fileUploadGuardError = getFileUploadGuardError(describe, text);
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Run the file-upload guard on the substituted text; %variable% file paths currently bypass the guard and allow the type tool to enter local file paths.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/agent/tools/type.ts, line 42:

<comment>Run the file-upload guard on the substituted text; `%variable%` file paths currently bypass the guard and allow the type tool to enter local file paths.</comment>

<file context>
@@ -38,6 +39,14 @@ export const typeTool = (v3: V3, provider?: string, variables?: Variables) => {
       text,
     }): Promise<TypeToolResult> => {
       try {
+        const fileUploadGuardError = getFileUploadGuardError(describe, text);
+        if (fileUploadGuardError) {
+          return {
</file context>
Suggested change
const fileUploadGuardError = getFileUploadGuardError(describe, text);
const fileUploadGuardError = getFileUploadGuardError(
describe,
substituteVariables(text, variables),
);
Fix with Cubic

if (fileUploadGuardError) {
return {
success: false,
error: fileUploadGuardError,
};
}

const page = await v3.context.awaitActivePage();
const processed = processCoordinates(
coordinates[0],
Expand Down
Loading
Loading