fix: skip CSS pseudo-elements when generating XPath segments by dstlmrk · Pull Request #2000 · browserbase/stagehand

dstlmrk · 2026-04-13T22:03:44Z

why

Stagehand's XPath generation produces selectors that include CSS pseudo-elements (::before, ::after), which don't exist in the DOM and can never be resolved by Playwright. When these XPaths get stored in the action cache, replays fail deterministically with "Could not find an element for the given xPath(s)".

Concrete example of a failing cached XPath:

xpath=/html[1]/body[1]/div[1]/div[1]/div[2]/div[1]/div[4]/div[1]/div[1]/div[1]/form[1]/div[1]/div[6]/label[1]/div[1]/div[1]/span[1]/label[1]/*[name()='::after'][1]

The trailing *[name()='::after'][1] segment is syntactically valid XPath but matches no nodes — pseudo-elements are a CSS rendering construct, not part of the DOM tree. Once cached, every subsequent run hits this entry and fails on the same step.

This is analogous to the trailing text-node bug fixed in #824 (where text()[n] segments were stripped via trimTrailingTextNode); the same class of "impossible XPath" issue, just for pseudo-elements instead of text nodes.

what changed

Root cause

According to the CDP spec, pseudo-elements should be returned in Protocol.DOM.Node.pseudoElements, separate from node.children. In practice — particularly when DOM.describeNode is called with pierce: true during hydrateDomTree — Chromium also returns pseudo-element nodes inside node.children. Those nodes have nodeName values like ::before and ::after.

buildChildXPathSegments in xpathUtils.ts iterates kids without filtering these out. Because ::before / ::after contain a colon, they fall into the namespaced-element branch and produce segments like *[name()='::after'][1].

Fix

buildChildXPathSegments now skips pseudo-element nodes (those with nodeName starting with ::) via continue and returns Array<{ child, segment }> pairs instead of a plain string[]. This way callers get only real DOM nodes with their corresponding XPath segments, without needing to maintain index alignment with the original kids array.
Both call sites in domTree.ts (domMapsForSession and buildSessionDomIndex) iterate over the returned pairs directly, which means pseudo-element nodes never enter the XPath map and are never pushed onto the traversal stack.
Pseudo-elements are excluded from sibling counting, so positional indexes (e.g. span[1], span[2]) remain correct even when ::before/::after nodes appear between real siblings.

Summary by cubic

Skip CSS pseudo-elements (::before, ::after) when generating XPath segments to avoid caching impossible selectors and breaking replays. XPaths now resolve to real DOM nodes and sibling indexing stays correct.

Bug Fixes
- buildChildXPathSegments now returns filtered { child, segment } pairs and omits pseudo-elements; domTree callers updated to use pairs.
- Unit tests cover pseudo-element skipping and correct indexing for same-tag siblings.

^{Written for commit 5a54be7. Summary will update on new commits. Review in cubic}

changeset-bot · 2026-04-13T22:03:50Z

⚠️ No Changeset found

Latest commit: 936be68

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

github-actions · 2026-04-13T22:03:59Z

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

cubic-dev-ai

No issues found across 3 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

Architecture diagram

sequenceDiagram
    participant CDP as Chromium (CDP)
    participant DT as domTree.ts
    participant XU as xpathUtils.ts
    participant Cache as Action Cache
    participant PW as Playwright (Replay)

    Note over DT, XU: DOM Snapshot / Indexing Flow

    DT->>CDP: Request nodes (pierce: true)
    CDP-->>DT: Return Protocol.DOM.Node[] (includes ::before/::after)

    DT->>XU: buildChildXPathSegments(kids)
    
    loop For each child node
        XU->>XU: Check nodeName
        alt NEW: nodeName starts with "::"
            XU->>XU: Return null (Skip pseudo-element)
        else Standard Node
            XU->>XU: Calculate positional index (e.g., div[1])
        end
    end
    
    XU-->>DT: Return (string | null)[]
    
    loop For each segment/child pair
        alt CHANGED: segment is null
            DT->>DT: Skip node processing
        else segment is valid
            DT->>DT: Build full XPath segment
            DT->>DT: Push node to traversal stack
        end
    end

    Note over DT, Cache: Action Storage

    DT->>Cache: Store action with generated XPaths
    Note right of Cache: XPaths are now guaranteed to<br/>point to real DOM nodes only.

    Note over Cache, PW: Replay Phase

    PW->>Cache: Retrieve cached XPath
    PW->>CDP: locator.element(xpath)
    
    alt Success Path
        CDP-->>PW: Node Found
    else Unhappy Path (Old Behavior)
        Note over CDP, PW: If XPath contained ::after, resolution failed.
        CDP-->>PW: Error: "Could not find element"
    end

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cubic-dev-ai

No issues found across 4 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

Architecture diagram

sequenceDiagram
    participant CDP as Browser (CDP)
    participant DT as DOM Tree Service
    participant XU as XPath Utilities
    participant Cache as Action Cache

    Note over DT, XU: DOM Indexing & XPath Generation Flow

    DT->>CDP: describeNode(pierce: true)
    CDP-->>DT: Return node.children (may include ::before, ::after)

    DT->>XU: buildChildXPathSegments(kids)
    
    loop For each child in kids
        XU->>XU: Check nodeName
        alt NEW: nodeName starts with "::"
            Note right of XU: Skip pseudo-element
        else Valid DOM Node
            XU->>XU: CHANGED: Increment positional index (skipping pseudo-elements)
            XU->>XU: Create XPath segment (e.g., "div[2]")
            XU->>XU: Map child node to segment
        end
    end

    XU-->>DT: CHANGED: Return Array of { child, segment } pairs

    loop For each { child, segment } pair
        DT->>DT: joinXPath(parentPath, segment)
        DT->>DT: Push child to traversal stack
    end

    Note over DT, Cache: Only valid DOM XPaths are now stored
    DT->>Cache: Store action step with resolvable XPath
    
    opt Subsequent Replay
        Cache-->>DT: Retrieve XPath
        DT->>CDP: Find element by XPath
        Note over CDP: Success (No ::after segments)
    end

fix: skip CSS pseudo-elements when generating XPath segments

936be68

github-actions bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels Apr 13, 2026

cubic-dev-ai bot reviewed Apr 13, 2026

View reviewed changes

dstlmrk marked this pull request as draft April 13, 2026 22:26

refactor: return filtered pairs from buildChildXPathSegments

5a54be7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dstlmrk marked this pull request as ready for review April 14, 2026 15:18

cubic-dev-ai bot reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: skip CSS pseudo-elements when generating XPath segments#2000

fix: skip CSS pseudo-elements when generating XPath segments#2000
dstlmrk wants to merge 2 commits intobrowserbase:mainfrom
dstlmrk:fix/xpath-pseudo-elements

dstlmrk commented Apr 13, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dstlmrk commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what changed

Root cause

Fix

Summary by cubic

Uh oh!

changeset-bot bot commented Apr 13, 2026

⚠️ No Changeset found

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dstlmrk commented Apr 13, 2026 •

edited

Loading