Skip to content

fix: skip CSS pseudo-elements when generating XPath segments#2000

Open
dstlmrk wants to merge 2 commits intobrowserbase:mainfrom
dstlmrk:fix/xpath-pseudo-elements
Open

fix: skip CSS pseudo-elements when generating XPath segments#2000
dstlmrk wants to merge 2 commits intobrowserbase:mainfrom
dstlmrk:fix/xpath-pseudo-elements

Conversation

@dstlmrk
Copy link
Copy Markdown

@dstlmrk dstlmrk commented Apr 13, 2026

why

Stagehand's XPath generation produces selectors that include CSS pseudo-elements (::before, ::after), which don't exist in the DOM and can never be resolved by Playwright. When these XPaths get stored in the action cache, replays fail deterministically with "Could not find an element for the given xPath(s)".

Concrete example of a failing cached XPath:

xpath=/html[1]/body[1]/div[1]/div[1]/div[2]/div[1]/div[4]/div[1]/div[1]/div[1]/form[1]/div[1]/div[6]/label[1]/div[1]/div[1]/span[1]/label[1]/*[name()='::after'][1]

The trailing *[name()='::after'][1] segment is syntactically valid XPath but matches no nodes — pseudo-elements are a CSS rendering construct, not part of the DOM tree. Once cached, every subsequent run hits this entry and fails on the same step.

This is analogous to the trailing text-node bug fixed in #824 (where text()[n] segments were stripped via trimTrailingTextNode); the same class of "impossible XPath" issue, just for pseudo-elements instead of text nodes.

what changed

Root cause

According to the CDP spec, pseudo-elements should be returned in Protocol.DOM.Node.pseudoElements, separate from node.children. In practice — particularly when DOM.describeNode is called with pierce: true during hydrateDomTree — Chromium also returns pseudo-element nodes inside node.children. Those nodes have nodeName values like ::before and ::after.

buildChildXPathSegments in xpathUtils.ts iterates kids without filtering these out. Because ::before / ::after contain a colon, they fall into the namespaced-element branch and produce segments like *[name()='::after'][1].

Fix

  • buildChildXPathSegments now skips pseudo-element nodes (those with nodeName starting with ::) via continue and returns Array<{ child, segment }> pairs instead of a plain string[]. This way callers get only real DOM nodes with their corresponding XPath segments, without needing to maintain index alignment with the original kids array.
  • Both call sites in domTree.ts (domMapsForSession and buildSessionDomIndex) iterate over the returned pairs directly, which means pseudo-element nodes never enter the XPath map and are never pushed onto the traversal stack.
  • Pseudo-elements are excluded from sibling counting, so positional indexes (e.g. span[1], span[2]) remain correct even when ::before/::after nodes appear between real siblings.

Summary by cubic

Skip CSS pseudo-elements (::before, ::after) when generating XPath segments to avoid caching impossible selectors and breaking replays. XPaths now resolve to real DOM nodes and sibling indexing stays correct.

  • Bug Fixes
    • buildChildXPathSegments now returns filtered { child, segment } pairs and omits pseudo-elements; domTree callers updated to use pairs.
    • Unit tests cover pseudo-element skipping and correct indexing for same-tag siblings.

Written for commit 5a54be7. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 13, 2026

⚠️ No Changeset found

Latest commit: 936be68

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

@github-actions github-actions bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels Apr 13, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant CDP as Chromium (CDP)
    participant DT as domTree.ts
    participant XU as xpathUtils.ts
    participant Cache as Action Cache
    participant PW as Playwright (Replay)

    Note over DT, XU: DOM Snapshot / Indexing Flow

    DT->>CDP: Request nodes (pierce: true)
    CDP-->>DT: Return Protocol.DOM.Node[] (includes ::before/::after)

    DT->>XU: buildChildXPathSegments(kids)
    
    loop For each child node
        XU->>XU: Check nodeName
        alt NEW: nodeName starts with "::"
            XU->>XU: Return null (Skip pseudo-element)
        else Standard Node
            XU->>XU: Calculate positional index (e.g., div[1])
        end
    end
    
    XU-->>DT: Return (string | null)[]
    
    loop For each segment/child pair
        alt CHANGED: segment is null
            DT->>DT: Skip node processing
        else segment is valid
            DT->>DT: Build full XPath segment
            DT->>DT: Push node to traversal stack
        end
    end

    Note over DT, Cache: Action Storage

    DT->>Cache: Store action with generated XPaths
    Note right of Cache: XPaths are now guaranteed to<br/>point to real DOM nodes only.

    Note over Cache, PW: Replay Phase

    PW->>Cache: Retrieve cached XPath
    PW->>CDP: locator.element(xpath)
    
    alt Success Path
        CDP-->>PW: Node Found
    else Unhappy Path (Old Behavior)
        Note over CDP, PW: If XPath contained ::after, resolution failed.
        CDP-->>PW: Error: "Could not find element"
    end
Loading

@dstlmrk dstlmrk marked this pull request as draft April 13, 2026 22:26
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dstlmrk dstlmrk marked this pull request as ready for review April 14, 2026 15:18
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant CDP as Browser (CDP)
    participant DT as DOM Tree Service
    participant XU as XPath Utilities
    participant Cache as Action Cache

    Note over DT, XU: DOM Indexing & XPath Generation Flow

    DT->>CDP: describeNode(pierce: true)
    CDP-->>DT: Return node.children (may include ::before, ::after)

    DT->>XU: buildChildXPathSegments(kids)
    
    loop For each child in kids
        XU->>XU: Check nodeName
        alt NEW: nodeName starts with "::"
            Note right of XU: Skip pseudo-element
        else Valid DOM Node
            XU->>XU: CHANGED: Increment positional index (skipping pseudo-elements)
            XU->>XU: Create XPath segment (e.g., "div[2]")
            XU->>XU: Map child node to segment
        end
    end

    XU-->>DT: CHANGED: Return Array of { child, segment } pairs

    loop For each { child, segment } pair
        DT->>DT: joinXPath(parentPath, segment)
        DT->>DT: Push child to traversal stack
    end

    Note over DT, Cache: Only valid DOM XPaths are now stored
    DT->>Cache: Store action step with resolvable XPath
    
    opt Subsequent Replay
        Cache-->>DT: Retrieve XPath
        DT->>CDP: Find element by XPath
        Note over CDP: Success (No ::after segments)
    end
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. external-contributor Tracks PRs mirrored from external contributor forks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant