fix: skip CSS pseudo-elements when generating XPath segments#2000
fix: skip CSS pseudo-elements when generating XPath segments#2000dstlmrk wants to merge 2 commits intobrowserbase:mainfrom
Conversation
|
|
This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run. |
There was a problem hiding this comment.
No issues found across 3 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant CDP as Chromium (CDP)
participant DT as domTree.ts
participant XU as xpathUtils.ts
participant Cache as Action Cache
participant PW as Playwright (Replay)
Note over DT, XU: DOM Snapshot / Indexing Flow
DT->>CDP: Request nodes (pierce: true)
CDP-->>DT: Return Protocol.DOM.Node[] (includes ::before/::after)
DT->>XU: buildChildXPathSegments(kids)
loop For each child node
XU->>XU: Check nodeName
alt NEW: nodeName starts with "::"
XU->>XU: Return null (Skip pseudo-element)
else Standard Node
XU->>XU: Calculate positional index (e.g., div[1])
end
end
XU-->>DT: Return (string | null)[]
loop For each segment/child pair
alt CHANGED: segment is null
DT->>DT: Skip node processing
else segment is valid
DT->>DT: Build full XPath segment
DT->>DT: Push node to traversal stack
end
end
Note over DT, Cache: Action Storage
DT->>Cache: Store action with generated XPaths
Note right of Cache: XPaths are now guaranteed to<br/>point to real DOM nodes only.
Note over Cache, PW: Replay Phase
PW->>Cache: Retrieve cached XPath
PW->>CDP: locator.element(xpath)
alt Success Path
CDP-->>PW: Node Found
else Unhappy Path (Old Behavior)
Note over CDP, PW: If XPath contained ::after, resolution failed.
CDP-->>PW: Error: "Could not find element"
end
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
No issues found across 4 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant CDP as Browser (CDP)
participant DT as DOM Tree Service
participant XU as XPath Utilities
participant Cache as Action Cache
Note over DT, XU: DOM Indexing & XPath Generation Flow
DT->>CDP: describeNode(pierce: true)
CDP-->>DT: Return node.children (may include ::before, ::after)
DT->>XU: buildChildXPathSegments(kids)
loop For each child in kids
XU->>XU: Check nodeName
alt NEW: nodeName starts with "::"
Note right of XU: Skip pseudo-element
else Valid DOM Node
XU->>XU: CHANGED: Increment positional index (skipping pseudo-elements)
XU->>XU: Create XPath segment (e.g., "div[2]")
XU->>XU: Map child node to segment
end
end
XU-->>DT: CHANGED: Return Array of { child, segment } pairs
loop For each { child, segment } pair
DT->>DT: joinXPath(parentPath, segment)
DT->>DT: Push child to traversal stack
end
Note over DT, Cache: Only valid DOM XPaths are now stored
DT->>Cache: Store action step with resolvable XPath
opt Subsequent Replay
Cache-->>DT: Retrieve XPath
DT->>CDP: Find element by XPath
Note over CDP: Success (No ::after segments)
end
why
Stagehand's XPath generation produces selectors that include CSS pseudo-elements (
::before,::after), which don't exist in the DOM and can never be resolved by Playwright. When these XPaths get stored in the action cache, replays fail deterministically with "Could not find an element for the given xPath(s)".Concrete example of a failing cached XPath:
The trailing
*[name()='::after'][1]segment is syntactically valid XPath but matches no nodes — pseudo-elements are a CSS rendering construct, not part of the DOM tree. Once cached, every subsequent run hits this entry and fails on the same step.This is analogous to the trailing text-node bug fixed in #824 (where
text()[n]segments were stripped viatrimTrailingTextNode); the same class of "impossible XPath" issue, just for pseudo-elements instead of text nodes.what changed
Root cause
According to the CDP spec, pseudo-elements should be returned in
Protocol.DOM.Node.pseudoElements, separate fromnode.children. In practice — particularly whenDOM.describeNodeis called withpierce: trueduringhydrateDomTree— Chromium also returns pseudo-element nodes insidenode.children. Those nodes havenodeNamevalues like::beforeand::after.buildChildXPathSegmentsinxpathUtils.tsiterateskidswithout filtering these out. Because::before/::aftercontain a colon, they fall into the namespaced-element branch and produce segments like*[name()='::after'][1].Fix
buildChildXPathSegmentsnow skips pseudo-element nodes (those withnodeNamestarting with::) viacontinueand returnsArray<{ child, segment }>pairs instead of a plainstring[]. This way callers get only real DOM nodes with their corresponding XPath segments, without needing to maintain index alignment with the originalkidsarray.domTree.ts(domMapsForSessionandbuildSessionDomIndex) iterate over the returned pairs directly, which means pseudo-element nodes never enter the XPath map and are never pushed onto the traversal stack.span[1],span[2]) remain correct even when::before/::afternodes appear between real siblings.Summary by cubic
Skip CSS pseudo-elements (
::before,::after) when generating XPath segments to avoid caching impossible selectors and breaking replays. XPaths now resolve to real DOM nodes and sibling indexing stays correct.buildChildXPathSegmentsnow returns filtered{ child, segment }pairs and omits pseudo-elements;domTreecallers updated to use pairs.Written for commit 5a54be7. Summary will update on new commits. Review in cubic