Conversation
WalkthroughThe PR enhances the interpreter with operation-type awareness (extract, scrape, crawl, search) and introduces sophisticated navigation/stability utilities including network-wait strategies, image-loading detection, and DOM stabilization. TypeScript configuration is refined with proper root directory and type constraints. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
maxun-core/src/interpret.ts (1)
759-781:⚠️ Potential issue | 🟠 MajorWait for dynamic content before non-paginated
scrapeList.The paginated path waits in
scrapeCurrentPage, but the non-paginated path evaluateswindow.scrapeListimmediately after script loading. With fasterdomcontentloadednavigation, client-rendered list items can be missed.🐛 Proposed fix
if (!config.pagination) { + await this.waitForDynamicStability(page, [{ + action: 'scrapeList', + args: [config] + } as any]); + scrapeResults = await page.evaluate((cfg) => { try { return window.scrapeList(cfg); } catch (error: any) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@maxun-core/src/interpret.ts` around lines 759 - 781, The non-paginated branch calls window.scrapeList immediately after ensureScriptsLoaded and can miss client-rendered items; change that branch to wait the same way the paginated path does by invoking the same waiting/scrape helper (e.g., call and await this.scrapeCurrentPage(page, config, actionName) or the equivalent wait-for-list-render function used by handlePagination) before evaluating/collecting results, then assign scrapeResults from that awaited call (or fall back to page.evaluate(window.scrapeList) only if the wait returns nothing).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@maxun-core/src/interpret.ts`:
- Around line 470-486: The getUpcomingExtractionSelector function currently only
looks for selectors in actions 'scrapeList' and 'scrapeSchema', causing
standalone scrape(selector) targets to be missed; update
getUpcomingExtractionSelector to also handle s.action === 'scrape' by extracting
the selector from s.args (same logic used for firstArg: check
firstArg.listSelector, firstArg.fields → first field.selector, then
firstArg.selector or if s.args is a string/selector use that) so it returns the
selector for scrape actions as well; apply the same change to the analogous
location around the other occurrence (the later block referenced) to keep both
selector-aware stability waits consistent.
- Around line 447-458: waitForNetworkQuiet currently only updates
lastRequestTime and can report "quiet" while long requests are still in-flight;
change it to track an in-flight counter by incrementing on 'request' and
decrementing on 'requestfinished'/'requestfailed', and only consider the network
quiet when inFlight === 0 and Date.now() - lastRequestTime > quietWindow (use
the same check loop and clear listeners on return). In waitForDynamicStability
avoid Promise.race(signals) which allows early return on a single signal;
instead await both network and DOM stability together (e.g.,
Promise.all([waitForNetworkQuiet(...), waitForDomStable(...)]) or require both
boolean flags be true for the stability window) so DOM text changes and
in-flight requests are both resolved before proceeding; apply the same
counter/listener and combined-wait pattern to the duplicate implementation
around the other occurrence (the block referenced in the review).
- Around line 495-499: In waitForImagesLoaded, page.waitForFunction currently
passes the timeout object as the second argument (arg) instead of the third
(options), so Playwright ignores the 5s timeout; update the call to pass the
predicate as the first arg, pass undefined (or any intended arg) as the second
arg, and supply { timeout: 5000 } as the third argument to ensure the 5s timeout
is honored (keep the existing catch handling); locate the call in the
waitForImagesLoaded method and adjust the page.waitForFunction invocation
accordingly.
- Around line 1871-1899: The goto branch incorrectly assumes step.args is an
array, force-overwrites user-specified waitUntil values, and swallows navigation
failures; update the methodName === 'goto' handling so it: 1) normalizes
step.args the same way as click/waitForLoadState (treat non-array args as
single-arg array) before reading url and opts; 2) only change
existingOpts.waitUntil to 'domcontentloaded' when there was no explicit
waitUntil (i.e., preserve 'networkidle'/'load' if provided) while still applying
the default timeout fallback; 3) after logging navigation errors from
executeAction, rethrow the error so the workflow can observe the failure (keep
the existing waitForDynamicStability call and checks like needsDataSoon,
blockNeedsVisualRender, remainingWorkflowNeedsVisualRender intact).
---
Outside diff comments:
In `@maxun-core/src/interpret.ts`:
- Around line 759-781: The non-paginated branch calls window.scrapeList
immediately after ensureScriptsLoaded and can miss client-rendered items; change
that branch to wait the same way the paginated path does by invoking the same
waiting/scrape helper (e.g., call and await this.scrapeCurrentPage(page, config,
actionName) or the equivalent wait-for-list-render function used by
handlePagination) before evaluating/collecting results, then assign
scrapeResults from that awaited call (or fall back to
page.evaluate(window.scrapeList) only if the wait returns nothing).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e6d93e60-84e3-4bdd-a465-cac436d8fe41
📒 Files selected for processing (2)
maxun-core/src/interpret.tsmaxun-core/tsconfig.json
| private async waitForNetworkQuiet(page: Page, timeout: number = 4000, quietWindow: number = 600): Promise<void> { | ||
| let lastRequestTime = Date.now(); | ||
| const onRequest = () => { lastRequestTime = Date.now(); }; | ||
| page.on('request', onRequest); | ||
| page.on('requestfinished', onRequest); | ||
| page.on('requestfailed', onRequest); | ||
| try { | ||
| const checkInterval = 100; | ||
| const start = Date.now(); | ||
| while (Date.now() - start < timeout) { | ||
| if (Date.now() - lastRequestTime > quietWindow) return; | ||
| await new Promise(r => setTimeout(r, checkInterval)); |
There was a problem hiding this comment.
Avoid resolving “dynamic stability” on the first partial signal.
Promise.race(signals) lets waitForDynamicStability continue after only network quiet, even if DOM text is still changing. Also, waitForNetworkQuiet does not track in-flight requests, so a long request can still be active when it reports quiet. This can produce incomplete scrape/crawl results.
🐛 Proposed fix
private async waitForNetworkQuiet(page: Page, timeout: number = 4000, quietWindow: number = 600): Promise<void> {
let lastRequestTime = Date.now();
- const onRequest = () => { lastRequestTime = Date.now(); };
+ let inFlight = 0;
+ const onRequest = () => {
+ inFlight += 1;
+ lastRequestTime = Date.now();
+ };
+ const onRequestDone = () => {
+ inFlight = Math.max(0, inFlight - 1);
+ lastRequestTime = Date.now();
+ };
page.on('request', onRequest);
- page.on('requestfinished', onRequest);
- page.on('requestfailed', onRequest);
+ page.on('requestfinished', onRequestDone);
+ page.on('requestfailed', onRequestDone);
try {
const checkInterval = 100;
const start = Date.now();
while (Date.now() - start < timeout) {
- if (Date.now() - lastRequestTime > quietWindow) return;
+ if (inFlight === 0 && Date.now() - lastRequestTime > quietWindow) return;
await new Promise(r => setTimeout(r, checkInterval));
}
} finally {
page.off('request', onRequest);
- page.off('requestfinished', onRequest);
- page.off('requestfailed', onRequest);
+ page.off('requestfinished', onRequestDone);
+ page.off('requestfailed', onRequestDone);
}
}- const signals: Promise<any>[] = [
- this.waitForNetworkQuiet(page, 10000, 1000),
- page.evaluate(async () => {
+ const networkQuiet = this.waitForNetworkQuiet(page, 10000, 1000);
+ const domStable = page.evaluate(async () => {
let lastLen = 0;
let stableIterations = 0;
for (let i = 0; i < 60; i++) {
const currentLen = document.body.innerText.length;
@@
}
return false;
- }).catch(() => {}),
- new Promise(resolve => setTimeout(resolve, 10000))
- ];
+ }).catch(() => false);
+ const hardTimeout = new Promise(resolve => setTimeout(resolve, 10000));
@@
- await Promise.race(signals);
+ await Promise.race([
+ Promise.all([networkQuiet, domStable]),
+ hardTimeout
+ ]);
await new Promise(resolve => setTimeout(resolve, 1500));Also applies to: 506-536
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@maxun-core/src/interpret.ts` around lines 447 - 458, waitForNetworkQuiet
currently only updates lastRequestTime and can report "quiet" while long
requests are still in-flight; change it to track an in-flight counter by
incrementing on 'request' and decrementing on 'requestfinished'/'requestfailed',
and only consider the network quiet when inFlight === 0 and Date.now() -
lastRequestTime > quietWindow (use the same check loop and clear listeners on
return). In waitForDynamicStability avoid Promise.race(signals) which allows
early return on a single signal; instead await both network and DOM stability
together (e.g., Promise.all([waitForNetworkQuiet(...), waitForDomStable(...)])
or require both boolean flags be true for the stability window) so DOM text
changes and in-flight requests are both resolved before proceeding; apply the
same counter/listener and combined-wait pattern to the duplicate implementation
around the other occurrence (the block referenced in the review).
| private getUpcomingExtractionSelector(remainingWorkflow: Workflow): string | null { | ||
| if (!remainingWorkflow || remainingWorkflow.length === 0) return null; | ||
|
|
||
| for (let i = remainingWorkflow.length - 1; i >= 0; i--) { | ||
| const pair = remainingWorkflow[i]; | ||
| for (const s of pair.what) { | ||
| if (s.action === 'goto') return null; | ||
|
|
||
| if (s.action === 'scrapeList' || s.action === 'scrapeSchema') { | ||
| const firstArg = Array.isArray(s.args) ? s.args[0] : s.args; | ||
| if (firstArg?.listSelector) return firstArg.listSelector; | ||
| if (firstArg?.fields) { | ||
| const firstField = Object.values(firstArg.fields)[0] as any; | ||
| if (firstField?.selector) return firstField.selector; | ||
| } | ||
| if (firstArg?.selector) return firstArg.selector; | ||
| } |
There was a problem hiding this comment.
Include scrape(selector) in selector-aware stability waits.
wawActions.scrape passes the target selector into waitForDynamicStability, but getUpcomingExtractionSelector only extracts selectors for scrapeList/scrapeSchema. A late-rendered scrape(selector) target can still be missed.
🐛 Proposed fix
for (let i = remainingWorkflow.length - 1; i >= 0; i--) {
const pair = remainingWorkflow[i];
for (const s of pair.what) {
if (s.action === 'goto') return null;
+ if (s.action === 'scrape') {
+ const firstArg = Array.isArray(s.args) ? s.args[0] : s.args;
+ if (typeof firstArg === 'string' && firstArg.trim()) return firstArg;
+ if (firstArg?.selector) return firstArg.selector;
+ }
+
if (s.action === 'scrapeList' || s.action === 'scrapeSchema') {
const firstArg = Array.isArray(s.args) ? s.args[0] : s.args;
if (firstArg?.listSelector) return firstArg.listSelector;
if (firstArg?.fields) {Also applies to: 643-646
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@maxun-core/src/interpret.ts` around lines 470 - 486, The
getUpcomingExtractionSelector function currently only looks for selectors in
actions 'scrapeList' and 'scrapeSchema', causing standalone scrape(selector)
targets to be missed; update getUpcomingExtractionSelector to also handle
s.action === 'scrape' by extracting the selector from s.args (same logic used
for firstArg: check firstArg.listSelector, firstArg.fields → first
field.selector, then firstArg.selector or if s.args is a string/selector use
that) so it returns the selector for scrape actions as well; apply the same
change to the analogous location around the other occurrence (the later block
referenced) to keep both selector-aware stability waits consistent.
| private async waitForImagesLoaded(page: Page): Promise<void> { | ||
| await page.waitForFunction( | ||
| () => Array.from(document.images).every(img => img.complete), | ||
| { timeout: 5000 } | ||
| ).catch(() => {}); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Playwright Page.waitForFunction signature pageFunction arg options timeout
💡 Result:
The signature of Page.waitForFunction in Playwright (Node.js) is: await page.waitForFunction(pageFunction, arg?, options?) - pageFunction: string | ((arg?: any) => any | Promise) - Function to be evaluated in the page context until it returns a truthy value. - arg?: any - Optional argument passed to pageFunction (Serializable or JSHandle). - options?: { timeout?: number; polling?: 'raf' | 'mutation' | number } - Optional wait options. - timeout: Maximum time to wait in milliseconds (defaults to 30 seconds). - polling: Polling interval or strategy ('raf' for requestAnimationFrame, 'mutation' for MutationObserver, or number in ms). It returns Promise of the truthy value. Example from docs: const watchDog = page.waitForFunction( => window.innerWidth < 100); await page.setViewportSize({ width: 50, height: 50 }); await watchDog; With arg: const selector = '.foo'; await page.waitForFunction(selector => !!document.querySelector(selector), selector); Note: page.waitForFunction is still available as of 2026 (present in current docs and source code), added before v1.9, not deprecated/removed unlike some other wait methods. Prefer locators/expect for most waiting needs due to auto-waiting. The function executes in browser context.
Citations:
- 1: https://playwright.dev/docs/api/class-page
- 2: https://playwright.dev/docs/next/api/class-page
- 3: https://playwright.help/docs/next/api/class-page
- 4: https://stackoverflow.com/questions/66426138/page-waitforfunction-not-working-as-expected-playwright-js
- 5: [BUG] Wait for function ignores timeout option microsoft/playwright#3074
- 6: https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/client/page.ts
🏁 Script executed:
# Read the file at the specified lines
cat -n maxun-core/src/interpret.ts | sed -n '490,505p'Repository: getmaxun/maxun
Length of output: 641
Pass waitForFunction options as the third argument.
{ timeout: 5000 } is currently passed as the second argument (the optional arg parameter), but it should be passed as the third argument (the options parameter). Without the correct options argument, Playwright ignores the 5-second timeout and uses the default 30-second timeout instead, causing screenshot actions to wait much longer when images never complete.
🐛 Proposed fix
private async waitForImagesLoaded(page: Page): Promise<void> {
await page.waitForFunction(
() => Array.from(document.images).every(img => img.complete),
+ null,
{ timeout: 5000 }
).catch(() => {});
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private async waitForImagesLoaded(page: Page): Promise<void> { | |
| await page.waitForFunction( | |
| () => Array.from(document.images).every(img => img.complete), | |
| { timeout: 5000 } | |
| ).catch(() => {}); | |
| private async waitForImagesLoaded(page: Page): Promise<void> { | |
| await page.waitForFunction( | |
| () => Array.from(document.images).every(img => img.complete), | |
| null, | |
| { timeout: 5000 } | |
| ).catch(() => {}); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@maxun-core/src/interpret.ts` around lines 495 - 499, In waitForImagesLoaded,
page.waitForFunction currently passes the timeout object as the second argument
(arg) instead of the third (options), so Playwright ignores the 5s timeout;
update the call to pass the predicate as the first arg, pass undefined (or any
intended arg) as the second arg, and supply { timeout: 5000 } as the third
argument to ensure the 5s timeout is honored (keep the existing catch handling);
locate the call in the waitForImagesLoaded method and adjust the
page.waitForFunction invocation accordingly.
| if (methodName === 'goto') { | ||
| try { | ||
| const gotoArgs = step.args || []; | ||
| const url = gotoArgs[0]; | ||
| const existingOpts: Record<string, any> = (typeof gotoArgs[1] === 'object' && gotoArgs[1] !== null) | ||
| ? { ...gotoArgs[1] } | ||
| : {}; | ||
|
|
||
| const requestedWait = existingOpts.waitUntil; | ||
| const remaining = (currentWorkflow || []).slice(0, -1); | ||
| const needsDataSoon = this.blockNeedsVisualRender(steps) || this.remainingWorkflowNeedsVisualRender(remaining); | ||
|
|
||
| if (!requestedWait || requestedWait === 'networkidle' || requestedWait === 'load') { | ||
| existingOpts.waitUntil = 'domcontentloaded'; | ||
| this.log( | ||
| `goto: navigation speed-optimized to 'domcontentloaded' + surgical-ready midground`, | ||
| Level.LOG | ||
| ); | ||
| } | ||
| if (!existingOpts.timeout) existingOpts.timeout = 15000; | ||
|
|
||
| await executeAction(invokee, methodName, [url, existingOpts]); | ||
|
|
||
| if (needsDataSoon) { | ||
| await this.waitForDynamicStability(page, (currentWorkflow || []).slice(0, -1)); | ||
| } | ||
| } catch (error: any) { | ||
| this.log(`goto failed: ${error.message}`, Level.WARN); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Inspect repository usage of goto arguments and explicit waitUntil settings.
rg -nP -C3 '(["'\'']action["'\'']\s*:\s*["'\'']goto["'\'']|action\s*:\s*["'\'']goto["'\'']|waitUntil\s*:\s*["'\''](networkidle|load)["'\''])'Repository: getmaxun/maxun
Length of output: 4915
🏁 Script executed:
#!/bin/bash
# Search for goto actions with explicit waitUntil in args (second argument)
rg -nP -C2 'action\s*:\s*["\']goto["\']' | head -50Repository: getmaxun/maxun
Length of output: 126
🏁 Script executed:
#!/bin/bash
# Look for any step.args or args assignments that might be non-array
fd -t f '\.(ts|js)$' | xargs grep -l 'goto' | head -10 | xargs grep -nP 'args\s*:\s*(?!\[)' | head -20Repository: getmaxun/maxun
Length of output: 1870
🏁 Script executed:
#!/bin/bash
# Check if workflows ever pass waitUntil in the second argument to goto
rg -nP 'goto.*args.*waitUntil|goto["\'].*\[.*waitUntil' | head -30Repository: getmaxun/maxun
Length of output: 126
🏁 Script executed:
#!/bin/bash
sed -n '1840,1920p' maxun-core/src/interpret.tsRepository: getmaxun/maxun
Length of output: 3641
🏁 Script executed:
#!/bin/bash
# Check how other handlers deal with non-array args
sed -n '1845,1855p' maxun-core/src/interpret.ts
sed -n '1915,1925p' maxun-core/src/interpret.tsRepository: getmaxun/maxun
Length of output: 1226
🏁 Script executed:
#!/bin/bash
# Search for any workflow definitions that might use goto with waitUntil in args
rg -n 'action.*goto' maxun-core/src -A2 | grep -E '(action|args|waitUntil)' | head -30Repository: getmaxun/maxun
Length of output: 337
Preserve workflow goto semantics and surface navigation failure.
This special case assumes step.args is always an array (inconsistent with how click and waitForLoadState handle non-array args in the same file), unconditionally overwrites explicit waitUntil: 'networkidle' | 'load' options regardless of user intent, and swallows navigation errors without rethrowing—allowing failed navigation to be marked "done" in the workflow.
Proposed fix
if (methodName === 'goto') {
try {
- const gotoArgs = step.args || [];
+ const gotoArgs = Array.isArray(step.args)
+ ? step.args
+ : step.args !== undefined
+ ? [step.args]
+ : [];
const url = gotoArgs[0];
const existingOpts: Record<string, any> = (typeof gotoArgs[1] === 'object' && gotoArgs[1] !== null)
? { ...gotoArgs[1] }
: {};
const requestedWait = existingOpts.waitUntil;
const remaining = (currentWorkflow || []).slice(0, -1);
const needsDataSoon = this.blockNeedsVisualRender(steps) || this.remainingWorkflowNeedsVisualRender(remaining);
- if (!requestedWait || requestedWait === 'networkidle' || requestedWait === 'load') {
- existingOpts.waitUntil = 'domcontentloaded';
+ if (!requestedWait) {
+ existingOpts.waitUntil = this.getNavigationWaitStrategy(needsDataSoon);
this.log(
- `goto: navigation speed-optimized to 'domcontentloaded' + surgical-ready midground`,
+ `goto: navigation using '${existingOpts.waitUntil}'`,
Level.LOG
);
}
if (!existingOpts.timeout) existingOpts.timeout = 15000;
await executeAction(invokee, methodName, [url, existingOpts]);
if (needsDataSoon) {
await this.waitForDynamicStability(page, (currentWorkflow || []).slice(0, -1));
}
} catch (error: any) {
this.log(`goto failed: ${error.message}`, Level.WARN);
+ throw error;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@maxun-core/src/interpret.ts` around lines 1871 - 1899, The goto branch
incorrectly assumes step.args is an array, force-overwrites user-specified
waitUntil values, and swallows navigation failures; update the methodName ===
'goto' handling so it: 1) normalizes step.args the same way as
click/waitForLoadState (treat non-array args as single-arg array) before reading
url and opts; 2) only change existingOpts.waitUntil to 'domcontentloaded' when
there was no explicit waitUntil (i.e., preserve 'networkidle'/'load' if
provided) while still applying the default timeout fallback; 3) after logging
navigation errors from executeAction, rethrow the error so the workflow can
observe the failure (keep the existing waitForDynamicStability call and checks
like needsDataSoon, blockNeedsVisualRender, remainingWorkflowNeedsVisualRender
intact).
What this PR does?
Summary by CodeRabbit
New Features
Improvements