Skip to content

feat: Add input field summaries to Actor search results#737

Open
MQ37 wants to merge 3 commits intomasterfrom
feat/search-actors-input-fields
Open

feat: Add input field summaries to Actor search results#737
MQ37 wants to merge 3 commits intomasterfrom
feat/search-actors-input-fields

Conversation

@MQ37
Copy link
Copy Markdown
Contributor

@MQ37 MQ37 commented Apr 21, 2026

Summary

  • Include input field names, types, and required status in search-actors results so LLMs can prepare Actor calls without a separate fetch-actor-details round-trip
  • Fetch Actor build definitions in parallel (concurrency=10) after search, with dual-layer caching (inputFieldsCache + actorDefinitionPrunedCache)
  • Display as TypeScript-like notation in text: url: string, maxResults?: number
  • Only show "main" input fields using sectionCaption heuristic — fields before the first section caption are the core inputs, everything after is advanced config. Reduces 20-35 fields down to 4-8 per actor.
  • Show total field count when truncated: (4 of 35) so the LLM knows to use fetch-actor-details for the full schema

Text output example

- **Input fields (4 of 35):** searchStringsArray?: array, locationQuery?: string, maxCrawledPlacesPerSearch?: integer, language?: string

Structured output example

"inputFields": [
  { "name": "searchStringsArray", "type": "array", "required": false },
  { "name": "locationQuery", "type": "string", "required": false }
],
"totalInputFields": 35

Input field selection heuristic

Apify Actor input schemas use sectionCaption to group fields into UI sections. Fields before the first sectionCaption are the "main" inputs (search query, URL, limit). Fields after are advanced config (proxy, custom code, deprecated). We return only the first section. Fallback: cap at 10 fields if no sections exist.

Testing

  • Type check, lint, and 84 unit tests pass
  • Manually verified via mcpc against local server for Google Maps and Instagram actors

@github-actions github-actions Bot added t-ai Issues owned by the AI team. tested Temporary label used only programatically for some analytics. labels Apr 21, 2026
@MQ37 MQ37 force-pushed the feat/search-actors-input-fields branch 3 times, most recently from 5f31e83 to a873a4a Compare April 21, 2026 14:46
@MQ37 MQ37 marked this pull request as ready for review April 21, 2026 16:23
@MQ37 MQ37 marked this pull request as draft April 21, 2026 16:36
Include input field names, types, and required status in search-actors
results so LLMs can prepare Actor calls without a separate
fetch-actor-details round-trip.

- Fetch Actor build definitions in parallel (concurrency=10) after search
- Check actorDefinitionPrunedCache + dedicated inputFieldsCache for hits
- Display as TypeScript-like notation: name: type, name?: type
- Add reusable runWithConcurrency utility to src/utils/generic.ts
- Add constants rule to CLAUDE.md
@MQ37 MQ37 force-pushed the feat/search-actors-input-fields branch from a873a4a to 9c1de65 Compare April 22, 2026 11:50
@MQ37 MQ37 marked this pull request as ready for review April 22, 2026 12:00
@MQ37 MQ37 requested a review from jirispilka April 22, 2026 12:00
@jirispilka
Copy link
Copy Markdown
Collaborator

jirispilka commented Apr 23, 2026

I'm afraid this PR adds a lot of complexity. It was supposed to be a simple addition to the existing logic.

The branch adds: a new cache, a generic semaphore utility, a parallel fetch pipeline that duplicates defaultBuild()+schema, and a third argument.

I can see three approaches:

  1. Provide minimal code and reuse existing things as much as possible. Do not fetch inputs for all search results but for top 10. Use Promise.allSettled. Remove the semaphore. Extract helper for the defaultBuild()+schema and reuse it at the two places.

  2. Add this to apify-core /store/search endpoint with responseFormat=agent. It already uses an $in join in fetchAgentSafetyContext (Acts2 + UsersActs2 + UserBackgroundChecks in parallel). Adding one more $in against ActorBuilds could be fine imo.

  3. Store a pre-computed input info into the Algolia index. We would not need to hit Mongo at all as the schema would be stored there. I'm not sure how often we sync the data. So there is a risk we might get a stale input fields.

My order of preference would be 3 -> 2 -> 1 but let me pull @Jkuzz for his opinion here.

Copy link
Copy Markdown
Collaborator

@jirispilka jirispilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment here: #737

@MQ37
Copy link
Copy Markdown
Contributor Author

MQ37 commented Apr 23, 2026

Makes sense, it is large and inefficient change as I did not want to touch the API or MongoDB - let's touch the API then.

I would be for the option 2 so we do the input fields lookup from MongoDB for the default build - no need to handle the Algolia. Right? @Jkuzz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-ai Issues owned by the AI team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants