Skip to content

feat: add Agent/Team/Workflow factories for multi-tenant AgentOS#7549

Open
ashpreetbedi wants to merge 18 commits intov2.6.0from
feat/agent-factories
Open

feat: add Agent/Team/Workflow factories for multi-tenant AgentOS#7549
ashpreetbedi wants to merge 18 commits intov2.6.0from
feat/agent-factories

Conversation

@ashpreetbedi
Copy link
Copy Markdown
Contributor

Summary

Add first-class factory support to AgentOS for per-request, context-driven agent/team/workflow construction. This enables multi-tenant deployments where the agent's tools, instructions, model, or database scope depend on who is calling — driven by JWT claims, request body input, or any other request-time context.

DX example:

from agno.agent import Agent, AgentFactory

def build_tenant_agent(ctx):
    role = ctx.trusted.claims["role"]  # from verified JWT middleware
    tools = [read_docs_tool()]
    if role == "admin":
        tools.append(manage_members_tool())
    return Agent(
        id=f"tenant_{ctx.user_id}",
        model=OpenAIResponses(id="gpt-5.4"),
        tools=tools,
    )

agent_os = AgentOS(
    agents=[AgentFactory(id="tenant-agent", factory=build_tenant_agent)],
)

Key changes

  • New classes: AgentFactory, TeamFactory, WorkflowFactory — registered callables that produce components per request
  • RequestContext with TrustedContext — separates verified middleware claims from untrusted client input at the type level
  • factory_input form field on all run endpoints — JSON payload validated against optional input_schema
  • Async factory support via get_*_by_id_async variants
  • Factory discovery in list endpoints with type: "factory" and factory_input_schema
  • Exception hierarchy: FactoryError → 500, FactoryValidationError → 400, FactoryPermissionError → 403
  • 22 unit tests covering invocation, input validation, error handling, trust split

Files changed

File Change
agno/agent/factory.py New. Core types + exceptions
agno/team/factory.py New. TeamFactory
agno/workflow/factory.py New. WorkflowFactory
agno/os/utils.py Factory resolution branch + async variants + build_request_context helper
agno/os/app.py Widened types, factory skips in init/registry/tracing/db-discovery
agno/os/routers/agents/router.py factory_input param, factory error handling
agno/os/routers/teams/router.py Same
agno/os/routers/workflows/router.py Same
agno/os/routers/agents/schema.py type, factory_input_schema fields, from_factory()
agno/os/routers/teams/schema.py Same
tests/unit/os/test_factories.py New. 22 tests

The existing prototype deep_copy path is completely unchanged. All changes are additive.

Type of change

  • New feature

Checklist

  • Code complies with style guidelines
  • Ran format/validation scripts (ruff format and ruff check)
  • Self-review completed
  • Documentation updated (comments, docstrings)
  • Tested in clean environment
  • Tests added/updated (if applicable)

Duplicate and AI-Generated PR Check

  • I have searched existing open pull requests and confirmed that no other PR already addresses this issue
  • Check if this PR was entirely AI-generated (by Copilot, Claude Code, Cursor, etc.)

Additional Notes

  • Design spec: specs/agno/features/agent-factories/design.md
  • Out of scope for v1: caching/memoization, nested factory composition (team members that are factories), playground UI form generation, registry/DB rehydration of factories
  • Security: RequestContext.trusted vs .input split forces factory authors to explicitly reach into verified claims for authorization decisions — the most likely multi-tenancy bug (client-supplied field deciding tools) is visible at code review time

🤖 Generated with Claude Code

…agent construction

Enable multi-tenant AgentOS deployments where the agent's tools, instructions,
model, or database scope depend on who is calling. Factories are registered
callables that AgentOS invokes on each request with a RequestContext, returning
a freshly built Agent/Team/Workflow.

Key additions:
- AgentFactory, TeamFactory, WorkflowFactory classes
- RequestContext with TrustedContext for JWT-claim-driven authorization
- factory_input form field on all run endpoints
- Async factory support via get_*_by_id_async variants
- Factory discovery in list endpoints (type: "factory")
- Input validation via pydantic input_schema
- Exception hierarchy: FactoryError, FactoryValidationError,
  FactoryPermissionError, FactoryContextRequired
- 22 unit tests covering invocation, validation, error handling

The existing prototype deep_copy path is completely unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ashpreetbedi ashpreetbedi requested a review from a team as a code owner April 16, 2026 12:43
@kausmeows kausmeows changed the base branch from main to v2.6.0 April 19, 2026 20:00
Comment thread cookbook/05_agent_os/factories/agent/03_jwt_role_factory.py
Comment thread cookbook/05_agent_os/factories/agent/01_basic_factory.py Outdated
Comment thread cookbook/05_agent_os/factories/agent/02_input_schema_factory.py Outdated
Comment thread libs/agno/agno/agent/factory.py Outdated
Comment thread libs/agno/agno/os/routers/workflows/router.py Outdated
@ysolanky
Copy link
Copy Markdown
Member

AgentFactory, TeamFactory, and WorkflowFactory are near-identical (~80 lines each). The only difference is the class name and docstring references. This should be a single generic Factory[T] base class, with the three as thin subclasses (or even just type aliases). Right now any bug fix (e.g., in validate_input) needs to be applied in 3 places.

Comment thread libs/agno/agno/agent/factory.py Outdated
Comment thread libs/agno/agno/agent/factory.py Outdated
Comment thread libs/agno/agno/os/routers/agents/router.py Outdated
agent = get_agent_by_id(
agent_id=agent_id, agents=os.agents, db=os.db, registry=os.registry, create_fresh=True
)
except FactoryContextRequired:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors here should be consistent with async -- and again should not just be factory errors

Copy link
Copy Markdown
Contributor

@kausmeows kausmeows Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors here should be consistent with async

we can added Exception too yes, but other factory errors are not needed because these other endpoints (cancel, continue, get_run, list_runs, get_config) operate on an already-running or already-existing agent. There's no factory to invoke. so that's why we only raise FactoryContextRequired here

Comment thread libs/agno/agno/os/routers/agents/router.py Outdated
)
async def get_agent(agent_id: str, request: Request) -> AgentResponse:
agent = get_agent_by_id(agent_id=agent_id, agents=os.agents, db=os.db, registry=os.registry, create_fresh=True)
# Check if it's a factory first — return factory info without requiring a RequestContext
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again, the errors can be raised by the get_agent_by_id function. We cannot just raise the FactoryContextRequired error.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread libs/agno/agno/os/routers/agents/router.py Outdated
agent = get_agent_by_id(
agent_id=agent_id, agents=os.agents, db=os.db, registry=os.registry, create_fresh=True
)
except FactoryContextRequired:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging this as well, but all the exceptions need to be reworked.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread libs/agno/agno/os/routers/agents/schema.py Outdated
Comment thread libs/agno/agno/agent/factory.py Outdated
Comment thread libs/agno/agno/os/routers/teams/router.py Outdated
Comment thread libs/agno/agno/os/routers/teams/router.py
Comment thread libs/agno/agno/os/routers/workflows/router.py
Comment thread libs/agno/agno/os/app.py Outdated
get_eval_router(dbs=self.dbs, agents=self.agents, teams=self.teams),
get_eval_router(
dbs=self.dbs,
agents=[a for a in self.agents if not isinstance(a, AgentFactory)] if self.agents else None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factory primitives can be provided to the eval router, no?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need to be supported seperately, the eval runner accesses agent.model directly and there's no RequestContext available in the eval context. Supporting factories in evals would require the eval config to specify mock context values (user_id, claims, factory_input), which is new API surface. maybe for evals 2.0

Comment thread libs/agno/agno/os/app.py Outdated
db: Optional[Union[BaseDb, AsyncBaseDb, RemoteDb]] = None

for agent in self.agents or []:
if isinstance(agent, AgentFactory):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding these checks inside the setup tracing function, we should have already filtered these factory primitives out.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed an update here- 2f1022e

Comment thread libs/agno/agno/os/mcp.py Outdated
Comment thread libs/agno/agno/os/router.py Outdated
Comment thread libs/agno/agno/os/utils.py Outdated
validated_input = agent.validate_input(ctx.input)
from dataclasses import replace

ctx_with_input = replace(ctx, input=validated_input)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Business logic of how a factory should work should not be inside OS utils.

@ysolanky
Copy link
Copy Markdown
Member

ysolanky commented Apr 21, 2026

  • factory-backed runs cannot be managed after the initial POST /runs. The new lookup helpers now raise FactoryContextRequired whenever a factory is resolved without a request context (utils.py, utils.py, utils.py). The create-run endpoints pass ctx, but every follow-up route still calls the plain helper, so factory IDs stop working for cancel/poll/list/continue flows after the run has been created (agents/router.py, agents/router.py, teams/router.py, teams/router.py, workflows/router.py, workflows/router.py). Since the produced instance ID is explicitly rewritten back to the factory registration ID, this is not an edge case; it breaks normal background-run and approval lifecycles for every factory resource. Team /continue is worse and will likely surface as a 500 because it does not catch FactoryContextRequired at all (teams/router.py).

  • workflow factories are not discoverable with the same contract as agent/team factories. Agents and teams now expose type="factory" plus factory_input_schema in their response models (agents/schema.py, teams/schema.py), but workflows never got equivalent fields in either the detail or summary schemas (workflows/schema.py, os/schema.py). The router therefore returns only bare id/name/description for workflow factories (workflows/router.py, workflows/router.py). A client cannot tell that a workflow is factory-backed or discover the expected factory_input shape, which is inconsistent with the agent/team behavior and with the PR’s stated discovery story.

  • result.id = agent_id silently discards the factory author’s ID - the SSE matching rationale is documented in code but factory authors will be confused when their IDs vanish.

  • build_request_context swallows malformed JSON - passes raw broken strings through as ctx.input when no input_schema is set.

  • from dataclasses import replace imported 6 times inside function bodies - minor, but symptom of the duplication problem.

  • No continue_agent_run support for factories - these endpoints can never work with factory agents, and it’s unclear if that’s intentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants