Skip to content

feat: Slack HITL multi-row approvals with all pause types#7574

Open
Mustafa-Esoofally wants to merge 13 commits intomainfrom
feat/slack-hitl
Open

feat: Slack HITL multi-row approvals with all pause types#7574
Mustafa-Esoofally wants to merge 13 commits intomainfrom
feat/slack-hitl

Conversation

@Mustafa-Esoofally
Copy link
Copy Markdown
Contributor

Summary

Extends the Slack AgentOS interface so a single RunPausedEvent surfaces N RunRequirements as interactive rows in one thread message. Each row branches on its pause type and resolves independently; the run continues after the last row is decided.

  • confirmation → Approve / Reject / Reject with reason buttons on the row
  • user_input → "Provide input…" opens a typed modal (str/int/float/bool/list/dict with per-field validation)
  • user_feedback → "Answer…" opens a modal with StaticSelect / Checkboxes per question
  • external_execution → "Submit result…" opens a text modal

The aggregate approval.status flips to approved/rejected only after acontinue_run succeeds, so a failed continuation leaves the approval pending and rows re-clickable instead of orphaning the paused run.

Type of change

  • New feature

What's new

New files

  • libs/agno/agno/os/interfaces/slack/resolutions.py — pure helpers (classify_requirement, compute_aggregate_status, apply_resolutions_to_requirements, initialize_requirement_resolutions)
  • libs/agno/agno/os/interfaces/slack/blocks.pypause_request_blocks (multi-row Block Kit builder) + reject_reason_modal_view, user_input_modal_view, user_feedback_modal_view, external_execution_modal_view
  • libs/agno/agno/os/interfaces/slack/interactions.py/slack/interactions route with per-row dispatch, synchronous modal views_open, inline view_submission validation (response_action: errors), and background _continue_run_and_finalize
  • libs/agno/agno/os/interfaces/slack/pause_handler.pyrender_pause_ui (seeds per-row state in resolution_data.requirement_resolutions, CAS-reserves Slack metadata, posts the block message)

Modified

  • router.py — call render_pause_ui from both streaming and non-streaming paths
  • slack.py, state.py, events.py, tools/slack.py, pyproject.toml — HITL plumbing (hitl_enabled, paused-event capture, slack_sdk>=3.41.0 for Block Kit modal dialogs)

Cookbooks

  • cookbook/05_agent_os/interfaces/slack/approval_flow.py — now also passes db=agent_db to AgentOS so /approvals is available alongside Slack; instructions encourage tool batching so Slack can render multi-row pauses
  • cookbook/05_agent_os/approvals/approval_multi_tool.py — web-UI counterpart kept for dashboard comparison

Design notes

Per-row state lives inside approval.resolution_data.requirement_resolutions as a dict keyed by RunRequirement.id. Status CAS remains on the top-level status column — aggregate "pending" → "approved"/"rejected" is the locking primitive for "which worker gets to call acontinue_run". JSON merge inside resolution_data is last-write-wins in a narrow ~1ms window (accepted for v1; _version field reserved for future CAS hardening).

ask_user tools set both requires_user_input=True and user_feedback_schema. The classifier checks feedback first so those rows render with a question modal instead of a plain text-input modal (this is a subtle but important distinction that the core pause_type enum doesn't currently express).

chat.postMessage has a 50-block cap (modals allow 100). Confirmation rows use Section + Actions (2 blocks); other types flatten to Section with an accessory Button (1 block). The renderer clamps to _MAX_ROWS_PER_MESSAGE = 20 to keep the worst case safely under 50 blocks.

Out of scope (documented)

  • Cross-surface sync: if an admin resolves via the os.agno.com Approvals dashboard while Slack buttons are still visible, the dashboard's /resolve doesn't call acontinue_run — the run stays paused. Will be addressed when approvals move inline into the chat page.
  • Team HITL: slack.py still gates hitl_enabled=True to Agent (not Team / Workflow).
  • MongoDB: approvals table isn't yet supported there; SQLite + Postgres only.
  • Per-requirement _version CAS: deferred; last-write-wins accepted for v1 given the narrow concurrent-click window.

Checklist

  • Code complies with style guidelines
  • Ran format/validation scripts (./scripts/format.sh and ./scripts/validate.sh — ruff clean, mypy clean on touched files)
  • Self-review completed
  • Documentation updated (module + function docstrings, cross-reference comments)
  • Examples and guides: approval_flow.py updated; approval_multi_tool.py web-UI cookbook preserved
  • Tested in clean environment — pending live E2E with ngrok + Slack app (in progress)
  • Tests added/updated — deferred (no unit tests in this PR; integration tests exist for the single-row path in libs/agno/tests/integration/agent/human_in_the_loop/)

Duplicate and AI-Generated PR Check

  • I have searched existing open pull requests and confirmed that no other PR already addresses this
  • Check if this PR was entirely AI-generated (by Copilot, Claude Code, Cursor, etc.) — this PR was drafted with AI assistance following the Anthropic PR conventions; human review and live E2E validation are part of the ship criteria

Additional notes

  • The commit message and this PR body describe v1 scope. Phase 2 candidates: per-row _version CAS, dashboard→Slack resume hook, moving approvals inline into the chat-page UI.
  • Codex design + code reviews were run against the implementation; the top findings (classifier order, block-limit 50 vs 100, synchronous modal validation, status-flip ordering, subset-of-requirements safety) are all addressed in this PR.

Extends the Slack interface's human-in-the-loop flow so a single
RunPausedEvent can surface N RunRequirements as interactive rows in
one thread message. Each row branches on its pause type:

  - confirmation      -> Approve / Reject / Reject with reason buttons
  - user_input        -> "Provide input..." button opens a typed modal
  - user_feedback     -> "Answer..." button opens a question modal
  - external_execution-> "Submit result..." button opens a text modal

Aggregate approval.status flips to approved/rejected only AFTER
acontinue_run succeeds, so a failed continue leaves the approval
pending and rows re-clickable rather than orphaning the paused run.

New files:
  resolutions.py     pure helpers: classifier / aggregator / applier
  blocks.py          multi-row Block Kit builder + 4 modal views
  interactions.py    /interactions dispatcher, modal flow, continuation
  pause_handler.py   render_pause_ui (replaces render_approval_ui)

Other changes:
  router.py          call render_pause_ui; drop unused kwargs from
                     attach_interaction_routes
  approval_flow.py   pass db=agent_db to AgentOS so /approvals works
                     alongside Slack; instructions now encourage batching

Slack message block cap is 50 (not 100 like modals); rows use
Section+Actions (2 blocks) for confirmations and a single Section
with accessory Button (1 block) for modal-opening rows to fit.

ask_user tools set requires_user_input=True AND populate
user_feedback_schema; the classifier checks feedback first so they
render as feedback modals instead of plain text inputs.

Dashboard cross-surface sync (admin resolves via os.agno.com while
Slack still shows buttons) is out of scope for v1 and will be
addressed when approvals move inline into the chat page.
@Mustafa-Esoofally Mustafa-Esoofally requested a review from a team as a code owner April 17, 2026 21:07
Replace the non-streaming ``acontinue_run(stream=False)`` +
``chat.postMessage`` chunking in ``_continue_run_and_finalize`` with
the same ``chat_stream`` + ``process_event`` + ``StreamState`` pipeline
the initial run uses.

User-visible effect:

  - "Thinking..." typing indicator appears under the resolved row while
    the agent resumes (was: dead air for the full continuation latency)
  - Tool calls render as task cards live (was: nothing until final post)
  - Response text streams in as it's generated (was: one postMessage)
  - Media produced during the resumed run is uploaded (was: dropped)

Ordering fix: ``approval.status`` is now flipped to
``approved``/``rejected`` only after ``stream.stop`` returns cleanly.
A failed continuation leaves the approval pending so a pending-row
re-click can retry; the UI stays in a coherent state instead of
rendering rows as resolved while the run is stranded paused.

Packaging: the 4 streaming kwargs (entity_name, entity_type,
task_display_mode, buffer_size) that the /interactions dispatcher
threads to the continuation are now bundled in a ``StreamingConfig``
dataclass instead of trailing through 4 levels of helper signatures.
Mirror the per-tool card pattern from
agno-os/src/pages/Chat/components/DynamicHITLComponent — each
RunRequirement renders as its own card with Confirm/Reject buttons,
decisions resolve independently, and there is no global "Approve all"
shortcut.

Button renames:

  Approve  → Confirm    (matches ConfirmationDialog icon button semantics)
  Reject                 (unchanged label, simplified dispatch)
  Reject with reason    → dropped as a 3rd button; Reject now routes
                          straight to the reason modal when the mode
                          allows a note, mirroring the inline textarea
                          the web UI reveals on reject-click

Other chat-page parity:
  - ACTION_APPROVE_ALL footer removed (chat page is per-tool)
  - Per-row resolutions stay independent; each click resolves one row,
    continuation fires when all rows are resolved

Team support: hitl_enabled now accepts Agent OR Team. The render and
continuation paths were already entity-agnostic (aget_session,
acontinue_run, paused_run.requirements), so the unlock is a
validation gate + type widening:

  - slack.py::_validate_hitl_backend accepts agent or team and
    rejects only RemoteAgent/RemoteTeam
  - router.py replaces the isinstance(Agent) gate with
    isinstance((Agent, Team))
  - interactions.py widens `entity: Agent` to `entity: Agent | Team`

Workflow HITL stays out of scope — workflows don't expose
requirements[] at the top level; it surfaces via inner agent steps
which need separate routing (phase 2).

AgentOS dashboard logic (single APPROVE/DENY for the whole approval)
is a different pattern used by the admin panel; this PR follows the
chat-page pattern as agreed.
Replace the bare ``Section(text=...)`` + ``Actions(...)`` layout with a
richer composition that uses more of Slack's Block Kit primitives:

  - Header block for "Approval required"
  - Context block subtitle ("N actions need your decision...")
  - Divider between rows
  - Emoji prefix per pause kind (:wrench: confirmation, :pencil2:
    user_input, :speech_balloon: user_feedback, :rocket: external)
  - Section.fields 2-column grid for tool args (replaces bullet list)

Terminal resolved message gets the same treatment: Header with the
aggregate status + Context subtitle with the run count and a status
emoji (:white_check_mark: / :x:) + per-row resolved lines.

Row count cap drops to 15 to stay under Slack's 50-block message cap
now that each row is 3 blocks (Divider + Section + optional fields +
Actions) instead of 2.

No behavior or contract change — purely visual.
Removes code that was kept "for one release" or as future-hook
metadata but never actually participates in the live flow:

  - ACTION_APPROVE_ALL constant (button was removed with AgentOS
    chat-page parity; the action_id was left as deprecated baggage)
  - pending_confirmation_count / non_confirmation_pending counters
    (their only reader was the approve-all footer, also gone)
  - _apply_modal_resolution_and_maybe_continue's `payload` kwarg
    (threaded through but never read)
  - resolution_data["_version"] bumping in pause_handler and
    interactions (the CAS is on approval.status, never on _version,
    so the field only lied about concurrency protection)

Pure removal — no behavior change. Ruff + mypy clean, block rendering
and button set verified unchanged by a smoke render.
Per the repo convention (docstrings belong in toolkit files only;
comments capture WHY not WHAT), remove:

  - All module docstrings in the four slack HITL files
  - All function + class docstrings
  - Section-banner comments (``# === Router attachment ===`` and
    ``# --- header ---`` style dividers)
  - Comments that narrate what the next line does

Preserve the WHY comments that capture hidden invariants:

  - classify_tool_exec_dict: user_feedback must win over user_input
    so ``ask_user`` tools render as a feedback modal
  - compute_aggregate_status: empty resolutions treated as pending
  - apply_resolutions_to_requirements: external_execution raises in
    _tools.py:533 if result is None, so we enforce non-empty here
  - apply_resolutions_to_requirements: "resolved" on a confirmation
    row is safer than implicit approval

Line totals across the four files: 2280 → 1957 (-323).
Improves identifier clarity and replaces several loose ``str``
annotations with Literals so mypy/reviewers see intent.

Renames:

  clicker                    -> actor_slack_user_id
  classify_tool_exec_dict    -> classify_tool_execution
  _views_open                -> _open_view_safely
  req_dict / req_id          -> requirement / requirement_id
  pre_post_data              -> reserved_resolution_data
  post_patch                 -> message_ts_resolution_data
  _MAX_ROWS_PER_MESSAGE      -> _MAX_REQUIREMENTS_PER_MESSAGE

Types:

  RequirementKind = Literal["confirmation", "user_input",
                             "user_feedback", "external_execution"]
  EntityType      = Literal["agent", "team", "workflow"]

  - REQ_* constants are now typed as RequirementKind
  - classify_tool_execution + classify_requirement return RequirementKind
  - _classify_tool_flags (new) factors the shared branching so object
    and dict callers share one implementation
  - _KIND_EMOJI: Dict[RequirementKind, str] replaces .get(kind, ":wrench:")
    with direct indexing now that kind is Literal-constrained
  - StreamingConfig.entity_type: EntityType (was str, required a
    "# type: ignore[arg-type]" when passed to StreamState)

The wire JSON keys ("q" for requirement_id, "k" for kind, etc.) are
unchanged — only Python-side identifier naming shifted.
Splits a handful of long if/elif branches into dispatch tables and
factors out duplicated JSON construction:

  blocks.py
    - _encode_private_metadata dedupes the 4 identical modal
      private_metadata JSON blocks
    - _MODAL_ROW_UI dispatch table replaces the 4-way if/elif in
      _modal_row_blocks (button text, action_id, type_label, act_kind)

  resolutions.py
    - _apply_{confirmation,user_input,user_feedback,external_execution}
      replace the 60-line branched body of
      apply_resolutions_to_requirements
    - _APPLY_BY_KIND maps RequirementKind -> applicator; the public
      function is now a ~12-line dispatcher

  pause_handler.py
    - _most_recent inlined at its one call site (trivial
      one-line helper)

Verified: every applicator path (approve/reject/user_input/feedback/
external) still produces the same RunRequirement mutations.
Small consistency fixes:

  - interactions.py: replace two `import traceback; log_error(
    f"... {traceback.format_exc()}")` blocks with `log_exception(...)`
    from agno.utils.log — matches the project logging style
  - slack.py: collapse the 7-line "check capability not type" paragraph
    in _validate_hitl_backend into one comment; narrow the bare
    `# type: ignore` on APIRouter construction to `[arg-type]`
  - router.py: drop the redundant 2-line "HITL is gated at
    _validate_hitl_backend" comment above the hitl isinstance check —
    the gate itself documents the invariant
The continuation orchestrator was 190+ lines mixing: session load,
requirement hydration, resolution application, streaming with cleanup,
approval status flip, and nested-pause rendering. Split into four
helpers plus a thin orchestrator so each step has one concern and
reads in one pass:

  _hydrate_requirements(raw)
      Coerce session-serialized requirement dicts back into
      RunRequirement instances (handles the DB-round-trip case).

  _stream_continuation(...) -> StreamState | None
      Set "Thinking..." status, open chat_stream, pipe acontinue_run
      events through process_event, close cleanly, upload media.
      Returns the populated StreamState on success, None if the
      continuation raised (after cleaning up the stream + status).

  _finalize_approval_status(...)
      CAS-flip approval.status to the aggregate and refresh the
      pause card with the resolved summary.

  _render_nested_pause(...)
      Post a new pause UI for any RunPausedEvent the continuation
      produced.

  _continue_run_and_finalize(...) is now ~60 lines of orchestration.

Also strips the remaining `# ==== ... ===` section banners in this
file (Codex finding 3/31).
The rendering orchestrator was one ~130-line function mixing approval
lookup, metadata reservation, message posting, ts patching, and
stale-race handling. Split into four async helpers so each step has
one concern:

  _reserve_slack_delivery(...)       CAS-write Slack meta into
                                     resolution_data BEFORE posting
  _post_pause_message(...)           build blocks + chat.postMessage
  _patch_message_ts(...)             store the posted ts for later
                                     chat.update calls
  _replace_stale_pause_message(...)  handle the race where the
                                     approval resolved during the
                                     chat.postMessage window

  render_pause_ui(...)               thin orchestrator (~40 lines)

Renames ``_build_pre_post_resolution_data`` to
``_build_reserved_resolution_data`` to match the returned name, and
drops the ``message_ts`` parameter (always None at build time — it's
added later by _patch_message_ts).
Both _extract_user_input_row and _extract_user_feedback_row did the
same three-step walk: get_approval -> find_requirement -> pull schema
from tool_execution. Extract the shared lookup into
_load_modal_tool_schema(schema_key=...) so each extractor keeps only
its own per-field/per-question logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant