Skip to content

[Bug]: Parent session stuck "waiting for background agents" — notification delivery failures silently swallowed, polling stops prematurely #2627

@neocody

Description

@neocody

Prerequisites

  • I will write this issue in English (see our Language Policy)
  • I have searched existing issues to avoid duplicates
  • I am using the latest version of oh-my-opencode
  • I have read the documentation or asked an AI coding agent with this project's GitHub URL loaded and couldn't find the answer

Bug Description

Parent sessions frequently get permanently stuck showing "waiting for background agents to complete" with a spinning indicator but no stop button. The absence of the stop button indicates no agent is actually running, yet the parent session never recovers. This requires starting a new session.

This is distinct from #1517 (fast-completing tasks / MIN_IDLE_TIME_MS) and partially overlaps with #2292 (task lifecycle bugs), but focuses specifically on the notification delivery path which can fail silently even when task completion detection works correctly.

Root Cause Analysis

Investigated the compiled source in dist/index.js (v3.11.2) and found 4 interrelated bugs in the notification delivery pipeline:

Bug 1 (P0): tryCompleteTask() changes status before notification succeeds

// ~L76736-76768
async tryCompleteTask(task, source) {
    task.status = "completed";       // ← Status changed IMMEDIATELY
    task.completedAt = new Date();
    // ...
    await this.enqueueNotificationForParent(task.parentSessionID,
      () => this.notifyParentSession(task));  // ← Async, can fail
    return true;
}

Once task.status = "completed", hasRunningTasks() returns false → stop button disappears. But if notifyParentSession() fails, the parent session never learns the task completed → stuck forever.

Bug 2 (P0): Polling stops while notifications are still undelivered

// ~L76978-77035
async pollRunningTasks() {
    // ...
    if (!this.hasRunningTasks()) {
        this.stopPolling();  // ← Stops even if notifications failed
    }
}

hasRunningTasks() only checks task status, not pending notifications. Once polling stops, there is no recovery mechanism to retry failed notifications.

Bug 3 (P1): Notification errors silently swallowed

// ~L77081-77094
enqueueNotificationForParent(parentSessionID, operation) {
    const current = previous.catch(() => {}).then(operation);
    // ...
    current.finally(() => { ... }).catch(() => {});  // ← ALL errors swallowed
}

Any error from notifyParentSession() is caught and discarded. No retry, no logging at this level, no fallback.

Bug 4 (P1): Only AbortedSessionError gets fallback handling

// ~L76861-76887
try {
    await this.client.session.promptAsync({ ... });
} catch (error) {
    if (isAbortedSessionError(error)) {
        this.queuePendingNotification(task.parentSessionID, notification);
    } else {
        log("[background-agent] Failed to send notification:", error);
        // ← NO RETRY, NO QUEUE — notification is lost
    }
}

Timeouts, network issues, or any transient error other than AbortedSessionError results in a permanently lost notification.

The Stuck State Sequence

1. Background task completes → status="completed"
2. notifyParentSession() fails (timeout, transient error)
3. Error swallowed by catch(() => {})
4. hasRunningTasks() returns false → stop button gone
5. stopPolling() called → no recovery mechanism
6. Parent session waits forever for notification that will never come

Steps to Reproduce

This is intermittent and timing-dependent. Most reliably triggered when:

  1. Multiple background agents are running concurrently
  2. Parent session is temporarily busy (processing another notification or user input)
  3. A background task completes during this window
  4. promptAsync() to parent fails with a non-abort error

The user observes: spinning indicator, no stop button, "waiting for background agents" message that never clears.

Expected Behavior

  • Notification failures should be retried with backoff
  • Polling should continue while undelivered notifications exist
  • Task status should not change to "completed" until notification is confirmed delivered
  • OR: a separate "notified" flag should track delivery, and polling should check it

Proposed Fix

  1. Don't change task status until notification succeeds (or add a separate notified flag)
  2. Keep polling while pending notifications existstopPolling() should check both hasRunningTasks() AND hasPendingNotifications()
  3. Add retry with exponential backoff for all notification failures, not just AbortedSessionError
  4. Don't swallow errors in enqueueNotificationForParent — at minimum log them

Relationship to Other Issues

Environment

  • oh-my-opencode: 3.11.2 (latest npm)
  • OpenCode: 1.2.27
  • OS: macOS (Darwin 24.6.0)
  • Providers: Anthropic (Claude), OpenAI

Operating System

macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions