You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The gh-aw agent ecosystem is operating at a WHM score of 70/100 — a 3-point decline from April 20's 73 — against a fleet of 197 workflows all on schema v3. The platform story today is one of remarkable forward momentum shadowed by two entrenched infrastructure failures. On the positive side: 50 commits and 49 merged PRs in 24 hours, near-total automation of the development pipeline, a successful multi-engine expansion (OpenCode/Crush + Codex engine driver architecture), and the Dead Code Removal Agent's first successful PR (#27567, removing 4 functions). The Auto-Triage agent reports 0% unlabeled issues — a clean-house milestone.
The two dominant failure stories from last week's briefing remain unresolved: the Codex engine 401 auth failure (#27127) is now on Day 4, with all OpenAI-backed workflows (AI Moderator, Duplicate Code Detector, Smoke Codex) completely dark. And 15 stale lock files have re-emerged after touching 0 on Apr 18–19, likely from the multi-engine expansion commits. A new user-reported issue (#27556) describes MCP get_file_contents calls taking 37–71s each, causing 40-minute workflow timeouts that weren't occurring last week — suggesting a potential MCP Gateway performance regression.
This analysis covers 46 discussions updated in the past 7 days (34 audits, 8 announcements, 4 dev), 500 issues in the 7-day window (68 open / 432 closed), and 30 workflow runs from the last 48 hours.
📊 Pattern Analysis
Positive Patterns
AI-native development cadence is fully established — The repository chronicle (Apr 20, #27386) and Team Evolution report (Apr 21, #27540) confirm that Copilot is responsible for 82% of commits (41/50 in 24h), with automated bots contributing another 14%. pelikhan is the primary human orchestrator, assigning tasks and reviewing results. This is no longer a human-assisted AI repo — it is an AI-driven repo with human oversight.
Multi-engine architectural pivot underway — Apr 21 shipped configurable engine driver scripts, experimental OpenCode/Crush support alongside Codex, and worked through CODEX_HOME environment variable collisions. The parallel security hardening stream (SEC-004 conformance, SHA pinning, firewall bumps to v0.25.26) signals that expansion is happening with intentional guardrails.
Issue throughput extremely high — 500 issues in 7 days: 432 closed (86.4% resolution rate), avg resolution time ~0.7 days. All 9 human-opened issues were resolved within the week. The automated lifecycle is functioning as a high-throughput triage queue.
Auto-triage in perfect health — 0% unlabeled issues as of Apr 21 (#27510), well below the 5% target threshold. First time this metric has been at zero since tracking began.
New analytical capabilities maturing — Since the Apr 3 DeepReport: Terminal Stylist, Docs Noob Tester, Claude Code User Docs Review, Schema Consistency Checker, Go Type Consistency (typist), Lockfile Statistics, Daily Firewall Report, DIFC Integrity Report, and Copilot Session Insights are all operational. The analytical surface area of the fleet has expanded dramatically.
Concerning Patterns
Codex 401 auth failure: P0, Day 4 with no resolution signal — Every Codex-engine workflow (AI Moderator, Duplicate Code Detector, Smoke Codex, Gemini Smoke) has been completely dark since April 18. The Agent Performance Report (#27491) assigns AI Moderator a Q:10 E:5 score — the lowest in the fleet — and explicitly recommends escalation. The issue (#27127) has been open for 4 days with no fix comment or assignee update visible in the data.
Stale lock file cycle is structural, not one-time — Stale lock files peaked at 19 (Apr 3) → 0 (Apr 18–19) → 15 (Apr 21). The pattern suggests that every significant batch of workflow source edits (multi-engine expansion, CLI consistency crusade) generates a new wave of stale locks. The Workflow Health Manager (#27563) has automated detection, but the recompile step remains manual.
MCP get_file_contents performance regression (user-reported) — Issue #27556 documents a workflow that was completing in ~15 minutes now timing out at 40 minutes, with get_file_contents calls taking 37–71s each. This was filed by an external user today and hasn't appeared in any agent-generated audit — a gap in observability coverage.
Documentation Unbloat: high cost, zero output — 58 turns, $2.46/run, 0 safe outputs. The observability kit has flagged this workflow resource_heavy + partially_reducible for multiple consecutive days, and Agent Performance Report (#27491) gives it Q:48 E:52. The workflow is consuming significant budget without producing any verifiable artifact.
Emerging Patterns
Agent Persona Explorer over-engineering — 95% data-gathering turns, model_downgrade flag, poor_agentic_control. The observability signals are consistent: this workflow is using an LLM to do work that shell commands could do. First observed Apr 20; confirmed Apr 21.
Design Decision Gate turn-count instability — Escalating from 4→8→10 turns across consecutive runs (Apr 20), now with a structural P1 filed (#27470). The workflow design assumes ADR path = 5 turns, but real execution requires 6+. Pattern is repeatable and deterministic, not random.
list_code_scanning_alerts context overflow — The MCP Structural Analysis (#27570) rates this tool 1/5 with a critical note: 145K chars, no size limit, reliably breaks agent context. Any security-audit workflow that uses this tool naively will silently corrupt its context. Not yet filed as a blocking issue.
📈 Trend Intelligence
Metric
Apr 3
Apr 17
Apr 20
Apr 21
WHM Score
72
75
73
70
Stale Lock Files
19
0
0
15
Codex Engine Status
OK
OK
❌ P0
❌ P0 Day 4
GPU Runner T4 (node)
OK
OK
❌
❌
Safe Output Success Rate
100%
100%
100%
100%
Issue 7-Day Close Rate
—
—
86.4%
86.4%
Workflow Fleet Size
241
196
197
197
WHM trend is declining (75→73→70), primarily driven by the stale locks (+15) and the Codex P0. If the Codex auth failure is resolved this week, the score should recover to ~76.
Token cost baseline established at ~$7–8.50/day (Apr 20: $7.48 audit; $8.54 observability window). Previously this was unmeasured. Three workflows account for the majority: Documentation Unbloat ($2.46), Copilot Prompt Clustering ($1.69), Sergo ($1.25).
Issue volume accelerating — Daily creation: Apr 15: 11, Apr 16: 99, Apr 17: 104, Apr 18: 88, Apr 19: 45, Apr 20: 92, Apr 21 (partial): 61. Weekday average ~90 issues/day, mostly automated. The tracker is a healthy pipeline, not a backlog.
🚨 Notable Findings
Exciting: Dead Code Removal Agent delivered its first PR — PR #27567 removed 4 dead functions from the codebase. This agent is one of the newest additions to the fleet and its first successful code-change PR is a meaningful milestone for autonomous code maintenance.
Exciting: Copilot 100% agent success rate (all-time streak) — The Copilot Session Insights report (#27568) confirms that every Copilot coding agent run in the historical record has succeeded (100% success rate), with the only exceptions being "still pending" states (Apr 7, Apr 18), not actual failures. This streak is remarkable and worth protecting.
Suspicious: MCP get_file_contents 37–71s latency (new user report) — Issue #27556 came from an external user and describes a performance degradation that wasn't observed in any agent audit. The timing (Apr 21, same day as multi-engine expansion commits) is suspicious. The firewall report (#27329) shows the (unknown) domain blocking category at 89% of all blocks — unresolved CONNECT tunnel requests — which could be related if the MCP gateway is hitting firewall-related hangs on some DNS lookups.
Anomaly: Hippo Memory store is empty — Despite the Hippo Memory agent running daily, it has accumulated 0 memories. The store requires conventional commit keywords (fix:, revert:) to auto-learn, but the AI-dominated commit history uses non-standard message formats. The agent is running but not learning — a silent effectiveness gap.
Anomaly: DIFC filtering volume spike (174 events in 2 days) — The DIFC Integrity report (#27221) shows 174 filtered events in the Apr 18–19 window, concentrated in Sub-Issue Closer (67) and Dev workflow (73). Both workflows are scanning issues with none:all integrity tags. While this is expected behavior, the volume suggests these workflows could be tuned to pre-filter lower-integrity content before agent invocation.
🔮 Predictions and Recommendations
Codex P0 will continue blocking until manual credential rotation — There is no automated recovery path for a 401 auth failure on an OPENAI_API_KEY. This requires human action. Every day of delay costs the ecosystem 5+ workflow runs per day that produce no output. Recommend escalating to whoever manages the OpenAI API key in the org secrets.
Stale lock file wave is predictable — Any batch PR merging workflow source changes will generate stale locks. Recommend adding an automated post-merge compilation step or a PR check that fails if .md is newer than .lock.yml. The Workflow Health Manager can detect these but can't fix them — closing this gap would prevent the stale lock cycle from recurring.
MCP get_file_contents latency issue will worsen — If the 37–71s call latency is real and reproducible (not a one-time incident), it will start appearing in the observability toolkit's resource_heavy_for_domain assessments for any workflow that reads files. Watch the Apr 22 observability report for a jump in flagged workflows.
New agent quality control needed — Agent Persona Explorer and Documentation Unbloat both exhibit the same anti-pattern: LLM doing data-gathering work that belongs in shell pre-steps. As the fleet expands to 200+ workflows, having a "new workflow quality gate" that checks for partially_reducible > 50% before merging would catch these inefficiencies at creation time rather than after weeks of wasteful runs.
✅ Actionable Agentic Tasks (Quick Wins)
7 GitHub issues have been created from this analysis. All are labelled automation, improvement, quick-win:
#
Task
Effort
Impact
1
Fix Design Decision Gate max_turns: 5 → 10
Fast (<30m)
Eliminates P1 recurring failure
2
Cap Documentation Unbloat at max_turns:30, require ≥1 safe output
Medium (1–4h)
Save ~$1.50/run ($45/mo)
3
Recompile 15 stale lock files (batch gh aw compile)
Fast (<30m)
WHM score +5, correct workflow configs
4
Add Gemini to Quick Start prereqs + "Which engine?" guide
Medium (1–4h)
Reduces new-user friction for non-Copilot users
5
Downgrade Agent Persona Explorer to claude-haiku-4-5 + pre-agent shell steps
Medium (1–4h)
Save ~$1.80/run, eliminate 95% wasted turns
6
Seed Hippo Memory store with hippo init --scan --days 365
Medium (1–4h)
Activates dormant institutional memory agent
7
Add pagination/truncation guard to list_code_scanning_alerts calls
Quick (<1h)
Unblocks all security-audit agents from using code scanning data
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
🔍 Executive Summary
The gh-aw agent ecosystem is operating at a WHM score of 70/100 — a 3-point decline from April 20's 73 — against a fleet of 197 workflows all on schema v3. The platform story today is one of remarkable forward momentum shadowed by two entrenched infrastructure failures. On the positive side: 50 commits and 49 merged PRs in 24 hours, near-total automation of the development pipeline, a successful multi-engine expansion (OpenCode/Crush + Codex engine driver architecture), and the Dead Code Removal Agent's first successful PR (#27567, removing 4 functions). The Auto-Triage agent reports 0% unlabeled issues — a clean-house milestone.
The two dominant failure stories from last week's briefing remain unresolved: the Codex engine 401 auth failure (#27127) is now on Day 4, with all OpenAI-backed workflows (AI Moderator, Duplicate Code Detector, Smoke Codex) completely dark. And 15 stale lock files have re-emerged after touching 0 on Apr 18–19, likely from the multi-engine expansion commits. A new user-reported issue (#27556) describes MCP
get_file_contentscalls taking 37–71s each, causing 40-minute workflow timeouts that weren't occurring last week — suggesting a potential MCP Gateway performance regression.This analysis covers 46 discussions updated in the past 7 days (34 audits, 8 announcements, 4 dev), 500 issues in the 7-day window (68 open / 432 closed), and 30 workflow runs from the last 48 hours.
📊 Pattern Analysis
Positive Patterns
AI-native development cadence is fully established — The repository chronicle (Apr 20, #27386) and Team Evolution report (Apr 21, #27540) confirm that Copilot is responsible for 82% of commits (41/50 in 24h), with automated bots contributing another 14%.
pelikhanis the primary human orchestrator, assigning tasks and reviewing results. This is no longer a human-assisted AI repo — it is an AI-driven repo with human oversight.Multi-engine architectural pivot underway — Apr 21 shipped configurable engine driver scripts, experimental OpenCode/Crush support alongside Codex, and worked through
CODEX_HOMEenvironment variable collisions. The parallel security hardening stream (SEC-004 conformance, SHA pinning, firewall bumps to v0.25.26) signals that expansion is happening with intentional guardrails.Issue throughput extremely high — 500 issues in 7 days: 432 closed (86.4% resolution rate), avg resolution time ~0.7 days. All 9 human-opened issues were resolved within the week. The automated lifecycle is functioning as a high-throughput triage queue.
Auto-triage in perfect health — 0% unlabeled issues as of Apr 21 (#27510), well below the 5% target threshold. First time this metric has been at zero since tracking began.
New analytical capabilities maturing — Since the Apr 3 DeepReport: Terminal Stylist, Docs Noob Tester, Claude Code User Docs Review, Schema Consistency Checker, Go Type Consistency (typist), Lockfile Statistics, Daily Firewall Report, DIFC Integrity Report, and Copilot Session Insights are all operational. The analytical surface area of the fleet has expanded dramatically.
Concerning Patterns
Codex 401 auth failure: P0, Day 4 with no resolution signal — Every Codex-engine workflow (AI Moderator, Duplicate Code Detector, Smoke Codex, Gemini Smoke) has been completely dark since April 18. The Agent Performance Report (#27491) assigns AI Moderator a Q:10 E:5 score — the lowest in the fleet — and explicitly recommends escalation. The issue (#27127) has been open for 4 days with no fix comment or assignee update visible in the data.
Stale lock file cycle is structural, not one-time — Stale lock files peaked at 19 (Apr 3) → 0 (Apr 18–19) → 15 (Apr 21). The pattern suggests that every significant batch of workflow source edits (multi-engine expansion, CLI consistency crusade) generates a new wave of stale locks. The Workflow Health Manager (#27563) has automated detection, but the recompile step remains manual.
MCP
get_file_contentsperformance regression (user-reported) — Issue #27556 documents a workflow that was completing in ~15 minutes now timing out at 40 minutes, withget_file_contentscalls taking 37–71s each. This was filed by an external user today and hasn't appeared in any agent-generated audit — a gap in observability coverage.Documentation Unbloat: high cost, zero output — 58 turns, $2.46/run, 0 safe outputs. The observability kit has flagged this workflow
resource_heavy+partially_reduciblefor multiple consecutive days, and Agent Performance Report (#27491) gives it Q:48 E:52. The workflow is consuming significant budget without producing any verifiable artifact.Emerging Patterns
Agent Persona Explorer over-engineering — 95% data-gathering turns,
model_downgradeflag,poor_agentic_control. The observability signals are consistent: this workflow is using an LLM to do work that shell commands could do. First observed Apr 20; confirmed Apr 21.Design Decision Gate turn-count instability — Escalating from 4→8→10 turns across consecutive runs (Apr 20), now with a structural P1 filed (#27470). The workflow design assumes ADR path = 5 turns, but real execution requires 6+. Pattern is repeatable and deterministic, not random.
list_code_scanning_alertscontext overflow — The MCP Structural Analysis (#27570) rates this tool 1/5 with a critical note: 145K chars, no size limit, reliably breaks agent context. Any security-audit workflow that uses this tool naively will silently corrupt its context. Not yet filed as a blocking issue.📈 Trend Intelligence
WHM trend is declining (75→73→70), primarily driven by the stale locks (+15) and the Codex P0. If the Codex auth failure is resolved this week, the score should recover to ~76.
Token cost baseline established at ~$7–8.50/day (Apr 20: $7.48 audit; $8.54 observability window). Previously this was unmeasured. Three workflows account for the majority: Documentation Unbloat ($2.46), Copilot Prompt Clustering ($1.69), Sergo ($1.25).
Issue volume accelerating — Daily creation: Apr 15: 11, Apr 16: 99, Apr 17: 104, Apr 18: 88, Apr 19: 45, Apr 20: 92, Apr 21 (partial): 61. Weekday average ~90 issues/day, mostly automated. The tracker is a healthy pipeline, not a backlog.
🚨 Notable Findings
Exciting: Dead Code Removal Agent delivered its first PR — PR #27567 removed 4 dead functions from the codebase. This agent is one of the newest additions to the fleet and its first successful code-change PR is a meaningful milestone for autonomous code maintenance.
Exciting: Copilot 100% agent success rate (all-time streak) — The Copilot Session Insights report (#27568) confirms that every Copilot coding agent run in the historical record has succeeded (100% success rate), with the only exceptions being "still pending" states (Apr 7, Apr 18), not actual failures. This streak is remarkable and worth protecting.
Suspicious: MCP
get_file_contents37–71s latency (new user report) — Issue #27556 came from an external user and describes a performance degradation that wasn't observed in any agent audit. The timing (Apr 21, same day as multi-engine expansion commits) is suspicious. The firewall report (#27329) shows the(unknown)domain blocking category at 89% of all blocks — unresolved CONNECT tunnel requests — which could be related if the MCP gateway is hitting firewall-related hangs on some DNS lookups.Anomaly: Hippo Memory store is empty — Despite the Hippo Memory agent running daily, it has accumulated 0 memories. The store requires conventional commit keywords (
fix:,revert:) to auto-learn, but the AI-dominated commit history uses non-standard message formats. The agent is running but not learning — a silent effectiveness gap.Anomaly: DIFC filtering volume spike (174 events in 2 days) — The DIFC Integrity report (#27221) shows 174 filtered events in the Apr 18–19 window, concentrated in Sub-Issue Closer (67) and Dev workflow (73). Both workflows are scanning issues with
none:allintegrity tags. While this is expected behavior, the volume suggests these workflows could be tuned to pre-filter lower-integrity content before agent invocation.🔮 Predictions and Recommendations
Codex P0 will continue blocking until manual credential rotation — There is no automated recovery path for a 401 auth failure on an
OPENAI_API_KEY. This requires human action. Every day of delay costs the ecosystem 5+ workflow runs per day that produce no output. Recommend escalating to whoever manages the OpenAI API key in the org secrets.Stale lock file wave is predictable — Any batch PR merging workflow source changes will generate stale locks. Recommend adding an automated post-merge compilation step or a PR check that fails if
.mdis newer than.lock.yml. The Workflow Health Manager can detect these but can't fix them — closing this gap would prevent the stale lock cycle from recurring.MCP get_file_contents latency issue will worsen — If the 37–71s call latency is real and reproducible (not a one-time incident), it will start appearing in the observability toolkit's
resource_heavy_for_domainassessments for any workflow that reads files. Watch the Apr 22 observability report for a jump in flagged workflows.New agent quality control needed — Agent Persona Explorer and Documentation Unbloat both exhibit the same anti-pattern: LLM doing data-gathering work that belongs in shell pre-steps. As the fleet expands to 200+ workflows, having a "new workflow quality gate" that checks for
partially_reducible > 50%before merging would catch these inefficiencies at creation time rather than after weeks of wasteful runs.✅ Actionable Agentic Tasks (Quick Wins)
7 GitHub issues have been created from this analysis. All are labelled
automation,improvement,quick-win:max_turns: 5 → 10max_turns:30, require ≥1 safe outputgh aw compile)claude-haiku-4-5+ pre-agent shell stepshippo init --scan --days 365list_code_scanning_alertscalls📚 Source Attribution
Discussions analyzed (past 7 days, selected):
Issues referenced: #27127 (Codex P0), #27373 (GPU runner T4), #27556 (MCP latency), #27561 (MCP Gateway), #27563 (stale locks), #27595 (architecture violations), #27566 (APM skill form), #27470 (Design Decision Gate max_turns)
Workflow runs (48h): §24729745351 (this run); 30 runs analyzed from logs cache (Apr 19–21)
Repo-memory used:
/memory/deep-report/— previous snapshots from 2026-04-03; all stale entries updated in this run.Data window: 2026-04-14 – 2026-04-21 (7 days); full-refresh analysis (last analysis was 2026-04-03, 18 days prior)
References:
Beta Was this translation helpful? Give feedback.
All reactions