feat: add triage-issues agent skill for Shield workflow by tudorpopams · Pull Request #36012 · microsoft/fluentui

tudorpopams · 2026-04-20T16:06:04Z

Summary

Adds a new agent skill (triage-issues) that walks the Needs: Triage :mag: queue on this repo, classifies each issue against the Shield triage decision tree, and recommends labels, assignees, and comments — applying them via gh only after the human approves.
Proactive repro validation with playwright-cli: during classification the skill flags bug reports as validation candidates (has a sandbox + headless-observable + not perf/env/AT-dependent) and proposes them to the user with one-line reasons. The user confirms (validate yes / validate all / a subset / skip validation), the skill captures a screenshot + DOM snapshot + console output, and feeds the evidence back into the recommendation table. Resolution: Can't Repro is surfaced as a candidate only — never auto-applied.
For feature requests that cite v8 behavior, the skill investigates v9's documented composition patterns (Field, react-motion-components-preview, useAnnounce, etc.) and defaults to Resolution: By Design when a v9 pattern already addresses the ask. This keeps the backlog honest instead of auto-labeling everything Needs: Backlog review.
Reference docs describe Shield: P1 and Partner Ask as signal-based decisions (critical-regression evidence, tracked-workstream context) rather than identity-based tiering. External community reports go through the same triage path as any other issue.

The workflow

Fetch — query the Needs: Triage :mag: queue oldest-first.
Classify — per issue, decide classification, product, priority, label/assignee recommendations, and validation_candidate.
Present — recommendation table + proposed validation set with per-issue reasons.
Validate (optional, when user confirms) — playwright-cli against the reporter's StackBlitz/CodeSandbox/hosted Storybook or a local Storybook spin-up. Produces repros / does_not_repro / cannot_determine verdicts. Non-candidates (feature requests, root-cause-included reports, perf regressions, browser-specific, a11y interactions) are filtered out upfront; explicit overrides still run but get a headless-limitations warning.
Approve — user says apply / skip / edit / ask for more validation.
Apply — gh issue edit / gh issue comment / gh issue close, one issue at a time with per-issue result.
Summarize — what got triaged, what needs human follow-up, what was skipped.

What's in the skill

SKILL.md — workflow, decision rules, gh commands, proactive-validation proposal, recommend-then-apply gate
references/shield-guidelines.md — distilled decision tree
references/triage-labels.md — label allow-list (validated against the live repo labels API)
references/partner-orgs.md — what Shield: P1 / Partner Ask mean and what they don't
evals/evals.json — trigger prompts
AGENTS.md — skill registry updated

Operational specifics that used to live in partner-orgs.md (the actual list of tracked workstreams and known reporter handles) were deliberately kept out of this PR and live in the triager's private Claude memory instead — community contributors shouldn't have to read a tiering list to understand the triage process.

Test plan

Skill runs end-to-end against the live Needs: Triage :mag: queue (9 issues at time of authoring; all correctly triaged — see recent activity on microsoft/fluentui)
gh issue edit / gh issue close with the recommended labels succeeds (label allow-list validated against gh api repos/microsoft/fluentui/labels)
Approval gate: skill does not mutate any issue without explicit user approval, including during validation
v9 investigation step: skill correctly identifies Resolution: By Design for v8→v9 feature asks where composition addresses the need
Run the proactive-validation flow on a fresh queue with at least one sandbox-backed bug and one perf-style bug; confirm the latter is correctly excluded as a non-candidate
Confirm playwright-cli install path (npm install -g @playwright/cli@0.1.1) still works when the tool isn't preinstalled, since validation depends on it
Second triager reviews the reference docs for tone and accuracy before this PR leaves draft

🤖 Generated with Claude Code

Introduces a new agent skill that walks the `Needs: Triage 🔍` queue on microsoft/fluentui, classifies each issue against the Shield triage decision tree, and recommends labels, assignees, and comments before applying any changes via the `gh` CLI. The skill operates in recommend-then-apply mode: the LLM never mutates issues until the human has approved the batch. For feature requests that cite v8 behavior, the skill is instructed to investigate v9's documented composition patterns (Field, react-motion-components-preview, useAnnounce, etc.) and default to `Resolution: By Design` when a v9 pattern already addresses the ask — avoiding backlog pollution. Reference docs intentionally describe `Shield: P1` and `Partner Ask` as signal-based decisions (critical-regression evidence, tracked workstream context) rather than identity-based tiering, so external community reports go through the same triage path as any other issue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-20T16:08:34Z

📊 Bundle size report

✅ No changes found

github-actions · 2026-04-20T16:09:34Z

Pull request demo site: URL

Adds a Step 3.5 to the triage-issues skill that lets the human ask the skill to validate specific issues' reproductions with playwright-cli before approving triage. Reuses the install pattern from the visual-test skill. The validation pass visits the reporter's StackBlitz/CodeSandbox (or spins up local Storybook when no sandbox is provided), captures a screenshot + DOM snapshot + console output, and classifies the result as `repros`, `does_not_repro`, or `cannot_determine`. A `does_not_repro` result is surfaced as a `Resolution: Can't Repro` candidate only — never auto-applied — so the human still decides based on the evidence. Explicitly documents what validation is not for: feature requests, reports with a documented root cause + diff, perf regressions, OS-specific behavior, and assistive-tech interactions — headless doesn't give reliable signal there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Flips the validation flow so the skill takes the initiative: during Step 2 classification it now decides a `validation_candidate` boolean per issue, and Step 3 presents the proposed validation set with one-line reasons for each. The user confirms (yes / all / subset / skip) rather than having to think to ask. Moves the "when to validate vs not" heuristic into Step 3 where the candidate decision is made, next to the examples the user will be looking at. Step 3.5 is reframed as the execution of a human-confirmed set, not an opt-in on user request. Keeps the approval gate in Step 4 unchanged — validation produces evidence only, never a mutation. Users can still manually request additional validation after seeing the table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Investigated a real Copilot session on microsoft#35874 that failed visual validation for three reasons: (1) it tried the workspace-wide Storybook (public-docsite-v9) and hit HMR restart loops + missing unstable package errors, (2) it used a project name alias that may not exist in older workspace snapshots, (3) it fell back to guessing ports because the dev target is unexpectedly declared with `cache: true` which replays cached output and exits. This commit: - Forbids the workspace-wide Storybook for validation, explicitly. The per-component stories package is the only reliable path. - Switches the primary command to `react-<component>-stories:storybook` (direct target on the stories project) with `--skip-nx-cache`, so the advice works even in workspace snapshots that predate the library-level `start` alias. - Replaces the port-guessing loop with a proper detection pattern: find the storybook child PID (the nx wrapper often exits 0 after delegating) and read its listening socket via lsof. - Adds a troubleshooting section mapping the three failure modes the Copilot session hit to their real causes. The triage-issues validation step (which delegates to this skill) is updated to reinforce the per-component-only rule inline, so an agent that reads only the triage skill still gets the warning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tudorpopams and others added 3 commits April 20, 2026 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add triage-issues agent skill for Shield workflow#36012

feat: add triage-issues agent skill for Shield workflow#36012
tudorpopams wants to merge 4 commits intomicrosoft:masterfrom
tudorpopams:feat/triage-issues-skill

tudorpopams commented Apr 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tudorpopams commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The workflow

What's in the skill

Test plan

Uh oh!

github-actions Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Bundle size report

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tudorpopams commented Apr 20, 2026 •

edited

Loading

github-actions Bot commented Apr 20, 2026 •

edited

Loading