Skip to content

feat: add triage-issues agent skill for Shield workflow#36012

Draft
tudorpopams wants to merge 4 commits intomicrosoft:masterfrom
tudorpopams:feat/triage-issues-skill
Draft

feat: add triage-issues agent skill for Shield workflow#36012
tudorpopams wants to merge 4 commits intomicrosoft:masterfrom
tudorpopams:feat/triage-issues-skill

Conversation

@tudorpopams
Copy link
Copy Markdown
Contributor

@tudorpopams tudorpopams commented Apr 20, 2026

Summary

  • Adds a new agent skill (triage-issues) that walks the Needs: Triage :mag: queue on this repo, classifies each issue against the Shield triage decision tree, and recommends labels, assignees, and comments — applying them via gh only after the human approves.
  • Proactive repro validation with playwright-cli: during classification the skill flags bug reports as validation candidates (has a sandbox + headless-observable + not perf/env/AT-dependent) and proposes them to the user with one-line reasons. The user confirms (validate yes / validate all / a subset / skip validation), the skill captures a screenshot + DOM snapshot + console output, and feeds the evidence back into the recommendation table. Resolution: Can't Repro is surfaced as a candidate only — never auto-applied.
  • For feature requests that cite v8 behavior, the skill investigates v9's documented composition patterns (Field, react-motion-components-preview, useAnnounce, etc.) and defaults to Resolution: By Design when a v9 pattern already addresses the ask. This keeps the backlog honest instead of auto-labeling everything Needs: Backlog review.
  • Reference docs describe Shield: P1 and Partner Ask as signal-based decisions (critical-regression evidence, tracked-workstream context) rather than identity-based tiering. External community reports go through the same triage path as any other issue.

The workflow

  1. Fetch — query the Needs: Triage :mag: queue oldest-first.
  2. Classify — per issue, decide classification, product, priority, label/assignee recommendations, and validation_candidate.
  3. Present — recommendation table + proposed validation set with per-issue reasons.
  4. Validate (optional, when user confirms) — playwright-cli against the reporter's StackBlitz/CodeSandbox/hosted Storybook or a local Storybook spin-up. Produces repros / does_not_repro / cannot_determine verdicts. Non-candidates (feature requests, root-cause-included reports, perf regressions, browser-specific, a11y interactions) are filtered out upfront; explicit overrides still run but get a headless-limitations warning.
  5. Approve — user says apply / skip / edit / ask for more validation.
  6. Applygh issue edit / gh issue comment / gh issue close, one issue at a time with per-issue result.
  7. Summarize — what got triaged, what needs human follow-up, what was skipped.

What's in the skill

  • SKILL.md — workflow, decision rules, gh commands, proactive-validation proposal, recommend-then-apply gate
  • references/shield-guidelines.md — distilled decision tree
  • references/triage-labels.md — label allow-list (validated against the live repo labels API)
  • references/partner-orgs.md — what Shield: P1 / Partner Ask mean and what they don't
  • evals/evals.json — trigger prompts
  • AGENTS.md — skill registry updated

Operational specifics that used to live in partner-orgs.md (the actual list of tracked workstreams and known reporter handles) were deliberately kept out of this PR and live in the triager's private Claude memory instead — community contributors shouldn't have to read a tiering list to understand the triage process.

Test plan

  • Skill runs end-to-end against the live Needs: Triage :mag: queue (9 issues at time of authoring; all correctly triaged — see recent activity on microsoft/fluentui)
  • gh issue edit / gh issue close with the recommended labels succeeds (label allow-list validated against gh api repos/microsoft/fluentui/labels)
  • Approval gate: skill does not mutate any issue without explicit user approval, including during validation
  • v9 investigation step: skill correctly identifies Resolution: By Design for v8→v9 feature asks where composition addresses the need
  • Run the proactive-validation flow on a fresh queue with at least one sandbox-backed bug and one perf-style bug; confirm the latter is correctly excluded as a non-candidate
  • Confirm playwright-cli install path (npm install -g @playwright/cli@0.1.1) still works when the tool isn't preinstalled, since validation depends on it
  • Second triager reviews the reference docs for tone and accuracy before this PR leaves draft

🤖 Generated with Claude Code

Introduces a new agent skill that walks the `Needs: Triage 🔍` queue
on microsoft/fluentui, classifies each issue against the Shield triage
decision tree, and recommends labels, assignees, and comments before
applying any changes via the `gh` CLI.

The skill operates in recommend-then-apply mode: the LLM never mutates
issues until the human has approved the batch. For feature requests
that cite v8 behavior, the skill is instructed to investigate v9's
documented composition patterns (Field, react-motion-components-preview,
useAnnounce, etc.) and default to `Resolution: By Design` when a v9
pattern already addresses the ask — avoiding backlog pollution.

Reference docs intentionally describe `Shield: P1` and `Partner Ask`
as signal-based decisions (critical-regression evidence, tracked
workstream context) rather than identity-based tiering, so external
community reports go through the same triage path as any other issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

📊 Bundle size report

✅ No changes found

@github-actions
Copy link
Copy Markdown

Pull request demo site: URL

tudorpopams and others added 3 commits April 20, 2026 18:28
Adds a Step 3.5 to the triage-issues skill that lets the human ask
the skill to validate specific issues' reproductions with playwright-cli
before approving triage. Reuses the install pattern from the
visual-test skill.

The validation pass visits the reporter's StackBlitz/CodeSandbox (or
spins up local Storybook when no sandbox is provided), captures a
screenshot + DOM snapshot + console output, and classifies the result
as `repros`, `does_not_repro`, or `cannot_determine`. A
`does_not_repro` result is surfaced as a `Resolution: Can't Repro`
candidate only — never auto-applied — so the human still decides based
on the evidence.

Explicitly documents what validation is not for: feature requests,
reports with a documented root cause + diff, perf regressions,
OS-specific behavior, and assistive-tech interactions — headless
doesn't give reliable signal there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flips the validation flow so the skill takes the initiative: during
Step 2 classification it now decides a `validation_candidate` boolean
per issue, and Step 3 presents the proposed validation set with
one-line reasons for each. The user confirms (yes / all / subset /
skip) rather than having to think to ask.

Moves the "when to validate vs not" heuristic into Step 3 where the
candidate decision is made, next to the examples the user will be
looking at. Step 3.5 is reframed as the execution of a
human-confirmed set, not an opt-in on user request.

Keeps the approval gate in Step 4 unchanged — validation produces
evidence only, never a mutation. Users can still manually request
additional validation after seeing the table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Investigated a real Copilot session on microsoft#35874 that failed visual
validation for three reasons: (1) it tried the workspace-wide
Storybook (public-docsite-v9) and hit HMR restart loops + missing
unstable package errors, (2) it used a project name alias that may
not exist in older workspace snapshots, (3) it fell back to guessing
ports because the dev target is unexpectedly declared with
`cache: true` which replays cached output and exits.

This commit:

- Forbids the workspace-wide Storybook for validation, explicitly.
  The per-component stories package is the only reliable path.
- Switches the primary command to `react-<component>-stories:storybook`
  (direct target on the stories project) with `--skip-nx-cache`, so
  the advice works even in workspace snapshots that predate the
  library-level `start` alias.
- Replaces the port-guessing loop with a proper detection pattern:
  find the storybook child PID (the nx wrapper often exits 0 after
  delegating) and read its listening socket via lsof.
- Adds a troubleshooting section mapping the three failure modes the
  Copilot session hit to their real causes.

The triage-issues validation step (which delegates to this skill)
is updated to reinforce the per-component-only rule inline, so an
agent that reads only the triage skill still gets the warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant