feat: add triage-issues agent skill for Shield workflow#36012
Draft
tudorpopams wants to merge 4 commits intomicrosoft:masterfrom
Draft
feat: add triage-issues agent skill for Shield workflow#36012tudorpopams wants to merge 4 commits intomicrosoft:masterfrom
tudorpopams wants to merge 4 commits intomicrosoft:masterfrom
Conversation
Introduces a new agent skill that walks the `Needs: Triage 🔍` queue on microsoft/fluentui, classifies each issue against the Shield triage decision tree, and recommends labels, assignees, and comments before applying any changes via the `gh` CLI. The skill operates in recommend-then-apply mode: the LLM never mutates issues until the human has approved the batch. For feature requests that cite v8 behavior, the skill is instructed to investigate v9's documented composition patterns (Field, react-motion-components-preview, useAnnounce, etc.) and default to `Resolution: By Design` when a v9 pattern already addresses the ask — avoiding backlog pollution. Reference docs intentionally describe `Shield: P1` and `Partner Ask` as signal-based decisions (critical-regression evidence, tracked workstream context) rather than identity-based tiering, so external community reports go through the same triage path as any other issue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📊 Bundle size report✅ No changes found |
|
Pull request demo site: URL |
Adds a Step 3.5 to the triage-issues skill that lets the human ask the skill to validate specific issues' reproductions with playwright-cli before approving triage. Reuses the install pattern from the visual-test skill. The validation pass visits the reporter's StackBlitz/CodeSandbox (or spins up local Storybook when no sandbox is provided), captures a screenshot + DOM snapshot + console output, and classifies the result as `repros`, `does_not_repro`, or `cannot_determine`. A `does_not_repro` result is surfaced as a `Resolution: Can't Repro` candidate only — never auto-applied — so the human still decides based on the evidence. Explicitly documents what validation is not for: feature requests, reports with a documented root cause + diff, perf regressions, OS-specific behavior, and assistive-tech interactions — headless doesn't give reliable signal there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flips the validation flow so the skill takes the initiative: during Step 2 classification it now decides a `validation_candidate` boolean per issue, and Step 3 presents the proposed validation set with one-line reasons for each. The user confirms (yes / all / subset / skip) rather than having to think to ask. Moves the "when to validate vs not" heuristic into Step 3 where the candidate decision is made, next to the examples the user will be looking at. Step 3.5 is reframed as the execution of a human-confirmed set, not an opt-in on user request. Keeps the approval gate in Step 4 unchanged — validation produces evidence only, never a mutation. Users can still manually request additional validation after seeing the table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Investigated a real Copilot session on microsoft#35874 that failed visual validation for three reasons: (1) it tried the workspace-wide Storybook (public-docsite-v9) and hit HMR restart loops + missing unstable package errors, (2) it used a project name alias that may not exist in older workspace snapshots, (3) it fell back to guessing ports because the dev target is unexpectedly declared with `cache: true` which replays cached output and exits. This commit: - Forbids the workspace-wide Storybook for validation, explicitly. The per-component stories package is the only reliable path. - Switches the primary command to `react-<component>-stories:storybook` (direct target on the stories project) with `--skip-nx-cache`, so the advice works even in workspace snapshots that predate the library-level `start` alias. - Replaces the port-guessing loop with a proper detection pattern: find the storybook child PID (the nx wrapper often exits 0 after delegating) and read its listening socket via lsof. - Adds a troubleshooting section mapping the three failure modes the Copilot session hit to their real causes. The triage-issues validation step (which delegates to this skill) is updated to reinforce the per-component-only rule inline, so an agent that reads only the triage skill still gets the warning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
triage-issues) that walks theNeeds: Triage :mag:queue on this repo, classifies each issue against the Shield triage decision tree, and recommends labels, assignees, and comments — applying them viaghonly after the human approves.validate yes/validate all/ a subset /skip validation), the skill captures a screenshot + DOM snapshot + console output, and feeds the evidence back into the recommendation table.Resolution: Can't Reprois surfaced as a candidate only — never auto-applied.Field,react-motion-components-preview,useAnnounce, etc.) and defaults toResolution: By Designwhen a v9 pattern already addresses the ask. This keeps the backlog honest instead of auto-labeling everythingNeeds: Backlog review.Shield: P1andPartner Askas signal-based decisions (critical-regression evidence, tracked-workstream context) rather than identity-based tiering. External community reports go through the same triage path as any other issue.The workflow
Needs: Triage :mag:queue oldest-first.validation_candidate.repros/does_not_repro/cannot_determineverdicts. Non-candidates (feature requests, root-cause-included reports, perf regressions, browser-specific, a11y interactions) are filtered out upfront; explicit overrides still run but get a headless-limitations warning.gh issue edit/gh issue comment/gh issue close, one issue at a time with per-issue result.What's in the skill
SKILL.md— workflow, decision rules,ghcommands, proactive-validation proposal, recommend-then-apply gatereferences/shield-guidelines.md— distilled decision treereferences/triage-labels.md— label allow-list (validated against the live repo labels API)references/partner-orgs.md— whatShield: P1/Partner Askmean and what they don'tevals/evals.json— trigger promptsAGENTS.md— skill registry updatedOperational specifics that used to live in
partner-orgs.md(the actual list of tracked workstreams and known reporter handles) were deliberately kept out of this PR and live in the triager's private Claude memory instead — community contributors shouldn't have to read a tiering list to understand the triage process.Test plan
Needs: Triage :mag:queue (9 issues at time of authoring; all correctly triaged — see recent activity on microsoft/fluentui)gh issue edit/gh issue closewith the recommended labels succeeds (label allow-list validated againstgh api repos/microsoft/fluentui/labels)Resolution: By Designfor v8→v9 feature asks where composition addresses the neednpm install -g @playwright/cli@0.1.1) still works when the tool isn't preinstalled, since validation depends on it🤖 Generated with Claude Code