Skip to content

Fix Claude analysis gating to handle co-failures correctly#25491

Merged
iangmaia merged 2 commits intotrunkfrom
iangmaia/fix-claude-analysis-gating
Apr 13, 2026
Merged

Fix Claude analysis gating to handle co-failures correctly#25491
iangmaia merged 2 commits intotrunkfrom
iangmaia/fix-claude-analysis-gating

Conversation

@iangmaia
Copy link
Copy Markdown
Contributor

@iangmaia iangmaia commented Apr 8, 2026

Description

Follow-up to #25477. Fixes a logic bug in upload-claude-analysis.sh where the script exited early on the first non-essential failure (e.g. Danger), causing Claude analysis to be silently skipped even when essential jobs also failed in the same build.

Changes:

  • Fix co-failure logic: Count non-essential failures first, then query the Buildkite REST API for total failed job count. Only skip Claude when non_essential_failures == total_failures.
  • Fail-safe: If the API call fails, Claude runs anyway (safe default).
  • Fix shebang: #!/bin/bash -eu#!/usr/bin/env bash + set -eu for portability.
  • Use Bash array: NON_ESSENTIAL_STEPS is now a proper array to avoid word splitting/glob issues.
  • YAML literal scalar: custom_prompt: >custom_prompt: | to preserve numbered list and paragraph structure.
  • Include timed_out state: The jq filter catches both failed and timed_out job states.

Test Steps

  1. Trigger a build where only Danger fails → Claude analysis should be skipped
  2. Trigger a build where Danger and a real test both fail → Claude analysis should run
  3. Trigger a build where only a real test fails → Claude analysis should run

  • I have considered if this change warrants user-facing release notes and have added them to RELEASE-NOTES.txt if necessary.

@iangmaia iangmaia added the Tooling Build, Release, and Validation Tools label Apr 8, 2026
@iangmaia iangmaia requested a review from mokagio April 8, 2026 11:58
@wpmobilebot
Copy link
Copy Markdown
Contributor

wpmobilebot commented Apr 8, 2026

App Icon📲 You can test the changes from this Pull Request in WordPress by scanning the QR code below to install the corresponding build.
App NameWordPress
ConfigurationRelease-Alpha
Build Number32005
VersionPR #25491
Bundle IDorg.wordpress.alpha
Commitc530df9
Installation URL166b11m92fe58
Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

@wpmobilebot
Copy link
Copy Markdown
Contributor

wpmobilebot commented Apr 8, 2026

App Icon📲 You can test the changes from this Pull Request in Jetpack by scanning the QR code below to install the corresponding build.
App NameJetpack
ConfigurationRelease-Alpha
Build Number32005
VersionPR #25491
Bundle IDcom.jetpack.alpha
Commitc530df9
Installation URL333jt4p48oo08
Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

@iangmaia iangmaia requested a review from twstokes April 8, 2026 15:41
@iangmaia iangmaia self-assigned this Apr 8, 2026
@iangmaia iangmaia added this to the 26.9 milestone Apr 9, 2026
@iangmaia
Copy link
Copy Markdown
Contributor Author

iangmaia commented Apr 9, 2026

@mokagio updated this PR as well to check for hard_failed on 683f6ca.

iangmaia and others added 2 commits April 10, 2026 19:19
The previous script exited early when any non-essential step (e.g. Danger)
failed, incorrectly skipping Claude analysis even when essential jobs also
failed in the same build.

Now counts non-essential failures first, then queries the Buildkite API for
the total failed job count. Claude is only skipped when all failures are
accounted for by non-essential steps. Fails safe: if the API call fails,
Claude runs anyway.

Also fixes shebang portability, uses a proper Bash array for step keys,
uses YAML literal scalar to preserve prompt formatting, and includes
timed_out jobs in the failure count.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs fixed:
- buildkite-agent step get outcome returns "hard_failed", not "failed",
  so the non-essential check never matched
- When all steps passed, the script fell through to uploading Claude
  analysis instead of exiting early

Also restructures the script to query the API first, which simplifies
the flow and avoids a redundant early-exit branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@iangmaia iangmaia force-pushed the iangmaia/fix-claude-analysis-gating branch from 683f6ca to c530df9 Compare April 10, 2026 17:19
Copy link
Copy Markdown
Contributor

@mokagio mokagio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iangmaia great job on this update across the apps.

Have you considered a next step where the script goes into CI toolkit with a default array of non-essential jobs comprising of Danger and SwiftLint, and the option for consumers to define a file in the repo (.buildkite/claude-analysis-non-essential-steps?) or an overriding env var in shared-pipeline-vars?

@iangmaia
Copy link
Copy Markdown
Contributor Author

Have you considered a next step where the script goes into CI toolkit with a default array of non-essential jobs comprising of Danger and SwiftLint, and the option for consumers to define a file in the repo (.buildkite/claude-analysis-non-essential-steps?) or an overriding env var in shared-pipeline-vars?

That's not a bad idea! Though the Claude Build Analysis is probably going to change to adopt Buildkite Model Providers or a different solution given the plugin has been deprecated. As is, it is still interesting and we can likely migrate to other setup taking the learnings we had so far, but I wouldn't invest in tweaking the current setup that much.

@iangmaia iangmaia added this pull request to the merge queue Apr 13, 2026
Merged via the queue into trunk with commit 68ff59f Apr 13, 2026
24 checks passed
@iangmaia iangmaia deleted the iangmaia/fix-claude-analysis-gating branch April 13, 2026 20:45
@mokagio
Copy link
Copy Markdown
Contributor

mokagio commented Apr 13, 2026

probably going to change to adopt Buildkite Model Providers or a different solution given the plugin has been deprecated

Good point, good point. I also wonder if this is actually useful for folks?

Do you have any feedback on it? But even then, AI analysis annotations might not be useful for devs but could be useful for us to get to a place where they can be useful? So definitely worth to keep investing on this, the question is how to measure usefulness/effectiveness.

@iangmaia
Copy link
Copy Markdown
Contributor Author

Good point, good point. I also wonder if this is actually useful for folks?

Do you have any feedback on it? But even then, AI analysis annotations might not be useful for devs but could be useful for us to get to a place where they can be useful? So definitely worth to keep investing on this, the question is how to measure usefulness/effectiveness.

Yep, I'm still seeing it as an experiment. I often check it, sometimes it has useful insights, but often it is just noisy and verbose, runs when it shouldn't, etc so these changes are important to make it more reliable and relevant. The next iterations (hopefully already in a custom and more compact template) should keep improving it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Tooling Build, Release, and Validation Tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants