Skip to content

Commit b55e978

Browse files
fix(pull-mode): do not commit helm stage when a chart failed this cycle
handleCharts in controllers/handlers_helm.go was committing the staged ConfigurationBundles even when walkChartsAndDeploy returned an error for one of the charts in the profile. Because staging had aborted before the failing chart was added to the in-memory staging manager, the commit produced a ConfigurationGroup missing the bundles for that chart. The applier-manager on the managed cluster then treats the missing bundles as "this release was removed from the profile" and uninstalls the chart, removing the previously deployed resources from the managed cluster. The user-visible effect is that any pull-mode reconcile which fails inside helm (helm values schema validation, template instantiation error, unreachable chart repo, invalid semantic version from a poisoned cache, etc.) does not just fail loudly: it silently tears down the deployment that was working moments earlier. Issue #1724 reported this for cert-manager when the values ConfigMap referenced by valuesFrom was updated with a values document that did not match the chart schema. The two sibling handlers already handle this correctly: * handlers_resources.go:151-155 returns on deployError before CommitStagedResourcesForDeployment. * handlers_kustomize.go:196-197 does the same. This change aligns handlers_helm.go with that pattern: if walkChartsAndDeploy returned a deployError, return it immediately instead of committing a partial stage. The previously committed ConfigurationGroup is left in place so the agent continues to run the last known good state until a clean reconcile can complete. updateClusterReportWithHelmReports is a no-op outside of DryRun sync mode, so moving the deployError check ahead of it does not affect non-DryRun reconciles. Staged-but-not-committed bundles are cleared by DiscardStagedResourcesForDeployment at the top of the next deployHelmCharts invocation (handlers_helm.go:159), so there is no leak of in-memory staged state. Refs: #1724
1 parent cc85fae commit b55e978

File tree

1 file changed

+15
-4
lines changed

1 file changed

+15
-4
lines changed

controllers/handlers_helm.go

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -928,6 +928,21 @@ func handleCharts(ctx context.Context, clusterSummary *configv1beta1.ClusterSumm
928928
}
929929

930930
if isPullMode {
931+
// If any chart in this reconcile failed to stage (helm values schema
932+
// error, template instantiation failure, unreachable repo, etc.),
933+
// do not commit. Committing a partial staged set produces a
934+
// ConfigurationGroup with missing bundles for the failed charts,
935+
// which the agent in the managed cluster interprets as "these
936+
// releases were removed from the profile" and uninstalls them.
937+
// Returning early leaves the previously-committed ConfigurationGroup
938+
// in place so the agent continues to run the last known good state
939+
// until the reconcile can complete cleanly. This mirrors the
940+
// deployError-before-commit pattern already used in
941+
// handlers_resources.go and handlers_kustomize.go.
942+
if deployError != nil {
943+
return deployError
944+
}
945+
931946
err = commitStagedResourcesForDeployment(ctx, clusterSummary, configurationHash, mgmtResources, logger)
932947
if err != nil {
933948
return err
@@ -937,10 +952,6 @@ func handleCharts(ctx context.Context, clusterSummary *configv1beta1.ClusterSumm
937952
if err != nil {
938953
return err
939954
}
940-
941-
if deployError != nil {
942-
return deployError
943-
}
944955
} else {
945956
// First get the helm releases currently managed and uninstall all the ones
946957
// not referenced anymore. Only if this operation succeeds, removes all stale

0 commit comments

Comments
 (0)