Fixes #27004: enforce Kubernetes 63-char label limit in LabelBuilder with hash-preserving truncation#27181
Conversation
…reserving truncation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
This PR updates the openmetadata-k8s-operator label generation to enforce Kubernetes’ 63-character label value limit by introducing hash-preserving truncation in LabelBuilder, and ensures pod selectors use the same sanitized values so pod lookup/cleanup continues to work.
Changes:
- Apply
sanitizeLabelValue()to OMJob name and pipeline name labels inbuildBaseLabels(). - Apply the same sanitization in
buildPodSelector()to keep selectors aligned with labels. - Extend unit tests around truncation/determinism/collision-resistance behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| openmetadata-k8s-operator/src/main/java/org/openmetadata/operator/util/LabelBuilder.java | Enforces 63-char label value limit via hash-suffix truncation and applies sanitization to labels/selectors. |
| openmetadata-k8s-operator/src/test/java/org/openmetadata/operator/unit/LabelBuilderTest.java | Adds/updates unit tests covering sanitization behavior, truncation, and selector/label consistency. |
| public static String sanitizeLabelValue(String value) { | ||
| if (value == null || value.isEmpty()) { | ||
| return ""; | ||
| } | ||
|
|
||
| // Replace invalid characters with hyphens and truncate to 63 chars | ||
| // Replace invalid characters with hyphens, collapse runs, strip leading/trailing hyphens | ||
| String sanitized = | ||
| value.replaceAll("[^a-zA-Z0-9\\-_.]", "-").replaceAll("-+", "-").replaceAll("^-|-$", ""); | ||
|
|
||
| return sanitized.length() > 63 ? sanitized.substring(0, 63) : sanitized; | ||
| if (sanitized.length() <= MAX_LABEL_LENGTH) { | ||
| return sanitized; | ||
| } | ||
|
|
||
| // Truncate to prefix length and strip trailing non-alphanumeric chars | ||
| String prefix = | ||
| sanitized.substring(0, MAX_LABEL_PREFIX_LENGTH).replaceAll("[^a-zA-Z0-9]+$", ""); | ||
| String hash = md5Hash(value).substring(0, 7); | ||
| return prefix + "-" + hash; |
There was a problem hiding this comment.
sanitizeLabelValue() docstring states outputs match the Kubernetes label value regex (must start/end with an alphanumeric), but the implementation only strips leading/trailing hyphens. Inputs like ".abc" or "abc" will return values that still start with '.'/'' and therefore violate the regex. For long values, prefix can also become empty (e.g., if the first 55 chars are non-alphanumeric), producing an output starting with '-' which is invalid. Consider stripping leading/trailing non-alphanumeric characters (not just '-') and adding a fallback when the sanitized/prefix becomes empty so the result always matches the K8s regex.
There was a problem hiding this comment.
Fixed in latest commit — now strips all leading/trailing non-alphanumeric
characters (not just hyphens) to ensure full K8s label regex compliance.
Added empty-string fallback to "omjob" for both sanitized value and prefix.
| // --- New tests for hash-preserving truncation (issue #27004) --- | ||
|
|
||
| @Test | ||
| void testSanitizeLabelValue_shortName_unchanged() { | ||
| String input = "my-pipeline-name"; | ||
| String result = LabelBuilder.sanitizeLabelValue(input); | ||
| assertEquals(input, result); | ||
| } | ||
|
|
||
| @Test | ||
| void testSanitizeLabelValue_exactly63Chars_unchanged() { | ||
| String input = "a".repeat(63); | ||
| String result = LabelBuilder.sanitizeLabelValue(input); | ||
| assertEquals(63, result.length()); | ||
| assertEquals(input, result); | ||
| } | ||
|
|
||
| @Test | ||
| void testSanitizeLabelValue_longName_truncatedTo63() { | ||
| String input = "om-job-data-mart-warehouse-production-metadata-jwef2ikn-ada040a6"; | ||
| String result = LabelBuilder.sanitizeLabelValue(input); | ||
| assertTrue( | ||
| result.length() <= 63, | ||
| "Label value must be <= 63 chars, got: " + result.length()); | ||
| } | ||
|
|
||
| @Test | ||
| void testSanitizeLabelValue_longName_endsWithAlphanumeric() { | ||
| String input = "om-job-data-mart-warehouse-production-metadata-jwef2ikn-ada040a6"; | ||
| String result = LabelBuilder.sanitizeLabelValue(input); | ||
| assertTrue( | ||
| result.matches(".*[a-zA-Z0-9]$"), | ||
| "Label value must end with alphanumeric, got: " + result); | ||
| } | ||
|
|
||
| @Test | ||
| void testSanitizeLabelValue_deterministic() { | ||
| String input = "om-job-data-mart-warehouse-production-metadata-jwef2ikn-ada040a6"; | ||
| String result1 = LabelBuilder.sanitizeLabelValue(input); | ||
| String result2 = LabelBuilder.sanitizeLabelValue(input); | ||
| assertEquals(result1, result2, "Same input must always produce same output"); | ||
| } | ||
|
|
||
| @Test | ||
| void testSanitizeLabelValue_differentLongNames_differentOutputs() { | ||
| String input1 = "om-job-data-mart-warehouse-production-metadata-pipeline-alpha-01"; | ||
| String input2 = "om-job-data-mart-warehouse-production-metadata-pipeline-beta-02"; | ||
| String result1 = LabelBuilder.sanitizeLabelValue(input1); | ||
| String result2 = LabelBuilder.sanitizeLabelValue(input2); | ||
| assertNotEquals( | ||
| result1, result2, "Different long names must not collide after truncation"); | ||
| } |
There was a problem hiding this comment.
Current tests cover length limits and trailing alphanumeric for a specific long name, but they don't assert the documented K8s label regex constraint that values must also start with an alphanumeric. Adding cases like ".starts-with-dot" / "_starts-with-underscore" (and a case where sanitization would otherwise produce an empty prefix) would prevent regressions and would have caught the current non-compliance in sanitizeLabelValue().
There was a problem hiding this comment.
Added two new tests covering inputs starting with '.' and '_' to prevent
regressions on leading non-alphanumeric characters.
…ests - Strip leading/trailing non-alphanumeric chars to ensure label values match K8s regex [a-zA-Z0-9]([a-zA-Z0-9._-]*[a-zA-Z0-9])? - Add fallback to 'omjob' when sanitized value or prefix is empty - Add tests for inputs starting with '.' and '_'
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
🟡 Playwright Results — all passed (20 flaky)✅ 3598 passed · ❌ 0 failed · 🟡 20 flaky · ⏭️ 207 skipped
🟡 20 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
Code Review ✅ ApprovedEnforces Kubernetes 63-character label limit in LabelBuilder using hash-preserving truncation to fix #27004. No issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
| // Kubernetes label value limits | ||
| private static final int MAX_LABEL_LENGTH = 63; | ||
| private static final int MAX_LABEL_PREFIX_LENGTH = 55; // 55 + "-" + 7 hash chars = 63 |
There was a problem hiding this comment.
MAX_LABEL_PREFIX_LENGTH is a derived value (max label length minus separator minus hash length). Keeping it hard-coded risks drift if the hash length/format changes later. Consider introducing a HASH_SUFFIX_LENGTH constant and computing MAX_LABEL_PREFIX_LENGTH from MAX_LABEL_LENGTH to make the truncation invariant self-maintaining.
| * Sanitize label value to be Kubernetes-compliant. | ||
| * | ||
| * <p>Kubernetes label values must be at most 63 characters and match | ||
| * [a-zA-Z0-9]([a-zA-Z0-9._-]*[a-zA-Z0-9])?. When truncation is needed, | ||
| * a 7-character MD5 hash suffix is appended to avoid collisions between | ||
| * similarly-named pipelines. |
There was a problem hiding this comment.
The Javadoc regex shown doesn’t reflect Kubernetes’ allowance for an empty label value (empty is valid; if non-empty it must match the regex). Since sanitizeLabelValue returns "" for null/empty input, consider updating the comment to clarify the empty-string case to avoid confusion for future callers.
| @Test | ||
| void testSanitizeLabelValue_nullInput_returnsEmpty() { | ||
| assertEquals("", LabelBuilder.sanitizeLabelValue(null)); | ||
| } | ||
|
|
||
| @Test | ||
| void testSanitizeLabelValue_emptyInput_returnsEmpty() { | ||
| assertEquals("", LabelBuilder.sanitizeLabelValue("")); | ||
| } |
There was a problem hiding this comment.
These null/empty input behaviors are already asserted in testSanitizeLabelValue() above. Duplicating them as separate tests adds noise without increasing coverage; consider removing the redundant tests or converting testSanitizeLabelValue() into a parameterized test that includes all cases in one place.
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
|
Thank you @mohitjeswani01, but we will be closing this PR as we already had a submission here, please make sure that you are assigned an issue and that there is not an existing PR out for an issue before you submit something for it, we will reopen this PR if necessary |
Understood @PubChimps thanks alot for your review ! would love to dive in if this reopens by any chance. |
Description :
Fixes #27004
I worked on the
omjob-operatorlabel generation logic because long pipeline names causedKubernetes pod creation to fail with a 422 error — Kubernetes enforces a strict 63-character
limit on label values, but OMJob names were being used as label values without any length
enforcement.
Root Cause
The
LabelBuilder.buildBaseLabels()method was setting raw OMJob names as label values:LABEL_OMJOB_NAME→ set to rawomJob.getMetadata().getName()(no limit check)LABEL_APP_PIPELINE→ set to rawpipelineName(no limit check)A
sanitizeLabelValue()method already existed inLabelBuilder.javabut was never calledfor label values anywhere in the codebase.
Additionally,
buildPodSelector()used the same raw name — meaning even if labels were fixed,selectors would stop matching them, causing pod lookups to silently fail.
What I Changed
LabelBuilder.javaonly — surgical, focused fix:Upgraded
sanitizeLabelValue()to use hash-preserving truncation instead of plainprefix cut:
55-char prefix+-+7-char MD5 hash= always ≤ 63 charsFixed
buildBaseLabels()— wrapped both raw label values withsanitizeLabelValue()Fixed
buildPodSelector()— applies same sanitization so selectors always matchlabels (critical for
findPod()anddeletePods()to work correctly after truncation)Added constants
MAX_LABEL_LENGTH = 63andMAX_LABEL_PREFIX_LENGTH = 55Why hash-preserving truncation?
Plain prefix truncation (what other PRs use) causes collision risk — two pipelines named
om-job-data-mart-warehouse-production-alphaandom-job-data-mart-warehouse-production-betawould truncate to the same label value, causing the operator to select or delete the wrong pods.
The MD5 suffix makes each truncated value unique and deterministic.
How I tested
Added 10 new unit tests to
LabelBuilderTest.javacovering:buildBaseLabels()produces valid label values for long OMJob namesbuildPodSelector()value always matchesbuildBaseLabels()value (consistency)nulland empty string inputs handled safelyType of change:
Checklist:
Fixes #27004: enforce Kubernetes 63-char label limit in LabelBuilder with hash-preserving truncation