Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@

package org.openmetadata.operator.util;

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.HashMap;
import java.util.Map;
import org.openmetadata.operator.model.OMJobResource;
Expand Down Expand Up @@ -43,6 +45,10 @@ public class LabelBuilder {
public static final String POD_TYPE_MAIN = "main";
public static final String POD_TYPE_EXIT_HANDLER = "exit-handler";

// Kubernetes label value limits
private static final int MAX_LABEL_LENGTH = 63;
private static final int MAX_LABEL_PREFIX_LENGTH = 55; // 55 + "-" + 7 hash chars = 63
Comment on lines +48 to +50
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_LABEL_PREFIX_LENGTH is a derived value (max label length minus separator minus hash length). Keeping it hard-coded risks drift if the hash length/format changes later. Consider introducing a HASH_SUFFIX_LENGTH constant and computing MAX_LABEL_PREFIX_LENGTH from MAX_LABEL_LENGTH to make the truncation invariant self-maintaining.

Copilot uses AI. Check for mistakes.

private LabelBuilder() {
// Utility class
}
Expand All @@ -56,12 +62,12 @@ public static Map<String, String> buildBaseLabels(OMJobResource omJob) {
labels.put(LABEL_APP_NAME, APP_NAME);
labels.put(LABEL_APP_COMPONENT, COMPONENT_INGESTION);
labels.put(LABEL_APP_MANAGED_BY, MANAGED_BY_OMJOB_OPERATOR);
labels.put(LABEL_OMJOB_NAME, omJob.getMetadata().getName());
labels.put(LABEL_OMJOB_NAME, sanitizeLabelValue(omJob.getMetadata().getName()));

// Copy pipeline and run-id from OMJob labels
String pipelineName = omJob.getPipelineName();
if (pipelineName != null) {
labels.put(LABEL_APP_PIPELINE, pipelineName);
labels.put(LABEL_APP_PIPELINE, sanitizeLabelValue(pipelineName));
}

String runId = omJob.getRunId();
Expand Down Expand Up @@ -113,7 +119,7 @@ public static Map<String, String> buildExitHandlerLabels(OMJobResource omJob) {
*/
public static Map<String, String> buildPodSelector(OMJobResource omJob) {
Map<String, String> selector = new HashMap<>();
selector.put(LABEL_OMJOB_NAME, omJob.getMetadata().getName());
selector.put(LABEL_OMJOB_NAME, sanitizeLabelValue(omJob.getMetadata().getName()));
selector.put(LABEL_APP_MANAGED_BY, MANAGED_BY_OMJOB_OPERATOR);
return selector;
}
Expand All @@ -137,17 +143,60 @@ public static Map<String, String> buildExitHandlerSelector(OMJobResource omJob)
}

/**
* Sanitize label value to be Kubernetes-compliant
* Sanitize label value to be Kubernetes-compliant.
*
* <p>Kubernetes label values must be at most 63 characters and match
* [a-zA-Z0-9]([a-zA-Z0-9._-]*[a-zA-Z0-9])?. When truncation is needed,
* a 7-character MD5 hash suffix is appended to avoid collisions between
* similarly-named pipelines.
Comment on lines +146 to +151
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc regex shown doesn’t reflect Kubernetes’ allowance for an empty label value (empty is valid; if non-empty it must match the regex). Since sanitizeLabelValue returns "" for null/empty input, consider updating the comment to clarify the empty-string case to avoid confusion for future callers.

Copilot uses AI. Check for mistakes.
*/
public static String sanitizeLabelValue(String value) {
if (value == null || value.isEmpty()) {
return "";
}

// Replace invalid characters with hyphens and truncate to 63 chars
// Replace invalid characters with hyphens, collapse runs,
// strip leading/trailing non-alphanumeric chars (K8s label values
// must start and end with [a-zA-Z0-9])
String sanitized =
value.replaceAll("[^a-zA-Z0-9\\-_.]", "-").replaceAll("-+", "-").replaceAll("^-|-$", "");
value
.replaceAll("[^a-zA-Z0-9\\-_.]", "-")
.replaceAll("-+", "-")
.replaceAll("^[^a-zA-Z0-9]+", "")
.replaceAll("[^a-zA-Z0-9]+$", "");

if (sanitized.isEmpty()) {
return "omjob";
}

if (sanitized.length() <= MAX_LABEL_LENGTH) {
return sanitized;
}

return sanitized.length() > 63 ? sanitized.substring(0, 63) : sanitized;
// Truncate to prefix length and strip trailing non-alphanumeric chars
String prefix =
sanitized.substring(0, MAX_LABEL_PREFIX_LENGTH).replaceAll("[^a-zA-Z0-9]+$", "");
if (prefix.isEmpty()) {
prefix = "omjob";
}
String hash = md5Hash(value).substring(0, 7);
return prefix + "-" + hash;
Comment on lines 153 to +183
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizeLabelValue() docstring states outputs match the Kubernetes label value regex (must start/end with an alphanumeric), but the implementation only strips leading/trailing hyphens. Inputs like ".abc" or "abc" will return values that still start with '.'/'' and therefore violate the regex. For long values, prefix can also become empty (e.g., if the first 55 chars are non-alphanumeric), producing an output starting with '-' which is invalid. Consider stripping leading/trailing non-alphanumeric characters (not just '-') and adding a fallback when the sanitized/prefix becomes empty so the result always matches the K8s regex.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit — now strips all leading/trailing non-alphanumeric
characters (not just hyphens) to ensure full K8s label regex compliance.
Added empty-string fallback to "omjob" for both sanitized value and prefix.

}

/**
* Compute the MD5 hex digest of a string.
*/
private static String md5Hash(String input) {
try {
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] digest = md.digest(input.getBytes(java.nio.charset.StandardCharsets.UTF_8));
StringBuilder sb = new StringBuilder();
for (byte b : digest) {
sb.append(String.format("%02x", b));
}
return sb.toString();
} catch (NoSuchAlgorithmException e) {
throw new IllegalStateException("MD5 algorithm not available", e);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,8 @@ void testSanitizeLabelValue() {
// Test truncation (> 63 chars)
String longValue = "a".repeat(70);
String sanitized = LabelBuilder.sanitizeLabelValue(longValue);
assertEquals(63, sanitized.length());
assertTrue(sanitized.length() <= 63,
"Sanitized value must be <= 63 chars, got: " + sanitized.length());
}

private OMJobResource createTestOMJob() {
Expand All @@ -106,4 +107,137 @@ private OMJobResource createTestOMJob() {
omJob.setMetadata(metadata);
return omJob;
}

private OMJobResource createTestOMJob(String name) {
OMJobResource omJob = new OMJobResource();

ObjectMeta metadata =
new ObjectMetaBuilder()
.withName(name)
.withNamespace("test-namespace")
.withLabels(
Map.of(
"app.kubernetes.io/pipeline", "mysql-pipeline",
"app.kubernetes.io/run-id", "run-12345"))
.build();

omJob.setMetadata(metadata);
return omJob;
}

// --- New tests for hash-preserving truncation (issue #27004) ---

@Test
void testSanitizeLabelValue_shortName_unchanged() {
String input = "my-pipeline-name";
String result = LabelBuilder.sanitizeLabelValue(input);
assertEquals(input, result);
}

@Test
void testSanitizeLabelValue_exactly63Chars_unchanged() {
String input = "a".repeat(63);
String result = LabelBuilder.sanitizeLabelValue(input);
assertEquals(63, result.length());
assertEquals(input, result);
}

@Test
void testSanitizeLabelValue_longName_truncatedTo63() {
String input = "om-job-data-mart-warehouse-production-metadata-jwef2ikn-ada040a6";
String result = LabelBuilder.sanitizeLabelValue(input);
assertTrue(
result.length() <= 63,
"Label value must be <= 63 chars, got: " + result.length());
}

@Test
void testSanitizeLabelValue_longName_endsWithAlphanumeric() {
String input = "om-job-data-mart-warehouse-production-metadata-jwef2ikn-ada040a6";
String result = LabelBuilder.sanitizeLabelValue(input);
assertTrue(
result.matches(".*[a-zA-Z0-9]$"),
"Label value must end with alphanumeric, got: " + result);
}

@Test
void testSanitizeLabelValue_deterministic() {
String input = "om-job-data-mart-warehouse-production-metadata-jwef2ikn-ada040a6";
String result1 = LabelBuilder.sanitizeLabelValue(input);
String result2 = LabelBuilder.sanitizeLabelValue(input);
assertEquals(result1, result2, "Same input must always produce same output");
}

@Test
void testSanitizeLabelValue_differentLongNames_differentOutputs() {
String input1 = "om-job-data-mart-warehouse-production-metadata-pipeline-alpha-01";
String input2 = "om-job-data-mart-warehouse-production-metadata-pipeline-beta-02";
String result1 = LabelBuilder.sanitizeLabelValue(input1);
String result2 = LabelBuilder.sanitizeLabelValue(input2);
assertNotEquals(
result1, result2, "Different long names must not collide after truncation");
}
Comment on lines +128 to +179
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current tests cover length limits and trailing alphanumeric for a specific long name, but they don't assert the documented K8s label regex constraint that values must also start with an alphanumeric. Adding cases like ".starts-with-dot" / "_starts-with-underscore" (and a case where sanitization would otherwise produce an empty prefix) would prevent regressions and would have caught the current non-compliance in sanitizeLabelValue().

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two new tests covering inputs starting with '.' and '_' to prevent
regressions on leading non-alphanumeric characters.


@Test
void testBuildBaseLabels_longName_labelsAreValid() {
String longName =
"om-job-data-mart-warehouse-production-metadata-jwef2ikn-ada040a6";
OMJobResource omJob = createTestOMJob(longName);

Map<String, String> labels = LabelBuilder.buildBaseLabels(omJob);

String nameLabel = labels.get(LabelBuilder.LABEL_OMJOB_NAME);
assertNotNull(nameLabel);
assertTrue(
nameLabel.length() <= 63,
"LABEL_OMJOB_NAME must be <= 63 chars, got: " + nameLabel.length());
assertTrue(
nameLabel.matches("[a-zA-Z0-9]([a-zA-Z0-9._-]*[a-zA-Z0-9])?"),
"LABEL_OMJOB_NAME must match K8s label value regex");
}

@Test
void testBuildPodSelector_longName_matchesBuildBaseLabels() {
String longName =
"om-job-data-mart-warehouse-production-metadata-jwef2ikn-ada040a6";
OMJobResource omJob = createTestOMJob(longName);

Map<String, String> labels = LabelBuilder.buildBaseLabels(omJob);
Map<String, String> selector = LabelBuilder.buildPodSelector(omJob);

assertEquals(
labels.get(LabelBuilder.LABEL_OMJOB_NAME),
selector.get(LabelBuilder.LABEL_OMJOB_NAME),
"Label value and selector value MUST match for pod lookup to work");
}

@Test
void testSanitizeLabelValue_nullInput_returnsEmpty() {
assertEquals("", LabelBuilder.sanitizeLabelValue(null));
}

@Test
void testSanitizeLabelValue_emptyInput_returnsEmpty() {
assertEquals("", LabelBuilder.sanitizeLabelValue(""));
}
Comment on lines +214 to +222
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These null/empty input behaviors are already asserted in testSanitizeLabelValue() above. Duplicating them as separate tests adds noise without increasing coverage; consider removing the redundant tests or converting testSanitizeLabelValue() into a parameterized test that includes all cases in one place.

Copilot uses AI. Check for mistakes.

@Test
void testSanitizeLabelValue_startsWithDot_strippedToAlphanumeric() {
String input = ".starts-with-dot";
String result = LabelBuilder.sanitizeLabelValue(input);
assertTrue(
result.matches("[a-zA-Z0-9].*"),
"Result must start with alphanumeric, got: " + result);
assertFalse(result.isEmpty(), "Result must not be empty");
}

@Test
void testSanitizeLabelValue_startsWithUnderscore_strippedToAlphanumeric() {
String input = "_starts-with-underscore";
String result = LabelBuilder.sanitizeLabelValue(input);
assertTrue(
result.matches("[a-zA-Z0-9].*"),
"Result must start with alphanumeric, got: " + result);
assertFalse(result.isEmpty(), "Result must not be empty");
}
}
Loading