fix(sampler): Respect randomizedSample flag at 100% percentage sampling by RajdeepKushwaha5 · Pull Request #26966 · open-metadata/OpenMetadata

RajdeepKushwaha5 · 2026-04-02T09:34:31Z

Describe your changes:

When profileSample is set to 100% with PERCENTAGE type, both the SQLAlchemy and Pandas samplers short-circuit in get_dataset() and return the raw dataset directly, ignoring the randomizedSample flag entirely. This means that even when randomizedSample=True (the default), data is returned in its natural order instead of being shuffled — defeating the purpose of randomized sampling for AutoClassifier.

Root Cause:
The original condition combined the "no sample configured" and "100% percentage" checks into a single or clause, which always bypassed the sampling path at 100% regardless of the randomizedSample setting.

What changed:

get_dataset() — both SQLAlchemy and Pandas samplers: Split the 100% short-circuit condition to gate on randomizedSample is False (not not randomizedSample), so that True and None (the default) correctly fall through to the randomized sampling path.
fetch_sample_data() — SQLAlchemy sampler: When get_dataset() returns a sampling CTE with a random column, fetch_sample_data() now detects it and applies ORDER BY on that column before LIMIT, so each call returns a different subset of rows instead of the same first N.
Optional[bool] correctness: randomizedSample is typed as Optional[bool] with default True. Using not would treat None the same as False, incorrectly triggering the optimization path. We use is False to only skip randomization when explicitly disabled.

Key advantages of this approach:

Fixes both SQLAlchemy AND Pandas samplers — the Pandas sampler had the same bug but is overlooked by other PRs targeting this issue
ORDER BY uses the CTE's existing random column — no dialect compatibility issues (unlike ORDER BY RandomNumFn() which compiles to a constant "0" on Snowflake/Teradata)
Minimal, focused changes — no new methods or abstractions, just corrected conditions and proper ORDER BY
Correct Optional[bool] handling — is False instead of not, preserving backward compatibility when randomizedSample is unset (None)

How I tested:

Real-DB integration tests using SQLite for the 100% PERCENTAGE edge case (randomizedSample=True, False, None) verifying get_sample_query is called/skipped correctly
Mock-based unit tests for both SQASampler and DatalakeSampler get_dataset() short-circuit logic
Standalone algorithm validation for the is False vs not behavior difference

Files changed:

ingestion/src/metadata/sampler/sqlalchemy/sampler.py — get_dataset() condition fix + fetch_sample_data() ORDER BY
ingestion/src/metadata/sampler/pandas/sampler.py — get_dataset() condition fix
ingestion/tests/unit/observability/profiler/sqlalchemy/test_sample.py — 3 real-DB integration tests
ingestion/tests/unit/sampler/test_sampler_100_pct.py — 6 mock-based unit tests (3 SQA + 3 Pandas)

Type of change:

Bug fix

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

I have added a test that covers the exact scenario we are fixing. For complex issues, comment the issue number in the test for future reference.

When profileSample is 100% with PERCENTAGE type, the sampler short-circuits and returns the raw dataset without any randomization, even when randomizedSample is True (the default). Split the combined condition so: - No profileSample set -> return raw dataset (no sampling configured) - 100% PERCENTAGE + randomizedSample=False -> return raw dataset (optimization) - 100% PERCENTAGE + randomizedSample=True -> go through normal sampling path which applies RandomNumFn/df.sample for proper row shuffling Fixes open-metadata#21304

github-actions · 2026-04-02T09:35:02Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot

Pull request overview

This PR aims to fix sampling behavior in ingestion samplers so that when profileSampleType=PERCENTAGE and profileSample=100, the randomizedSample flag is respected (i.e., the dataset should still be randomized when randomizedSample=True instead of short-circuiting to the raw dataset).

Changes:

Split the “no sampling” vs “100% percentage” short-circuit checks in the SQLAlchemy sampler get_dataset(), gating the 100% fast-path on randomizedSample=False.
Apply the same conditional split in the Pandas sampler get_dataset().

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
ingestion/src/metadata/sampler/sqlalchemy/sampler.py	Adjusts 100% percentage short-circuit logic to consider `randomizedSample`.
ingestion/src/metadata/sampler/pandas/sampler.py	Adjusts 100% percentage short-circuit logic to consider `randomizedSample`.

Copilot · 2026-04-02T09:39:52Z

+        if (
            self.sample_config.profileSampleType == ProfileSampleType.PERCENTAGE
            and self.sample_config.profileSample == 100
+            and not self.sample_config.randomizedSample


randomizedSample is typed as Optional[bool]; using not self.sample_config.randomizedSample will treat None the same as False and trigger the 100%-optimization path unexpectedly. To preserve the intended default behavior, consider checking explicitly for is False (or normalizing randomizedSample to a concrete bool when building SampleConfig).

Suggested change

and not self.sample_config.randomizedSample

and self.sample_config.randomizedSample is False

Copilot · 2026-04-02T09:39:52Z

+        if (
            self.sample_config.profileSample == 100
            and self.sample_config.profileSampleType == ProfileSampleType.PERCENTAGE
+            and not self.sample_config.randomizedSample


randomizedSample is Optional[bool]. The condition not self.sample_config.randomizedSample will treat None as disabled randomization and return the raw dataset at 100% PERCENTAGE, which can reintroduce deterministic sampling if configs ever provide null/unset here. Consider checking self.sample_config.randomizedSample is False (or coercing None to the desired default when building SampleConfig).

Suggested change

and not self.sample_config.randomizedSample

and self.sample_config.randomizedSample is False

Copilot · 2026-04-02T09:39:52Z

+        if not self.sample_config.profileSample:
+            if self.partition_details:
+                return self._partitioned_table()
+
+            return self.raw_dataset
+
+        if (
            self.sample_config.profileSampleType == ProfileSampleType.PERCENTAGE
            and self.sample_config.profileSample == 100
+            and not self.sample_config.randomizedSample


This PR changes behavior specifically for profileSampleType=PERCENTAGE with profileSample=100 based on randomizedSample. There are existing unit tests for both samplers, but none appear to cover the 100% PERCENTAGE edge case; adding tests for (a) 100% + randomizedSample=True producing shuffled/non-deterministic sample rows and (b) 100% + randomizedSample=False keeping the fast path would help prevent regressions.

- Fix randomizedSample check from 'not' to 'is False' in both SQASampler and DatalakeSampler to correctly handle None (Optional[bool] default=True) - Add unit tests verifying 100%% PERCENTAGE behavior for randomizedSample values True, False, and None

github-actions · 2026-04-02T12:42:50Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

…tion The get_dataset() fix ensures 100% PERCENTAGE + randomizedSample routes through get_sample_query() which produces a CTE with a random column. Now fetch_sample_data() detects that column and applies ORDER BY before LIMIT, so each call returns a different subset of rows. Also add real-DB integration tests using SQLite for the 100% PERCENTAGE edge case (True, False, None).

github-actions · 2026-04-02T12:49:41Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

ingestion/src/metadata/sampler/sqlalchemy/sampler.py:213

With this change, profileSampleType=PERCENTAGE and profileSample=100 will route through get_sample_query() whenever randomizedSample is not explicitly False (default is True). get_sample_query() adds a RandomNumFn/modulo expression + extra CTE layers even though the effective sample is still 100% of rows, and get_dataset() is used broadly for profiler metrics and data quality runners (not just sample-data display). Consider keeping the full-table fast-path for metric computation and applying randomization only when extracting sample rows (e.g., handle this edge case inside fetch_sample_data() or via a dedicated flag/entrypoint for “sample data” vs “profiling dataset”).

        if not self.sample_config.profileSample:
            if self.partition_details:
                return self._partitioned_table()

            return self.raw_dataset

        if (
            self.sample_config.profileSampleType == ProfileSampleType.PERCENTAGE
            and self.sample_config.profileSample == 100
            and self.sample_config.randomizedSample is False
        ):
            if self.partition_details:
                return self._partitioned_table()

            return self.raw_dataset

        return self.get_sample_query(column=column)

Copilot · 2026-04-02T12:55:37Z

        # Add new RandomNumFn column
        ds = self.get_dataset()
+        ds_columns = inspect(ds).c
+        random_column = next(
+            (col for col in ds_columns if col.name == RANDOM_LABEL), None
+        )


The comment # Add new RandomNumFn column is no longer accurate here: get_dataset() may return the raw dataset (no random column) or a sampling CTE depending on config. Updating the comment to reflect that we’re detecting an existing random column (and only ordering when present) would avoid confusion for future maintenance.

Copilot · 2026-04-02T12:55:38Z

+import pytest
+


pytest is imported but never used in this test module. Removing the unused import will keep the test file clean and avoids potential lint failures if/when test linting is enabled.

Suggested change

import pytest

…tions

github-actions · 2026-04-02T13:08:36Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

github-actions · 2026-04-02T17:07:14Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion:trivy (debian 12.13)`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (37)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.airlift:aircompressor`	CVE-2025-67721	🚨 HIGH	0.27	2.0.3
`io.netty:netty-codec-http`	CVE-2026-33870	🚨 HIGH	4.1.96.Final	4.1.132.Final, 4.2.10.Final
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	CVE-2026-33871	🚨 HIGH	4.1.96.Final	4.1.132.Final, 4.2.11.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.spark:spark-core_2.12`	CVE-2025-54920	🚨 HIGH	3.5.6	3.5.7
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (24)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`Authlib`	CVE-2026-27962	🔥 CRITICAL	1.6.6	1.6.9
`Authlib`	CVE-2026-28490	🚨 HIGH	1.6.6	1.6.9
`Authlib`	CVE-2026-28498	🚨 HIGH	1.6.6	1.6.9
`Authlib`	CVE-2026-28802	🚨 HIGH	1.6.6	1.6.7
`PyJWT`	CVE-2026-32597	🚨 HIGH	2.11.0	2.12.0
`Werkzeug`	CVE-2024-34069	🚨 HIGH	2.2.3	3.0.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.12.12	3.13.3
`apache-airflow`	CVE-2026-26929	🚨 HIGH	3.1.7	3.1.8
`apache-airflow`	CVE-2026-28779	🚨 HIGH	3.1.7	3.1.8
`apache-airflow`	CVE-2026-30911	🚨 HIGH	3.1.7	3.1.8
`apache-airflow-providers-http`	CVE-2025-69219	🚨 HIGH	5.6.4	6.0.0
`cryptography`	CVE-2026-26007	🚨 HIGH	42.0.8	46.0.5
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`protobuf`	CVE-2026-0994	🚨 HIGH	4.25.8	6.33.5, 5.29.6
`pyOpenSSL`	CVE-2026-27459	🚨 HIGH	24.1.0	26.0.0
`pyasn1`	CVE-2026-30922	🚨 HIGH	0.6.2	0.6.3
`ray`	CVE-2025-62593	🔥 CRITICAL	2.47.1	2.52.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`tornado`	CVE-2026-31958	🚨 HIGH	6.5.4	6.5.5
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `usr/bin/docker`

Vulnerabilities (2)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`stdlib`	CVE-2025-68121	🔥 CRITICAL	v1.25.6	1.24.13, 1.25.7, 1.26.0-rc.3
`stdlib`	CVE-2026-25679	🚨 HIGH	v1.25.6	1.25.8, 1.26.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

No Vulnerabilities Found

github-actions · 2026-04-02T17:07:58Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`libpng-dev`	CVE-2026-33416	🚨 HIGH	1.6.39-2+deb12u3	1.6.39-2+deb12u4
`libpng-dev`	CVE-2026-33636	🚨 HIGH	1.6.39-2+deb12u3	1.6.39-2+deb12u4
`libpng16-16`	CVE-2026-33416	🚨 HIGH	1.6.39-2+deb12u3	1.6.39-2+deb12u4
`libpng16-16`	CVE-2026-33636	🚨 HIGH	1.6.39-2+deb12u3	1.6.39-2+deb12u4

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (37)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.airlift:aircompressor`	CVE-2025-67721	🚨 HIGH	0.27	2.0.3
`io.netty:netty-codec-http`	CVE-2026-33870	🚨 HIGH	4.1.96.Final	4.1.132.Final, 4.2.10.Final
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	CVE-2026-33871	🚨 HIGH	4.1.96.Final	4.1.132.Final, 4.2.11.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.spark:spark-core_2.12`	CVE-2025-54920	🚨 HIGH	3.5.6	3.5.7
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (13)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`apache-airflow`	CVE-2026-26929	🚨 HIGH	3.1.7	3.1.8
`apache-airflow`	CVE-2026-28779	🚨 HIGH	3.1.7	3.1.8
`apache-airflow`	CVE-2026-30911	🚨 HIGH	3.1.7	3.1.8
`cryptography`	CVE-2026-26007	🚨 HIGH	42.0.8	46.0.5
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`pyOpenSSL`	CVE-2026-27459	🚨 HIGH	24.1.0	26.0.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/extended_sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/lineage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data_aut.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

No Vulnerabilities Found

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

TeddyCr

Thanks for the PR, could you take a look at the comment?

TeddyCr · 2026-04-02T17:12:49Z

-            )
+            query = client.query(*select_columns).select_from(ds)
+            if random_column is not None:
+                query = query.order_by(random_column)


this should be moved to the sampling query itself, so this method responsibility is to fetch the sample data. see comment here https://github.com/open-metadata/OpenMetadata/pull/26966/changes#r3029273293

…andomization

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

ingestion/src/metadata/sampler/sqlalchemy/sampler.py:234

fetch_sample_data() always applies LIMIT without an ORDER BY. Even if get_dataset() returns a sampled CTE with a random column, relying on ordering inside the CTE is not sufficient to guarantee randomized results across SQL dialects. Consider detecting the presence of the random column in ds and applying an outer ORDER BY on it before the LIMIT is applied.

        ds = self.get_dataset()
        if not columns:
            sqa_columns = [col for col in inspect(ds).c if col.name != RANDOM_LABEL]
        else:
            # we can't directly use columns as it is bound to self.raw_dataset and not the rnd table.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

ingestion/src/metadata/sampler/sqlalchemy/sampler.py:233

fetch_sample_data() applies LIMIT on the dataset returned by get_dataset() but does not apply an ORDER BY on the random column when the dataset is a sampling CTE. Relying on an ORDER BY inside the CTE (from get_sample_query()) is not guaranteed to affect the outer query’s row order, so the LIMIT may still consistently return the first N rows for some dialects/optimizers.

To make the behavior deterministic across dialects, apply ORDER BY <cte>.random (or ORDER BY RANDOM_LABEL) in fetch_sample_data() when RANDOM_LABEL is present on ds and randomizedSample is enabled, before applying the LIMIT.

        ds = self.get_dataset()
        if not columns:
            sqa_columns = [col for col in inspect(ds).c if col.name != RANDOM_LABEL]
        else:

Copilot · 2026-04-07T19:00:05Z

    fullyQualifiedName: FullyQualifiedEntityName
    profileSample: Optional[Union[float, int]] = None
    profileSampleType: Optional[ProfileSampleType] = None
    samplingMethodType: Optional[SamplingMethodType] = None
    sampleDataCount: Optional[int] = 100
-    randomizedSample: Optional[bool] = True
+    randomizedSample: Optional[bool] = False



This PR changes randomizedSample defaults from True to False in the ingestion Python models (BaseProfileConfig, TableConfig, SampleConfig). That’s a behavior change well beyond “respect the flag at 100% percentage sampling” and also conflicts with the PR description/issue context that states the default is True.

If the goal is only to fix the 100% edge case, these defaults should remain True and the logic should be adjusted to respect explicit False. If the goal is to change the product default, the PR title/description and upgrade notes should explicitly call that out (and ensure UI + server-side defaults remain consistent).

Copilot · 2026-04-07T19:00:05Z

    "randomizedSample": {
      "description": "Whether to randomize the sample data or not.",
      "type": "boolean",
-      "default": true
+      "default": false
    },


The JSON Schema default for randomizedSample is changed from true to false. This flips the default behavior for profiler sampling configuration and contradicts the PR description that calls out randomizedSample=True as the default.

If you’re only fixing the 100% sampling bug, keep the schema default as true and adjust the short-circuit logic accordingly. If you intend to change the default platform behavior, please update the PR description and ensure corresponding migrations/release notes cover the breaking behavior change.

Copilot · 2026-04-07T19:00:06Z

        "randomizedSample": {
          "description": "Whether to randomize the sample data or not.",
          "type": "boolean",
-          "default": true
+          "default": false
        },


The schema default for tableProfilerConfig.randomizedSample is changed from true to false, which is a breaking behavioral change and not implied by the PR title (“Respect randomizedSample flag at 100% percentage sampling”). This also conflicts with the PR description stating the default is true.

Please either revert this default change (and keep the fix scoped to 100% sampling), or explicitly document this default flip and ensure it’s consistently applied across all config sources and migrations.

Copilot · 2026-04-07T19:00:06Z

+-- Set randomizedSample to false where it was true (old default behavior)
+UPDATE ingestion_pipeline_entity
+SET json = JSON_SET(json, '$.sourceConfig.config.randomizedSample', false)
+WHERE JSON_EXTRACT(json, '$.sourceConfig.config.randomizedSample') = true
+AND pipelineType = 'profiler';
+
+UPDATE table_entity
+SET json = JSON_SET(json, '$.tableProfilerConfig.randomizedSample', false)
+WHERE JSON_EXTRACT(json, '$.tableProfilerConfig.randomizedSample') = true;
+
+UPDATE database_entity
+SET json = JSON_SET(json, '$.databaseProfilerConfig.randomizedSample', false)
+WHERE JSON_EXTRACT(json, '$.databaseProfilerConfig.randomizedSample') = true;
+
+UPDATE database_schema_entity
+SET json = JSON_SET(json, '$.databaseSchemaProfilerConfig.randomizedSample', false)
+WHERE JSON_EXTRACT(json, '$.databaseSchemaProfilerConfig.randomizedSample') = true;
+


This migration flips any stored randomizedSample=true to false, which will override existing user choices and silently change runtime behavior. If you only want a new default, avoid rewriting explicit values; migrate only missing/null fields if needed.

If the intent is to globally disable randomization, please treat this as a breaking change and document it clearly instead of silently mutating config JSON.

Suggested change

-- Set randomizedSample to false where it was true (old default behavior)

UPDATE ingestion_pipeline_entity

SET json = JSON_SET(json, '$.sourceConfig.config.randomizedSample', false)

WHERE JSON_EXTRACT(json, '$.sourceConfig.config.randomizedSample') = true

AND pipelineType = 'profiler';

UPDATE table_entity

SET json = JSON_SET(json, '$.tableProfilerConfig.randomizedSample', false)

WHERE JSON_EXTRACT(json, '$.tableProfilerConfig.randomizedSample') = true;

UPDATE database_entity

SET json = JSON_SET(json, '$.databaseProfilerConfig.randomizedSample', false)

WHERE JSON_EXTRACT(json, '$.databaseProfilerConfig.randomizedSample') = true;

UPDATE database_schema_entity

SET json = JSON_SET(json, '$.databaseSchemaProfilerConfig.randomizedSample', false)

WHERE JSON_EXTRACT(json, '$.databaseSchemaProfilerConfig.randomizedSample') = true;

-- Preserve existing randomizedSample values during migration.

-- Any new default for randomizedSample must be applied only to missing/null

-- fields in application logic or a separate, explicitly documented migration.

Copilot · 2026-04-07T19:00:07Z

+    def test_full_percentage_none_randomized_skips_sample_query(self, sampler_mock):
+        """100% PERCENTAGE + randomizedSample=None should short-circuit
+        (only explicit True enables randomization)."""
+        with patch.object(SQASampler, "build_table_orm", return_value=User):
+            sampler = SQASampler(
+                service_connection_config=self.sqlite_conn,
+                ometa_client=None,
+                entity=None,
+                sample_config=SampleConfig(
+                    profileSampleType=ProfileSampleType.PERCENTAGE,
+                    profileSample=100,
+                    randomizedSample=None,
+                ),


These new tests assert that randomizedSample=None should short-circuit (i.e., behave like False). This conflicts with the PR description’s stated intent of handling Optional[bool] correctly by only skipping randomization when randomizedSample is False (so None follows the default behavior).

Either adjust the implementation to treat None as “use default” (and update these tests accordingly), or explicitly document and enforce the new “opt-in randomization” contract across schema defaults, models, and migration strategy.

Copilot · 2026-04-07T19:00:07Z

        "randomizedSample": {
          "description": "Whether to randomize the sample data or not.",
          "type": "boolean",
-          "default": true
+          "default": false
        }


databaseProfilerConfig.randomizedSample default is changed from true to false, which flips default sampling behavior for database-level profiler configs. This is a breaking behavior change and is not reflected in the PR title/description (which states the default is true).

Please either revert this default flip (keeping the PR scoped to the 100% sampling bug) or explicitly document the behavior change and ensure all layers (schemas, ingestion models, UI defaults, migrations) are updated consistently.

Copilot · 2026-04-07T19:00:07Z

        "randomizedSample": {
          "description": "Whether to randomize the sample data or not.",
          "type": "boolean",
-          "default": true
+          "default": false
        }


databaseSchemaProfilerConfig.randomizedSample default is changed from true to false, which flips default sampling behavior for schema-level profiler configs. This is a breaking behavior change and conflicts with the PR description’s claim that the default is true.

Please clarify whether the PR is intended to change defaults platform-wide; if not, revert this schema default change and keep the fix limited to respecting explicit randomizedSample at 100% PERCENTAGE.

RajdeepKushwaha5 · 2026-04-07T20:37:43Z

Hi Sir @TeddyCr ! I've addressed all your requested changes:

randomizedSample now defaults to False across BaseProfileConfig, TableConfig, and SampleConfig
Randomization only activates on an explicit True — the guard uses is not True, so both None and False are treated the same and skip randomization
ORDER BY rnd.c.random is placed inside get_sample_query(), gated behind randomizedSample is True, as you suggested
JSON schemas updated with "default": false on all affected entities (table.json, database.json, databaseSchema.json, databaseServiceProfilerPipeline.json)
Behavioral tests added for the True / False / None cases and for non-determinism verification

However, flipping the default from True → False cascaded into a few breakages I want to flag before pushing further. Three things currently failing:

1. test_bigquery_sampling.py — Python unit tests failing

Two test cases have ORDER BY "...rnd".random added to their expected query strings. But both of those tests construct a plain SampleConfig(profileSampleType=..., profileSample=50.0) with no randomizedSample argument — meaning they inherit the new default of False. With False, get_sample_query() does not emit the ORDER BY, so the expected strings don't match.

Proposed fix: Remove the ORDER BY "...rnd".random line from both expected strings in those two test cases. Those tests are not testing randomization behavior — they were testing sampling percentages — so the expected query should not include ORDER BY.

2. TypeScript snapshot — TS type-generation test failing

Changing "default": true → "default": false in the JSON schemas caused a TypeScript snapshot test to fail because the snapshot was previously generated with the old default of true and no longer matches.

Proposed fix: Regenerate and update the affected .snap file so it reflects "default": false. This is a mechanical update with no logic change.

3. SQL migration — data integrity concern (not blocking CI, but important)

The migration scripts for both MySQL and PostgreSQL currently have:

WHERE JSON_EXTRACT(json, '$.randomizedSample') = true

The intent is presumably to backfill legacy rows (rows that were created before this field existed) with the new default. However, using = true means this UPDATE would also overwrite rows where a user had explicitly set randomizedSample: true. That silently destroys their configuration.

Proposed fix: Change the condition to IS NULL in both the MySQL and PostgreSQL migration scripts, so only rows that have no existing value for randomizedSample are touched. Rows that already carry an explicit value — whether true or false — are left untouched.

Planning to apply all three fixes. Let me know if you'd prefer a different approach for any of them, especially the migration — happy to drop it entirely if you think it's out of scope for this PR.

TeddyCr · 2026-04-09T22:59:55Z

@RajdeepKushwaha5 can you address the test failures please

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

BigQuery sampling tests create SampleConfig without setting randomizedSample, which now defaults to False. Since ORDER BY is only added when randomizedSample is True, the expected query strings should not include ORDER BY. Also fix inaccurate docstring in test_sample.py.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (2)

ingestion/src/metadata/sampler/models.py:132

SampleConfig.randomizedSample default is changed to False, which alters sampling behavior for any call sites that rely on SampleConfig() defaults (including when no per-entity config is found). If randomization is still meant to be the default, revert this to True and treat None as default behavior rather than as False.

class SampleConfig(ConfigModel):
    """Profile Sample Config"""

    profileSample: Optional[Union[float, int]] = None
    profileSampleType: Optional[ProfileSampleType] = ProfileSampleType.PERCENTAGE
    samplingMethodType: Optional[SamplingMethodType] = None
    randomizedSample: Optional[bool] = False

ingestion/src/metadata/sampler/sqlalchemy/sampler.py:214

The 100% PERCENTAGE short-circuit uses randomizedSample is not True, which treats None the same as False. This contradicts the PR description’s stated Optional[bool] semantics (only skip when explicitly disabled). If None should behave like the default, use an explicit is False check for the short-circuit (and treat None as "randomize").

        if (
            self.sample_config.profileSampleType == ProfileSampleType.PERCENTAGE
            and self.sample_config.profileSample == 100
            and self.sample_config.randomizedSample is not True
        ):
            if self.partition_details:
                return self._partitioned_table()

            return self.raw_dataset

Increase fetch_sample_data loop from 10 to 20 iterations to further reduce the theoretical probability of a false failure in the randomized ordering test.

gitar-bot · 2026-04-11T19:21:42Z

Code Review 👍 Approved with suggestions 3 resolved / 4 findings

Fixes the randomizedSample flag to be respected at 100% sampling by resolving duplicate assertions, integration test contradictions, and docstring mismatches. Consider refactoring the non-determinism test to reduce flakiness inherent in its probabilistic assertions.

💡 Edge Case: Non-determinism test is inherently flaky

📄 ingestion/tests/unit/observability/profiler/sqlalchemy/test_sample.py:451-454

The test test_randomized_true_produces_non_deterministic_rows (line 451-454) asserts that at least one of 5 fetch_sample_data() calls returns a different row ordering. With only 5 rows and 5 iterations, there's a small but non-zero probability that all iterations happen to produce the same ordering, especially since ABS(RANDOM()) % 100 can produce duplicate values. This would cause a spurious test failure. Consider increasing the iteration count or using a larger dataset to reduce flakiness.

✅ 3 resolved

✅ Quality: Duplicate assertions in test methods

📄 ingestion/tests/unit/sampler/test_sampler_100_pct.py:104-107 📄 ingestion/tests/unit/sampler/test_sampler_100_pct.py:121-122
Two test methods contain copy-pasted duplicate assertions:

test_100_pct_randomized_true_delegates_to_sampled_dataframe: lines 106-107 are exact duplicates of lines 104-105

test_100_pct_randomized_none_delegates_to_sampled_dataframe: line 122 duplicates line 121

These don't cause failures but are unnecessary noise.

✅ Bug: Integration test contradicts code: None case will fail

📄 ingestion/tests/unit/observability/profiler/sqlalchemy/test_sample.py:413-427 📄 ingestion/src/metadata/sampler/sqlalchemy/sampler.py:206-209 📄 ingestion/tests/unit/sampler/test_sampler_100_pct.py:63-68 📄 ingestion/src/metadata/sampler/sqlalchemy/sampler.py:209 📄 ingestion/src/metadata/sampler/pandas/sampler.py:116
The integration test test_full_percentage_none_randomized_uses_sample_query asserts that randomizedSample=None should call get_sample_query (i.e., NOT short-circuit), with the docstring stating "None is not False". However, the actual get_dataset() condition uses not self.sample_config.randomizedSample, and not None evaluates to True — meaning None IS treated the same as False and WILL short-circuit to the raw dataset.

This directly contradicts the assertion assert mock_gsq.called, so this test will fail at runtime.

The unit test test_100_pct_randomized_none_returns_raw_dataset correctly expects short-circuit behavior for None, confirming the integration test is the one that's wrong.

Either:

The integration test should be fixed to match the not behavior (assert not mock_gsq.called), or

The condition should use is False instead of not to distinguish None from False — but then the unit test needs updating.

✅ Quality: Test docstring contradicts model default for randomizedSample

📄 ingestion/tests/unit/sampler/test_sampler_100_pct.py:15-16 📄 ingestion/tests/unit/sampler/test_sampler_100_pct.py:63 📄 ingestion/tests/unit/sampler/test_sampler_100_pct.py:115
The test docstrings in test_sampler_100_pct.py (lines 15-16, 63, 115) state "None defaults to False" but SampleConfig.randomizedSample is defined as Optional[bool] = True in models.py:130. When None is explicitly passed, not None evaluates to True (short-circuits), which is the tested behavior — but the comment suggests None is the default, which is misleading since the actual default is True. Consider updating the docstrings to say something like "None is treated as non-randomized (falsy)" to avoid confusion.

🤖 Prompt for agents

Code Review: Fixes the randomizedSample flag to be respected at 100% sampling by resolving duplicate assertions, integration test contradictions, and docstring mismatches. Consider refactoring the non-determinism test to reduce flakiness inherent in its probabilistic assertions.

1. 💡 Edge Case: Non-determinism test is inherently flaky
   Files: ingestion/tests/unit/observability/profiler/sqlalchemy/test_sample.py:451-454

   The test `test_randomized_true_produces_non_deterministic_rows` (line 451-454) asserts that at least one of 5 `fetch_sample_data()` calls returns a different row ordering. With only 5 rows and 5 iterations, there's a small but non-zero probability that all iterations happen to produce the same ordering, especially since `ABS(RANDOM()) % 100` can produce duplicate values. This would cause a spurious test failure. Consider increasing the iteration count or using a larger dataset to reduce flakiness.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-04-11T20:21:33Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
88.9% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot AI review requested due to automatic review settings April 2, 2026 09:34

RajdeepKushwaha5 requested a review from a team as a code owner April 2, 2026 09:34

Copilot started reviewing on behalf of RajdeepKushwaha5 April 2, 2026 09:35 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 2, 2026 12:49

Copilot started reviewing on behalf of RajdeepKushwaha5 April 2, 2026 12:49 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Address review: remove stale comment, unused import, add return asser…

34a6c8d

…tions

harshach added the safe to test Add this label to run secure Github workflows on PRs label Apr 2, 2026

harshach requested a review from TeddyCr April 2, 2026 16:55

harshach had a problem deploying to test April 2, 2026 17:06 — with GitHub Actions Error

Apply suggestion from @Copilot

f746c76

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 2, 2026 17:15

TeddyCr requested changes Apr 2, 2026

View reviewed changes

Copilot started reviewing on behalf of TeddyCr April 2, 2026 17:15 View session

gitar-bot Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread ingestion/tests/unit/sampler/test_sampler_100_pct.py Outdated

RajdeepKushwaha5 temporarily deployed to test April 3, 2026 16:33 — with GitHub Actions Inactive

TeddyCr added 3 commits April 3, 2026 10:07

chore: added randomize as false

692f27b

Merge remote-tracking branch 'upstream/main' into fix/21304-sampler-r…

e31a335

…andomization

Merge remote-tracking branch 'upstream/main' into fix/21304-sampler-r…

3909d3c

…andomization

Copilot AI review requested due to automatic review settings April 3, 2026 18:09

TeddyCr previously approved these changes Apr 3, 2026

View reviewed changes

Copilot started reviewing on behalf of TeddyCr April 3, 2026 18:10 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

TeddyCr temporarily deployed to test April 3, 2026 18:19 — with GitHub Actions Inactive

RajdeepKushwaha5 mentioned this pull request Apr 3, 2026

Randomization of AutoClassifier Sample Data #21304

Closed

RajdeepKushwaha5 and others added 2 commits April 4, 2026 02:19

fix: align randomizedSample defaults with schema (false)

a084392

Merge branch 'main' into fix/21304-sampler-randomization

61b0678

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Merge branch 'main' into fix/21304-sampler-randomization

7636fe3

Merge branch 'main' into fix/21304-sampler-randomization

3dcfe85

Copilot AI reviewed Apr 11, 2026

View reviewed changes

RajdeepKushwaha5 and others added 2 commits April 12, 2026 00:34

Merge branch 'main' into fix/21304-sampler-randomization

8667db8

Copilot AI reviewed Apr 11, 2026

View reviewed changes

test: increase non-determinism test iterations to reduce flakiness

b3bd457

Increase fetch_sample_data loop from 10 to 20 iterations to further reduce the theoretical probability of a false failure in the randomized ordering test.

TeddyCr approved these changes Apr 14, 2026

View reviewed changes

	and not self.sample_config.randomizedSample
	and self.sample_config.randomizedSample is False

Conversation

RajdeepKushwaha5 commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (24)

🛡️ TRIVY SCAN RESULT 🛡️

Target: usr/bin/docker

Vulnerabilities (2)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

Uh oh!

github-actions Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

Vulnerabilities (4)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (13)

🛡️ TRIVY SCAN RESULT 🛡️

RajdeepKushwaha5 commented Apr 2, 2026 •

edited

Loading

github-actions Bot commented Apr 2, 2026 •

edited

Loading

Target: `openmetadata-ingestion:trivy (debian 12.13)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `usr/bin/docker`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

github-actions Bot commented Apr 2, 2026 •

edited

Loading

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/ingestion/pipelines/extended_sample_data.yaml`

Target: `/ingestion/pipelines/lineage.yaml`

Target: `/ingestion/pipelines/sample_data.json`

Target: `/ingestion/pipelines/sample_data.yaml`

Target: `/ingestion/pipelines/sample_data_aut.yaml`

Target: `/ingestion/pipelines/sample_usage.json`

Target: `/ingestion/pipelines/sample_usage.yaml`

Target: `/ingestion/pipelines/sample_usage_aut.yaml`