Skip to content

Fix #22911 [1]: add support for more column datatypes#26979

Merged
keshavmohta09 merged 12 commits intomainfrom
fix-datatypes-mappings
Apr 7, 2026
Merged

Fix #22911 [1]: add support for more column datatypes#26979
keshavmohta09 merged 12 commits intomainfrom
fix-datatypes-mappings

Conversation

@SumanMaharana
Copy link
Copy Markdown
Contributor

@SumanMaharana SumanMaharana commented Apr 2, 2026

Describe your changes:

Partial Fixes #22911
This pull request expands and tests SQL type mappings for several database integrations, improving support for additional data types across ClickHouse, PostgreSQL, StarRocks, MSSQL, and Vertica. It also updates the utility function for MSSQL type registration and adds comprehensive unit tests for all new and modified type mappings.

Database type mapping expansions and fixes:

  • ClickHouse: Added support for geo types (Point, Ring, Polygon, MultiPolygon, LineString, MultiLineString) in ischema_names and verified their registration and resolution with new tests. [1] [2]
  • PostgreSQL: Registered the tid type as a String in the shared PostgreSQL ischema_names and added tests to ensure its presence and correct mapping. [1] [2]
  • StarRocks: Expanded _get_sqlalchemy_type to handle more integer, analytics, and semi-structured types, mapping them to appropriate SQLAlchemy types, and added parameterized tests to verify correct resolution. [1] [2] [3] [4]
  • Vertica: Added new type mappings for binary, long string, interval, and complex types, and introduced a dedicated test module to verify their registration and resolution. [1] [2]

Refactoring and utility improvements:

  • MSSQL: Changed update_mssql_ischema_names to mutate the dictionary in-place and return None (instead of returning the updated dict), updated all usages accordingly, and added tests to confirm correct behavior. [1] [2] [3] [4] [5]

Other improvements:

  • Minor test cleanup and import fixes

References: [1] [2] [3] [4] [5]

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Copilot AI review requested due to automatic review settings April 2, 2026 11:23
@SumanMaharana SumanMaharana requested a review from a team as a code owner April 2, 2026 11:23
@github-actions github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Apr 2, 2026
Comment thread ingestion/tests/unit/topology/database/test_mssql.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands SQLAlchemy type mappings used by the ingestion framework so additional database-specific column types are recognized (reducing SAWarning: Did not recognize type ...) across multiple connectors, and adds unit coverage for the new mappings.

Changes:

  • Extend type mapping registries for ClickHouse (geo types), PostgreSQL (tid), StarRocks (more numeric + semi-structured), and Vertica (binary/interval/complex).
  • Refactor update_mssql_ischema_names to be an in-place mutator (returning None) and update MSSQL/AzureSQL call sites accordingly.
  • Add/extend unit tests covering the new mappings and a small test import cleanup.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
ingestion/tests/unit/topology/pipeline/test_service_resolver.py Removes unused test imports (no behavior change).
ingestion/tests/unit/topology/database/test_vertica_type_mapping.py New unit tests validating Vertica ischema_names registration/resolution.
ingestion/tests/unit/topology/database/test_starrocks.py Adds parameterized unit tests for StarRocks _get_sqlalchemy_type mapping expansion.
ingestion/tests/unit/topology/database/test_postgres.py Adds tests asserting Postgres tid is registered and maps to String.
ingestion/tests/unit/topology/database/test_mssql.py Adds tests validating new in-place behavior of update_mssql_ischema_names.
ingestion/tests/unit/topology/database/test_clickhouse_utils.py Adds tests for ClickHouse geo type registration and resolution.
ingestion/src/metadata/utils/sqa_utils.py Changes update_mssql_ischema_names to return None and adds a type hint.
ingestion/src/metadata/ingestion/source/database/vertica/metadata.py Registers additional Vertica types (binary/long varchar/interval/complex) in ischema_names.
ingestion/src/metadata/ingestion/source/database/starrocks/metadata.py Expands StarRocks _get_sqlalchemy_type mapping for more types.
ingestion/src/metadata/ingestion/source/database/mssql/metadata.py Updates MSSQL dialect init to call update_mssql_ischema_names without assignment.
ingestion/src/metadata/ingestion/source/database/common_pg_mappings.py Registers Postgres tid in the shared ischema_names map.
ingestion/src/metadata/ingestion/source/database/clickhouse/utils.py Registers ClickHouse geo types in ischema_names.
ingestion/src/metadata/ingestion/source/database/azuresql/metadata.py Updates AzureSQL dialect init to call update_mssql_ischema_names without assignment.

Comment on lines +375 to +379
def test_does_not_overwrite_existing_entries(self):
sentinel = object()
target = {"existing_key": sentinel}
update_mssql_ischema_names(target)
assert target["existing_key"] is sentinel
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_does_not_overwrite_existing_entries doesn’t actually verify non-overwrite for keys that update_mssql_ischema_names updates (it uses a non-colliding key). Either change the test to pre-populate one of the keys being added (e.g., xml/geography) and assert it’s preserved, or rename the test to reflect what it checks. If the intent is truly “don’t overwrite”, the helper should use setdefault/conditional updates instead of dict.update().

Copilot uses AI. Check for mistakes.
}
# All values should be distinct from NullType
for name, t in types.items():
assert t is not sqltypes.NullType, f"{name} resolved to NullType"
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_geo_types_are_distinct claims each geo type should resolve to a different object, but the assertions only check the entries are not NullType and never verify distinctness across the mapped values. Either add an explicit uniqueness assertion (e.g., all mapped values are unique) or update the docstring/test name to match the current check.

Suggested change
assert t is not sqltypes.NullType, f"{name} resolved to NullType"
assert t is not sqltypes.NullType, f"{name} resolved to NullType"
# And all geo types should map to distinct objects
assert len(set(types.values())) == len(
types
), "Geo types do not resolve to distinct objects"

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
libpng-dev CVE-2026-33416 🚨 HIGH 1.6.39-2+deb12u3 1.6.39-2+deb12u4
libpng-dev CVE-2026-33636 🚨 HIGH 1.6.39-2+deb12u3 1.6.39-2+deb12u4
libpng16-16 CVE-2026-33416 🚨 HIGH 1.6.39-2+deb12u3 1.6.39-2+deb12u4
libpng16-16 CVE-2026-33636 🚨 HIGH 1.6.39-2+deb12u3 1.6.39-2+deb12u4

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http CVE-2026-33870 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.10.Final
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 CVE-2026-33871 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.11.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.spark:spark-core_2.12 CVE-2025-54920 🚨 HIGH 3.5.6 3.5.7
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (13)

Package Vulnerability ID Severity Installed Version Fixed Version
apache-airflow CVE-2026-26929 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-28779 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-30911 🚨 HIGH 3.1.7 3.1.8
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
pyOpenSSL CVE-2026-27459 🚨 HIGH 24.1.0 26.0.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http CVE-2026-33870 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.10.Final
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 CVE-2026-33871 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.11.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.spark:spark-core_2.12 CVE-2025-54920 🚨 HIGH 3.5.6 3.5.7
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (27)

Package Vulnerability ID Severity Installed Version Fixed Version
Authlib CVE-2026-27962 🔥 CRITICAL 1.6.6 1.6.9
Authlib CVE-2026-28490 🚨 HIGH 1.6.6 1.6.9
Authlib CVE-2026-28498 🚨 HIGH 1.6.6 1.6.9
Authlib CVE-2026-28802 🚨 HIGH 1.6.6 1.6.7
PyJWT CVE-2026-32597 🚨 HIGH 2.11.0 2.12.0
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
apache-airflow CVE-2026-26929 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-28779 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-30911 🚨 HIGH 3.1.7 3.1.8
apache-airflow-providers-http CVE-2025-69219 🚨 HIGH 5.6.4 6.0.0
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
litellm CVE-2026-35030 🔥 CRITICAL 1.81.6 1.83.0
litellm CVE-2026-35029 🚨 HIGH 1.81.6 1.83.0
protobuf CVE-2026-0994 🚨 HIGH 4.25.8 6.33.5, 5.29.6
pyOpenSSL CVE-2026-27459 🚨 HIGH 24.1.0 26.0.0
pyasn1 CVE-2026-30922 🚨 HIGH 0.6.2 0.6.3
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
tornado CVE-2026-31958 🚨 HIGH 6.5.4 6.5.5
tornado CVE-2026-35536 🚨 HIGH 6.5.4 6.5.5
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: usr/bin/docker

Vulnerabilities (2)

Package Vulnerability ID Severity Installed Version Fixed Version
stdlib CVE-2025-68121 🔥 CRITICAL v1.25.6 1.24.13, 1.25.7, 1.26.0-rc.3
stdlib CVE-2026-25679 🚨 HIGH v1.25.6 1.25.8, 1.26.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

🟡 Playwright Results — all passed (19 flaky)

✅ 3599 passed · ❌ 0 failed · 🟡 19 flaky · ⏭️ 207 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 455 0 2 2
🟡 Shard 2 639 0 3 32
🟡 Shard 3 649 0 2 26
🟡 Shard 4 617 0 5 47
🟡 Shard 5 606 0 1 67
🟡 Shard 6 633 0 6 33
🟡 19 flaky test(s) (passed on retry)
  • Features/DataAssetRulesDisabled.spec.ts › Verify the Storage Service entity item action after rules disabled (shard 1, 1 retry)
  • Pages/Roles.spec.ts › Roles page should work properly (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/DataQuality/BundleSuiteBulkOperations.spec.ts › Add test case to existing Bundle Suite (shard 2, 1 retry)
  • Features/DataQuality/BundleSuiteBulkOperations.spec.ts › Bulk selection operations (shard 2, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Container (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Verify domain tags and glossary terms (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Rename domain with deeply nested subdomains (3+ levels) verifies FQN propagation (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Rename domain with assets (tables, topics, dashboards) preserves associations (shard 4, 1 retry)
  • Pages/ExploreTree.spec.ts › Verify Database and Database Schema available in explore tree (shard 5, 1 retry)
  • Pages/Glossary.spec.ts › Create glossary, change language to Dutch, and delete glossary (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/ProfilerConfigurationPage.spec.ts › Non admin user (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)
  • Pages/Users.spec.ts › User should have only view permission for glossary and tags for Data Consumer (shard 6, 1 retry)
  • Pages/Users.spec.ts › Check permissions for Data Steward (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot AI review requested due to automatic review settings April 3, 2026 09:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Comment thread ingestion/tests/unit/topology/database/test_starrocks.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

…ata.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
keshavmohta09
keshavmohta09 previously approved these changes Apr 6, 2026
Comment thread ingestion/tests/unit/topology/database/test_vertica_type_mapping.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Comment thread ingestion/tests/unit/topology/database/test_postgres.py
Comment thread ingestion/tests/unit/topology/database/test_vertica_type_mapping.py
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 7, 2026

Code Review ⚠️ Changes requested 2 resolved / 6 findings

Adds support for more column datatypes across multiple database sources, resolving MSSQL and Vertica test issues. However, three important security vulnerabilities remain open: MCP StdioTransport allows overriding sensitive environment variables, HttpTransport lacks URL validation enabling SSRF attacks, and McpExecution.create may throw a null pointer exception if serverId is null.

⚠️ Security: MCP StdioTransport allows overriding sensitive env vars

In StdioTransport.connect(), user-supplied environment variables can override sensitive variables like PATH, LD_PRELOAD, LD_LIBRARY_PATH, and DYLD_INSERT_LIBRARIES. The code detects this and logs a warning (line 122-124), but still proceeds with full_env.update(self.env) on line 125. Overriding PATH could redirect command resolution to a malicious binary, and LD_PRELOAD/LD_LIBRARY_PATH can inject shared libraries into the subprocess. Since MCP server configs may come from user-supplied configuration files (Claude Desktop, VS Code configs), this is a real attack surface.

Suggested fix
Block overrides of sensitive env vars instead of just warning:

    if overridden:
        logger.warning(
            f"MCP server '{self.command}' attempted to override "
            f"sensitive env vars: {overridden}. These will be ignored."
        )
        for key in overridden:
            del self.env[key]
    full_env.update(self.env)
⚠️ Security: HttpTransport lacks URL validation, enabling SSRF

The HttpTransport class accepts any URL and makes HTTP requests to {url}/mcp without validating the URL scheme or hostname. If the MCP connection configuration is user-controlled (e.g., via ingestion pipeline config), an attacker could supply URLs targeting internal services (e.g., http://169.254.169.254/latest/meta-data/ for cloud metadata, http://localhost:8585/ for internal APIs). The requests.Session.post() call will follow redirects by default, amplifying the risk.

Suggested fix
Add URL validation in connect():

    from urllib.parse import urlparse
    def connect(self) -> None:
        parsed = urlparse(self.url)
        if parsed.scheme not in ('http', 'https'):
            raise McpProtocolError(f"Unsupported URL scheme: {parsed.scheme}")
        if parsed.hostname in ('localhost', '127.0.0.1', '::1') or \
           parsed.hostname.startswith('169.254.'):
            raise McpProtocolError(f"URL targets restricted host: {parsed.hostname}")
        # ... rest of connect
⚠️ Bug: McpExecution.create may NPE if serverId is null

In McpExecutionResource.create() at line 173, mcpExecution.getServerId().toString() is called without a null check. If a client sends a POST request without a serverId field, this will throw a NullPointerException resulting in a 500 Internal Server Error instead of a proper 400 Bad Request. The @Valid annotation on the parameter only validates constraints defined on the McpExecution schema — if serverId is not marked as @NotNull in the schema, this is a reachable NPE.

Suggested fix
Add a null check before using serverId:

    if (mcpExecution.getServerId() == null) {
        return Response.status(Response.Status.BAD_REQUEST)
            .entity("serverId is required").build();
    }
    return create(mcpExecution, mcpExecution.getServerId().toString());
💡 Quality: Unused extension param in McpExecutionDAO.deleteAtTimestamp

The deleteAtTimestamp method in McpExecutionDAO (CollectionDAO.java) declares an @Bind("extension") String extension parameter, but the SQL query (DELETE FROM <table> WHERE serverId = :serverId AND timestamp = :timestamp) does not use :extension. The caller in McpExecutionRepository.deleteExecutionData() passes null for this parameter. This is dead code that creates confusion about the method's contract. Similarly, the insert and insertWithoutExtension methods have parameters (entityFQNHash, jsonSchema, extension) that don't appear in their SQL queries.

Suggested fix
Remove unused parameters from the DAO method signatures to match the actual SQL queries, or add the parameters to the SQL if they were intended to be used.
✅ 2 resolved
Quality: MSSQL overwrite test only checks non-colliding keys

📄 ingestion/tests/unit/topology/database/test_mssql.py:375-379
The test test_does_not_overwrite_existing_entries uses "existing_key" which never collides with any key added by update_mssql_ischema_names. Since dict.update() does overwrite colliding keys, this test doesn't actually verify the behavior its name implies. A more meaningful test would check that a pre-existing mapping for a key like "nvarchar" gets overwritten (if that's intended) or add a comment clarifying the intent is just to verify unrelated keys survive.

Bug: Vertica INTERVAL test asserts wrong type, will fail

📄 ingestion/tests/unit/topology/database/test_vertica_type_mapping.py:63 📄 ingestion/tests/unit/topology/database/test_vertica_type_mapping.py:69-72 📄 ingestion/src/metadata/ingestion/source/database/vertica/metadata.py:67
The Vertica metadata module maps "INTERVAL" via create_sqlalchemy_type("INTERVAL"), which creates a custom class inheriting from TypeEngine. However, the test at line 63 asserts entry is sqltypes.Interval, which is a completely different class. The identity check will always fail.

Either the mapping should use the standard SQLAlchemy type (like the binary/text mappings do), or the test expectation should match what create_sqlalchemy_type actually produces.

🤖 Prompt for agents
Code Review: Adds support for more column datatypes across multiple database sources, resolving MSSQL and Vertica test issues. However, three important security vulnerabilities remain open: MCP StdioTransport allows overriding sensitive environment variables, HttpTransport lacks URL validation enabling SSRF attacks, and McpExecution.create may throw a null pointer exception if serverId is null.

1. ⚠️ Security: MCP StdioTransport allows overriding sensitive env vars

   In `StdioTransport.connect()`, user-supplied environment variables can override sensitive variables like `PATH`, `LD_PRELOAD`, `LD_LIBRARY_PATH`, and `DYLD_INSERT_LIBRARIES`. The code detects this and logs a warning (line 122-124), but still proceeds with `full_env.update(self.env)` on line 125. Overriding `PATH` could redirect command resolution to a malicious binary, and `LD_PRELOAD`/`LD_LIBRARY_PATH` can inject shared libraries into the subprocess. Since MCP server configs may come from user-supplied configuration files (Claude Desktop, VS Code configs), this is a real attack surface.

   Suggested fix:
   Block overrides of sensitive env vars instead of just warning:
   
       if overridden:
           logger.warning(
               f"MCP server '{self.command}' attempted to override "
               f"sensitive env vars: {overridden}. These will be ignored."
           )
           for key in overridden:
               del self.env[key]
       full_env.update(self.env)

2. ⚠️ Security: HttpTransport lacks URL validation, enabling SSRF

   The `HttpTransport` class accepts any URL and makes HTTP requests to `{url}/mcp` without validating the URL scheme or hostname. If the MCP connection configuration is user-controlled (e.g., via ingestion pipeline config), an attacker could supply URLs targeting internal services (e.g., `http://169.254.169.254/latest/meta-data/` for cloud metadata, `http://localhost:8585/` for internal APIs). The `requests.Session.post()` call will follow redirects by default, amplifying the risk.

   Suggested fix:
   Add URL validation in connect():
   
       from urllib.parse import urlparse
       def connect(self) -> None:
           parsed = urlparse(self.url)
           if parsed.scheme not in ('http', 'https'):
               raise McpProtocolError(f"Unsupported URL scheme: {parsed.scheme}")
           if parsed.hostname in ('localhost', '127.0.0.1', '::1') or \
              parsed.hostname.startswith('169.254.'):
               raise McpProtocolError(f"URL targets restricted host: {parsed.hostname}")
           # ... rest of connect

3. ⚠️ Bug: McpExecution.create may NPE if serverId is null

   In `McpExecutionResource.create()` at line 173, `mcpExecution.getServerId().toString()` is called without a null check. If a client sends a POST request without a `serverId` field, this will throw a `NullPointerException` resulting in a 500 Internal Server Error instead of a proper 400 Bad Request. The `@Valid` annotation on the parameter only validates constraints defined on the `McpExecution` schema — if `serverId` is not marked as `@NotNull` in the schema, this is a reachable NPE.

   Suggested fix:
   Add a null check before using serverId:
   
       if (mcpExecution.getServerId() == null) {
           return Response.status(Response.Status.BAD_REQUEST)
               .entity("serverId is required").build();
       }
       return create(mcpExecution, mcpExecution.getServerId().toString());

4. 💡 Quality: Unused extension param in McpExecutionDAO.deleteAtTimestamp

   The `deleteAtTimestamp` method in `McpExecutionDAO` (CollectionDAO.java) declares an `@Bind("extension") String extension` parameter, but the SQL query (`DELETE FROM <table> WHERE serverId = :serverId AND timestamp = :timestamp`) does not use `:extension`. The caller in `McpExecutionRepository.deleteExecutionData()` passes `null` for this parameter. This is dead code that creates confusion about the method's contract. Similarly, the `insert` and `insertWithoutExtension` methods have parameters (`entityFQNHash`, `jsonSchema`, `extension`) that don't appear in their SQL queries.

   Suggested fix:
   Remove unused parameters from the DAO method signatures to match the actual SQL queries, or add the parameters to the SQL if they were intended to be used.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 7, 2026

@keshavmohta09 keshavmohta09 merged commit d96a0b7 into main Apr 7, 2026
62 of 66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Postgres metadata ingestion job inconsistency: it raises warnings during processing, but the overall result does't consider them

3 participants