Skip to content

Fixes #24798: Prevent avro parser from leaking file content in warning logs#27498

Open
rajattiwari-stack wants to merge 1 commit intoopen-metadata:mainfrom
rajattiwari-stack:fix/avro-parser-sensitive-data-logging-24798
Open

Fixes #24798: Prevent avro parser from leaking file content in warning logs#27498
rajattiwari-stack wants to merge 1 commit intoopen-metadata:mainfrom
rajattiwari-stack:fix/avro-parser-sensitive-data-logging-24798

Conversation

@rajattiwari-stack
Copy link
Copy Markdown

@rajattiwari-stack rajattiwari-stack commented Apr 17, 2026

Describe your changes:

Fixes #24798

When the avro schema parser fails to parse a file during S3 ingestion, the exception message includes the actual file content. This was being logged at WARNING level via logger.warning(f"Unable to parse the avro schema: {exc}"), which can leak sensitive data into log storage.

Changed warning logs to only include the exception type name instead of the full exception message. Full exception details remain available at DEBUG level through the existing traceback.format_exc() call.

Before:

WARNING - Unable to parse the avro schema: No "type" property: {"secret_key": "super_secret_value_12345"}

After:

WARNING - Unable to parse the avro schema: SchemaParseException

Added a unit test to verify warning-level logs do not contain any of the sensitive file content.

Type of change:

  • Bug fix

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have added a test that covers the exact scenario we are fixing.

@rajattiwari-stack rajattiwari-stack requested a review from a team as a code owner April 17, 2026 21:10
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/tests/integration/s3/test_avro_schema_logging.py Outdated
@rajattiwari-stack rajattiwari-stack force-pushed the fix/avro-parser-sensitive-data-logging-24798 branch from 6b5d865 to 3fbe124 Compare April 17, 2026 21:18
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 17, 2026

Code Review ✅ Approved 1 resolved / 1 findings

Redacted sensitive file content from Avro parser warning logs and decoupled integration tests from MinIO dependency. No issues found.

✅ 1 resolved
Quality: Integration test unnecessarily requires MinIO for a pure function

📄 ingestion/tests/integration/s3/test_avro_schema_logging.py:22-36
The test test_avro_parse_failure_does_not_leak_file_content requires the minio and bucket_name fixtures (which spin up a MinIO container), but parse_avro_schema() is a pure function that takes a string argument. The MinIO upload/download round-trip on lines 26-36 is completely unnecessary — the same string (INVALID_AVRO_CONTENT) is passed in and read back out unchanged.

This adds:

  • A heavy Docker container dependency for a trivial unit test
  • Slower CI feedback (~seconds to start MinIO vs milliseconds for a unit test)
  • A flaky surface (network, Docker availability)

This should be a simple unit test under tests/unit/ instead.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@rajattiwari-stack
Copy link
Copy Markdown
Author

Ready for review. Addressed the gitar-bot feedback — moved the test to unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avro parser logs actual file content when it fails to parse the file in S3 ingestion

1 participant