Skip to content

Avro parser logs actual file content when it fails to parse the file in S3 ingestion #24798

@Jezreel-Zamora-Paidy

Description

@Jezreel-Zamora-Paidy

Affected module
Ingestion Framework

Describe the bug
When running S3 ingestion with avro formatting, if the avro parser fails to parse the file then it will logs the content of the file in the warning logs. This can lead to leaking sensitive data in to the log storage.

Offending line of code is here.

logger.warning(f"Unable to parse the avro schema: {exc}")

To Reproduce

To easily replicate the issue, simply run the following code

import avro.schema as avroschema
import logging

logger = logging.getLogger()

try:

    avroschema.parse('{"test": "invalid_avro"}')
except Exception as exc:
    logger.warning(f"Unable to parse the avro schema: {exc}")

Which will result in the following warning message that includes that actual data.

Unable to parse the avro schema: No "type" property: {'test': 'invalid_avro'}

Expected behavior
Do not log the actual exception message when avro schema fails to parse the json since this will include the actual data which may contain sensitive data.
Version:

  • OS: [e.g. iOS]
  • Python version: 3.11.4
  • OpenMetadata version: 1.9.13
  • OpenMetadata Ingestion package version: [e.g. openmetadata-ingestion[docker]==1.9.13]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Labels

Type

Projects

Status

OpenMetadata Connectors & Ingestion Framework

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions