Affected module
Ingestion Framework
Describe the bug
When running S3 ingestion with avro formatting, if the avro parser fails to parse the file then it will logs the content of the file in the warning logs. This can lead to leaking sensitive data in to the log storage.
Offending line of code is here.
|
logger.warning(f"Unable to parse the avro schema: {exc}") |
To Reproduce
To easily replicate the issue, simply run the following code
import avro.schema as avroschema
import logging
logger = logging.getLogger()
try:
avroschema.parse('{"test": "invalid_avro"}')
except Exception as exc:
logger.warning(f"Unable to parse the avro schema: {exc}")
Which will result in the following warning message that includes that actual data.
Unable to parse the avro schema: No "type" property: {'test': 'invalid_avro'}
Expected behavior
Do not log the actual exception message when avro schema fails to parse the json since this will include the actual data which may contain sensitive data.
Version:
- OS: [e.g. iOS]
- Python version: 3.11.4
- OpenMetadata version: 1.9.13
- OpenMetadata Ingestion package version: [e.g.
openmetadata-ingestion[docker]==1.9.13]
Additional context
Add any other context about the problem here.
Affected module
Ingestion Framework
Describe the bug
When running S3 ingestion with avro formatting, if the avro parser fails to parse the file then it will logs the content of the file in the warning logs. This can lead to leaking sensitive data in to the log storage.
Offending line of code is here.
OpenMetadata/ingestion/src/metadata/parsers/avro_parser.py
Line 265 in 9ef83fa
To Reproduce
To easily replicate the issue, simply run the following code
Which will result in the following warning message that includes that actual data.
Expected behavior
Do not log the actual exception message when avro schema fails to parse the json since this will include the actual data which may contain sensitive data.
Version:
openmetadata-ingestion[docker]==1.9.13]Additional context
Add any other context about the problem here.