feat: add structured logs to zenml-server and OTel instrumentation#4676
feat: add structured logs to zenml-server and OTel instrumentation#4676amitvikramraj wants to merge 22 commits intodevelopfrom
Conversation
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
| ) | ||
|
|
||
| logger = get_logger(__name__) | ||
| logger = structlog.get_logger(__name__) |
There was a problem hiding this comment.
We shouldn't do that. We should configure the logging module at root level, then any logger we instantiate with the standard logging.getLogger(__name__) instruction has the formatter changes in place.
There was a problem hiding this comment.
Something like that:
import logging
import structlog
shared = [
structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
]
formatter = structlog.stdlib.ProcessorFormatter(
foreign_pre_chain=shared,
processors=[
structlog.stdlib.ProcessorFormatter.remove_processors_meta,
structlog.processors.JSONRenderer(),
],
)
handler = logging.StreamHandler()
handler.setFormatter(formatter)
root = logging.getLogger()
root.handlers = [handler]
root.setLevel(logging.INFO)
logger = logging.getLogger(__name__)
logger.info("hello from stdlib logging")
That doesn't propagate changes regarding logging across our codebase but does the work we want (structured, JSON logs).
There was a problem hiding this comment.
But if we call the std logger, then we would not be able to pass extra context.
logger = logging.getLogger(__name__)
logger.info("hello from stdlib logging", ctx="something", another_ctx="something_else")
# ^^^this would not workThere was a problem hiding this comment.
You can't pass them this way sure, but you can use other structlog utility like bind_context_vars to do so. I imagine we wouldn't pass context vars this way either. A more sensible way to do it is:
with context_vars_set(): # -> Some context manager that sets context vars on the logger
do_something()
logger.info()
try:
do()
except:
logger.error()
finally:
logger.debug()
With this pattern all log calls pick up the context here, no reason to pass them explicitly.
There was a problem hiding this comment.
Here is how I would approach migration to structlog. This is up for discussion btw, don't want to enforce it as a pattern:
- I would keep on using the same log statements, keep on using python's native logger.
- I would use structlog to re-configure loggers (JSO formatter etc.)
- I would introduce a few utilities (context manager, fastapi middleware, etc.) that help me introduce context vars on a few, controlled places that inject context for all subsequent log records (e.g trace-id on the beginning of any API call).
What that gives us: We get JSONified logs, we get context that is necessary for debugging (e.g. track all log statements for a particular request) and we do it with limited changes. I am a bit sceptic if I want to introduce changes across the codebase to comply with structlog's logger.
The difference boils down here:
logger.info("hello from stdlib logging", ctx="something", another_ctx="something_else")
I disagree that we should do it. We would need to change hundreds of log statements and the benefit is minimal. If you have context specific to one log statement then just add it in the message as you would today. If not, if the context is shared do it with context manager and all log statements within that block will have the context variables:
with bind_context(ctx_var="test"):
logger.info("Hello")
do()
logger.warning("Oh oh")
Documentation Link Check Results❌ Absolute links check failed |
stefannica
left a comment
There was a problem hiding this comment.
I really love where this PR is going. You have successfully replaced the entire custom API request tracing mess with centralized structured logging. Some things to consider:
- test that the request trace logs are still generated and have all the necessary info. The context vars you used need to travel from async-io coroutines down to worker threads and the connection you made through structlog might fail to cover everything.
- is there a way to hide structlogs from the actual code that emits logs ? I want it to be an implementation detail rather than something that the regular python modules need to be aware of and import directly. The key lies in reusing the existing centralized logging module zenml.logging.
| CredentialsNotValid, | ||
| OAuthError, | ||
| ) | ||
| from zenml.logger import get_logger |
There was a problem hiding this comment.
Currently, all ZenML code uses the zenml.logger module and utilities. You can use that to your advantage: instead of replacing this with structlog in various modules, you should go to the source and update the zenml.logger functionality to use structlog instead when configured, which means that you automatically get structured logs everywhere instead of just a handful of modules.
Keep in mind that the same logger module is also used by client code, where you still want to keep unstructured logs as the default.
There was a problem hiding this comment.
Thanks for this, it's a good idea. Earlier, I purposely didn't touch it for the same reason, since it is being used on the client side as well.
There was a problem hiding this comment.
addressed this
| ) | ||
| return | ||
|
|
||
| service_name = os.environ.get("OTEL_SERVICE_NAME", "zenml-server") |
There was a problem hiding this comment.
it's not a good idea to have all these environment variables hidden inside the code here. it's best to add them to the ServerConfiguration in zenml.config.server.config where they are centralized, automatically set from environment and discover-able by users.
There was a problem hiding this comment.
addressed it!
| "tldextract~=5.1.0", | ||
| "itsdangerous~=2.2.0", | ||
| "croniter>=6.0.0", | ||
| "opentelemetry-distro>=0.59b0", |
There was a problem hiding this comment.
we already have OTEL dependencies in our client. Please check that they match version-wise.
There was a problem hiding this comment.
Could I get a link to where that dependency is? Is it the sdk dependency here at line 41 in toml file "opentelemetry-sdk==1.38.0" ?
087c7a7 to
dced3e4
Compare
🔍 Broken Links ReportSummary
Details
📂 Full file paths
|
…tion and reverted back the usage of ger_logger function in zen_server modules. And added comments for structlog settings
… logs appeared and added a function to sanitize log records for OTel compatibility in OTel Log Store
dced3e4 to
eed1b16
Compare
Changes I made
ConsoleFormatterwith structlog as aProcessorFormatterover Python's standard logging.zen_server/,zen_stores/) now emit structured events with keyword arguments instead of f-string messages.request_id,method,path, etc.) is automatically propagated via structlog contextvars — no manual interpolation needed.X-Request-IDin error responses to easily be able to filter logsWhy
structlogprovides parseable JSON output in the server and colored console output locally, with automatic context propagation.Before / After
Before:
After — console:
After — JSON (server default, queryable in Loki/Grafana):
{"event":"request.received","level":"debug","logger":"zenml.zen_server.middleware","method":"GET","path":"/api/v1/pipelines","request_id":"d5638eeb","client_ip":"172.217.26.123","timestamp":"2026-04-02T09:21:52Z"} {"event":"sql.session.started","level":"debug","logger":"zenml.zen_stores.sql_zen_store","caller":"SqlZenStore.list_runs","active_connections":0,"idle_connections":3,"request_id":"d5638eeb","timestamp":"2026-04-02T09:21:52Z"} {"event":"sql.session.completed","level":"debug","logger":"zenml.zen_stores.sql_zen_store","caller":"SqlZenStore.list_runs","duration_ms":12.34,"error":false,"request_id":"d5638eeb","timestamp":"2026-04-02T09:21:52Z"} {"event":"request.completed","level":"debug","logger":"zenml.zen_server.middleware","method":"GET","path":"/api/v1/pipelines","status_code":200,"duration_ms":"38.42ms","request_id":"d5638eeb","timestamp":"2026-04-02T09:21:52Z"}Steps to reproduce:
docker compose up --build— ZenML UI starts on port3001, Grafana UI runs on port3002TODO: Later remove the docker-compose.