Skip to content

bug: reserved __authenticated_user key in provider data can corrupt auth context and trigger 500s #5588

@shaun0927

Description

@shaun0927

System Info

llama-stack main @ 2e08be040be2cd15b528d23e39b58b286e02d379
Python 3.12.12
macOS (arm64)

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

X-LlamaStack-Provider-Data currently allows a caller to inject the reserved
__authenticated_user key into the same request-context dictionary that the
server uses for authenticated user state.

That creates two problems on current main:

  1. get_authenticated_user() can return a caller-controlled object instead of a
    validated User
  2. downstream code that expects a real User can crash with a 500

This is distinct from the already-fixed background-worker context leak in #5221 /
#5227. That issue was about ContextVar propagation across worker tasks. This
one happens earlier, when the request provider-data context is created for a
normal HTTP request.

A minimal reproduction path is:

  • start from current main
  • install ProviderDataMiddleware
  • send
    X-LlamaStack-Provider-Data: {"__authenticated_user": {"principal": "attacker", "attributes": {"roles": ["admin"]}}}
  • call a route that reaches get_authenticated_user() or AuthorizedSqlStore

Observed behavior:

  • the reserved key survives into the request context when no authenticated
    User is present
  • get_authenticated_user() returns a raw dict
  • a storage-backed route that calls AuthorizedSqlStore.insert() can return 500

Why I think this is a real bug and not just an unsupported input shape:

  • the project explicitly documents multi-tenant isolation as an intended
    production scenario
  • caller-controlled provider-data is otherwise treated with a default-deny /
    hardening mindset (for example, forwarded-header config rejects __-prefixed
    names and security-sensitive headers)
  • mixing caller-controlled provider-data with server-owned auth context in the
    same namespace breaks that boundary

Error logs

Local reproduction against current main produced:

AttributeError: 'dict' object has no attribute 'principal'

I also reproduced the bug through a real FastAPI app with ProviderDataMiddleware:

  • a debug route that returns get_authenticated_user() sees user_type == "dict"
  • a storage-backed POST route succeeds without the malicious header, but returns
    HTTP 500 with it

Expected behavior

Caller-controlled provider-data should not be able to set or override reserved
server-owned auth-context keys.

At minimum, __authenticated_user should be stripped or rejected before the
provider-data context is installed. Defense-in-depth would also be to make
get_authenticated_user() ignore non-User values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions