Skip to content

[Bug] mcp-optimizer fails to start on Podman — Dockerfile hardcodes host.docker.internal #1972

@value-added-korea

Description

@value-added-korea

Bug Description

The mcp-optimizer container fails to connect to the Toolhive API when running under Podman on Linux. The server starts, binds uvicorn successfully, but never completes initialization. All tool discovery fails, the MCP proxy has nothing to forward to, and clients time out.
The root cause is Dockerfile:84 which bakes host.docker.internal as the default TOOLHIVE_HOST. This hostname does not resolve in Podman. The container enters a retry loop (100 attempts, exponential backoff up to 60s) scanning ports 50000–50100 against an unresolvable hostname, then gives up.

Steps to Reproduce

Using Podman rootless on Linux

Start Toolhive API

thv serve --host 0.0.0.0 --port 50051

Run mcp-optimizer WITHOUT overriding TOOLHIVE_HOST

thv run
--name mcp-optimizer-default
--proxy-port 54000
--host 0.0.0.0
--transport streamable-http
--port 9900
ghcr.io/stackloklabs/mcp-optimizer:0.2.7

Check logs

thv logs mcp-optimizer-default 2>&1 | tail -20

Expected Behavior

The mcp-optimizer connects to the Toolhive API, discovers workloads, and begins serving tool queries on the MCP transport.

Actual Behavior

DNS resolution for host.docker.internal fails on every port probe. Logs show:
MCP session error error='ConnectError: All connection attempts failed' Failed to fetch tools from workload error='ConnectError: All connection attempts failed
The server retries up to 100 times with exponential backoff (1s → 60s), never connects, and the proxy on --proxy-port has nothing to forward to. Clients time out.

Priority

Medium

Environment

OS: Ubuntu 25.10
Linux 6.17.0-20-generic arch: x86_64
podman version 5.4.2

Additional Context

Root cause

Issue 1 — Hardcoded host.docker.internal (hard blocker)

File: Dockerfile:84

dockerfile ENV TOOLHIVE_HOST=host.docker.internal
host.docker.internal is a synthetic DNS name injected by Docker Desktop. Podman does not inject this name. Podman provides host.containers.internal instead, via the pasta/slirp4netns network stack.
The failure propagates through:

  • toolhive_client.py:295 — port-discovery probe: http://{host}:{port}/api/v1beta/version
  • toolhive_client.py:551 — workload list fetch: {self.base_url}/api/v1beta/workloads
  • toolhive_client.py:598-604 — workload URL rewriting produces bad URLs when the hostname is unresolvable

Issue 2 — Missing EXPOSE directive
File: Dockerfile (no EXPOSE anywhere)
The mcp-optimizer listens on port 9900 (server.py:92) using streamable-http transport (server.py:644). Without an EXPOSE 9900 directive, Toolhive may not correctly wire the proxy to the container port, requiring users to manually specify --transport streamable-http --port 9900 on every thv run command.

Issue 3 — RUNNING_IN_DOCKER naming
File: Dockerfile:85, toolhive_client.py:30-38
dockerfile ENV RUNNING_IN_DOCKER=1
python def _is_running_in_docker() -> bool: return os.getenv("RUNNING_IN_DOCKER") == "1"
This flag guards container-networking logic (URL rewriting from localhost to the TOOLHIVE_HOST address) that applies equally to Podman and any OCI runtime. An operator deploying on Podman may reasonably strip Docker-named environment variables from the container spec, inadvertently disabling URL rewriting and breaking all tool calls with connection-refused errors.

Proposed fix

Dockerfile

`dockerfile
#Line 84 — change default to Podman-compatible hostname
#host.containers.internal works on all Podman versions
#and on Docker Desktop 4.x+ with host networking enabled
ENV TOOLHIVE_HOST=host.containers.internal

#Line 85 — rename to runtime-agnostic name
ENV RUNNING_IN_CONTAINER=1

#Add before CMD
EXPOSE 9900
`

toolhive_client.py
`python
#Lines 30-38 — rename with backwards-compatible fallback
def _is_running_in_container() -> bool:
"""Check if we're running inside a container (Docker, Podman, etc.).

Falls back to RUNNING_IN_DOCKER for backwards compatibility with older images.
"""
return (
    os.getenv("RUNNING_IN_CONTAINER") == "1"
    or os.getenv("RUNNING_IN_DOCKER") == "1"
)

`
Update all callers of the old function name in the same file.

Workaround

Override TOOLHIVE_HOST at runtime:
bash thv run \ --name mcp-optimizer-default \ --proxy-port 54000 \ --host 0.0.0.0 \ --transport streamable-http \ --port 9900 \ ghcr.io/stackloklabs/mcp-optimizer:0.2.7 \ -e TOOLHIVE_HOST=host.containers.internal \ -e TOOLHIVE_PORT=50051 \ -e ALLOWED_GROUPS=default
Note: --host 0.0.0.0 is also required on every thv run invocation when using Podman, including for peer MCP servers. See related Toolhive feature request regarding proxy bind address defaults.

Environment

Podman 4.x+ (rootless, Linux)
Network backend: netavark
mcp-optimizer image: ghcr.io/stackloklabs/mcp-optimizer:0.2.7
Toolhive network: toolhive-external (bridge, 10.89.0.0/24)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions