Description
The ChromaVectorStoreComponent in lfx/components/chroma/chroma.py uses the deprecated chromadb.Client(settings=Settings(chroma_server_host=...)) pattern to connect to a remote Chroma server. On chromadb >= 1.0, this call silently creates a local in-memory ephemeral database and ignores chroma_server_host entirely — no error, no warning.
Every query against the component succeeds with dataframe: [] regardless of how much real data exists in the configured remote server.
Environment
- Langflow: 1.8.4
- chromadb: 1.5.5
- langchain-chroma: latest
- Remote ChromaDB:
host.docker.internal:8100 (75 documents in collection)
Reproduction
- Run a standalone ChromaDB server (e.g.
docker run -p 8100:8000 chromadb/chroma)
- Ingest documents into a collection using
chromadb.HttpClient + pre-computed embeddings
- In Langflow, create a flow with: Chat Input → Chroma DB node → Parser → Prompt Template → Language Model
- Configure the Chroma DB node with
chroma_server_host and chroma_server_http_port pointing at the remote server, correct collection name, and an Ollama/OpenAI embedding node matching the ingest model
- Send a query through the flow
Expected: Chroma DB node returns results from the remote collection.
Actual: Chroma DB node returns dataframe: []. Execution time ~95ms (no network call happened). All downstream nodes process empty context.
Root cause
In lfx/components/chroma/chroma.py:build_vector_store():
if self.chroma_server_host:
chroma_settings = Settings(
chroma_server_cors_allow_origins=self.chroma_server_cors_allow_origins or [],
chroma_server_host=self.chroma_server_host,
chroma_server_http_port=self.chroma_server_http_port or None,
chroma_server_grpc_port=self.chroma_server_grpc_port or None,
chroma_server_ssl_enabled=self.chroma_server_ssl_enabled,
)
client = Client(settings=chroma_settings)
On chromadb 1.0+, Client(settings=Settings(chroma_server_host=...)) no longer connects to a remote server — it creates a local ephemeral in-memory database. The server-related settings fields are silently ignored.
Minimal reproducer (run inside the Langflow container)
import chromadb
from chromadb.config import Settings
# This is what Langflow does — BROKEN on chromadb >= 1.0
settings = Settings(chroma_server_host='host.docker.internal', chroma_server_http_port=8100)
client_broken = chromadb.Client(settings=settings)
print(client_broken.list_collections()) # → [] (local ephemeral DB, ignores remote)
# This is the correct approach
client_working = chromadb.HttpClient(host='host.docker.internal', port=8100)
print(client_working.list_collections()) # → [<real collections with real data>]
Proposed fix
Replace the Client(settings=...) call with HttpClient(host=..., port=..., ssl=...):
from chromadb import HttpClient
client = None
if self.chroma_server_host:
client = HttpClient(
host=self.chroma_server_host,
port=self.chroma_server_http_port or 8000,
ssl=bool(self.chroma_server_ssl_enabled),
)
The Settings import can be removed entirely.
Additional note for maintainers
When applying the fix, remember to regenerate lfx/_assets/component_index.json — that file contains a serialized copy of each component's source code under entries[*][1]["Chroma"]["template"]["code"]["value"], and Langflow's flow runner executes the code string from the JSON, not from the .py file. Patching only the .py file has no effect on actual flow execution — we verified this empirically by mounting a patched .py that worked perfectly via direct import but had zero impact on flow runs until the JSON was also updated.
Current workaround
Volume-mount a patched chroma.py and a patched component_index.json into the container:
# docker-compose.yml
langflow:
volumes:
- ./patches/chroma.py:/app/.venv/lib/python3.12/site-packages/lfx/components/chroma/chroma.py:ro
- ./patches/component_index.json:/app/.venv/lib/python3.12/site-packages/lfx/_assets/component_index.json:ro
Description
The
ChromaVectorStoreComponentinlfx/components/chroma/chroma.pyuses the deprecatedchromadb.Client(settings=Settings(chroma_server_host=...))pattern to connect to a remote Chroma server. On chromadb >= 1.0, this call silently creates a local in-memory ephemeral database and ignoreschroma_server_hostentirely — no error, no warning.Every query against the component succeeds with
dataframe: []regardless of how much real data exists in the configured remote server.Environment
host.docker.internal:8100(75 documents in collection)Reproduction
docker run -p 8100:8000 chromadb/chroma)chromadb.HttpClient+ pre-computed embeddingschroma_server_hostandchroma_server_http_portpointing at the remote server, correct collection name, and an Ollama/OpenAI embedding node matching the ingest modelExpected: Chroma DB node returns results from the remote collection.
Actual: Chroma DB node returns
dataframe: []. Execution time ~95ms (no network call happened). All downstream nodes process empty context.Root cause
In
lfx/components/chroma/chroma.py:build_vector_store():On chromadb 1.0+,
Client(settings=Settings(chroma_server_host=...))no longer connects to a remote server — it creates a local ephemeral in-memory database. The server-related settings fields are silently ignored.Minimal reproducer (run inside the Langflow container)
Proposed fix
Replace the
Client(settings=...)call withHttpClient(host=..., port=..., ssl=...):The
Settingsimport can be removed entirely.Additional note for maintainers
When applying the fix, remember to regenerate
lfx/_assets/component_index.json— that file contains a serialized copy of each component's source code underentries[*][1]["Chroma"]["template"]["code"]["value"], and Langflow's flow runner executes the code string from the JSON, not from the.pyfile. Patching only the.pyfile has no effect on actual flow execution — we verified this empirically by mounting a patched.pythat worked perfectly via directimportbut had zero impact on flow runs until the JSON was also updated.Current workaround
Volume-mount a patched
chroma.pyand a patchedcomponent_index.jsoninto the container: