Environment
- opencode version: 1.1.27
- oh-my-opencode version: latest (installed via plugin)
- OS: Linux
- Local LLM: Ollama via Harbor (qwen2.5:14b, llama3.1:8b-instruct-q8_0)
Configuration
oh-my-opencode.json:
{
"agents": {
"oracle": {
"model": "harbor-ollama/qwen2.5:14b"
},
"explore": {
"model": "harbor-ollama/llama3.1:8b-instruct-q8_0"
},
"librarian": {
"model": "harbor-ollama/llama3.1:8b-instruct-q8_0"
}
}
}
opencode.json provider:
"harbor-ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Harbor/Ollama (overlord)",
"options": {
"baseURL": "http://overlord:33821/v1"
},
"models": {
"qwen2.5:14b": {
"name": "Qwen2.5 14B",
"tools": true,
"options": { "num_ctx": 16384 }
}
}
}
Issue
Subagent calls (oracle, explore, librarian) send prompts of ~24-25K tokens to local models configured with 16K context windows. Ollama truncates the prompt, causing:
- Empty responses (oracle returns
(No text output))
- Corrupted/garbage output when truncation breaks mid-instruction
Evidence
Ollama logs show consistent truncation warnings:
level=WARN msg="truncating input prompt" limit=16535 prompt=24775 keep=4 new=16535
level=WARN msg="truncating input prompt" limit=16535 prompt=25835 keep=4 new=16535
level=WARN msg="truncating input prompt" limit=16535 prompt=24357 keep=5 new=16535
Analysis
The stock oracle system prompt from oh-my-opencode is only ~700 tokens. The 24K+ tokens appear to come from:
- The full Sisyphus main agent system prompt being passed to subagents
- Tool definitions
- Possibly conversation context
This works fine for API models (Claude, GPT, Gemini) with 128K+ context, but breaks local models.
Verification
-
Direct Ollama API calls work perfectly:
curl http://overlord:33821/v1/chat/completions \
-d '{"model": "qwen2.5:14b", "messages": [{"role": "user", "content": "What is 2+2?"}]}'
# Returns: {"content": "4"}
-
Other agents using same model work (explore, librarian) - they may have shorter prompts or different handling
-
Oracle consistently fails with empty output
Expected Behavior
Subagents should:
- Use their own lightweight system prompts (e.g., the ~700 token oracle prompt)
- NOT include the full Sisyphus orchestrator prompt
- Or, at minimum, respect the model's context limit and truncate intelligently
Workaround
Currently using API models (google/gemini-2.5-pro) for oracle, which have sufficient context windows.
Related Issues
Environment
Configuration
oh-my-opencode.json:
{ "agents": { "oracle": { "model": "harbor-ollama/qwen2.5:14b" }, "explore": { "model": "harbor-ollama/llama3.1:8b-instruct-q8_0" }, "librarian": { "model": "harbor-ollama/llama3.1:8b-instruct-q8_0" } } }opencode.json provider:
Issue
Subagent calls (oracle, explore, librarian) send prompts of ~24-25K tokens to local models configured with 16K context windows. Ollama truncates the prompt, causing:
(No text output))Evidence
Ollama logs show consistent truncation warnings:
Analysis
The stock oracle system prompt from oh-my-opencode is only ~700 tokens. The 24K+ tokens appear to come from:
This works fine for API models (Claude, GPT, Gemini) with 128K+ context, but breaks local models.
Verification
Direct Ollama API calls work perfectly:
Other agents using same model work (explore, librarian) - they may have shorter prompts or different handling
Oracle consistently fails with empty output
Expected Behavior
Subagents should:
Workaround
Currently using API models (google/gemini-2.5-pro) for oracle, which have sufficient context windows.
Related Issues