Skip to content

VertexAI provider: implement reasoning support via openai_chat_completions_with_reasoning #5448

@major

Description

@major

🚀 Describe the new functionality needed

The remote::vertexai inference provider does not implement openai_chat_completions_with_reasoning(). When the Responses API is used with reasoning enabled (e.g., reasoning.effort = "high") and vertexai is the inference backend, the Responses layer catches the NotImplementedError and falls back to regular chat completion, silently discarding reasoning content.

Gemini models already support thinking via ThinkingConfig, and the vertexai provider already extracts thinking parts into delta.reasoning_content on streaming chunks. The missing piece is the wrapper method that the Responses layer expects.

💡 Why is this needed? What if we don't build it?

Without this, vertexai users cannot use the Responses API's reasoning features. The provider silently falls back to non-reasoning chat completion, which is confusing since Gemini models fully support thinking. All other major providers (openai, bedrock, ollama, vllm) already implement this method.

Other thoughts

Gemini's thinking API supports opaque thought_signature tokens for faithful reasoning replay across turns. The llama-stack Responses layer does not capture or propagate these, so multi-turn reasoning replay uses plain text with thought: True as a lossy approximation. This matches the approach other providers take with their own reasoning fields.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions