Skip to content

Custom VLM presets stored as raw dicts, not deserialized into VlmConvertOptions #578

@kyle-leasecake

Description

@kyle-leasecake

Description

Custom VLM presets defined via DOCLING_SERVE_CUSTOM_VLM_PRESETS (env var) or custom_vlm_presets (YAML config file) are stored as raw Python dicts and never deserialized into VlmConvertOptions Pydantic models. Downstream code then fails when trying to access attributes like .response_format on a dict.

Error

"error_message": "'dict' object has no attribute 'response_format'"

Version

docling-serve v1.16.1

Config

# via DOCLING_SERVE_CONFIG_FILE
enable_remote_services: true
default_vlm_preset: granite_remote

custom_vlm_presets:
  granite_remote:
    model_spec:
      name: Granite-Docling-Remote
      default_repo_id: ibm-granite/granite-docling-258M
      prompt: "Convert this page to docling."
      response_format: doctags
    engine_options:
      engine_type: api
      url: http://litellm:4000/v1/chat/completions
      params:
        model: ibm-granite/granite-docling-258M
        max_tokens: 4096
      timeout: 120.0
      concurrency: 4
    scale: 2.0

Request:

curl -X POST http://localhost:5001/v1/convert/file \
  -F "files=@test.pdf" \
  -F "pipeline=vlm" \
  -F "vlm_pipeline_preset=granite_remote"

Root Cause

In docling_jobkit/convert/manager.py, _build_preset_registries() stores custom preset values as raw dicts:

for preset_id, preset_options in self.config.custom_vlm_presets.items():
    self.vlm_preset_registry[preset_id] = {
        "source": "custom",
        "options": preset_options,  # raw dict, NOT VlmConvertOptions
    }

Built-in presets go through VlmConvertOptions.from_preset() which produces real Pydantic model instances. Custom presets skip this step.

Suggested Fix

Validate custom presets during registration:

for preset_id, preset_options in self.config.custom_vlm_presets.items():
    if isinstance(preset_options, dict):
        if "engine_options" in preset_options and isinstance(preset_options["engine_options"], dict):
            preset_options["engine_options"] = self._instantiate_engine_options(preset_options["engine_options"])
        preset_options = VlmConvertOptions.model_validate(preset_options)
    self.vlm_preset_registry[preset_id] = {
        "source": "custom",
        "options": preset_options,
    }

Workaround

Using vlm_pipeline_custom_config per-request works correctly — that path does run VlmConvertOptions.model_validate():

curl -X POST http://localhost:5001/v1/convert/file \
  -F "files=@test.pdf" \
  -F 'options={"pipeline":"vlm","to_formats":["md"],"vlm_pipeline_custom_config":{"model_spec":{"name":"Granite-Docling-Remote","default_repo_id":"ibm-granite/granite-docling-258M","prompt":"Convert this page to docling.","response_format":"doctags"},"engine_options":{"engine_type":"api","url":"http://litellm:4000/v1/chat/completions","params":{"model":"ibm-granite/granite-docling-258M","max_tokens":4096},"timeout":120.0,"concurrency":4},"scale":2.0}}'

This returns status: success with correct VLM-converted output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions