Summary
There is currently no way to configure the default OCR engine at deploy time. The ocr_engine parameter only exists at the request level, meaning every client must explicitly pass ocr_engine=tesseract (or whichever engine) on every request, or they silently get EasyOCR regardless of the server environment.
Expected behaviour
A new environment variable — DOCLING_SERVE_DEFAULT_OCR_ENGINE — should allow operators to set the OCR engine server-wide at deploy time, so that requests which do not specify an ocr_engine fall back to the configured default rather than always defaulting to EasyOCR.
DOCLING_SERVE_DEFAULT_OCR_ENGINE=tesseract
Motivation
- In many deployment environments (CPU-only, FIPS-compliant, lightweight containers), EasyOCR is undesirable or broken, while Tesseract is the preferred engine.
- The pattern already exists in the codebase for other pipeline components —
DOCLING_SERVE_DEFAULT_TABLE_STRUCTURE_KIND and default layout kind are both configurable at the server level. OCR is a notable omission.
- Forcing every downstream client to pass
ocr_engine on every request is not practical when docling-serve is used as infrastructure (e.g. behind Open WebUI or other integrations that don't expose this parameter).
Suggested implementation
Follow the same pattern as the existing default table/layout settings in docling_serve/settings.py — add a default_ocr_engine field to DoclingServeSettings with DOCLING_SERVE_DEFAULT_OCR_ENGINE as its env var, and apply it as the fallback when no ocr_engine is provided in the request options.
Summary
There is currently no way to configure the default OCR engine at deploy time. The
ocr_engineparameter only exists at the request level, meaning every client must explicitly passocr_engine=tesseract(or whichever engine) on every request, or they silently get EasyOCR regardless of the server environment.Expected behaviour
A new environment variable —
DOCLING_SERVE_DEFAULT_OCR_ENGINE— should allow operators to set the OCR engine server-wide at deploy time, so that requests which do not specify anocr_enginefall back to the configured default rather than always defaulting to EasyOCR.Motivation
DOCLING_SERVE_DEFAULT_TABLE_STRUCTURE_KINDand default layout kind are both configurable at the server level. OCR is a notable omission.ocr_engineon every request is not practical when docling-serve is used as infrastructure (e.g. behind Open WebUI or other integrations that don't expose this parameter).Suggested implementation
Follow the same pattern as the existing default table/layout settings in
docling_serve/settings.py— add adefault_ocr_enginefield toDoclingServeSettingswithDOCLING_SERVE_DEFAULT_OCR_ENGINEas its env var, and apply it as the fallback when noocr_engineis provided in the request options.