Replies: 5 comments 5 replies
-
|
I same the same issue. The I think a way to fix the problem is to convert .txt to .md and read it as .md via docling. The same can be apply to any plain text format like yaml for example. |
Beta Was this translation helpful? Give feedback.
-
|
This would be great if a text file comes in. If it can be just bypassed into the output that would also be great. |
Beta Was this translation helpful? Give feedback.
-
|
Ok, so I'm evaluating Unstructured and Docling. Docling is supposed to have all this "ML" and cool "hi-res" tech to extract headings. But it can't do the simplest most obvious case - text files from Gutenberg. What's the point of all the extra ML if it can't do text files? |
Beta Was this translation helpful? Give feedback.
-
|
+1 |
Beta Was this translation helpful? Give feedback.
-
|
Docling is designed to preserve and reconstruct a document’s structural information during the reading stage. However, plain TXT files do not contain layout information (such as font styles, page numbers, or spatial coordinates), which makes it impossible for Docling to support TXT files directly. If the goal is to obtain an output for TXT files that is comparable to what Docling produces when parsing PDF documents, an additional, manually designed “structure inference layer” must be introduced to align TXT content with Docling’s document model. In practice, this means implementing a Docling-like processing pipeline specifically tailored for TXT files, whose responsibility is to infer and construct document structure in a way that matches Docling’s output for layout-rich formats. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently, Docling does not support .txt files as input, which limits its ability to handle plain text documents. Given that .txt files are widely used in various document processing workflows, adding support for them would make Docling even more versatile.
Beta Was this translation helpful? Give feedback.
All reactions