Export functions fail to generate <page_break> tags for non-consecutive (skipped) pages

## Description 

When parsing PDF documents, if specific pages fail to parse (e.g., due to exceptions caught in the pipeline) and are excluded from the `doc.pages` list, the export functions (`export_to_doctags`, `export_to_html`, `export_to_markdown`) do not generate `page break` markers for these gaps. This causes page count mismatch issues in downstream processing. 

## Steps to Reproduce
### Attachments
[test.pdf](https://github.com/user-attachments/files/24523289/test.pdf)
* `test.pdf` - A minimal reproduction file derived from a larger document. The content has been intentionally corrupted for confidentiality, but the parsing error still reproduces.
  * Structure: Consists of 4 pages (Pages 78, 79, 83, 84).
  * Behavior: Parsing succeeds for pages 78 and 84, but fails for pages 79 and 83. 

1.  Convert the attached `test.pdf` using `DocumentConverter`.
2.  Export using `export_to_doctags()`, `export_to_markdown(page_break_placeholder=str)`, `export_to_html(split_page_view=True)`.
3.  Count the `page break` 

**Reproduction Script:** 

```python
converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options=pipeline_options,
        ),
    },
) 

doc = converter.convert(source).document

doctags_output = doc.export_to_doctags()
markdown_output = doc.export_to_markdown(page_break_placeholder="===PAGE_BREAK===")
html_output = doc.export_to_html(split_page_view=True) 

print(doctags_output)
print(markdown_output)
print(html_output)
``` 

## Expected Behavior
* **Export:** export fuction should generate `pagebreak` tags for **all pages**, including failed ones, so that page numbering remains consistent. 

## Actual Behavior
**Missing `page break` for failed pages:**
If pages 1, 2, 4, 5 succeed but page 3 fails, the output currently looks like this: 

```xml
... content ...      1page
<page_break>
... content ...      2page
<page_break>
... content ...      4page
<page_break>
... content ...      5page
``` 

```markdown
... content ...      1page
===PAGE_BREAK===
... content ...      2page
===PAGE_BREAK===
... content ...      4page
===PAGE_BREAK===
... content ...      5page
```

```html
<td>
<div class="page">
... content ...      1page
</div>
</td> 

<td>
<div class="page">
... content ...      2page
</div>
</td> 

<td>
<div class="page">
... content ...      4page
</div>
</td> 

<td>
<div class="page">
... content ...      5page
</div>
</td>
```


### Environment 

| Component | Version / Details |
| :--- | :--- |
| **docling version** | 2.31.1 |
| **docling-core version** | 2.31.0 |
| **Python** | 3.11 |
| **OS** | macOS |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export functions fail to generate <page_break> tags for non-consecutive (skipped) pages #472

Description

Steps to Reproduce

Attachments

Expected Behavior

Actual Behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Version / Details
docling version	2.31.1
docling-core version	2.31.0
Python	3.11
OS	macOS

Export functions fail to generate <page_break> tags for non-consecutive (skipped) pages #472

Description

Description

Steps to Reproduce

Attachments

Expected Behavior

Actual Behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions