Skip to content

DocChunk expansion #542

@odelliab

Description

@odelliab

When chunking documents for retrieval, typically there is a limit on chunk size (due to embedding models limitations).
However, post retrieval tasks (such as re-ranking or generation) do not share this limit and actually benefit from larger context.
Take for example a long table, which is chunked, but the generator that is required to answer a question regarding the table, is likely to provide better answer if given the whole table.

A naive solution for that is numbering the chunks, and expand a retrieved chunk by retrieving also its adjacent chunks. However, this solution does not take into account the document structure, so we may end up with information which is not cohesive (e.g texts from different sections of the document).

The requested feature is a method, that given a DocChunk, expands it to another DocChunk with a cohesive content (e.g. subsection, table, ...).

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions