DocChunk expansion

When chunking documents for retrieval, typically there is a limit on chunk size (due to embedding models limitations).
However, post retrieval tasks (such as re-ranking or generation) do not share this limit and actually benefit from larger context.
Take for example a long table, which is chunked, but the generator that is required to answer a question regarding the table, is likely to provide better answer if given the whole table.

A naive solution for that is numbering the chunks, and expand a retrieved chunk by retrieving also its adjacent chunks. However, this solution does not take into account the document structure, so we may end up with information which is not cohesive (e.g texts from different sections of the document).

The requested feature is a method, that given a DocChunk, expands it to another DocChunk with a cohesive content (e.g. subsection, table, ...). 

   

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DocChunk expansion #542

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DocChunk expansion #542

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions