Let Al choose

Dynamic Text Chunking using LLMs

Establishing a high quality RAG pipeline is essential for ensuring the responses you get from your AI are coherent and accurate.

The first step is to of course create a vector database containing chunks of your text data. The standard approaches include separators or recursive text splitting with fixed and predetermined chunk sizes. These can be problematic when your documents are highly varied.

We propose an ai-in-the-loop framework to generate more contextual chunks using LLMs to sample the input documents and determine suitable chunk sizes and overlap.