Retrieval Augmented Generation (RAG) has emerged as a needed framework for improving the quality of LLM-generated responses. Without RAG, Large Language Models only have access to the knowledge contained in their training data. With the inclusion of RAG, LLMs can improve prediction quality by tapping into external data sources, building a prompt that is loaded with rich context and relevant knowledge. But is the use of standard RAG enough of a quality improvement on generated output to use in production applications? Extensive evaluation will show you a standard RAG pipeline is certainly not enough to avoid unexpected hallucinations, overlooked knowledge, and misunderstood context.
To combat these issues, we've developed the SuperStack: a collection of technologies, including Relevant Segment Extraction (RSE), AutoQuery, and Autocontext, that dramatically improves retrieval precision and output reliability. In this blog post, we will be honing in on RSE to better understand the need for variable-length context. We will also cover how RSE works and how to effectively use RSE when configuring your LLM.
What is RSE?
To understand RSE, we must first review the process of “chunking” text. In the context of building LLM-based applications, chunking is the process of breaking down large pieces of text, like entire documents, into smaller pieces of text like complete sentences or paragraphs. If our chunks are too large or too small, the LLM may overlook relevant information or misinterpret context, thus reducing the accuracy of retrieval. This problem requires an elegant solution and leads us to RSE.
Relevant Segment Extraction (RSE) is an optional (but strongly recommended) post-processing step that takes clusters of relevant chunks and intelligently combines them into longer sections of text that we call segments. These segments provide better context to the LLM than any individual chunk can. For simple factual questions, like “What was George Washington’s birthday?”, the answer can usually be found in a single chunk. However, more complex questions may have an answer that spans multiple pages of text. The goal of RSE is to intelligently identify the section(s) of text that provide the most relevant information, without being constrained to fixed length chunks.
Illustrating the Benefit of RSE
Let’s check out an example in the Superpowered AI user interface to see RSE in action. By asking a question with and without RSE, we can gauge the impact of this post-processing step on response quality. My first step is to create a knowledge base, and for this example, I’ve chosen the Sourcegraph company handbook, which is publicly visible. With RSE configured as “No” in the LLM settings, I am ready to ask my first question about Sourcegraph to get a baseline of what RAG can produce without RSE.
Question: What are Sourcegraph’s core values?

The answer above seems somewhat generic, giving me pause to think that this is precisely what is written in the Sourcegraph handbook. Let’s see what the right answer is.

Hmm, the LLM response to our question was clearly a generalized one, lacking specificity from the real text. Unfortunately, without RSE, the LLM could only see bits and pieces of this section in the document and couldn’t retrieve the exact answer from the handbook.
Let’s turn on RSE in our LLM configuration and see how the output changes.

We couldn’t have asked for a better response! With RSE, the LLM sees the entire Core Values section and provides the exact answer. This level of reliability in finding complete relevant knowledge from unstructured data is what makes the difference between a neat toy and a production-ready application. Let’s now discuss the configuration options that come with RSE.
Configuring RSE for Success
While the length of segments created will be determined in large part by the nature of the query, there is a configurable parameter called “segment_length” that you can use to bias the segments towards being shorter or longer. Here is a general guideline for what the average top segment length will be for each setting:
- Very short: 1-3 chunks
- Short: 2-6 chunks
- Medium: 4-10 chunks
- Long: 8-16 chunks
“Medium” is the default setting and should work well with most applications. For the Sourcegraph example above, we were configured to “medium” segments. If your use case primarily involves straightforward factual questions, you may get better results by selecting “Short” or even “Very short” segments. If your use case deals mainly with complex queries, opting for “Long” segments may yield better results. When determining your optimal configuration, it is best to experiment with a variety of real-world user inputs to identify the best fit for your use case.
RSE in the API:
In the query endpoint, you can simply pass use_rse
as true
or false
.
The chat endpoint has a few more moving pieces. You must first create a “chat thread” that will contain the chat history for every conversation. Each chat thread has a series of default options. One of those options is use_rse
(which defaults to true
). For all of the default options, please visit our REST API docs: https://docs.superpowered.ai/api/rest/index.html
If you ever want to override the chat thread defaults, you can just pass use_rse
when calling the /v1/chat/threads/{thread_id}/get_response
API endpoint or the get_chat_response()
Python SDK function.
We encourage readers to explore the Superpowered AI platform further and leverage its capabilities in their own applications. If you have an LLM-based app in mind, you’ve come to the right place. Join us in Discord, where our founding team is ready to help you build your vision!