The “R” in RAG

Nowadays, the term “RAG” is fairly well understood. Or is it?  Many know that it stands for retrieval augmented generation, but recently I’ve encountered some confusion around the “R” (retrieval) aspect of RAG.  I think that much of that confusion stems from the idea that there is a fixed way to implement RAG when working with large language models (LLMs). In this post, I’m going to try to clear up some of the potential confusion by introducing a few different types of retrieval techniques and what they bring to LLMs. I’ll cover:

  • The naïve RAG pattern
  • The differences in retrieval techniques for structured and unstructured data
  • The situations in which techniques used with unstructured data are best
  • The situations in which techniques used for structured data are best
  • The strength of combining techniques
  • Some emerging new RAG patterns

The Naïve RAG pattern

I’m starting with the “naïve” approach because once we understand the challenges and opportunities of this approach, we can imagine how this pattern can evolve (and frankly has evolved) to more advanced versions.  The naïve approach is mostly what one thinks about when one thinks about RAG and how it works: An agent or application receives input from a user/entity, finds relevant information to satisfy that request, packages it up in a prompt to give to an LLM, and waits for the response that the LLM provides.    

This means that there are categories of tasks that are essential for success:   

  • Identifying information that we want to make available
  • Preparing this information so we can find what we need
  • Figuring out what we need to send the LLM to get a relevant response

Conceptually these are very straightforward—for a human.    So, let’s walk through what each task truly means.  

Identifying Information That We Want to Make Available

Fundamentally, we should be able to tie this to the scope or goal of a project. Let’s use the example of a popular GenAI implementation such as a customer service chatbot. For such a chatbot to be successful, we would naturally need to consider information that customer service representatives need to reference when they support customers. In reality, this can generally be a mix of policies, procedures, emails, orders, customer information, other transactions, and potentially more.   If these customer service representatives have to search the web or use services or other tools, these may also be options to consider.  The scope of capabilities of our chatbot will greatly depend on the degree of comprehensiveness of the information we decide to include. If, for example, we only include shipping, return, or cancellation policies, for example, we limit our application accordingly. This can also determine the ultimate value or ROI our chatbot provides. 

So, for example, if you only associate document-based information with RAG, how much are you limiting your application? I have been in architectural discussions in which the general understanding is that only documents are suitable candidates for RAG implementations. But imagine how powerful these applications can be if we also provided access to transactions as they happen along with policies, interactions, external information etc. Also, what if we could leverage information from core systems, and in real time?  How much more value would this bring? However, not only do each of these data types have different techniques and considerations, but each needs to be collected differently and requires its own type of preparation. For the rest of this discussion, I will discuss structured and unstructured data.

Preparing This Information So We Can Find What We Need

Based on what we have identified as potential information, we can see that we have information such as emails, policies, etc., in document or unstructured form, and information in our core transactional systems usually in tabular, relational, or some other structured form. So, what do we consider when preparing unstructured vs structured information? Let’s take this from a conceptual perspective.

Preparing Unstructured Data

Emails, policies, procedures, and other forms of free-form text or documents make up what we commonly refer to as unstructured data. Statistics show that this accounts for most information in organizations. If we look at this at face value, this is overwhelming.   Think about the operational challenges such as how this information is secured, stored, shared, updated, etc. Despite these being challenges, these are at least familiar ones.    But there are other aspects when you need to leverage this information for an AI based application. One fundamental challenge is how to determine what information is relevant to address an unknown future request.

Search engines have actually been quite good at this.    When you interact with a search engine, you can receive pages and pages of results. We can scan our results and decide which is in line with our expectations. With RAG, this responsibility of determining what to select and how much to use lies with the application or agent that is driving. The importance of this step should be pretty clear. 

Today there are more advanced methods that go beyond the full text and keyword-based search on which these engines relied. These semantic and relationship-based approaches now help to find information that has similar meaning and contextual relationships.   In my blog Querying Minds Want to Know: Can a Data Fabric and RAG Clean up LLMs? – Part 3: Semantic Indexing of Enterprise Data I discuss semantic indexing and its importance in semantic search and retrieval.  When it comes to preparing unstructured data for semantic searching, it also requires thought around how to separate the information into sufficient chunks of meaning. For instance, does an entire document provide enough, or should we separate it into sections or paragraphs?   I won’t get into technical details here, but hopefully you have an idea of how this can affect retrieving the necessary information as well as a sense of how many patterns might be available for retrieving the best possible collection of information.

The takeaway here is that to leverage unstructured data, you have to prepare or preprocess it so as to be able to find the relevant information for a request. You must also keep in mind that the goal of techniques for preparing unstructured data is to yield information that is close in meaning or related to any given request.  

Preparing Structured Data

When it comes to structured data, the amount of needed preparation can vary widely. If you access data directly from systems, there is minimal preparation. If you move this information through data pipelines, ETL processes, etc., the preparation time can grow quite quickly. For many organizations, this structured data drives our core processes and enables measurement of how they are doing. The preparation challenges center around how to unify information and make it meaningful to consumers, as well as how to make the process for leveraging it generalizable and repeatable.

You can prepare structured data just as you would prepare unstructured data.   I provide insight into semantic understanding and unstructured data in my blog Querying Minds Want to Know: Can a Data Fabric and RAG Clean up LLMs? – Part 3: Semantic Indexing of Enterprise Data.  When going this route, it is crucial to understand that retrieving information is still based on semantic equivalence and the same challenges around how much to collect apply.   This is why it is very important to understand the technique used for collection and what it will retrieve.

But with structured data you can also perform that retrieval through structured queries.   Such queries are more analytic in nature, in contrast to the retrieval for unstructured information. When queries are formed properly, the collection is determined by that query along with its completeness.  Think about how we leverage visualization tools and the types of outcomes we can achieve.      

Now, you may be wondering how we determine queries for these unknown requests.    We do this by having the LLM create the structured query based on the request and information about the structured data—metadata. LLMs are quite good at generating structured queries when they have good metadata that is business friendly. Since structured queries use tables, it is important for columns to have business friendly names and descriptions as well as information on how tables relate to one another.    The more you can simplify this information, the better the LLM will be at giving accurate queries.

Now that we have insight into how to prepare information, and we’ve given some thought about the capabilities we want to enable in our AI implementations, we can see that how we prepare the data will greatly influence any limitations on our implementations.  It is also important to note that any decision made will require associated, scalable processes for the implementation to be successful.   I use an example of an account representative chatbot to discuss such processes in my blog series A Deep Dive into Harnessing Generative AI with Data Fabric and RAG.

Figuring Out What We Need to Send the LLM to Get a Relevant Response

You should now be able to see that what we send to our LLM will depend on what we have decided to include and how we can collect or “retrieve” it—the R in RAG.    What we include will dictate the preparation we need as well as how we can assess how well we did.

If we use unstructured data, then we can find relevant documents that are close in meaning or relate to our request, and we know that we are responsible for facilitating how and how much to collect.  However, when it comes to assessing the quality of that data, we do not have a direct indicator, so to speak. This is difficult because we need to align the user’s intent, the intent in the documents we retrieved, and the final response. We also need to be able to assess if our collection of information is relevant, sufficient, and/or complete. Requests that require a collection of facts or are analytic in nature can both be at risk of insufficient data collection.

AI developers need to decide on how to evaluate successfulness, and this is not the simplest of tasks. In this area, we start to see more research on various approaches that take us out of the naïve RAG pattern and into more complex patterns. 

If we use structured data, we can see if the LLM can generate a query and use it to retrieve relevant data. In this approach, the focus is on generating the correct query to align with the user’s intent. Processes that maintain good description and metadata significantly assist in this endeavor. Other practices that simplify the collection of tables, relationships, etc.,  will also go a long way to increase the accuracy of queries generated.  One saving grace here is that the query will either execute or produce an error so there is some level of assessment during the process. However, this depends on the query and an explanation being able help to provide the user insight into what the query will produce. This is relevant because SQL is ubiquitous and fairly readable by end users. These tools of assessment follow closely to the naïve pattern and there is still the option of leveraging more advanced methods.

One essential thing to point out is that when the SQL query is correct, it mitigates the concerns around retrieval completeness. In addition, SQL is very powerful and can retrieve real-time information and faithfully produce a plethora of analytic analysis to elicit visualizations and explanations.  Adding related unstructured data to assist with explanations and summarizations can make it even more powerful.

Either of these approaches will power a GenAI application. A general rule of thumb is that techniques leveraging unstructured data (semantic similarity) work well when providing fact-based information, whereas leveraging SQL with structured data works best with analytics-based requests. The takeaway here is that if we use both approaches, we can increase the likelihood of getting good responses, and we can make an application broader and more valuable.

Advances in RAG Techniques

Today, the naïve RAG pattern is a great foundation for GenAI applications. A Survey on RAG patterns and QueryRAG are some resources that describe working with unstructured and structured data, respectively.  As companies start to work with applications, we are seeing that more advanced RAG patterns are emerging.   Many involve leveraging both structured and unstructured data in concert with web searching and external API calling, as well as addressing the shortcomings in collection and generation.     

Advancements in RAG have led to the development of several specialized patterns, enhancing the precision and adaptability of AI systems across different sectors. These include RQ-RAG and Self-RAG, which seek to improve collection through self-reflection; Domain-Specific RAG, which integrates specialized knowledge for accuracy; GraphRAG, which uses knowledge graphs for enriched data connectivity; Agentic RAG and Modular RAG, coordinating multiple AI agents for complex tasks; Adaptive RAG, which dynamically adjusts retrieval processes based on query complexity; and Corrective RAG, focusing on accuracy by verifying information against reliable sources. These patterns demonstrate the growing sophistication and potential impact of RAG technologies in various applications.

I hope that you found this post helpful in your quest to implement RAG in your organization.

Loading

Add a Comment

Your email address will not be published. Required fields are marked *