Retrieval augmented generation (RAG)

Compiled by

Dean Giustini, UBC Biomed librarian, dean.giustini@ubc.ca

Updated

12 March 2026 | Part of Knowledge Synthesis (KS) & AI Search Wiki 2026 & A to Z Listing

Introduction

"...retrieval augmented generation (RAG) enhances large language models (LLMs) by retrieving relevant external information from a knowledge base or database and incorporating it into the generation process. This approach allows LLMs to provide more accurate, timely, and context-specific responses by supplementing their internal training data with up-to-date external knowledge..." — IBM Research

Retrieval augmented generation (RAG) refers to a technique combining the strengths of both retrieval-based and generative AI models. In RAG, an AI system first retrieves information from a large dataset or knowledge base and then uses this retrieved data to generate a response or output. Essentially, the RAG model augments the generation process with additional context or information pulled from relevant, trustworthy sources. Similarly, RAG allows underlying AI models to generate answers by retrieving relevant information directly from trusted, current sources rather than by relying on pre-trained data that may be outdated.

In a 2020 paper, Meta (then known as Facebook) came up with an RAG framework to give LLMs access to information beyond their training data. This allowed LLMs to build on a specialized body of knowledge to answer questions in more accurate way. In RAG, the model responds to a question by browsing through the content it has gathered, as opposed to trying to remember facts from memory. As the name suggests, RAG has two phases: retrieval and content generation. In the retrieval phase, algorithms search for and retrieve snippets of information relevant to the user’s prompt or question. In an open-domain, consumer setting, those facts can come from indexed documents on the internet; in a closed-domain, enterprise setting, a narrower set of sources are typically used for added security and reliability.

In 2025, Miao et al wrote that search is "...shifting from surface-level matching toward contextualized intent recognition, from vague semantics toward logic-driven dynamic retrieval, from passive toward active knowledge retrieval, and from simple aggregation toward coherent context construction. However, most RAG systems in the medical and nursing domains have not yet introduced reasoning methods, and those that have are still predominantly reliant on data‑driven associations without causal modeling."

RAG in searching

AI-powered search applications often rely on RAG, and focus on embedding models and vector databases for text similarity-based information retrieval. Rather than relying on memorized information as in most AI models, the goal in RAG is to retrieve trustworthy evidence chunks of “context” based on user queries (and, yes, prompts), and then to include that context when prompting AI systems. Increasingly, AI developers are finding vector-based search to provide sub-optimal retrieval and use techniques that combine traditional lexical and taxonomy-based search with vector searching.

Vector searching involves calculating embeddings, numerical representions of words in a vector space, for both content and user queries, and using similarity metrics such as cosine similarity and dot product to find pieces of content that closely match a user’s query. Vector databases are often a significant part of the information retrieval layer of RAG systems.

Given library traditions in metadata design, search, and information retrieval, librarians can assist in developing search frameworks to feed data into AI systems, grounding them and mitigating hallucination. Metadata in search systems can be used to filter data, scoping searches to subsets of corpora, similar to faceted navigation used in retrieval systems pioneered by librarians. Creation of relevant metadata to support these kinds of filtering and navigation systems is aligned with the librarian skill set. ALACCs and RUSA competencies focus on collection development which can be applied to designing and structuring corpora used in RAG. These skills are valuable in building domain specific and task specific RAG systems, where the development of the corpus must provide appropriate coverage of the specific use cases and questions that users are likely to present.

Many (if not all) of the AI-powered search tools such as Elicit.com and Undermind.ai use RAG techniques to deliver results.

Challenges for librarians

Librarians have already discovered, by testing AI-powered search tools, that RAG is not a complete solution to the problem of hallucinations in AI. RAG does improve the accuracy of large language models (LLMs), but does not eliminate all challenges. One limitation is that while RAG reduces the need to retrain models, it does not remove it entirely. LLMs struggle to recognize when they lack sufficient information to provide a reliable response. Without specific training, models may generate answers even when they should indicate uncertainty. This issue can arise when the model lacks the ability to assess its own knowledge limitations. Here are some of the other challenges:

Missing content in knowledge base: LLM provides incorrect answers due to absence of necessary information in database.
Difficulty in extracting answers from context: LLMs fail to extract correct answers due to noise or conflicting information in datasets.
Output in incorrect formats: The output from the LLM doesn't match the desired format, such as tables or lists.
Incomplete outputs: The model returns partially correct answers, missing some relevant information available in database.
Data ingestion: Large volumes of data overwhelm ingestion pipeline, affecting system's ability to manage and process data efficiently.
Working with PDFs: Extracting data from complex PDFs with embedded tables and charts requires sophisticated parsing logic due to inconsistent layouts and formats.

RAG systems may retrieve factually correct but misleading sources, leading to errors in interpretation. An LLM may extract statements from a source without considering its context, resulting in incorrect conclusions. When faced with conflicting information RAG models may struggle to determine which source is accurate. The worst case outcome of this limitation is that the model may combine details from multiple sources producing responses that merge outdated and updated information in a misleading manner. According to the MIT Technology Review, these issues occur because RAG systems may misinterpret the data they retrieve.

References

Disclaimer

Note: Please use your critical reading skills while reading entries. No warranties, implied or actual, are granted for any health or medical search or AI information obtained while using these pages. Check with your librarian for more contextual, accurate information.