Retrieval-Augmented Generation RAG and LLM Integration
Retrieval-Augmented Generation RAG and LLM Integration
Integration
2024 8th International Symposium on Innovative Approaches in Smart Technologies (ISAS) | 979-8-3315-4010-4/24/$31.00 ©2024 IEEE | DOI: 10.1109/ISAS64331.2024.10845308
Abstract— Advances in Natural Language Processing (NLP) data utilized for model training become less significant over
have led to the emergence of complex structures such as Large time, leading to a decline in model performance. As a
Language Models (LLM). LLMs are highly successful in preliminary solution, it was proposed that the models be
understanding the subtleties of language and processing context retrained with new data. Nevertheless, retraining the models
by being trained on large datasets. However, the difficulties with each new dataset is not The studies conducted by Gerard
encountered in Information Retrieval (IR) processes have Salton in the 1970s constituted the foundation for
created an awareness that these models are not sufficient on contemporary IR methodologies. Information retrieval (IR)
their own. Traditional IR methods have generally been can be defined as the process of finding keywords within a
insufficient in understanding the complexity of natural language
given text. The process entails identifying the desired message
in responding to specific queries and retrieving appropriate
information from documents or databases. Since this process is
and the sought information through a systematic examination
based only on keywords, it cannot fully capture the semantic of the text. Salton used TF-IDF to calculate the frequency of
meaning of the language. For this reason, it has been necessary terms in the text or document they appear in [1]. This approach
to go beyond traditional IR methods for more precise has informed the development of new IR models, including
information creation based on context and meaning. As a result the Best Match 25 and Vector Space Model. The objective of
of these requirements, the Retrieval-Augmented Generation IR models is to identify the document that is most relevant to
(RAG) architecture has come to the fore. RAG offers the ability a given query, and this is achieved through the use of
to create richer and contextually meaningful answers to user techniques such as pre-indexing documents and vectorizing
queries by integrating LLMs with information retrieval both documents and queries. Despite the efficacy of IR models
processes. This architecture allows the language model to in identifying the most pertinent document on a given subject,
instantly access external information sources; thus, it generates they have not yet reached the desired level of success due to
more accurate and contextual responses armed with existing the intricate structure of natural language. It has been
information. These features of RAG provide appropriate demonstrated that IR models are unable to meet the requisite
solutions to users' information-based demands by better demands in isolation.
understanding the complexity of natural language. In this study,
it is emphasized that the integration of RAG architecture with The efficacy of generative language models in generating
information retrieval systems and LLMs provides more text has declined over time due to the static and limited nature
sensitive and accurate solutions in information-intensive tasks. of the datasets. In response to these challenges, the Facebook
This study emphasizes that the RAG architecture's ability to artificial intelligence research team unveiled an architectural
retrieve information by dynamically using the learnings framework, designated as the RAG model, in 2020. The RAG
obtained from large datasets of LLMs strengthens applications model is based on the integration of text generation
in the field of NLP. capabilities inherent to generative artificial intelligence
models with the ability of IR models to identify the most
Keywords— Retrieval-Augmented Generation, Large pertinent text. The RAG architecture represents an approach
Language Models, Information Retrieval, Natural Language that combines generative models with information retrieval
Processing
models. The advantageous aspects of the two language models
have been combined for a common purpose.
I. INTRODUCTION
In studies conducted on NLP, LLM has demonstrated The primary characteristic of the RAG architecture is its
superior performance compared to other models. LLM has capacity to draw upon external sources of information in real
achieved this success with large datasets on which it was time, extending beyond the confines of the dataset utilized
trained with larger parameters. Nevertheless, the lack of during the text generation process. In this manner, the model
resources, insufficient current data, and inadequate number of is distinct from traditional, large language models that are
datasets on which LLMs were trained may impede the model's based on static datasets, as it is capable of generating texts that
success from reaching the desired level. LLMs are constrained are supported by dynamic, up-to-date information. The RAG
to generating text based on the datasets from which they were model employs IR systems to identify the most pertinent
trained. Nevertheless, the quantity of data generated on a daily documents for a given query and then synthesizes these
basis, both in the physical world and in the digital domain, is documents to generate accurate and comprehensive responses.
rapidly expanding. The inability of LLMs to adapt to an This approach enables the model to obtain more dynamic and
increase and updates in data has an adverse effect on the precise results by leveraging data that emerges after the
success of the model. The content, quantity, and caliber of the training period. Consequently, the RAG model seeks to
Authorized licensed use limited to: BMS College of Engineering. Downloaded on February 25,2025 at 16:28:22 UTC from IEEE Xplore. Restrictions apply.
address the limitation of LLMs being dependent on a static pages, etc.) based on a specific query. Traditional information
database. retrieval systems (such as search engines) are based on
keywords. However, with the development of artificial
The RAG architecture has enabled LLMs to ingest not intelligence and deep learning methods, more sophisticated
only the dataset they are trained on, but also larger and more systems have emerged [6].
current data sources, including the internet and archives.
Large language models, such as GPT (Generative Pre-trained Lewis et al. (2020) introduced their pioneering work, the
Transformer) and Bert (Bidirectional Encoder RAG architecture, to develop a model that improves the
Representations from Transformers), have been supported by performance of LLMs using external data sources in
IR models, including VSM (Vector Space Model) and DRM knowledge-intensive natural language processing tasks. This
(Dense Retrieval Model). This has enabled them to produce work demonstrates how RAG creates more accurate results by
more meaningful and accurate results. The RAG architecture accessing external databases without relying solely on the
has enhanced the capabilities of LLM by integrating external parameters of language models [7].
information sources. The RAG architecture is predominantly
employed in the context of complex language models. In In this context, the RAG architecture emerges as a
particular, the RAG architecture is commonly employed in remarkable innovation in natural language processing. This
architecture has the capacity to provide more contextual and
question-answering models to scan documents and identify
the pertinent subject matter, thereby facilitating the generation meaningful answers to users by integrating LLM and
of responses. In addition, the RAG architecture is employed information retrieval processes. When the literature is
in the generation of meaningful text, such as blog posts and examined, various studies are conducted to understand the
news articles, due to the enhanced efficacy of generative effectiveness and application areas of RAG.
language models when supported by information retrieval RAG architecture integrates with LLM’s information
models. The structure of the RAG architecture is also retrieval systems. Before answering a question, these systems
employed in dialogic applications, such as chatbots, as it retrieve relevant documents and use this information to create
enables the establishment of a continuous dialogue with the answers. This provides a great advantage, especially in
user and the scanning of the internet or large archives for up- providing accurate and up-to-date information. RAG is an
to-date answers. As can be observed, RAG architecture is a advanced system with models such as Google’s T5 and offers
methodology employed primarily in applications that more efficient information creation processes [8].
necessitate current and sophisticated information.
While RAG combines knowledge retrieval and generation,
II. RELATED WORKS similar approaches have also been developed. For example,
the FiD (Fusion-in-Decoder) architecture combines retrieved
NLP is a subfield of machine learning and deep learning
documents to create an answer. RePAQ (Retrieval-enhanced
that deals with the processing and interpretation of language.
Pretrained Autoregressive Query) offers a more compact
Transformer-based models (e.g. GPT, BERT) have directly
structure, allowing faster knowledge retrieval and response
influenced the development of architectures such as RAG.
generation [8] [9].
These models have formed the basis for systems that can
understand and create human language [2]. In his study, Reimers (2019) developed sentence-level
embedding techniques to make retrieval-based systems such
Transformer-based models such as BERT and GPT can
as RAG work more efficiently. This method is frequently used
capture the meaning of language in more depth by being
in the retrieval phase of RAG [10].
trained on large datasets. While BERT can effectively capture
the context of words in text thanks to its bidirectional The ColBERT system developed by Khattab and Zaharia
approach, models such as GPT have become more effective in (2020) has made the document retrieval process more efficient
text generation by using forward language modeling with a BERT-based bidirectional attention mechanism. This
(unidirectional). These models have strengthened the model has greatly contributed to the development of systems
language generation and understanding capabilities that form where retrieval and generative models work together, such as
the basis of systems such as RAG [3] [4]. RAG [11].
As the complexity of language models increases, the need Nogueira et al. (2020) studied pre-trained sequence-to-
for knowledge retrieval and access systems to improve the sequence models for ranking retrieved documents. This work
accuracy of models in knowledge-intensive tasks has also plays a critical role in the ranking and evaluation processes of
increased. Open-Domain Question Answering (ODQA) retrieved information in RAG systems [12].
systems have been developed to meet this need. These systems Guu, K. et al. (2020) Analyzing the effects of RAG on
scan large document collections and retrieve relevant information retrieval and language modeling, this study
documents to increase the accuracy of the answer created by provides important findings on real-time information
the language model when answering a user question. In such integration [13].
systems, instead of directly creating knowledge, language
models first retrieve the necessary documents and then create In their work, Karpukhin et al. (2020) examine the
an answer using these documents. Important steps have been transitive document retrieval methods based on the RAG
taken in the development of ODQA systems. First-generation architecture [14].
systems, such as DrQA, access large knowledge bases such as In their work, Xiong et al. (2021) examine the effects of
Wikipedia, find text fragments that are relevant to a particular transformer-based models on information sorting and discuss
question and create an answer using this information [5] . its relationship with the RAG architecture [15].
RAG-like architectures are based on information retrieval Gao et al. (2023) examine the development of the RAG
systems. These systems aim to retrieve the most relevant paradigm in their work, addressing the Naive RAG, Advanced
information from large datasets (documents, articles, web
Authorized licensed use limited to: BMS College of Engineering. Downloaded on February 25,2025 at 16:28:22 UTC from IEEE Xplore. Restrictions apply.
RAG, and Modular RAG models, and analyzing in detail the LLMs face difficulties in accessing accurate and up-to-
three core components of RAG systems: retrieval, generation, date information during the real-time usage processes of the
and augmentation techniques. They also introduce current model when they are trained on huge datasets. Language
evaluation frameworks and benchmarks, highlighting the models can create answers based on the dataset on which they
current challenges and future research areas of RAG [16]. are trained. This requires the model training process to be run
again after a data update in the dataset. This process is
In their study, Fan et al. (2024) systematically examine the insufficient in terms of time benefit and cost when the LLM
ability of LLMs integrated with RAG to improve content training process and costs are taken into consideration.
quality by leveraging external knowledge sources. The
research reviews the existing RAG and LLM literature from The training process of LLMs raises serious concerns
three main technical perspectives, evaluating the advantages about sustainability and a green environment. The equipment
offered by RAG to overcome the models' inherent knowledge used in model training are devices with high energy
constraints and ensure knowledge timeliness; it also discusses requirements and operate with high energy consumption for
current challenges and potential directions for future research long periods. This situation creates a significant
[17]. environmental impact in terms of sustainability and causes
excessive consumption of energy resources. The necessity of
Salemi et al. (2024) propose a new method for evaluating
retraining the model with each new data update turns into a
RAG systems, called eRAG, introducing an approach where process that harms the environment and has negative effects
each retrieved document is individually used by the large on energy resources over time. Alternative and more
language model. eRAG provides more accurate evaluations at sustainable solutions have been sought for this problem.
the document level while providing higher correlation and
significantly less computational resource consumption RAG architecture has become widespread as a flexible and
compared to traditional methods [18]. robust architecture against changes in the information in the
dataset that is effective in the IR processes of LLMs. An
There are a significant number of studies in the literature update on the data in the dataset provides access to up-to-date
on the applications and impacts of RAG architecture. These data in real-time without the need to retrain the language
studies deeply examine the advantages and solutions provided model. RAG architecture prevents the model from being
by the integration of RAG with knowledge retrieval processes dependent on fixed data with the IR layer and enables the
in natural language processing tasks. For example, the creation of responses based on dynamic data. With these
research conducted by Lewis et al. (2020) reveals how the features, RAG architecture has a flexible and robust structure
RAG architecture creates more effective answers in against changing data.
knowledge-intensive NLP tasks [7].
Studies conducted on the RAG architecture in the
literature reveal the potential and effectiveness of this system
in information-intensive natural language processing tasks.
RAG's ability to create more contextual and accurate answers
by combining large language models with information
retrieval processes is supported by various studies. In the
future, further development of this architecture will contribute
to the further strengthening of applications in the field of
information retrieval systems and natural language
processing. Therefore, the RAG architecture stands out as an
important innovation in terms of accelerating and increasing
the accuracy of information-based processes, and research in
this area will enable the development of more reliable and
effective systems. Such studies are critical to understanding Fig. 1. RAG Architecture.
the interaction between information retrieval and language
creation in RAG, and research conducted in this context plays RAG architecture consists of Retrieval Document Search,
an important role in the future development of the field. Augmentation, and Generation stages. These stages are as
follows.
III. LANGUAGE MODELS AND RAG ARCHITECTURE
• Retrieval Document Search: This stage includes
The intense interest in NLP and the artificial intelligence the process of finding and retrieving documents
ecosystem, which includes a large number of developers, has related to the question asked by the user from the
provided the basis for the spread of new technologies and dataset.
architectures. Language models trained on large datasets stand
out with their human-like abilities, such as creating text, • Augmentation: This stage includes the processes
editing, answering questions, summarizing, and translating by of making sense of the documents returned
learning the mathematical structure of the language. regarding the question, increasing the value of
the data by adding additional data and documents
Language models work by probabilistically modeling the and updating the data. In this way, the answer is
distribution of words in sentences and their use together, and created by adding not only the trained field but
thus can predict the next word. also the current data.
Language models are classified according to the dataset • Generation: This stage covers the process of
they are trained on and the parameter size used. While LLMs producing the final answer to the question asked
contain more than 100 million parameters, small language by the user. Answers to the relevant question are
models (SLM) contain less than 100 million parameters.
Authorized licensed use limited to: BMS College of Engineering. Downloaded on February 25,2025 at 16:28:22 UTC from IEEE Xplore. Restrictions apply.
created through data collected from different Modular RAG is an approach to optimizing all
channels. components separately and independently. The process of
separating them into modules makes the system more flexible
The RAG architecture is used as the basis for many and customizable. In this way, improvements and fine-tuning
architectures customized to different needs and domains. Its can be done only to a certain module. However, it can be
main variations are Standard RAG, Corrective RAG, difficult to ensure that different modules work seamlessly and
Speculative RAG, Fusion RAG, Agentic RAG, Self RAG, are fully compatible with each other [25].
Graph RAG, Modular RAG, and RadioRAG.
Radio RAG was developed to integrate real-time and
Standard RAG, the basis for other variations of this radiology information into LLM. It was tested using a dataset
architecture, is highly successful in question-and-answer called RadioQA and strengthened LLM’s ability to diagnose
systems, and in summarizing large texts. Despite its diseases with real-time radiological information. According to
widespread use, it may fail to retrieve data related to the user's the results of the tests conducted, it was observed that some
question in the IR step. This may cause the answer created to models increased the diagnostic accuracy by up to 54%. The
be incorrect or insufficient. test results demonstrate the potential of Radio RAG to
Corrective RAG proposes to add an additional verification improve and change disease diagnosis processes [26].
layer to check and correct the accuracy of the generated
Each variation of the RAG architecture has been shaped
answer after the IR and answer generation stages. In this way, according to different needs and challenges in different areas.
in cases where the answer created by standard RAG is The standard RAG is the basis for most architectures. On the
incomplete or insufficient, it can create more successful other hand, widely used and known variations such as
answers because it includes a re-improvement and answer Corrective RAG, Speculative RAG, Fusion RAG, Agentic
generation phase [19]. RAG, Self RAG, Graph RAG, Modular RAG, and RadioRAG
Speculative RAG offers a solution to provide correct have been developed to provide architecture suitable for the
answers in cases where the returned data is insufficient. It main requirements. As these models develop and prove their
involves the process of the model predicting the answer based success more accurately and contextually, their developer
on the information in the returned data and other information base and usage areas will increase day by day.
in the language model. However, the answer created as a result
of these studies may not be correct [20]. IV. RESULT
The RAG architecture is an advanced solution that
Fusion RAG is an architecture that allows a holistic
overcomes the current limitations of LLMs, offering
response to be created from data obtained by collecting data
significant advantages in effects-based missions. The
from different sources. In particular, it aims to create
shortcomings of LLMs, such as the inability to pass training
successful outputs by combining relevant data in cases where
data and the inaccessibility of external data sources, are
data from different data sources contradict each other.
effectively addressed by RAG's IR data forwarding. Thanks to
However, one of the biggest challenges to be encountered is
this architecture's ability to pull information from external
when there is a lot of data that contradict each other, in which
data sources, more contextual, up-to-date, and highly accurate
case it becomes difficult to ensure accuracy in the response
solutions can be created. Especially for applications with large
[21].
datasets and updated data, RAG revolutionizes the field of
Agentic RAG enables the model to decide independently NLP by providing more efficient and flexible partitioning of
which type of data it needs. In this way, it adds decision- language models. RAG has provided a solution to LLM's
making ability to the model, allowing the model to be used by concerns about maintainability. The model of ensuring that
prioritizing the most appropriate data in case of different types datasets are updated has been an alternative solution to the
of data. However, an error or failure in the prediction problem of re-training process. Architectures like RAG are
mechanism can directly affect the answer to be created and everywhere, with rich contributions from the accessibility of
may cause incorrect outputs to be created [22]. IR and language partitions, sustainability information, and
smarter and more dynamic systems.
Self RAG stands out with its feature of evaluating the
model's performance. The model contributes to the model's ACKNOWLEDGMENT
consistency with the dataset by evaluating the quality of the
answer it creates while producing an answer to the relevant This research was made possible thanks to the support and
question. However, since the model's performance evaluation infrastructure provided by Vakıf Participation Bank R&D
depends on the accuracy of the data, it will create wrong Center. The valuable contributions of Vakıf Participation
answers if the data is incorrect, incomplete, or insufficient Bank R&D Center were effective in the successful completion
[23]. of this study. We would like to express our sincere gratitude
to Vakıf Participation Bank R&D Center for their support.
Graph RAG enables understanding and organizing
information through relationships by incorporating graph- REFERENCES
based data structures into IR processes. It is suitable for use in [1] K. Spärck Jones, " IDF term weighting and IR research lessons,"
areas that require relational understanding, such as biological Journal of documentation, pp. 60(5), 521-523., 2004.
research in understanding the relationships between genes, [2] A. Vaswani, "Attention is all you need.," Advances in Neural
proteins, and diseases. However, outdated, incorrect, or Information Processing Systems, 2017.
incomplete graphs affect the accuracy of the answer to be [3] J. Devlin, "Bert: Pre-training of deep bidirectional transformers for
created. Therefore, graph structures must be kept correct and language understanding," arXiv preprint arXiv:1810.04805., 2018.
up-to-date [24]. [4] J. &. R. S. Howard, "Universal language model fine-tuning for text
classification," arXiv preprint arXiv:1801.06146., 2018.
Authorized licensed use limited to: BMS College of Engineering. Downloaded on February 25,2025 at 16:28:22 UTC from IEEE Xplore. Restrictions apply.
[5] D. Chen, "Reading Wikipedia to answer open‐domain questions.," [20] Z. W. Z. L. L. Z. H. S. M. S. P. V. .. &. P. T. Wang, "Speculative rag:
arXiv preprint arXiv:1704.00051., 2017. Enhancing retrieval augmented generation through drafting.," arXiv
[6] C. D. Manning, "Introduction to information retrieval.," 2008. preprint arXiv:2407.08223., 2024.
[7] P. P. E. P. A. P. F. K. V. G. N. .. &. K. D. Lewis, "Retrieval-augmented [21] Z. Rackauckas, "Rag-fusion: a new take on retrieval-augmented
generation for knowledge-intensive nlp tasks," Advances in Neural generation.," arXiv preprint arXiv:2402.03367., 2024.
Information Processing Systems, pp. 33, 9459-9474., 2020. [22] C. S. S. S. &. R. V. Ravuru, "Agentic Retrieval-Augmented
[8] G. &. G. E. Izacard, "Leveraging passage retrieval with generative Generation for Time Series Analysis.," arXiv preprint
models for open domain question answering," arXiv preprint arXiv:2408.14484., 2024.
arXiv:2007.01282., 2020. [23] A. W. Z. W. Y. S. A. &. H. H. Asai, "Self-rag: Learning to retrieve,
[9] P. W. Y. L. L. M. P. K. H. P. A. .. &. R. S. Lewis, "PAQ: 65 million generate, and critique through self-reflection.," arXiv preprint
probably-asked questions and what you can do with them," arXiv:2310.11511., 2023.
Transactions of the Association for Computational Linguistics, pp. 9, [24] B. Z. Y. L. Y. B. X. S. H. H. C. .. &. T. S. Peng, "Graph retrieval-
1098-1115., 2021. augmented generation: A survey.," arXiv preprint arXiv:2408.08921.,
[10] N. (. Reimers, "Sentence-BERT: Sentence Embeddings using Siamese 2024.
BERT-Networks.," arXiv preprint arXiv:1908.10084., 2019. [25] Y. X. Y. W. M. &. W. H. Gao, "Modular RAG: Transforming RAG
[11] O. &. Z. M. Khattab, "Colbert: Efficient and effective passage search Systems into LEGO-like Reconfigurable Frameworks.," arXiv
via contextualized late interaction over bert.," In Proceedings of the preprint arXiv:2407.21059., 2024.
43rd International ACM SIGIR conference on research and [26] S. T. L. M. B. K. S. R. F. D. K. C. .. &. T. D. Arasteh, " RadioRAG:
development in Information, pp. Retrieval (pp.39-48), 2020, July. Factual Large Language Models for Enhanced Diagnostics in
[12] F. N. R. &. L. R. (. .. B. 2. R. G. B. ,. P. .. S. Souza, "BERTimbau: Radiology Using Dynamic Retrieval Augmented Generation.," arXiv
pretrained BERT models for Brazilian Portuguese. In Intelligent preprint arXiv:2407.15, 2024.
Systems: 9th Brazilian Conference,," Springer International
Publishing., pp. Part I 9 (pp. 403-417), October 20–23, 2020.
[13] K. L. K. T. Z. P. P. &. C. M. Guu, "Retrieval augmented language
model pre-training.," In International conference on machine
learning, vol. PMLR, pp. pp. 3929-3938, 2020, November.
[14] V. O. B. M. S. L. P. W. L. E. S. .. &. Y. W. T. Karpukhin, "Dense
passage retrieval for open-domain question answering.," arXiv
preprint arXiv:2004.04906., 2020.
[15] A. N. R. &. L. J. Yates, "Pretrained transformers for text ranking:
BERT and beyond.," In Proceedings of the 14th ACM International
Conference on web search and data mining, pp. pp. 1154-1156, 2021,
March.
[16] Y. X. Y. G. X. J. K. P. J. B. Y. .. &. W. H. Gao, "Retrieval-augmented
generation for large language models: A survey," arXiv preprint
arXiv:2312.10997., 2023.
[17] W. D. Y. N. L. W. S. L. H. Y. D. .. &. L. Q. Fan, "A survey on rag
meeting llms: Towards retrieval-augmented large language models.,"
In Proceedings of the 30th ACM SIGKDD Conference on Knowledge
Discovery and Data Mining, pp. pp. 6491-6501, 2024, August.
[18] A. &. Z. H. Salemi, "Evaluating retrieval quality in retrieval-
augmented generation," In Proceedings of the 47th International ACM
SIGIR Conference on Research and Development in Information
Retrieval , pp. pp. 2395-2400, 2024, July.
[19] S. Q. G. J. C. Z. Y. &. L. Z. H. Yan, "Corrective retrieval augmented
generation.," arXiv preprint arXiv:2401.15884., 2024.
Authorized licensed use limited to: BMS College of Engineering. Downloaded on February 25,2025 at 16:28:22 UTC from IEEE Xplore. Restrictions apply.