[go: up one dir, main page]

0% found this document useful (0 votes)
17 views15 pages

From Natural Language To SQL Review of

This document reviews the evolution and advancements of LLM-based text-to-SQL systems, highlighting the transition from rule-based models to Retrieval Augmented Generation (RAG) frameworks. It discusses the challenges faced by these systems, such as schema understanding and linguistic ambiguity, and how RAG systems improve SQL generation through dynamic retrieval and enhanced schema linking. The survey also introduces a taxonomy for classifying text-to-SQL techniques and emphasizes the role of Graph RAG in enhancing accuracy and handling complex queries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views15 pages

From Natural Language To SQL Review of

This document reviews the evolution and advancements of LLM-based text-to-SQL systems, highlighting the transition from rule-based models to Retrieval Augmented Generation (RAG) frameworks. It discusses the challenges faced by these systems, such as schema understanding and linguistic ambiguity, and how RAG systems improve SQL generation through dynamic retrieval and enhanced schema linking. The survey also introduces a taxonomy for classifying text-to-SQL techniques and emphasizes the role of Graph RAG in enhancing accuracy and handling complex queries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

From Natural Language to SQL: Review of

LLM-based Text-to-SQL Systems


Ali Mohammadjafari∗ , Anthony S. Maida† , Raju Gottumukkala†
∗ PhD Student, School of Computing and Informatics, University of Louisiana at Lafayette, LA, USA
† Associate Professor, School of Computing and Informatics, University of Louisiana at Lafayette, LA, USA
† Associate Professor, School of Mechanical Engineering, University of Louisiana at Lafayette, LA, USA
{Ali.mohammadjafari1, maida, raju}@louisiana.edu

Abstract—LLMs when used with Retrieval Augmented Gen- AND ST_Distance_Sphere(Point(long, lat),
eration (RAG), are greatly improving the SOTA of translat- Point(University_Long,
arXiv:2410.01066v2 [cs.CL] 4 Feb 2025

ing natural language queries to structured and correct SQL. University_Lat)) < 2;
Unlike previous reviews, this survey provides a comprehensive
study of the evolution of LLM-based text-to-SQL systems, from This SQL query retrieves all gas stations within a 2-mile
early rule-based models to advanced LLM approaches that use radius of the University that have power. The text-to-SQL
(RAG) systems. We discuss benchmarks, evaluation methods, system enables the user to extract this specific information
and evaluation metrics. Also, we uniquely study the use of without needing to write the SQL query thus making complex
Graph RAGs for better contextual accuracy and schema linking data more accessible.
in these systems. Finally, we highlight key challenges such as
computational efficiency, model robustness, and data privacy Stack Overflow shows that 51.52% of professional develop-
toward improvements of LLM-based text-to-SQL systems. ers use SQL in their works. However, SQL is often a barrier for
Index Terms—Text-to-SQL, Large Language Models, non-technical users because of its technical nature. This leaves
Database, Natural Language Processing, SQL Generation a gap between the vast stores of data housed in relational
databases and the ability of many users to access related data
I. I NTRODUCTION efficiently [3].
A. Overview of the Text-to-SQ task However, building reliable text-to-SQL systems is highly
challenging. The complexity arises from several factors:
Organizations increasingly rely on relational databases to
• 1. Ambiguity and Linguistic Complexity:
manage and analyze vast amounts of structured information.
NL queries often include complex structures such as
These databases are critical parts of many modern systems,
nested clauses, pronouns, or vague terms. A query might
from business intelligence to customer relationship manage-
contain phrases like “all gas stations in the area,” where
ment. As the volume of data increases, the need to query,
“area” could refer to a specific region not directly men-
extract, and make sense of this information also increases.
tioned.
However, querying databases often requires the use of Struc-
• 2. Schema Understanding:
tured Query Language (SQL) which is a technical skill. There
To generate accurate SQL queries, text-to-SQL systems
is a gap between users who need access to data and the
must understand the database schema. In realistic appli-
specialized knowledge required to retrieve it [1]. Text-to-SQL
cations, database schemas are often complex and can vary
parsing in natural language processing (NLP) bridges this gap.
greatly between domains, making it challenging to map
LLMs make the task of translating natural language (NL)
the NL query correctly to the database schema [4].
queries given by a non-technical user to precise SQL easier.
• 3. Rare and Complex SQL Operations:
This ability is important when the complexity of data increases
Some queries involve uncommon SQL operations, such
and makes manual data exploration impractical and inefficient
as nested sub-queries, joins across multiple tables, or
[2]. For example, consider a relational database that contains
window functions. These operations can be difficult for
information about gas stations in Louisiana, with columns
text-to-SQL models to generate, especially if they are not
such as GasStationID, PARISH, NEED, STATION NAME, CITY,
frequently seen during training [3].
and WORKING_GAS_STATIONS_5_MILES among others. Suppose
4. Cross-Domain Generalization:
we have a NL query: “Where can I find a gas station with
Text-to-SQL models trained on one domain (e.g., cus-
power less than 2 miles from the University?” A text-to-SQL
tomer service databases) may not generalize well to other
system would analyze this query, understand the user’s intent,
domains (e.g., healthcare). Differences in schema struc-
and automatically generate the corresponding SQL query to
tures, terminology, and query patterns make it difficult
extract the correct information from the database. In this case,
for models to perform consistently across a wide variety
the final SQL query might look something like:
of databases [5].
SELECT STATION_NAME, location Addressing these challenges is essential for the effectiveness
FROM gas_stations
WHERE fuel_available = ’Yes’ of text-to-SQL systems. As databases become more complex,
AND distance < 2 higher accuracy of text-to-SQL models is also needed.
B. Introducing Retrieval Augmented Generation as a Solution II. E VOLUTION OF T EXT- TO -SQL S YSTEMS IN THE
L ITERATURE
As already stated, despite advances in existing text-to-SQL
systems, they still face limitations in schema understanding, A. Evolutionary Progression
handling complex queries, and generalizing across domains. Figure 1 shows the historical evolution of text-to-SQL
Retrieval Augmented Generation (RAG) has emerged as a systems. Initially they relied on rules, but now they use deep
promising framework to address these challenges. RAG sys- neural networks and powerful large language models (LLMs)
tems combine two key components: [3]. Each step in the evolution brought important innovations,
• A Retrieval Module that dynamically fetches relevant making it easier for the newer models to understand and create
schema details, SQL query template, or domain-specific SQL queries from everyday language.
knowledge from structured and/or unstructured sources • Rule-Based Approaches
like documents [6]. Early text-to-SQL systems used rule-based approaches.
• A Generative Module that produces text-based output, These systems used manually crafted grammar rules and
such as SQL queries or direct answers by adding the heuristics to translate NL queries into SQL commands.
retrieved context into the generation process. These methods were limited in handling complex queries
In this way RAG system instead of using the general and diverse schemas [9]. The fixed rules made it hard to
knowledge of LLMs, and guessing the answers, use related deal with the variety of human language, often causing
external knowledge to generate corresponding answers. RAGs errors when faced with tricky or unexpected query struc-
facilitate text-to-SQL tasks in two ways. tures. Also these systems needed a lot of manual work to
design and update the rules. This made it hard for them to
• Increasing SQL generation means that RAG retrieves adapt to new domains or database formats. Systems like
schema-specific information, query examples, or tem- LUNAR [10] and NaLIX [11] showed the potential of
plates to improve SQL query generation for complex or semantic parsing but required significant manual feature
ambiguous queries in natural language [7]. engineering, hindering scalability and adaptability [10],
• Bypassing SQL Generation means that instead of gener- [11]. Deep learning approaches have addressed many
ating SQL, RAG can directly retrieve answers from data of these limitations by introducing models capable of
sources. learning complex patterns and representations directly
This dual capability makes RAG a powerful approach to from data [12].
address the limitations of text-to-SQL systems. By dynami- • Deep Learning Approaches
cally retrieving context and integrating it into the response Deep learning greatly improved text-to-SQL systems
generation process, RAG improves schema understanding, by using large models to interpret NL and generate
resolves linguistic ambiguities, and adapts to new domains SQL queries. Models like Seq-2-SQL and SQLNet [13]
without extensive retraining. used sequence-to-sequence architectures like LSTMs and
transformers [14]. These models introduced end-to-end,
differentiable architectures that directly convert text to
C. Contributions of this Survey
SQL, offering improvements over earlier systems and
This survey provides a comprehensive review of text-to- allowing the models to learn complex patterns and re-
SQL systems and explores how Retrieval-Augmented Genera- lationships within the data, leading to more accurate
tion (RAG) can address their limitations. The key contributions and efficient translation compared to earliear rule-based
are: systems [13], [15], [16].
Transformer-based models like BERT [17] and TaBERT
• This survey presents a comprehensive review of text-to-
[18], enhancedd understanding of both database schema
SQL systems, including their challenges, datasets, bench-
and user intent. These models improved generalization to
marks, and evaluation metrics.
unseen databases by capturing dependencies between NL
• This survey identifies important limitations in existing
queries and underlying schemas. However challenges still
text-to-SQL systems, such as schema understanding, lin-
remained such as handling nested queries, cross-domain
guistic ambiguity, domain generalization, and complex
generalization, and efficiently mapping ambiguous natu-
query handling.
ral language to structured SQL still remained. [19], [18].
• This survey explains how RAG complements and en-
• Pre-trained Language Models
hances text-to-SQL systems, addressing key challenges
Pre-trained Language Models (PLMs) shifted the ap-
with dynamic retrieval capabilities.
proach from task-specific supervised learning to a more
• This survey introduces a taxonomy to classify text-to-
generalized pre-training method followed by fine-tuning.
SQL techniques, and illustrates how RAG fits into this
Models like BERT [17] (Bidirectional Encoder Repre-
landscape.
sentations from Transformers) and GPT [20] (Generative
• This survey explains how Graph RAG became the state
Pre-training Transformer) started this shift by using large-
of the art and can enhance the text-to-SQL systems’
scale, unsupervised text datasets to pre-train models that
accuracy.
could then be fine-tuned for specific tasks. The concept
Fig. 1. How text-to-SQL research has evolved over time, highlighting different implementation approaches. Each phase includes key techniques and notable
works. The dates are approximate, based on when these key works were released, with a margin of error of about a year. The design is inspired by [3], [8].

of PLMs, grounded in transfer learning, allowed models prehension, SQL generation, and SQL execution. Each
to acquire a deep understanding of NL through extensive step involves sophisticated techniques to ensure that user
pre-training on vast datasets, and this understanding trans- queries are accurately mapped to SQL, providing correct
ferred to many of tasks, like text generation, sentiment and meaningful results from database.
analysis, and question-answering [21], [22]. • RAG Systems in Text-to-SQL
PLMs were better at capturing semantic relationships The evolution of text-to-SQL system has seen significant
between user queries and database schema structures. advancements, with the integration of RAG marking
PLMs such as TaBERT and BERT-SQL [23] enabled the another step forward. By combining retrieval mechanisms
integration of both the NL query and the database schema with large-scale generative models, RAG systems address
into a unified representation that improved the system some of the persistent limitations in text-to-SQL tasks,
context. These models addressed several challenges in particularly in schema understanding, handling complex
text-to-SQL systems, such as handling complex queries queries, and domain generalization.
with multiple joins, nested queries, and cross-domain
– Dynamic Knowledge Retrieval: RAG systems uti-
generalization. However, PLMs still had limitations in
lize retrieval modules to fetch relevant schema in-
regards to their need for domain-specific fine-tuning and
formation, table relationships, and example queries
the difficulty in understanding complex database schemas
from structured or unstructured sources, such as
without additional schema linking mechanisms [24].
relational databases or external documents. These
• Large Language Models
retrieved elements are then integrated into the gener-
The advent of Large Language Models (LLMs) such as
ative process, providing real-time context to improve
GPT-4 [25], Codex [26], and LLaMa [27] has revolu-
SQL generation [30], [7].
tionized NL processing. LLMs are a major advance over
– Enhanced Schema Linking: Unlike earlier models
earlier machine learning methods by virtue their vast
that relied heavily on pre-defined schema represen-
size and training on massive datasets to generate more
tations, RAG systems dynamically adapt to schema
accurate and comprehensive responses. These models
complexities. They align user queries with database
show exceptional performance in tasks that require un-
schemas more effectively by retrieving schema-
derstanding and generating human-like text, often without
specific details during query processing, thus reduc-
needing additional fine-tuning [28].
ing errors caused by schema ambiguity [31] [32].
LLMs capture more complex relationships between NL
– Cross-Domain Generalization: Traditional text-to-
queries and structured database schemas than the earlier
SQL systems often struggle to generalize across
models. Unlike previous pre-trained language models,
different database schemas or domains. RAG systems
which require significant task-specific fine-tuning and
mitigate this challenge by leveraging domain-specific
schema linking, LLMs can handle zero-shot and few-shot
retrieval mechanisms, enabling seamless generaliza-
scenarios more effectively because of their large-scale
tion without extensive fine-tuning. This makes RAG
pre-training and reasoning capabilities [29]. Studies show
systems particularly effective for zero-shot or few-
that models like Codex [28] achieve high performance in
shot scenarios [31].
generating SQL queries with minimal prompt engineer-
ing. However, challenges such as handling ambiguous • Graph RAG System Graph RAG [33] offers a struc-
queries and optimizing SQL statements for performance tured and hierarchical approach to Retrieval-Augmented
and correctness remain [28]. Generation (RAG), making it a promising solution for
As seen in Figure 2, the architecture of LLM-based text- Text-to-SQL tasks. Unlike traditional semantic search
to-SQL systems can be broken down into several key methods that rely solely on text-based retrieval, Graph
phases: natural language understanding, schema com- RAG constructs a knowledge graph from raw data, orga-
nizes schema elements into a hierarchical structure, and eign keys, and other data, and help more precise linking
generates summaries for different database components. of query component.
By leveraging these structured relationships, GraphRAG • 3. Advance SQL Generation:
enhances schema understanding, improves retrieval accu- The retrieved schema details, query examples, and meta-
racy, and enables more precise SQL query generation, data are integrated into the prompt, guiding the LLM to
making it particularly effective for complex database generate SQL queries that are both correct and semanti-
interactions. cally aligned with the database.
• 4. Iterative SQL Execution and Error Feedback:
B. LLM-based Text-to-SQL Architecture and RAG-Integrated
RAG-to-SQL systems incorporate a feedback loop to ad-
Systems
dress execution errors. SQL execution errors are detected
• Natural Language Understanding: during query execution. These errors are used to retrieve
In traditional LLM-based text-to-SQL systems, the pro- additional context from the vector database. The SQL
cess begins with user input in the form of NL queries. query is re-generate and re-executed until the errors are
LLMs first reprocess these input to understand the user’s resolved.
intent, identifying key components such as entities, con- In Table I, High-level comparison between the LLM-based
ditions, and relations within the question. Text-to-SQL System and RAG-Integrated Systems architec-
• Schema Linking:
tures is shown. And also in Fig.3 the high level workflow of
Once the NL query is parsed, the system moves RAG-to-SQL is illustrated.
to schema linking, where the LLM maps the parsed
components of the query to the corresponding tables, III. BENCHMARKS AND EVALUATION METHODS
columns, and relationships in the database schema. For Evaluating LLM-based text-to-SQL systems is crucial for
example, “gas stations” is linked to a table named measuring how well they understand NL and generate accu-
GasStations, and “power” is matched with a column rate SQL queries. Researchers use a variety of datasets and
named PowerAvailable. This phase ensures that the benchmarks that test these models across different scenarios,
system can correctly interpret the query. both within specific domains and across multiple domains.
• SQL Generation: A. Types of Datasets used in Benchmarks
After the query has been parsed and linked to the Text-to-SQL research has made rapid progress with the help
schema, the LLM generates an SQL query based o the of many benchmark datasets, each contributing unique chal-
established semantic relationships. This stage uses the lenges for model development. These datasets are categorized
model’s understanding of SQL syntax and database logic into four types based on their characteristics: cross-domain,
to form a structured query that reflects the user’s intent. knowledge-augmented, context-dependent, and robustness. In
The generated SQL is then validated and optimized for Table ??, we categorized most well-known datasets according
accuracy and performance. to these criteria.
• SQL Execution and Output:
• 1. Cross-domain Datasets
The final SQL query is executed on the underlying Datasets like WikiSQL [35], Spider [36], and KaggleDBA
database (such as SQLiteor MySQL) to retrieve the re- [44] focus on evaluating the generalization capabilities of
quested information. The results of the query are returned models across multiple databases from different domains.
either in raw format or, in some systems, converted back These datasets test whether models can generate accurate
into NL for easier interpretation by the user. Figure SQL queries for databases they have not seen during
2 shows the flow from user input to the final SQL training [36].
query. Each phase makes the text-to-SQL systems more In RAG-TO-SQL systems, RAG has better performance
accessible for non-technical users. in cross-domain settings where schemas differ signifi-
RAG improves the traditional text-to-SQL architecture by cantly across database. Retrieval modules dynamically
integrating dynamic retrieval mechanisms with LLMs to ad- fetch schema details or examples from diverse domains,
dress challenges like schema complexity, ambiguous queries, and help to improve robust generalization.
and domain generalization. • 2. Knowledge-Augmented Datasets Datasets such as
• 1. Improved Natural Language Understanding: SQUALL [45] and BIRD [17] use external knowledge to
RAG-to-SQL systems dynamically retrieve relevant ex- improve the semantic of SQL generation. These datasets
ternal knowledge: 1. Schema metadata, example queries, aim to enhance the model’s comprehension by augment-
and documentation are fetched from a vector database ing the schema with additional context, allowing for
based on the input NL query. 2. If the query is am- more accurate and better SQL generation [17]. Spider-DK
biguous, RAG systems retrieve clarifying context, such [5] adds domain knowledge requirements to the spider
as domain-specific examples or schema descriptions, to dataset. RAG systems can retrieve external documents
refine understanding. or unstructured knowledge to handle questions needing
• 2. Schema Linking with Contextual Retrieval: additional context.
In RAG-to-SQL systems, RAG retrieves detailed schema • 3. Context-Dependent Datasets Datasets like CoSQL
information, including table-column, relationships, for- [4] and SParc [38] emphasize the conversational nature of
Fig. 2. Illustrates the key stages of the traditional text-to-SQL process using Large Language Models (LLMs).

Fig. 3. Illustrates the High-Level Workflow of RAG-based Text-to-SQL System (RAG-TO-SQL).

querying databases, where previous interactions influence natural language to SQL are tested [41].
current queries. These datasets challenge models to main- • 6. Multi-Lingual or Cross-Lingual Datasets
tain context throughout multi-turn interactions, making Test model performance across multiple languages, re-
them essential for developing systems that can handle quiring the system to map non-English queries to SQL.
complex, dialog-driven database queries [38]. etrieval modules in RAG systems can fetch schema
• 4. Robustness Datasets A dataset like ADVETA [46] mappings or query examples in the specific language of
tests the robustness of text-to-SQL systems, by introduc- the input, enhancing multilingual performance [37].
ing adversarial table perturbations. This method tests if • 7. Conversational or Interactive Datasets
models are capable of handling unexpected changes in These datasets designed for conversation settings, where
database schema or table structure, thereby assessing their context from previous turns must be considered [4].
adaptability to real-world scenarios [46]. The Table II CoSQL extends Spider for conversational text-to-SQL
datasets push the boundaries of text-to-SQL research, tasks, and SParC [38] is a dataset for conversational SQL
providing challenges across various dimensions, from queries with dependencies between turns. RAG systems
domain generalization to contextual understanding and can retrieve prior turns’ context dynamically, ensuring
system robustness. Retrieval modules fetch clarifying continuity across dialogue turns.
schema descriptions or mappings, and helps LLMs re-
solve ambiguities introduced in the dataset. B. Evaluation Metrics Used in Benchmarks
• 5. Semantic Parsing-Oriented Datasets Benchmarks for Text-to-SQL systems use metrics that cap-
These datasets designed for evaluating the semantic pars- ture both the correctness and efficiency of an SQL query.
ing capabilities of models, where precise mappings from These metrics test that systems not only produce accurate
SQL queries, but also perform efficiently in realistic database
TABLE I
C OMPARISON BETWEEN LLM- BASED T EXT- TO -SQL S YSTEM AND RAG-I NTEGRATED S YSTEMS ARCHITECTURES

Phase Traditional LLM-Based Text-to-SQL RAG-Integrated Text-to-SQL


LLM parses user intent based on RAG retrieves schema descriptions or
Natural Language Understanding
pre-trained knowledge. prior queries to enhance LLM’s understanding.
Relies on the LLM’s schema understanding Dynamically retrieves schema relationships
Schema Linking
and training data. and metadata to improve linking accuracy.
Generates SQL based on in-context learning Leverages retrieved examples/templates and
SQL Generation
or fine-tuning. supports iterative refinement via feedback loops.
Refines SQL based on execution errors or
SQL Execution Executes SQL and retrieves raw results.
retrieves answers directly from sources.

Fig. 4. Taxonomy of research approaches in LLM-based text-to-SQL. The format is adapted from [34].

environments. Figure 5 shows evaluation metrics that are 2) Execution-based Metrics: Focus on the actual perfor-
fall into two categories: Content Matching-based Metrics and mance of the generated SQL query when run on a database.
Execution-based Metrics. These metrics assess whether the queries not only follow the
1) Content Matching-based Metrics: Content matching- correct structure but also return the correct results and run
based metrics focus on how closely the structure of the efficiently in realistic scenarios.
generated SQL query matches the gold (or reference) query. • 1. Execution Accuracy (EX):
This type of evaluation is needed for ensuring that the model Checks whether the generated SQL query, when executed
follows the correct SQL syntax and structure, even if it might on the database, returns the same result as the gold query.
not always produce the most optimized SQL query. Execution accuracy focuses on the correctness of the
result, regardless of how the query is structured [36].
• 1. Component Matching (CM): • 2. Valid Efficiency Score (VES):
This metric evaluates each component of the SQL query The Valid Efficiency Score measures the computational
( such as SELECT, FROM, WHERE) individually. Even efficiency of the generated SQL query compared to the
if the components appear in a different order than in gold query. While a query might return the correct
the gold query, they are considered correct as long as result, it could still be inefficient, requiring unnecessary
they correspond to the expected components. This allows computational resources. VES penalizes queries that in-
for flexibility in the query structure while ensuring the troduce extra complexity, such as redundant sub-queries
essential parts of the SQL query are present and accurate or unnecessary joins, even if the results match. [4], [4].
[36].
• 2. Exact Matching (EM): C. Methods
Exact matching is stricter, requiring the generated SQL LLM-based text-to-SQL systems fall into two main classes
query to match the gold query exactly in terms of both of methods: In-context learning and Fine-tuning (FT). Each
structure and order. Every element including the sequence class use different strategies for training the model to generate
of components, must be identical to the gold query. The SQL queries from NL inputs. These methods rely on using the
disadvantage of this metric is it can penalize queries that model’s pre-trained knowledge through prompts or fine-tuning
are functionally correct but structured differently [36]. with added task-specific data to improve query generation.
TABLE II
C OMPARISON OF DATASET T YPES IN T EXT- TO -SQL: K EY C HARACTERISTICS , S TRENGTHS , C HALLENGES , AND E XAMPLES [35], [36], [4], [37], [17],
[38], [5], [39]. [40], [41], [42], [43], [7].

Category Focus Relation to RAG-to-SQL Strength Weaknesses Challenges (Posed to Models) Examples
Cross-Domain Tests generalization across RAG dynamically retrieves 1. Evaluates generalization 1. Limited focus on schema- 1. Handling unseen schemas with- 1. Spider
diverse database schemas and schema-specific mappings and across diverse schemas. specific nuances. out extensive fine-tuning. 2. WikiSQL
domains. examples, enabling better handling 2. Most common benchmark 2. May oversimplify domain- 2. Mapping ambiguous queries to
of unseen schemas. for real-world adaptability. specific challenges. schema elements.
Knowledge- Requires external domain RAG retrieves unstructured knowl- 1. Tests integration of external 1. Requires external knowledge 1. Linking domain-specific knowl- 1. Spider-DK
Augmented knowledge or context beyond edge (e.g., FAQs, documents) to domain knowledge. sources, increasing complexity. edge to schema elements. 2. BIRD
the schema itself. provide additional context for SQL 2. Simulates real-world 2. High reliance on retrieval accu- 2. Avoiding hallucinations during
generation. domain-specific challenges. racy. SQL generation.
Context-Dependent Multi-turn queries requiring RAG retrieves prior turns’ context 1. Evaluates continuity and co- 1. Limited datasets available for 1. Handling ambiguity in multi- 1. CoSQL
understanding of previous or intermediate query states to en- herence across dialogue turns. conversational SQL. turn interactions. 2. SParC
turns in a conversation. sure continuity across dialogue. 2. Tests memory and contex- 2. Requires additional context 2. Efficiently retrieving prior turn
tual adaptation. tracking and retrieval logic. information.
Robustness Evaluates performance under RAG retrieves schema clarifica- 1. Highlights system resilience 1. May overemphasize edge cases 1. Resolving synonym ambiguities. 1. Spider-Syn
adversarial or ambiguous tions, synonyms, or disambiguation to perturbations. that are rare in real-world applica- 2. Handling minimal or incomplete 2. Spider-Realistic
inputs. examples to handle perturbations. 2. Tests error handling under tions. schema descriptions.
noisy input conditions. 2. Often lacks domain realism.
Semantic Tests baseline NL-to-SQL RAG can retrieve schema meta- 1. Focuses on core NL-to-SQL 1. Limited to simple queries and 1. Achieving precise semantic pars- 1. GeoQuery
Parsing-Oriented translation accuracy and data or query examples to improve translation abilities. fixed schemas. ing for varied linguistic structures. 2. ATIS
schema linking. schema linking and parsing accu- 2. Serves as a foundation for 2. Doesn’t test domain generaliza- 2. Avoiding overfitting to simple 3. WikiSQL (single-
racy. all other categories. tion or robustness. datasets. turn queries)
Multi-Lingual or Evaluates performance across RAG retrieves schema metadata 1. Tests adaptability across 1. Limited availability of high- 1. Mapping language-specific 1. mSpider
Cross-Lingual multiple languages. and query examples in the same languages. quality multi-lingual datasets. terms to schema elements. 2. DuSQL
language as the input, improving 2. Highlights multi-lingual and 2. Heavily language-dependent re- 2. Handling multi-language
multi-lingual adaptation. cross-lingual generalization trieval models. inconsistencies in schema linking.
capabilities.
Real-World Benchmarks derived from RAG retrieves domain-specific 1. Evaluates systems on realis- 1. Often lacks diversity in schema 1. Integrating domain-specific 1. Financial SQL
Application real-world industry use cases. schema relationships or external tic, domain-specific queries. structure. external knowledge. 2. Nibiru
documentation to support domain- 2. High practical relevance for 2. Domain-specific datasets may be 2. Handling highly specific 3. ClinicalDB
specific SQL generation. real-world applications. inaccessible or proprietary. schemas with unique structures.
Conversational or Designed for interactive RAG retrieves historical interac- 1. Simulates real-world con- 1. High dependency on accurate 1. Managing long conversational 1. CoSQL
Interactive dialogue settings with tion logs or previous conversational versational settings. retrieval of prior interactions. histories. 2. SParC
evolving context. turns to ensure continuity and 2. Evaluates dynamic con- 2. Computationally intensive for 2. Handling ambiguous or implicit
context-awareness in SQL queries. text management in multi-turn large conversations. references across turns.
queries.

Fig. 5. Four main evaluation metrics falling into two categories.

1) In-context Learning: In in-context learning, the model towards generating accurate SQL queries. Table IV, classifies
generates the SQL query based on a given context without the most well known methods in five categories. The following
any updates to the model parameters. Instead, the model illustrates all these categories.
is guided through accurately constructed prompts. In-context
• Zero-Shot and Few-Shot Learning:
learning includes several categories that optimize how queries
In zero-shot learning, the model generates SQL queries
are generated and improve accuracy. Mathematically, the SQL
without any prior exposure to similar examples. the model
query generation task is described as
relies on its pre-trained knowledge to interpret the input
Y = f (Q, S, I; θ), (1)
and produce SQL queries. For example, in C3’s zero-shot
Y is the generated SQL query, Q is the NL question prompting of ChatGPT for text-to-SQL tasks, no fine-
set, S represents the database schema, I is the intermediate tuning is needed [2]. Zero-shot learning is most effective
reasoning, and θ are parameters of the pre-trained model. In when the LLM has been pre-trained on a vast corpus
in-context learning, this formula highlights that the model’s that includes SQL-related content. However, in few-shot
output is determined by the input information (Q, S, I). The learning, the model is given a few examples of input-
pre-trained knowledge embedded in θ, remains fixed during output pairs, to guide its generation of new SQL queries.
the process. The model’s performance depends on how ef- An example is FinSQL, where models are given a small
fectively the input prompt is engineered to guide the model set of SQL queries and asked to generalize from these
TABLE III
COMPARISON TABLE OF S TATE OF THE ART IN RAG-BASED TEXT- TO -SQL SYSTEMS VS SOME NEW TRADITIONAL LLM- BASED TEXT- TO -SQL

Method Name LLM Used Database Used Metrics Category Sub-category Novelty Weakness Reference
Chat2Data Not specified Not specified Execution accuracy RAG-based Knowledge- Uses vector databases to Lacks LLM [31]
Enhanced Retrieval, enhance prompt generation specification and
Decomposition detailed dataset usage
Sample-aware GPT-3.5 Not specified Exact matching (EM), Ex- RAG-based Dynamic Retrieval, Dynamic revision chain for Increased [47]
Prompting and ecution accuracy Context-Aware fine-grained feedback computational cost
Dynamic Revision Retrieval
Chain
RAG-based GPT-4o-mini PostgreSQL Exact matching (EM), Ex- RAG-based Knowledge-Enhanced Direct text-to-SQL query Struggles with [48]
Text-to-SQL ecution accuracy Retrieval generation database normalization
principles
RAG-enhanced LLMs GPT-3.5, Not specified Exact matching (EM), Effi- RAG-based Knowledge-Enhanced Evaluation across LLMs Increased query gener- [49]
GPT-4, Llama2 ciency Retrieval with and without RAG in- ation times
tegration
TAG Not specified Custom datasets Exact match, Accuracy RAG-based Knowledge-Enhanced Unified approach combin- High computational [50]
Retrieval, Schema- ing RAG and Text2SQL costs for iterative data
Augmented Prompting paradigms synthesis
Context-Aware Not specified OutSystems data Execution accuracy RAG-based Dynamic Retrieval, Handling large context Increased inference [51]
Generation Context-Aware sizes by reducing irrelevant time despite improved
Retrieval data accuracy
CRUSH4SQL GPT-3 SPIDER, BIRD, Recall, Accuracy RAG-based Schema-Augmented Leverages hallucination as Increased complexity [52]
SocialDB Prompting, Context- a mechanism for high- and resource
Aware Retrieval recall schema subsetting requirements for
schema extraction
FATO-SQL Medium-scale Petrochemical data Accuracy RAG-based Context-Aware Four-stage process with Scalability is limited by [53]
LLMs Retrieval, Dynamic two rounds of LLM calls medium-scale LLM pa-
Retrieval to improve SQL generation rameters
Distyl-SQL Proprietary Not specified Execution accuracy RAG-based Hierarchical Incorporates pre- High latency and needs [54]
LLM CTE, Knowledge processing for hierarchical adaptive knowledge en-
Management decomposition of SQL hancement
generation
TARGET Not specified Various Accuracy, Recall RAG-based Table Retrieval Benchmarking retrieval ef- High variability in [55]
fectiveness for generative retriever performance
tasks over tabular data across datasets
XRICL Codex SQLite Component Matching In-context Zero-shot, Prompt Op- Translation as Chain-of- Limited cross-lingual [56]
(CM), Execution accuracy timization Thought prompt for non- retrieval exemplars
English Text-to-SQL availability
RSL-SQL GPT-4o, Spider, BIRD Execution accuracy Fine-tuning Enhanced Architecture, Bidirectional schema link- Increased complexity [57]
DeepSeek Data Augmentation ing to mitigate schema and risks from multi-
pruning risks turn strategies
TCSR-SQL Not specified Custom benchmark Execution accuracy, Exact- In-context Reasoning Self-retrieval and multi- High costs for fuzzy [58]
set-match Enhancement, Schema- round generation- testing with large
Augmented Prompting execution-revision process datasets
E-SQL Not specified BIRD Execution accuracy In-context Question Decomposi- Direct schema linking and Diminishing returns [59]
tion, Execution Refine- candidate predicate genera- when used with
ment tion advanced LLMs
DataGpt-SQL DataGpt-SQL- Spider Accuracy (EX, TS) Fine-tuning Data Augmentation, Uses preference datasets Requires fine-tuning on [60]
7B Decomposition and self-refine mechanisms large datasets, poten-
to mitigate LLM risks tially increasing costs
SEA-SQL GPT-3.5 Spider, BIRD Execution accuracy In-context Reasoning Adaptive bias elimination Limited efficiency [61]
Enhancement, and dynamic execution ad- when using complex
Adaptive Refinement justment schema structures
TA-SQL GPT-4 BIRD, Spider Execution accuracy In-context Task Alignment, Task alignment to miti- Dependence on task [62]
Schema Linking, gate hallucinations, provid- alignment makes
Logical Synthesis ing robustness in SQL gen- implementation more
eration complex
Interactive-T2S GPT-4, BIRD, Spider-Dev Execution accuracy In-context Multi-Turn Interaction, Stepwise generation with Lack of scalability for [63]
ChatGPT Prompt Optimization four tools for proactive in- wide tables
formation retrieval
SQLfuse Open-source Spider Execution accuracy Fine-tuning Enhanced Architecture, SQL generation with a Complex module inte- [64]
LLMs SQL Critic critic module for continu- gration requires high
ous improvement system resources
Knowledge-to-SQL DELLM BIRD, Spider Execution accuracy Fine-tuning Data Expert, Schema Uses tailored Data Expert Knowledge generation [65]
Augmentation LLM for knowledge en- specialized for specific
hancement in SQL genera- domains only
tion

examples [43]. understanding. Methods like ACT-SQL use prompt op-


• Decomposition: timization to enhance SQL generation by structuring the
This technique breaks complex queries into simpler sub- prompts more efficiently and linking them directly to the
queries. Decomposition methods can involve dividing database schema [66].
a challenging NL question into multiple SQL queries • Reasoning Enhancement:
that are easier to generate. Decomposition improves This methods, such as Chain of Thought (CoT) and Tree
the model’s ability to handle multi-step or nested SQL of Thoughts (ToT), guide the model to think step-by-step.
queries [77]. These approaches enable the model to solve complex
• Prompt Optimization: queries by generating intermediate reasoning steps before
This involves refining the input prompts to achieve better arriving at the final SQL output [76], [75].
SQL generation. By providing more structured or sample • Execution Refinement:
prompts, the model is better guided in generating the These methods involve iterative improvements to the
appropriate SQL queries. Techniques like prompt design SQL query. The model generates an initial SQL query,
and prompt calibration fall under this category, ensuring executes it, and then refines it based on the results. This
the prompts are constructed to maximize the model’s method ensures that the generated query is optimized
TABLE IV
S TATE OF ART METHODS USED IN IN - CONTEXT LEARNING FOR LLM- BASED TEXT- TO -SQL. C1: ZERO SHOT AND FEW SHOTS LEARNING , C2:
D ECOMPOSITION , C3: P ROMPT O PTIMIZATION , C4: R EASONING E NHANCEMENT, C5: E XECUTION R EFINEMENT, [66], [67], [68], [69], [65], [70],
[71], [43], [72], [73], [2], [74], [75], [76], [77]. T HIS TABLE WAS COMPILED BASED ON INFORMATION FROM PREVIOUS WORK [3]

Methods Applied LLM Database Metrics Methods Categories Released


ACT-SQL GPT4 Spider, BIRD EX, EM, etc. Prompt Optimization, Reasoning Enhancement Oct-23
Schema-free GPT4 BIRD EX Reasoning Enhancement, Execution Refine- Aug-24
ment
MetaSQL GPT4 BIRD EX, EM Decomposition Mar-24
SQL-CRAFT GPT4 Spider, BIRD EX Reasoning Enhancement, Execution Refine- Feb-24
ment
FUXI GPT4 BIRD EX Reasoning Enhancement, Execution Refine- Feb-24
ment
DELLM GPT4 Spider, BIRD EX, VES Prompt Optimization, Execution Refinement Feb-24
SGU-SQL GPT4 Spider, BIRD EX, EM Decomposition Feb-24
FinSQL LLamA2 BULL EX Zero shot and few shots learning Jan-24
DAIL-SQL GPT Family Spider, BIRD EX, EM, VES Prompt Optimization Nov-23
FastRAT XLM-RoBERTa Spider, Cspider EX, EM Execution Refinement Nov-23
CatSQL BERT Spider, WikiSQL EX Zero shot and few shots learning, Execution Nov-23
Refinement
C3 GPT Family Spider EX Zero shot and few shots learning, Prompt Op- Jul-23
timization, Execution Refinement
CoT GPT4 Spider, BIRD EX, VES Reasoning Enhancement May-23
ToT GPT4 OTHER OTHER Reasoning Enhancement, Execution Refine- May-23
ment
DIN-SQL CodeX, GPT-4 Spider, BIRD EX, EM Decomposition, Execution Refinement Apr-23

TABLE V
S TATE OF ART METHODS USED IN FINE - TUNING (FT) FOR LLM- BASED TEXT- TO -SQL. C1: P RE - TRAINED , C2: D ECOMPOSITION , C3: DATA
AUGMENTED , C4: E NHANCED A RCHITECTURE [78], [64], [79], [80], [81], [82]. T HIS TABLE WAS COMPILED BASED ON INFORMATION FROM
PREVIOUS WORK [3]

Methods LLM Database Metrics Methods Categories Released Time


SQL-GEN LLamA3 BIRD EX Pre-trained, Data Augmented Aug-24
SQLfuse LLamA2-70B Spider EX Pre-trained Jul-24
CLMMs Deepseek Spider EX Enhanced Architecture Mar-24
CodeS StarCoder Spider, BIRD EX, VES Data Augmented Feb-24
DTS-SQL Mistral Spider, Spider-SYN EX, EM Decomposition Feb-24
Symbol-LLM CodeLLaMA Spider EM Data Augmented Nov-23

for performance and correctness, minimizing errors in In fine-tuning, the model learns patterns specific to SQL
execution [73]. generation, such as understanding database schema and query
2) Fine-Tuning: Fine-tuning involves refining the model’s syntax. This helps it to perform better at generating SQL
internal parameters, θ, using task-specific datasets. Unlike queries by modifying its internal parameters based on task
in-context learning methods, where the model’s parameters data, making the model more specialized and accurate for the
remain fixed and prompts are the primary mechanism of SQL task. The new parameters, θ′ , improves model’s ability
control, fine-tuning updates the model’s parameters based to generalize across different databases and queries. Currently,
on examples from the target task. This process allows the a number of studies have been released exploring an improved
model to become more specialized in tasks like SQL query fine-tuning method. Table V shows categorized well-designed
generation, improving the ability to translate NL questions into fine-tuning methods.
accurate SQL queries over time. • Pre-trained Methods: Pre-trained methods form the
Mathematically, the outcome of this process is represented backbone of fine-tuning for text-to-SQL systems by lever-
as a function g: aging the general knowledge embedded in LLMs, such as
θ′ = g(θ, D) (2)
GPT, LLaMA, and T5. These models, trained on diverse
where θ is the pre-trained model’s parameters, D is the task- textual data, are adapted for SQL query generation by
specific dataset (pair of questions and SQL queries) and θ′ fine-tuning with task-specific data. The fine-tuning pro-
represents the updated parameters after fine-tuning [79]. cess enhances their ability to interpret NL and accurately
map it to SQL commands across different domains and 3) RAG-based Text-to-SQL System: RAG-based text-to-
database schemas [83]. Examples like SQL-GEN show SQL systems integrate dynamic retrieval abilities with genera-
how pre-trained models are fine-tuned with synthetic data tive models to improve SQL query generation [31]. These sys-
for dialect-specific SQL generation, while systems like tems can be categorized into 5 categories: Dynamic Retrieval,
RESDSQL [83] fine-tune LLMs on datasets like Spider Knowledge-Enhanced Retrieval, Schema-Augmented prompt-
for complex query handling [78]. ing, Context-Aware Retrieval, and Robustness Enhancement.
• Fine-Tuning Decomposition: Fine-tuning decomposition • Dynamic Retrieval:
methods aim to enhance the performance of LLM on text- These systems dynamically fetch schema-related infor-
to-SQL tasks by breaking down the complex process of mation, such as metadata, table descriptions, or previ-
query generation into smaller and manageable sub-tasks. ously used queries, to provide relevant context for SQL
The main idea is to address each sub-task individually, generation [51]. These systems improve adaptability in
thereby allowing the model to better focus and fine-tune zero-shot or few-shot scenarios [47]. However, compu-
its parameters for specific challenges related to text-to- tational overhead due to repeated retrieval queries can
SQL generation. By decomposing the task into stages reduce efficiency in real-time applications.
like schema linking and query formation, model can be • Knowledge-Enhanced Retrieval:
trained to handle these distinct processes more effectively Methods in this category integrate domain-specific un-
than if it were trained on the entire query generation task structured knowledge with schema-based retrieval to im-
all at once [80]. The typical fine-tuning decomposition prove SQL generation [50]. These systems bridge the
process involves: gap between schema and query understanding by using
external knowledge sources [48]. Also these systems
– Task Segmentation: breaking down the text-to-SQL are useful for handling domain-specific terminology and
conversion into smaller tasks like schema linking and queries with incomplete schema information. However,
SQL query generation. these systems reliant on the quality and availability of
– Sequential Fine-Tuning: Training the model on domain specific knowledge that may not always be ac-
these sub-tasks in sequence or in parallel so that each cessible.
sub-task is learned optimally. • Schema-Augmented Prompting:
Schema-Augmented Prompting use retrieved schema in-
• Data Augmented Methods: The performance of the formation to precise prompts for LLMs [52]. These
model is particularly affected by the quality of the systems improved accuracy for complex SQL like those
training labels during fine-tuning. Inadequate labeling requiring multi-table joins or nested operations [50].
can be counterproductive and often optimal results are However, prompt construction may become verbose be-
not achieved. Rather, if effective augmentation or high- cause of token limitations and inefficiencies in LLM
quality data is present, fine tuning is likely to yield inference.
results more than even the best fine tuning strategies • Context-Aware Retrieval:
implemented in low quality or raw data. In text-to-SQL Context-Aware Retrieval systems focus on retrieving rel-
and other problems data-augmented fine-tuning has pro- evant context across multi-turn conversations or inter-
gressed greatly, as more efforts now aim at improving the active sessions to generate accurate SQL queries [53].
data quality rather than the architecture. As an example, These systems have a good performance on Handling
Symbol-LLM has developed an injection and an infusion ambiguities in follow-up queries and maintains continuity
phase with a focus on improving the data quality during across dialogue turns [51]. However, Context tracking
instruction tuning [82], [81]. and retrieval can become expensive as conversations grow
• Enhance Architecture: The generative pre-trained longer.
transformer (GPT) framework employs a decoder-only • Robustness Enhancement:
transformer architecture combined with standard auto- Robustness systems retrieve alternate schema interpreta-
regressive decoding for text generation [3]. However, tion or use synonym mappings to handle ambiguity and
recent research on the efficiency of large language models adversarial challenges in SQL generation. These systems
(LLMs) has highlighted a shared challenge: when gener- increase resilience to noisy inputs and ambiguous schema
ating long sequences in the auto-regressive paradigm, the mapping. However, if irrelevant data is included, the
attention mechanism increases latency. This issue is pro- potential of having inaccurate SQL is increase.
nounced in LLM-based text-to-SQL systems, where gen-
erating SQL queries is slower than traditional language 4) Novelty and Advantages of RAG-based Systems: RAG-
modeling, posing a challenge for developing efficient, based text-to-SQL systems introduce several novel features
localized NL interfaces to databases (NLIDB). To address that distinguish them from in-context learning and fine-tuning-
this, Consistency large language models (CLLMs) has based methods:
been developed with an enhanced model architecture, • 1. Dynamic Contextualization:
providing a solution to reduce latency and speed up SQL Unlike the static fine-tuned models, RAG-based systems
query generation [79]. can dynamically adapt to new schemas or domains by
TABLE VI
C OMPARISON OF RAG-BASED , I N -C ONTEXT L EARNING , AND F INE -T UNING METHODS FOR LLM- BASED TEXT- TO -SQL

Aspect RAG-Based In-Context Learning Fine-Tuning


Excels in zero-shot and few-shot Limited without extensive prompt Domain-specific; requires retraining
Generalization
settings with dynamic retrieval. engineering. for new schemas.
High, due to integration of Moderate, depends on prompt Low, requires new datasets for
Domain
Adaptability domain-specific knowledge. design. domain adaptation.
Slower due to retrieval Faster but limited by prompt Faster inference but requires
Efficiency
latency. complexity. time-intensive fine-tuning.
Implementation High, due to integration of Moderate, requires careful High, involves pretraining and
Complexity retrieval mechanisms. prompt design. schema-specific tuning.

retrieving relevant context. A. Novelty of Using Graph RAG


• 2. Enhanced Generalization: Graph RAG introduces a new approach to retrieval-
The retrieval mechanisms allow these systems to excel augmented generation by incorporating graph-based structures
in zero-shot or few-shot scenarios, making them more to organize and retrieve knowledge. This method introduce
flexible than traditional fine-tuning approaches. solutions for the limitations of traditional RAG systems by
• 3. Multi-Domain Support: emphasizing the connecting data through graph representations
By integrating domain-specific knowledge with schema to each other [33].
retrieval, RAG-based systems outperform in-context
One of its most innovative aspects is its use of modu-
learning models in specialized applications.
larity detection algorithms, such as Leiden [85], to partition
• 4. Iterative Refinement:
knowledge graphs into manageable subgraphs. This modular
The feedback loops for SQL generation and execution
approach facilitates efficient summarization and parallel pro-
improve accuracy over time, a feature that is absent in
cessing, which is valuable for handling large-scale datasets.
most fine-tuning and in-context learning methods.
Additionally, Graph RAG employs advanced mechanisms like
5) Weaknesses of RAG-Based Systems: While RAG-based triple graph construction, where entities, their attributes, and
systems offer advantages, they also come with certain weak- relationships are linked to credible sources and controlled
nesses compares to other approaches: vocabularies [84], [33].
• 1. Computational Overhead: This approach ensures that generated SQL queries are not
Retrieval mechanisms increase latency, making these sys- only accurate but also grounded in verifiable evidence. Another
tems less efficient for real-time applications. novelty is its retrieval framework, which combines top-down
• 2. Dependency on Retrieval Quality: and bottom-up retrieval strategies to balance context awareness
The effectiveness of RAG-based systems is reliant on the with retrieval efficiency. By integrating these advanced graph-
quality and relevance of retrieved data. So, poor retrieval based techniques, Graph RAG redefines how retrieval and
can degrade performance. reasoning are performed, setting a new standard for RAG
• 3. Scalability Issues: systems [33], [86].
For databases with large schemas or noisy knowledge
B. Why is RAG a Promising Solution for Current LLM-Based
sources, the retrieval process can become a bottleneck.
Text-to-SQL Limitations?
In Table VI, this study summarized a comparison of RAG-
Graph RAG addresses many of the persistent challenges
Based, In-Context Learning, and Fine-Tuning methods for
faced by current LLM-based Text-to-SQL systems [33]. One
LLM-based text-to-SQL.
of its core strengths is its ability to understand and model
database schemas through graph-based relationships [87]. Un-
IV. G RAPH RAG IN T EXT- TO -SQL S YSTEMS , A like traditional approaches that rely on flat schema descrip-
P ROMISING S OLUTION tions (refer to traditional, straightforward representations of
a database schema, typically listing the names of tables,
Graph RAG is an advanced framework that integrates graph- columns, and their basic attributes in a linear or tabular
based knowledge representation with retrieval-augmented gen- format), Graph RAG captures relationships between tables,
eration techniques [33]. Unlike traditional RAG systems that columns, and entities, enabling precise and efficient schema
retrieve isolated textual data, Graph RAG builds a structured linking [84]. This capability is critical for tackling the com-
knowledge graph derived from the source documents. This plexity of modern databases, particularly in cross-domain
graph organizes entities, relationships, and contextually rele- scenarios. Moreover, Graph RAG’s structured graph connec-
vant responses [84]. tions and retrieval mechanisms improve its ability to resolve
ambiguities in queries, such as synonym discrepancies or in- Cross-lingual support of Graph RAG:
complete schema descriptions [88]. Its modular design allows Extend to support multi-lingual or cross-lingual queries,
it to generalize across diverse domains by adapting graph making it effective for use in a wide number of global
construction processes to specific requirements [86]. Further- use cases. By incorporating language-specific knowledge and
more, by synthesizing information across graph communities, translation mechanisms into the graph, Graph RAG systems
Graph RAG can handle complex, multi-faceted queries that can facilitate SQL generation across diverse linguistic con-
often confuse traditional systems. Also, the pre-indexed graph texts.
structures reduce computational overhead during retrieval, Real-time data handling:
improving efficiency without compromising accuracy. These The challenges related to data freshness and accuracy can
features position Graph RAG as a robust and adaptable frame- be overcome by real-time graph updates combined with strong
work for advancing Text-to-SQL systems beyond their current retrieval mechanisms. This is particularly important for time-
limitations. critical applications such as financial reporting or live analyt-
ics, which require query results to show the most up-to-date
V. C ONCLUSION data.
Graph RAG systems are a new paradigm in Retrieval- Improved Explainability:
Augmented Generation, where the mechanism of graph-based Future versions of Graph RAG should be improved in terms
structures enhances the generation of SQL queries from nat- of explainability of query generation processes. These systems
ural language. Unlike traditional RAG methods, Graph RAG can only earn trust and ensure transparency in their outputs by
integrates knowledge and schema information into interlinked providing users with clear visualizations of graph traversal and
graphs, allowing for accurate schema understanding, multi- reasoning steps.
hop reasoning, and ambiguity resolution. The structured nature Whereas graph RAG promises to do so with unprecedented
of this approach significantly alleviates many limitations that robustness and scalability in a constantly more complex setting
most current LLM-based Text-to-SQL systems suffer from, of the most modern databases with dynamic interactions, the
such as handling complex queries, schema ambiguity, and work done solves crucial limitations that current systems must
domain generalization. resolve with proposals toward their future enhancements and
Graph RAG systems have indeed made significant progress makes graph RAG fully empowered for the revolution to come
both in scalability and efficiency by leveraging modularity, with interacting through natural language towards structured
community detection, and graph traversal techniques. Graph query generation.
RAG improves SQL query accuracy by combining strengths
VI. L IMITATION OF THE S TATE OF THE A RT
in dynamic retrieval, schema augmentation, and robust rea-
soning and also ensures adaptability across diverse domains Despite the advances summarized above, a number of
and datasets. Graph RAG is a novel approach that combines challenges remain. These are summarized below.
domain-specific knowledge with complex query generation. It A. Scalability and Computational Efficiency
holds great promise in pushing the state-of-the-art for Text-to- Enhancing LLM-based text-to-SQL systems for large and
SQL systems. complex databases without losing computational efficiency
While Graph RAG has shown great promise, there are a is an important challenge. The processing and generation
number of further areas of exploration and development: cost of SQL queries remains high, especially with longer
Dynamic Schema Adaptation: sequences and larger datasets. Future solutions will likely
Future work should investigate how to support real-time focus on model optimizations, more efficient retrieval and
updates to graph structures as database schemas evolve. This storage mechanisms, and specialized indexing techniques to
is crucial for applications in dynamic environments where streamline query generation.
schema changes are frequent, such as enterprise data lakes B. Dynamic Adaptation to Schema Changes
or multi-tenant systems.
Integration with Conversational Agents: Most current systems are inefficient in adapting to dynamic,
Graph RAG can be used to make Text-to-SQL solutions evolving databases without full retraining. Considering that
more user-friendly for non-technical users through the inte- realistic databases will very often experience schema changes
gration of conversational AI. This integration would allow and data expansion, the lack of effective techniques, such
users to interface with databases in natural language-one that as incremental learning and flexible architectures, holds back
the system would dynamically update based on follow-up seamless updating of the LLMs and Knowledge Graphs (KGs),
questions and context shifts. potentially leading to reduced query accuracy after some time,
Optimization of Graph Construction: most importantly in relation to changing environments.
Current graph construction techniques can be resource- C. Contextual Accuracy and Disambiguity
intensive, particularly for large-scale databases. Developing Many LLM-based text-to-SQL systems face challenges in
lightweight algorithms for efficient graph partitioning, mod- handling complex and ambiguous queries where context is
ularity detection, and summarization will improve the scala- not explicitly given. Improving contextual accuracy requires
bility of Graph RAG systems, making them more practical for research into how LLMs use structured information from
real-world applications. KGs. Enhancing semantic links between user queries and the
database schema is needed, and more advanced semantic pars- [11] L. Hammami, A. Paglialonga, G. Pruneri, M. Torresani, M. Sant,
ing and disambiguation techniques will help resolve ambiguity. C. Bono, E. G. Caiani, and P. Baili, “Automated classification of
cancer morphology from italian pathology reports using natural language
processing techniques: A rule-based approach,” Journal of Biomedical
Informatics, vol. 116, p. 103712, 2021.
D. Balancing Retrieval-Augmented Generation (RAG) and
[12] F. El Boujddaini, A. Laguidi, and Y. Mejdoub, “A survey on text-to-sql
Fine-Tuning parsing: From rule-based foundations to large language models,” in In-
ternational Conference on Connected Objects and Artificial Intelligence.
While fine-tuning models for specific domains improves Springer, 2024, pp. 266–272.
performance, RAG offers a way to dynamically incorporate [13] G. Katsogiannis-Meimarakis and G. Koutrika, “A deep dive into deep
context with less extensive model retraining. The balance learning approaches for text-to-sql systems,” in Proceedings of the 2021
International Conference on Management of Data, 2021, pp. 2846–2851.
between RAG and fine-tuning is an area to be explored, with [14] F. S. Banitaba, S. Aygun, and M. H. Najafi, “Late breaking results:
potential future systems leveraging the strengths of both ap- Fortifying neural networks: Safeguarding against adversarial attacks with
proaches to minimize training time while maintaining context- stochastic computing,” arXiv preprint arXiv:2407.04861, 2024.
sensitive query generation. [15] A. Kumar, P. Nagarkar, P. Nalhe, and S. Vijayakumar, “Deep learning
driven natural languages text to sql query conversion: a survey,” arXiv
E. Ethics, Data Privacy, and Interpretability preprint arXiv:2208.04415, 2022.
[16] P. Naghshnejad, G. T. Marchan, T. Olayiwola, J. Romagnoli, and
The application of LLMs in critical domains like healthcare, R. Kumar, “Graph-based modeling and molecular dynamics for ion
finance, and education raises ethical concerns regarding data activity coefficient prediction in polymeric ion-exchange membranes,”
2024.
privacy and model interpretability. Such systems must be [17] J. Li, B. Hui, G. Qu, J. Yang, B. Li, B. Li, B. Wang, B. Qin, R. Geng,
transparent, reliable, and respectful of user privacy. Future N. Huo et al., “Can llm already serve as a database interface? a big bench
work needs to establish clear explainability protocols, safe data for large-scale database grounded text-to-sqls,” Advances in Neural
Information Processing Systems, vol. 36, 2024.
handling practices, and transparent AI procedures to build trust [18] G. Katsogiannis-Meimarakis and G. Koutrika, “A survey on deep
in LLM-based text-to-SQL systems. learning approaches for text-to-sql,” The VLDB Journal, vol. 32, no. 4,
pp. 905–936, 2023.
F. Human-in-the-Loop and Interactive Querying [19] H. Li, “Deep learning for natural language processing: advantages and
challenges,” National Science Review, vol. 5, no. 1, pp. 24–26, 2018.
Integration with human feedback is a key future direction. [20] T. B. Brown, “Language models are few-shot learners,” arXiv preprint
Human-in-the-loop mechanisms will help users to refine and arXiv:2005.14165, 2020.
correct generated queries interactively, enhancing model ac- [21] M. H. Tahir, S. Shams, L. Fiaz, F. Adeeba, and S. Hussain, “Bench-
curacy and transparency. Improved interactivity will not only marking pre-trained large language models’ potential across urdu nlp
tasks,” arXiv preprint arXiv:2405.15453, 2024.
help build user trust but also provide enhanced learning and [22] H. Wang, J. Li, H. Wu, E. Hovy, and Y. Sun, “Pre-trained language
error correction opportunities during SQL generation. models and their applications,” Engineering, vol. 25, pp. 51–65, 2023.
[23] T. Guo and H. Gao, “Content enhanced bert-based text-to-sql genera-
R EFERENCES tion,” arXiv preprint arXiv:1910.07179, 2019.
[24] B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz,
[1] A. B. KANBUROĞLU and F. B. TEK, “Text-to-sql: A methodical E. Agirre, I. Heintz, and D. Roth, “Recent advances in natural
review of challenges and models,” Turkish Journal of Electrical En- language processing via large pre-trained language models: A survey,”
gineering and Computer Sciences, vol. 32, no. 3, pp. 403–419, 2024. ACM Comput. Surv., vol. 56, no. 2, sep 2023. [Online]. Available:
[2] X. Dong, C. Zhang, Y. Ge, Y. Mao, Y. Gao, J. Lin, D. Lou et al., “C3: https://doi.org/10.1145/3605943
Zero-shot text-to-sql with chatgpt,” arXiv preprint arXiv:2307.07306, [25] OpenAI, “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774,
2023. 2023.
[3] Z. Hong, Z. Yuan, Q. Zhang, H. Chen, J. Dong, F. Huang, and X. Huang, [26] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan,
“Next-generation database interfaces: A survey of llm-based text-to-sql,” H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large
arXiv preprint arXiv:2406.08426, 2024. language models trained on code,” arXiv preprint arXiv:2107.03374,
[4] T. Yu, R. Zhang, H. Y. Er, S. Li, E. Xue, B. Pang, X. V. Lin, Y. C. 2021.
Tan, T. Shi, Z. Li et al., “Cosql: A conversational text-to-sql challenge [27] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux,
towards cross-domain natural language interfaces to databases,” arXiv T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al.,
preprint arXiv:1909.05378, 2019. “Llama: Open and efficient foundation language models,” arXiv preprint
[5] Y. Gan, X. Chen, and M. Purver, “Exploring underexplored lim- arXiv:2302.13971, 2023.
itations of cross-domain text-to-sql generalization,” arXiv preprint [28] N. Rajkumar, R. Li, and D. Bahdanau, “Evaluating the text-to-sql
arXiv:2109.05157, 2021. capabilities of large language models,” arXiv preprint arXiv:2204.00498,
[6] J. Chen, H. Lin, X. Han, and L. Sun, “Benchmarking large language 2022.
models in retrieval-augmented generation,” in Proceedings of the AAAI [29] B. Zhang, Y. Ye, G. Du, X. Hu, Z. Li, S. Yang, C. H. Liu, R. Zhao, Z. Li,
Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 754– and H. Mao, “Benchmarking the text-to-sql capability of large language
17 762. models: A comprehensive evaluation,” arXiv preprint arXiv:2403.02951,
[7] A. Orrenius, “Enhancing text-to-sql applications with retrieval aug- 2024.
mented generation: How does academic advancements in text-to-sql [30] S. Vichev and A. Marchev, “Ragsql: Context retrieval evaluation on
translate to industry usage?” 2024. augmenting text-to-sql prompts,” in 2024 IEEE 12th International
[8] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, Conference on Intelligent Systems (IS). IEEE, 2024, pp. 1–6.
J. Zhang, Z. Dong et al., “A survey of large language models,” arXiv [31] X. Zhao, X. Zhou, and G. Li, “Chat2data: An interactive data analysis
preprint arXiv:2303.18223, 2023. system with rag, vector databases and llms,” Proceedings of the VLDB
[9] T. Mahmud, K. A. Hasan, M. Ahmed, and T. H. C. Chak, “A rule based Endowment, vol. 17, no. 12, pp. 4481–4484, 2024.
approach for nlp based query processing,” in 2015 2nd international [32] K. Zhang, X. Lin, Y. Wang, X. Zhang, F. Sun, C. Jianhe, H. Tan,
conference on electrical information and communication technologies X. Jiang, and H. Shen, “Refsql: A retrieval-augmentation framework for
(EICT). IEEE, 2015, pp. 78–82. text-to-sql generation,” in Findings of the Association for Computational
[10] N. Kang, B. Singh, Z. Afzal, E. M. van Mulligen, and J. A. Kors, “Using Linguistics: EMNLP 2023, 2023, pp. 664–673.
rule-based natural language processing to improve disease normalization [33] D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt,
in biomedical text,” Journal of the American Medical Informatics and J. Larson, “From local to global: A graph rag approach to query-
Association, vol. 20, no. 5, pp. 876–881, 2013. focused summarization,” arXiv preprint arXiv:2404.16130, 2024.
[34] D. Xu, W. Chen, W. Peng, C. Zhang, T. Xu, X. Zhao, X. Wu, Y. Zheng, [59] H. Alp Caferoğlu and Ö. Ulusoy, “E-sql: Direct schema linking via
and E. Chen, “Large language models for generative information extrac- question enrichment in text-to-sql,” arXiv e-prints, pp. arXiv–2409,
tion: A survey,” arXiv preprint arXiv:2312.17617, 2023. 2024.
[35] V. Zhong, C. Xiong, and R. Socher, “Seq2sql: Generating structured [60] L. Wu, P. Li, J. Lou, and L. Fu, “Datagpt-sql-7b: An open-source
queries from natural language using reinforcement learning,” arXiv language model for text-to-sql,” arXiv preprint arXiv:2409.15985, 2024.
preprint arXiv:1709.00103, 2017. [61] C. Li, Y. Shao, and Z. Liu, “Sea-sql: Semantic-enhanced text-to-sql with
[36] T. Yu, R. Zhang, K. Yang, M. Yasunaga, D. Wang, Z. Li, J. Ma, I. Li, adaptive refinement,” arXiv preprint arXiv:2408.04919, 2024.
Q. Yao, S. Roman et al., “Spider: A large-scale human-labeled dataset [62] G. Qu, J. Li, B. Li, B. Qin, N. Huo, C. Ma, and R. Cheng, “Before
for complex and cross-domain semantic parsing and text-to-sql task,” generation, align it! a novel and effective strategy for mitigating hal-
arXiv preprint arXiv:1809.08887, 2018. lucinations in text-to-sql generation,” arXiv preprint arXiv:2405.15307,
[37] L. Wang, A. Zhang, K. Wu, K. Sun, Z. Li, H. Wu, M. Zhang, 2024.
and H. Wang, “Dusql: A large-scale and pragmatic chinese text-to-sql [63] G. Xiong, J. Bao, and W. Zhao, “Interactive-kbqa: Multi-turn inter-
dataset,” in Proceedings of the 2020 Conference on Empirical Methods actions for knowledge base question answering with large language
in Natural Language Processing (EMNLP), 2020, pp. 6923–6935. models,” arXiv preprint arXiv:2402.15131, 2024.
[38] T. Yu, R. Zhang, M. Yasunaga, Y. C. Tan, X. V. Lin, S. Li, H. Er, [64] T. Zhang, C. Chen, C. Liao, J. Wang, X. Zhao, H. Yu, J. Wang,
I. Li, B. Pang, T. Chen et al., “Sparc: Cross-domain semantic parsing J. Li, and W. Shi, “Sqlfuse: Enhancing text-to-sql performance through
in context,” arXiv preprint arXiv:1906.02285, 2019. comprehensive llm synergy,” arXiv preprint arXiv:2407.14568, 2024.
[39] Y. Gan, X. Chen, Q. Huang, M. Purver, J. R. Woodward, J. Xie, and [65] Z. Hong, Z. Yuan, H. Chen, Q. Zhang, F. Huang, and X. Huang,
P. Huang, “Towards robustness of text-to-sql models against synonym “Knowledge-to-sql: Enhancing sql generation with data expert llm,”
substitution,” arXiv preprint arXiv:2106.01065, 2021. arXiv preprint arXiv:2402.11517, 2024.
[40] X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and [66] H. Zhang, R. Cao, L. Chen, H. Xu, and K. Yu, “Act-sql: In-context
M. Richardson, “Structure-grounded pretraining for text-to-sql,” arXiv learning for text-to-sql with automatically-generated chain-of-thought,”
preprint arXiv:2010.12773, 2020. arXiv preprint arXiv:2310.17342, 2023.
[41] S. Goodman, A. BenYishay, Z. Lv, and D. Runfola, “Geoquery: Integrat- [67] X. Liu, S. Shen, B. Li, P. Ma, R. Jiang, Y. Luo, Y. Zhang, J. Fan, G. Li,
ing hpc systems and public web-based geospatial data tools,” Computers and N. Tang, “A survey of nl2sql with large language models: Where
& geosciences, vol. 122, pp. 103–112, 2019. are we, and where are we going?” arXiv preprint arXiv:2408.05109,
[42] G. Tur, D. Hakkani-Tür, and L. Heck, “What is left to be understood in 2024.
atis?” in 2010 IEEE Spoken Language Technology Workshop. IEEE,
[68] Y. Fan, Z. He, T. Ren, C. Huang, Y. Jing, K. Zhang, and X. S. Wang,
2010, pp. 19–24.
“Metasql: A generate-then-rank framework for natural language to sql
[43] C. Zhang, Y. Mao, Y. Fan, Y. Mi, Y. Gao, L. Chen, D. Lou, and J. Lin, translation,” arXiv preprint arXiv:2402.17144, 2024.
“Finsql: Model-agnostic llms-based text-to-sql framework for financial
[69] H. Xia, F. Jiang, N. Deng, C. Wang, G. Zhao, R. Mihalcea, and
analysis,” arXiv preprint arXiv:2401.10506, 2024.
Y. Zhang, “Sql-craft: Text-to-sql through interactive refinement and
[44] C.-H. Lee, O. Polozov, and M. Richardson, “Kaggledbqa: Realistic
enhanced reasoning,” arXiv preprint arXiv:2402.14851, 2024.
evaluation of text-to-sql parsers,” arXiv preprint arXiv:2106.11455,
2021. [70] Y. Gu, Y. Shu, H. Yu, X. Liu, Y. Dong, J. Tang, J. Srinivasa, H. Latapie,
and Y. Su, “Middleware for llms: Tools are instrumental for language
[45] T. Shi, C. Zhao, J. Boyd-Graber, H. Daumé III, and L. Lee, “On
agents in complex environments,” arXiv preprint arXiv:2402.14672,
the potential of lexico-logical alignments for semantic parsing to sql
2024.
queries,” arXiv preprint arXiv:2010.11246, 2020.
[46] X. Pi, B. Wang, Y. Gao, J. Guo, Z. Li, and J.-G. Lou, “Towards [71] Q. Zhang, J. Dong, H. Chen, W. Li, F. Huang, and X. Huang, “Struc-
robustness of text-to-sql models against natural and realistic adversarial ture guided large language model for sql generation,” arXiv preprint
table perturbation,” arXiv preprint arXiv:2212.09994, 2022. arXiv:2402.13284, 2024.
[47] C. Guo, Z. Tian, J. Tang, S. Li, Z. Wen, K. Wang, and T. Wang, [72] D. Gao, H. Wang, Y. Li, X. Sun, Y. Qian, B. Ding, and J. Zhou, “Text-
“Retrieval-augmented gpt-3.5-based text-to-sql framework with sample- to-sql empowered by large language models: A benchmark evaluation,”
aware prompting and dynamic revision chain,” in International Confer- arXiv preprint arXiv:2308.15363, 2023.
ence on Neural Information Processing. Springer, 2023, pp. 341–356. [73] P. Vougiouklis, N. Papasarantopoulos, D. Zheng, D. Tuckey, C. Diao,
[48] S.-M. Syrjä, “Retrieval-augmented generation utilizing sql database case: Z. Shen, and J. Pan, “Fastrat: Fast and efficient cross-lingual text-to-sql
web sport statistics application,” 2024. semantic parsing,” in Proceedings of the 13th International Joint Con-
[49] Z. Bartczak, “From rag to riches: Evaluating the benefits of retrieval- ference on Natural Language Processing and the 3rd Conference of the
augmented generation in sql database querying,” 2024. Asia-Pacific Chapter of the Association for Computational Linguistics
[50] A. Biswal, L. Patel, S. Jha, A. Kamsetty, S. Liu, J. E. Gonzalez, (Volume 1: Long Papers), 2023, pp. 564–576.
C. Guestrin, and M. Zaharia, “Text2sql is not enough: Unifying ai and [74] H. Fu, C. Liu, B. Wu, F. Li, J. Tan, and J. Sun, “Catsql: Towards real
databases with tag,” arXiv preprint arXiv:2408.14717, 2024. world natural language to sql applications,” Proceedings of the VLDB
[51] P. D. S. Henriques, “Augmenting large language models with context Endowment, vol. 16, no. 6, pp. 1534–1547, 2023.
retrieval,” Master’s thesis, 2024. [75] J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le,
[52] M. Kothyari, D. Dhingra, S. Sarawagi, and S. Chakrabarti, “Crush4sql: D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large
Collective retrieval using schema hallucination for text2sql,” arXiv language models,” Advances in neural information processing systems,
preprint arXiv:2311.01173, 2023. vol. 35, pp. 24 824–24 837, 2022.
[53] Y. Chen, S. Gu, and Z. He, “Fato-sql: a comprehensive framework [76] S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y. Cao, and
for high-performance text-to-sql task,” in International Conference on K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large
Optics, Electronics, and Communication Engineering (OECE 2024), vol. language models,” Advances in Neural Information Processing Systems,
13395. SPIE, 2024, pp. 841–848. vol. 36, 2024.
[54] K. Maamari and A. Mhedhbi, “End-to-end text-to-sql generation within [77] M. Pourreza and D. Rafiei, “Din-sql: Decomposed in-context learning
an analytics insight engine,” arXiv preprint arXiv:2406.12104, 2024. of text-to-sql with self-correction,” Advances in Neural Information
[55] X. Ji, A. Parameswaran, and M. Hulsebos, “Target: Benchmarking Processing Systems, vol. 36, 2024.
table retrieval for generative tasks,” in NeurIPS 2024 Third Table [78] M. Pourreza, R. Sun, H. Li, L. Miculicich, T. Pfister, and S. O. Arik,
Representation Learning Workshop. “Sql-gen: Bridging the dialect gap for text-to-sql via synthetic data and
[56] P. Shi, R. Zhang, H. Bai, and J. Lin, “Xricl: Cross-lingual retrieval- model merging,” arXiv preprint arXiv:2408.12733, 2024.
augmented in-context learning for cross-lingual text-to-sql semantic [79] S. Kou, L. Hu, Z. He, Z. Deng, and H. Zhang, “Cllms: Consistency
parsing,” arXiv preprint arXiv:2210.13693, 2022. large language models,” arXiv preprint arXiv:2403.00835, 2024.
[57] Z. Cao, Y. Zheng, Z. Fan, X. Zhang, and W. Chen, “Rsl-sql: [80] M. Pourreza and D. Rafiei, “Dts-sql: Decomposed text-to-sql with small
Robust schema linking in text-to-sql generation,” arXiv preprint large language models,” arXiv preprint arXiv:2402.01117, 2024.
arXiv:2411.00073, 2024. [81] H. Li, J. Zhang, H. Liu, J. Fan, X. Zhang, J. Zhu, R. Wei, H. Pan, C. Li,
[58] W. Xu, L. Yan, P. Han, H. Zhu, C. Liu, S. Duan, C. Gao, and Y. Liang, and H. Chen, “Codes: Towards building open-source language models
“Tcsr-sql: Towards table content-aware text-to-sql with self-retrieval,” for text-to-sql,” Proceedings of the ACM on Management of Data, vol. 2,
arXiv preprint arXiv:2407.01183, 2024. no. 3, pp. 1–28, 2024.
[82] F. Xu, Z. Wu, Q. Sun, S. Ren, F. Yuan, S. Yuan, Q. Lin, Y. Qiao, and
J. Liu, “Symbol-llm: Towards foundational symbol-centric interface for
large language models,” arXiv preprint arXiv:2311.09278, 2023.
[83] H. Li, J. Zhang, C. Li, and H. Chen, “Resdsql: Decoupling schema
linking and skeleton parsing for text-to-sql,” in Proceedings of the AAAI
Conference on Artificial Intelligence, vol. 37, no. 11, 2023, pp. 13 067–
13 075.
[84] J. Wu, J. Zhu, Y. Qi, J. Chen, M. Xu, F. Menolascina, and
V. Grau, “Medical graph rag: Towards safe medical large lan-
guage model via graph retrieval-augmented generation,” arXiv preprint
arXiv:2408.04187, 2024.
[85] V. A. Traag, L. Waltman, and N. J. Van Eck, “From louvain to leiden:
guaranteeing well-connected communities,” Scientific reports, vol. 9,
no. 1, pp. 1–12, 2019.
[86] C. Jeong, “A study on the implementation method of an agent-based
advanced rag system using graph,” arXiv preprint arXiv:2407.19994,
2024.
[87] T. Procko, “Graph retrieval-augmented generation for large language
models: A survey,” Available at SSRN, 2024.
[88] D. Fera, N. Kim, N. Shiffeldrim, J. Zorn, U. Laserson, H. H. Gan, and
T. Schlick, “Rag: Rna-as-graphs web resource,” BMC bioinformatics,
vol. 5, pp. 1–9, 2004.

You might also like