0% found this document useful (0 votes)

115 views24 pages

LLM Paper 1707247828

The document provides a comprehensive survey of techniques for applying large language models (LLMs) to graph data. It categorizes scenarios where LLMs can be used on graphs into three types: pure graphs, text-attributed graphs, and text-paired graphs. It then discusses three techniques for utilizing LLMs on graphs: LLM as predictor, LLM as aligner, and LLM as encoder. The survey compares the advantages and disadvantages of different approaches and discusses applications and available benchmark datasets and code. It concludes with potential future research directions in the emerging field of applying LLMs to graph-structured data.

Uploaded by

Sandeep Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views24 pages

LLM Paper 1707247828

Uploaded by

Sandeep Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO.

8, AUGUST 2021 1

Large Language Models on Graphs: A

Comprehensive Survey
Bowen Jin*, Gang Liu*, Chi Han*, Meng Jiang, Heng Ji, Jiawei Han

Graph-Text Relationship
arXiv:2312.02783v2 [cs.CL] 1 Feb 2024

Abstract—Large language models (LLMs), such as GPT4 and LLaMA,

are creating significant advancements in natural language processing, Protein Molecule Academic Social
Tra c
due to their strong text encoding/decoding ability and newly found Networks Graphs Graphs Networks Networks
emergent capability (e.g., reasoning). While LLMs are mainly designed to HV “Benzene is toxic”
L
T PE
process pure texts, there are many real-world scenarios where text data E
K H
is associated with rich structure information in the form of graphs (e.g., O “Water is less toxic”
“Myoglobin holds H
academic networks, and e-commerce networks) or scenarios where oxygen in muscles.”

graph data are paired with rich textual information (e.g., molecules
with descriptions). Besides, although LLMs have shown their pure text- Pure Graphs Text-Paired Graphs Text-Attributed Graphs
based reasoning ability, it is underexplored whether such ability can be
generalized to graphs (i.e., graph-based reasoning). In this paper, we
provide a systematic review of scenarios and techniques related to large
language models on graphs. We first summarize potential scenarios of
adopting LLMs on graphs into three categories, namely pure graphs,
text-attributed graphs, and text-paired graphs. We then discuss detailed
techniques for utilizing LLMs on graphs, including LLM as Predictor,
LLM as Encoder, and LLM as Aligner, and compare the advantages LLM LLM GNN LLM GNN
and disadvantages of different schools of models. Furthermore, we
discuss the real-world applications of such methods and summarize “text” GNN “text” “text” LLM
open-source codes and benchmark datasets. Finally, we conclude
“text”
with potential future research directions in this fast-growing field. The
related source can be found at https://github.com/PeterGriffinJin/ LLM as Predictor LLM as Aligner LLM as Encoder
Awesome-Language-Model-on-Graphs.
ffi
Large Language Models’ Roles
Index Terms—Large Language Models, Graph Neural Networks, Natural
Language Processing, Graph Representation Learning Fig. 1. According to the relationship between graph and text, we catego-
rize three LLM on graph scenarios. Depending on the role of LLM, we
summarize three LLM-on-graph techniques. “LLM as Predictor” is where
1 I NTRODUCTION LLMs are responsible for predicting the final answer. “LLM as Aligner”
will align the inputs-output pairs with those of GNNs. “LLM as Encoder”

L ARGE language models (LLMs) (e.g., BERT [23], T5

[29], LLaMA [119]) which represents a direction of
ever-increasing models’ sizes pre-trained on larger corpora,
refers to using LLMs to encode and obtain feature vectors.

emergent ability [5], exposing a strong potential for Artificial

have demonstrated powerful capabilities in solving natural General Intelligence (AGI).
language processing (NLP) tasks, including question answer- While LLMs are extensively applied to process pure texts,
ing [1], text generation [2] and document understanding there is an increasing number of applications where the text
[3]. There are no clear and static thresholds regarding the data are associated with structure information which are
model sizes. Early LLMs (e.g., BERT [23], RoBERTa [24]) represented in the form of graphs. As presented in Fig. 1, in
adopt an encoder-only architecture and show capabilities academic networks, papers (with title and description) and
in text representation learning [4] and natural language authors (with profile text), are interconnected with author-
understanding [3]. In recent years, more focus has been given ship relationships. Understanding both the author/paper’s
to larger decoder-only architectures [119] or encoder-decoder text information and author-paper structure information
architectures [29]. As the model size scales up, such LLMs on such graphs can contribute to advanced author/paper
have also shown reasoning ability and even more advanced modeling and accurate recommendations for collaboration;
In the scientific domain, molecules are represented as graphs
and are often paired with text that describes their basic
• * The first three authors contributed equally to this work.
• Bowen Jin, Chi Han, Heng Ji, Jiawei Han: University of Illinois at Urbana- properties (e.g., mass and weight). Joint modeling of both the
Champaign. {bowenj4, chihan3, hengji, hanj}@illinois.edu molecule structure (graph) and the associated rich knowledge
• Gang Liu, Meng Jiang: University of Notre Dame. {gliu7, (text) is important for deeper molecule understanding. Since
mjiang2@}@nd.edu
LLMs are mainly proposed for modeling texts that lie in a
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 2

sequential fashion, those scenarios mentioned above pose TABLE 1

new challenges on how to enable LLMs to encode the Notations of Concepts.
structure information on graphs. In addition, since LLMs Notations Descriptions
have demonstrated their superb text-based reasoning ability, |·| The length of a set.
it is promising to explore whether they have the potential [A, B] The concatenation of A and B.
to address fundamental graph reasoning problems on pure ∥ Concatenate operation.
graphs. These graph reasoning tasks include inferring con- G A graph.
V The set of nodes in a graph.
nectivity [6], shortest path [7], subgraph matching [8], and v A node v ∈ V .
logical rule induction [18]. E The set of edges in a graph.
Recently, there has been an increasing interest [9] in e An edge e ∈ E .
extending LLMs for graph-based applications (summarized Gv The ego graph associated with v in G .
N (v) The neighbors of a node v .
in Fig. 1). According to the relationship between graph M A meta-path or a meta-graph.
and text presented in Fig. 1, the application scenarios can The nodes which are reachable from
NM (v)
be categorized into pure graphs, text-attributed graphs node v with meta-path or meta-graph M .
(nodes/edges are associated with texts), and text-paired D The text set.
s∈S The text token in a text sentence S .
graphs. Depending on the role of LLMs and their interaction dvi The text associated with the node vi .
with graph neural networks (GNNs), the LLM on graphs deij The text associated with the edge eij .
techniques can be classified into treating LLMs as the final dG The text associated with the graph G .
predictor (LLM as Predictor), treating LLMs as the feature n The number of nodes, n = |V |.
b The dimension of a node hidden state.
encoder for GNNs (LLM as Encoder), and align LLMs with
xvi ∈ Rd The initial feature vector of the node vi .
GNNs (LLM as Aligner). Hv ∈ Rn×b The node hidden feature matrix.
There are a limited number of existing surveys exploring hvi ∈ Rb The hidden representation of node vi .
the intersection between LLMs and graphs. Related to deep hG ∈ Rb The hidden representation of a graph G .
learning on graphs, Wu et al. [19] gives a comprehensive hdv ∈ Rb The representation of text dv .
overview of graph neural networks (GNNs) with detailed Hdv ∈ R|dv |×b The hidden states of tokens in dv .
W, Θ, w, θ Learnable model parameters.
illustrations on recurrent graph neural networks, convo-
LLM(·) Large Language model.
lutional graph neural networks, graph autoencoders, and GNN(·) Graph neural network.
spatial-temporal graph neural networks. Liu et al. [20] dis-
cuss pretrained foundation models on graphs, including their Organization of Survey. The rest of this survey is
backbone architectures, pretraining methods, and adaptation organized as follows. Section 2 introduces the background of
techniques. Pan et al. [21] review the connection between LLMs and GNNs, lists commonly used notations, and defines
LLMs and knowledge graphs (KGs) especially on how KGs related concepts. Section 3 categorizes graph scenarios where
can enhance LLMs training and inference, and how LLMs LLMs can be adopted and summarizes LLMs on graph
can facilitate KG construction and reasoning. In summary, techniques. Section 4-6 provides a detailed illustration of
existing surveys either focus more on GNNs rather than LLM methodologies for different graph scenarios. Section
LLMs or fail to provide a systematic perspective on their 7 delivers available datasets, opensource codebases, and a
applications in various graph scenarios as in Fig. 1. Our paper collection of applications across various domains. Section
provides a comprehensive review of the LLMs on graphs, 8 introduces some potential future directions. Section 9
for broader researchers from diverse backgrounds besides summarizes the paper.
the computer science and machine learning community who 2 D EFINITIONS & BACKGROUND
want to enter this rapidly developing field.
2.1 Definitions
Our Contributions. The notable contributions of our
paper are summarized as follows: We provide definitions of various types of graphs and
introduce the notations (as shown in Table 1) in this section.
• Categorization of Graph Scenarios. We systemat- Definition 1 (Graph): A graph can be defined as G = (V, E).
ically summarize the graph scenarios where lan- Here V signifies the set of nodes, while E denotes the set
guage models can be adopted into: pure graphs, text- of edges. A specific node can be represented by vi ∈ V , and
attributed graphs, and text-paired graphs. an edge directed from node vj to vi can be expressed as
• Systematic Review of Techniques. We provide the eij = (vi , vj ) ∈ E . The set of nodes adjacent to a particular
most comprehensive overview of language models node v is articulated as N (v) = {u ∈ V|(v, u) ∈ E}.
on graph techniques. For different graph scenarios, A graph containing a node type set A and an edge type
we summarize the representative models, provide set R, where |A| + |R| > 2, is called a heterogeneous graph.
detailed illustrations of each of them, and make A heterogeneous graph is also associated with a node type
necessary comparisons. mapping function ϕ : V → A and an edge type mapping
• Abundant Resources. We collect abundant resources function ψ : E → R.
on language models on graphs, including benchmark Definition 2 (Graph with node-level textual information): A
datasets, open-source codebases, and practical appli- graph with node-level textual information can be denoted as
cations. G = (V, E, D), where V , E and D are node set, edge set, and
• Future Directions. We delve into the foundational text set, respectively. Each vi ∈ V is associated with some
principles of language models on graphs and propose textual information dvi ∈ D. For instance, in an academic
six prospective avenues for future exploration. citation network, one can interpret v ∈ V as the scholarly
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 3

articles, e ∈ E as the citation links between them, and d ∈ D Simple but powerful, subsequent models like GPT-3 [26],
as the textual content of these articles. A graph with node- GPT-4 [118], LLaMA [119], LLaMA2 [119], Mistral 7B [120],
level textual information is also called a text-attributed graph and T5 [29] show impressive emergent capabilities such
[31], a text-rich graph [62], or a textual graph [72]. as few-shot learning, chain-of-thought reasoning, and pro-
Definition 3 (Graph with edge-level textual information): A gramming. Efforts have been made to combine language
graph with node-level textual information can be denoted models with other modalities such as vision [96], [121] and
as G = (V, E, D), where V , E and D are node set, edge set, biochemical structures [47], [122], [123]. We will discuss its
and text set, respectively. Each eij ∈ E is associated with combination with graphs in this paper.
some textual information deij ∈ D. For example, in a social We would like to point out that the word “large” in
network, one can interpret v ∈ V as the users, e ∈ E as LLM is not associated with a clear and static threshold
the interaction between the users, and d ∈ D as the textual to divide language models. “Large” actually refers to a
content of the messages sent between the users. direction in which language models are inevitably evolving,
Definition 4 (Graph with graph-level textual information): It and larger foundational models tend to possess significantly
can be denoted as the pair (G, dG ), where G = (V, E). V more representation and generalization power. Hence, we
and E are node set and edge set. dG is the text set paired define LLMs to encompass both medium-scale PLMs, such as
to the graph G . For instance, in a molecular graph G , v ∈ V BERT, and large-scale LMs, like GPT-4, as suggested by [21].
denotes an atom, e ∈ E represents the strong attractive Graph Neural Networks & Graph Transformers. In real-
forces or chemical bonds that hold molecules together, and world scenarios, not all the data are sequential like text,
dG represents the textual description of the molecule. We many data lies in a more complex non-Euclidean structure,
note that texts may also be associated with subgraph-level i.e., graphs. GNN is proposed as a deep-learning architecture
concepts and then paired with the entire graph. for graph data. Primary GNNs including GCN [84], Graph-
SAGE [85] and, GAT [86] are designed for solving node-
2.2 Background level tasks. They mainly adopt a propagation-aggregation
(Large) Language Models. Language Models (LMs), or paradigm to obtain node representations:
language modeling, is an area in the field of natural language
a(l−1) (l)
h(l−1) , h(l−1)

processing (NLP) on understanding and generation from text vi vj = PROP vi vj , ∀vj ∈ N (vi ) ; (4)
distributions. In recent years, large language models (LLMs)

h(l)
vi = AGG
(l)
h(l−1)
vi , {a(l−1)
vi vj |vj ∈ N (vi )} . (5)
have demonstrated impressive capabilities in tasks such
as machine translation, text summarization, and question Later works such as GIN [189] explore GNNs for solving
answering [26], [43], [112]–[115], [195]. graph-level tasks. They obtain graph representations by
Language models have evolved significantly over time. adopting a READOUT function on node representations:
BERT [23] marks significant progress in language modeling
and representation. BERT models the conditional probability hG = READOUT({hvi |vi ∈ G}). (6)
of a word given its bidirectional context, also named masked The READOUT functions include mean pooling, max pool-
language modeling (MLM) objective: ing, and so on. Subsequent work on GNN tackles the issues
of over-smoothing [139], over-squashing [140], interpretabil-
 
X
ES∼D  log p(si |s1 , . . . , si−1 , si+1 , . . . , sNS ) , (1) ity [145], and bias [143]. While message-passing-based GNNs
si ∈S have demonstrated advanced structure encoding capability,
researchers are exploring further enhancing its expressive-
where S is a sentence sampled from the corpus D, si is
ness with Transformers (i.e., graph Transformers). Graph
the i-th word in the sentence, and NS is the length of the
Transformers utilize a global multi-head attention mecha-
sentence. BERT utilizes the Transformer architecture with
nism to expand the receptive field of each graph encoding
attention mechanisms as the core building block. In the
layer [141]. They integrate the inductive biases of graphs
vanilla Transformer, the attention mechanism is defined as:
into the model by positional encoding, structural encoding,
QK T

Attention(Q, K, V ) = softmax √ V, (2) the combination of message-passing layers with attention
dk layers [142], or improving the efficiency of attention on large
where Q, K, V ∈ RNS ×dk are the query, key, and value graphs [144]. Graph Transformers have been proven as the
vectors for each word in the sentence, respectively. Following state-of-the-art solution for many pure graph problems.
BERT, other masked language models are proposed, such Language Models vs. Graph Transformers. Modern lan-
as RoBERTa [24], ALBERT [116], and ELECTRA [117], with guage models and graph Transformers both use Transformers
similar architectures and objectives of text representation. [93] as the base model architecture. This makes the two
Although the original Transformer paper [93] was experi- concepts hard to distinguish, especially when the language
mented on machine translation, it was not until the release of models are adopted on graph applications. In this paper,
GPT-2 [115] that language generation (aka. causal language “Transformers” typically refers to Transformer language
modeling) became impactful on downstream tasks. Causal models for simplicity. Here, we provide three points to
language modeling is the task of predicting the next word help distinguish them: 1) Tokens (word token vs. node
given the previous words in a sentence. The objective of token): Transformers take a token sequence as inputs. For
causal language modeling is defined as: language models, the tokens are word tokens; while for graph
Transformers, the tokens are node tokens. In those cases
 
X
ES∼D  log p(si |s1 , . . . , si−1 ) . (3) where tokens include both word tokens and node tokens if
si ∈S the backbone Transformers is pretrained on text corpus (e.g.,
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 4

Zero-Shot [124]–[126], [128], [131],

Direct Answering Few-Shot [124], [125], [128], [131], GraphLLM [42],
Role Prompting [126], Format Explanation [126],

CoT [124]–[126], [128], [131], [132],

Pure Graphs LLM as Predictor
Heuristic Reasoning Self-Consistency [124], BaG [124], [131],
RoG [129], StructGPT [130], ToG [132]

Algorithmic Reasoning Algorithmic Prompting [124], Graph-ToolFormer [127]

Rule-based InstructGLM [46], GraphText [65], [82],

Graph as
Sequence
GNN-based GNP [41], GraphGPT [45], DGTL [76], METERN [75]

LLM as Predictor GreaseLM [67], DRAGON [81], GraphFormers [72],

Graph-Empowered LLM
Patton [31], Heterformer [73], Edgeformers [74],
Categorization of LLM on Graph

SPECTER [51], SciNCL [52], Touchup-G [54],

Graph-Aware LLM Finetuning
TwHIN-BERT [56], MICoL [59], E2EG [60]

One-step TextGNN [77], AdsGNN [78], GNN-LM [66]

Text-Rich Optimization
Graphs Two-step GIANT [58], LM-GNN [68], SimTeG [35], GaLM [80]
LLM as Encoder
Data Augmentation LLM-GNN [64], TAPE [70], ENG [71]

Knowledge Distillation AdsGNN [78], GraD [69]

Prediction Alignment LTRN [57], GLEM [62]

LLM as Aligner
Latent Space Alignment ConGrat [53], GRENADE [55], G2P2 [63], THLM [33]

CatBERTa [159] , LLaMA-Mol [160] , LLM4Mol [163] ,

RT [164] , MolReGPT [165] , ChatMol [166] ,
Graph as Sequence MolXPT [169] , LLM-ICL [168] , Text+Chem T5 [171] ,
MolT5 [123] , KV-PLM [175] , Chemformer [156] ,
LLM as Predictor MFBERT [176] , Galatica [178] , SMILES-BERT [179]

ReLM [157] , Prot2Text [161] , GIMLET [47] ,

Text-Paired Graph-Empowered LLM
Text2Mol [122]
Graphs
MolCA [167] , GIT-Mol [158] , MolFM [162] ,
LLM as Aligner Latent Space Alignment CLAMP [170] , MoMu-v2 [173] , MoleculeSTM [172] ,
MoMu [174]
Fig. 2. A taxonomy of LLM on graph scenarios and techniques with representative examples.

BERT [23] and LLaMA [119]), we will call it a “language theory problems) or serve as knowledge sources to enhance
model”. 2) Positional Encoding (sequence vs. graph): language the large language models (alleviate hallucination).
models typically adopt the absolute or relative positional Text-Attributed Graphs refers to graphs where nodes or
encoding considering the position of the word token in the edges are associated with semantically rich text informa-
sequence, while graph Transformers adopt shortest path tion. They are also called text-rich networks [31], textual
distance [141], random walk distance, the eigenvalues of the graphs [72] or textual-edge networks [74]. Examples include
graph Laplacian [142] to consider the distance of nodes in academic networks, e-commerce networks, social networks,
the graph. 3) Goal (text vs. graph): The language models and legal case networks. On these graphs, researchers are
are originally proposed for text encoding and generation; interested in learning representations for nodes or edges with
while graph Transformers are proposed for node encoding both textual and structure information [72] [74].
or graph encoding. In those cases where texts are served Text-Paired Graphs have textual descriptions defined for the
as nodes/edges on the graph if the backbone Transformers entire graph structure. For example, graphs like molecules
is pretrained on text corpus, we will call it a “language may be paired with captions or textual features. While the
model”. graph structure significantly contributes to molecular prop-
erties, text descriptions can complement our understanding
3 C ATEGORIZATION AND F RAMEWORK of molecules. The graph scenarios can be found in Fig. 1.
In this section, we first introduce our categorization of graph 3.2 Categorization of LLMs on Graph Techniques
scenarios where language models can be adopted. Then According to the roles of LLMs and what are the final
we discuss the categorization of LLM on graph techniques. components for solving graph-related problems, we classify
Finally, we summarize the training & inference framework LLM on graph techniques into three main categories:
for language models on graphs. LLM as Predictor. This category of methods serves LLM
as the final component to output representations or predic-
3.1 Categorization of Graph Scenarios with LLMs. tions. It can be enhanced with GNNs and can be classified
Pure Graphs without Textual Information are graphs with depending on how the graph information is injected into
no text information or no semantically rich text information. LLM: 1) Graph as Sequence: This type of method makes no
Examples include traffic graphs and power transmission changes to the LLM architecture, but makes it be aware
graphs. Those graphs often serve as context to test the graph of graph structure by taking a “graph token sequence” as
reasoning ability of large language models (solve graph input. The “graph token sequence” can be natural language
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 5

descriptions for a graph or hidden representations outputted and findings. Table 4 in the Appendix lists a categoriza-
by graph encoders. 2) Graph-Empowered LLM: This type of tion of these efforts. Usually, input graphs are serialized
method modifies the architecture of the LLM base model as part of the input sequence, either by verbalizing the
(i.e., Transformers) and enables it to conduct joint text and graph structure [124]–[126], [128]–[132] or by encoding the
graph encoding inside their architecture. 3) Graph-Aware LLM graph structure into implicit feature sequences [42]. The
Finetuning: This type of method makes no changes to the studied reasoning problems range from simpler ones like
input of the LLMs or LLM architectures, but only fine-tunes connectivity, shortest paths, and cycle detection to harder
the LLMs with supervision from the graph. ones like maximum flow and Hamiltonian pathfinding (an
LLM as Encoder. This method is mostly utilized for graphs NP-complete problem). A comprehensive list of the studied
where nodes or edges are associated with text information problems is listed in Appendix Table 5. Note that we only
(solving node-level or edge-level tasks). GNNs are the final list representative problems here. This table does not include
components and we adopt LLM as the initial text encoder. more domain-specific problems, such as the spatial-temporal
To be specific, LLMs are first utilized to encode the text reasoning problems in [128].
associated with the nodes/edges. The outputted feature
vectors by LLMs then serve as input embeddings for GNNs 4.1 Direct Answering
for graph structure encoding. The output embeddings from Although graph-based reasoning problems usually involve
the GNNs are adopted as final node/edge representations complex computation, researchers still attempt to let lan-
for downstream tasks. However, these methods suffer from guage models directly generate answers from the serialized
convergence issues, sparse data issues, and inefficient issues, input graphs as a starting point or a baseline, partially
where we summarize solutions from optimization, data because of the simplicity of the approach and partially in
augmentation, and knowledge distillation perspectives. awe of other emergent abilities of LLMs. Although various
LLM as Aligner. This category of methods adopts LLMs attempts have been made to optimize how graphs are
as text-encoding components and aligns them with GNNs presented in the input sequence, which we will discuss in the
which serve as graph structure encoding components. LLMs following sections, bounded by the finite sequence length
and GNNs are adopted together as the final components for and computational operations, there is a fundamental limita-
task solving. To be specific, the alignment between LLMs and tion of this approach to solving complex reasoning problems
GNNs can be categorized into 1) Prediction Alignment where such as NP-complete ones. Unsurprisingly, most studies find
the generated pseudo labels from one modality are utilized that LLMs possess preliminary graph understanding ability,
for training on the other modality in an iterative learning but the performance is less satisfactory on more complex
fashion and 2) Latent Space Alignment where contrastive problems or larger graphs [42], [124]–[126], [128], [131] where
learning is adopted to align text embeddings generated by reasoning is necessary.
LLMs and graph embeddings generated by GNNs. Plainly Verbalizing Graphs. Verbalizing the graph structure
In the following sections, we will follow our categoriza- in natural language is the most straightforward way of
tion in Section 3 and discuss detailed methodologies for each representing graphs. Representative approaches include
graph scenario. describing the edge and adjacency lists, widely studied
in [124], [125], [128], [131]. For example, for a triangle graph
4 P URE G RAPHS with three nodes, the edge list can be written as “[(0, 1), (1, 2),
Problems on pure graphs provide a fundamental motivation (2, 0)]”, which means node 0 is connected to node 1, node 1
for why and how LLMs are introduced into graph-related is connected to node 2, node 2 is connected to node 0. It can
reasoning problems. Investigated thoroughly in graph theory, also be written in natural language such as “There is an edge
pure graphs serve as a universal representation format for a between node 0 and node 1, an edge between node 1 and node 2,
wide range of classical algorithmic problems in all perspec- and an edge between node 2 and node 0.” On the other hand, we
tives in computer science. Many graph-based concepts, such can describe the adjacency list from the nodes’ perspective.
as shortest paths, particular sub-graphs, and flow networks, For example, for the same triangle graph, the adjacency list
have strong connections with real-world applications [133]– can be written as “Node 0 is connected to node 1 and node 2.
[135], [193]. Therefore, pure graph-based reasoning is vital Node 1 is connected to node 0 and node 2. Node 2 is connected to
in providing theoretical solutions and insights for reasoning node 0 and node 1.” On these inputs, one can prompt LLMs to
problems grounded in real-world applications. answer questions either in zero-shot or few-shot (in-context
Nevertheless, many reasoning tasks require a computa- learning) settings, the former of which is to directly ask
tion capacity beyond traditional GNNs. GNNs are typically questions given the graph structure, while the latter is to ask
designed to carry out a bounded number of operations given questions about the graph structure after providing a few
a graph size. In contrast, graph reasoning problems can examples of questions and answers. [124]–[126] do confirm
require up to indefinite complexity depending on the task’s that LLMs can answer easier questions such as connectivity,
nature. On the other hand, LLMs demonstrate excellent neighbor identification, and graph size counting but fail
emergent reasoning ability [48], [112], [113] recently. This to answer more complex questions such as cycle detection
is partially due to their autoregressive mechanism, which and Hamiltonian pathfinding. Their results also reveal that
enables computing indefinite sequences of intermediate steps providing more examples in the few-shot setting increases
with careful prompting or training [48], [49]. the performance, especially on easier problems, although it
The following subsections discuss the attempts to in- is still not satisfactory.
corporate LLMs into pure graph reasoning problems. We Paraphrasing Graphs. The verbalized graphs can be lengthy,
will also discuss the corresponding challenges, limitations, unstructured, and complicated to read, even for humans,
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 6

so they might not be the best input format for LLMs to summarize the key nodes, edges, or sub-graphs and perform
infer the answers. To this end, researchers also attempt to reasoning.
paraphrase the graph structure into more natural or concise Searching on Graphs. This kind of reasoning is related to
sentences. [126] find that by prompting LLMs to generate a the search algorithms on graphs, such as breadth-first search
format explanation of the raw graph inputs for itself (Format- (BFS) and depth-first search (DFS) Although not universally
Explanation) or to pretend to play a role in a natural task applicable, BFS and DFS are the most intuitive and effective
(Role Prompting), the performance on some problems can be ways to solve some graph reasoning problems. Numer-
improved but not systematically. [131] explores the effect of ous explorations have been made to simulate searching-
grounding the pure graph in a real-world scenario, such as based reasoning, especially on knowledge-graph question
social networks, friendship graphs, or co-authorship graphs. answering. This approach enjoys the advantage of providing
In such graphs, nodes are described as people, and edges are interpretable evidence besides the answer. Reasoning-on-
relationships between people. Results indicate that encoding Graphs (RoG) [129] is a representative approach that prompts
in real-world scenarios can improve the performance on LLMs to generate several relation paths as plans, which are
some problems, but still not consistently. then retrieved from the knowledge graph (KG) and used
Encoding Graphs Into Implicit Feature Sequences. Finally, as evidence to answer the questions. Another approach is
researchers also attempt to encode the graph structure into to iteratively retrieve and reason on the subgraphs from
implicit feature sequences as part of the input sequence [42]. KG [130], [132], simulating a dynamic searching process. At
Unlike the previous verbalizing approaches, this usually each step, the LLMs retrieve neighbors of the current nodes
involves training a graph encoder to encode the graph and then decide to answer the question or continue the next
structure into a sequence of features and fine-tuning the search step. These methods address the scalability challenge
LLMs to adapt to the new input format. [42] demonstrates when knowledge from multiple graphs is available.
drastic performance improvement on problems including
4.3 Algorithmic Reasoning
substructure counting, maximum triplet sum, shortest path,
and bipartite matching, indicating that fine-tuning LLMs has The previous two approaches are heuristic, which means
great fitting power on a specific task distribution. that the reasoning process accords with human intuition
but is not guaranteed to lead to the correct answer. In
4.2 Heuristic Reasoning contrast, these problems are usually solved by algorithms
Direct mapping to the output leverages the LLMs’ powerful in computer science. Therefore, researchers also attempt to
representation power to “guess” the answers. Still, it does not let LLMs perform algorithmic reasoning on graphs. [124]
fully utilize the LLMs’ impressive emergent reasoning ability, proposed “Algorithmic Prompting”, which prompts the LLMs
which is essential for solving complex reasoning problems. to recall the algorithms that are relevant to the questions
To this end, attempts have been made to let LLMs perform and then perform reasoning step by step according to the
heuristic reasoning on graphs. This approach encourages algorithms. Their results, however, do not show consistent
LLMs to perform a series of intermediate reasoning steps improvement over the heuristic reasoning approach. A more
that might heuristically lead to the correct answer, which direct approach, Graph-ToolFormer [127], lets LLMs generate
resembles a path-finding reasoning schema [203]. API calls as explicit reasoning steps. These API calls are then
Reasoning Step by Step. Encouraged by the success of chain- executed externally to acquire answers on an external graph.
of-thought (CoT) reasoning [48], [113], researchers also at- This approach is suitable for converting tasks grounded in
tempt to let LLMs perform reasoning step by step on graphs. real tasks into pure graph reasoning problems, demonstrating
Chain-of-thought encourages LLMs to roll out a sequence of efficacy on various applications such as knowledge graphs,
reasoning steps to solve a problem, similar to how humans social networks, and recommendation systems.
solve problems. Zero-shot CoT is a similar approach that 4.4 Discussion
does not require any examples. These techniques are studied
The above approaches are not mutually exclusive, and they
in [42], [124]–[126], [128], [131], [132]. Results indicate that
can be combined to achieve better performance, for example,
CoT-style reasoning can improve the performance on simpler
by prompting language models for heuristics in algorithmic
problems, such as cycle detection and shortest path detection.
searching. Moreover, heuristic reasoning can also conduct
Still, the improvement is inconsistent or diminishes on more
direct answering, while algorithmic reasoning contains the
complex problems, such as Hamiltonian path finding and
capacity of heuristic reasoning as a special case. Researchers
topological sorting.
are advised to select the most suitable approach for a specific
Retrieving Subgraphs as Evidence. Many graph reasoning
problem.
problems, such as node degree counting and neighborhood
detection, only involve reasoning on a subgraph of the 5 T EXT-ATTRIBUTED G RAPHS .
whole graph. Such properties allow researchers to let LLMs Text-attributed graphs exist ubiquitously in the real world,
retrieve the subgraphs as evidence and perform reasoning e.g., academic networks, and legal case networks. Learning
on the subgraphs. Build-a-Graph prompting [124] encour- on such networks requires the model to encode both the
ages LLMs to reconstruct the relevant graph structures textual information associated with the nodes/edges and
to the questions and then perform reasoning on them. the structure information lying inside the input graph.
This method demonstrates promising results on problems Depending on the role of LLM, existing works can be
except for Hamiltonian pathfinding, a notoriously tricky categorized into three types: LLM as Predictor, LLM as
problem requiring reasoning on the whole graph. Another Encoder, and LLM as Aligner. We summarize all surveyed
approach, Context-Summarization [126], encourages LLMs to methods in Appendix Table 6.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 7

5.1 LLM as Predictor The strength of these methods is that they can capture
These methods serve the language model as the main model the hidden representations of useful structure information
architecture to capture both the text information and graph with a strong graph encoder, while the challenge is how
structure information. They can be categorized into three to fill the gap between graph modality and text modality.
types: Graph as Sequence methods, Graph-Empowered LLMs, GNP [41] adopts a similar philosophy from LLaVA [91],
and Graph-Aware LLM finetuning methods, depending on how where they utilize GNN to generate graph tokens and then
structure information in graphs is injected into language project the graph tokens into the text token space with
models (input vs. architecture vs. loss). In the Graph as Se- learnable projection matrices. The projected graph tokens are
quence methods, graphs are converted into sequences that can concatenated with text tokens and fed into the language
be understood by language models together with texts from model. GraphGPT [45] further proposes to train a text-
the inputs. In the Graph-Empowered LLMs methods, people grounded GNN for the projection with a text encoder and
modify the architecture of Transformers (which is the base contrastive learning. DGTL [76] introduces disentangled
architecture for LLMs) to enable it to encode text and graph graph learning, serves graph representations as positional
structure simultaneously. In the Graph-Aware LLM finetuning encoding, and adds them to the text sequence. METERN
methods, LLM is fine-tuned with graph structure supervision [75] adds learnable relation embeddings to node textual
and can generate graph-contextualized representations. sequences for text-based multiplex representation learning
on graphs [92].
5.1.1 Graph as Sequence.
5.1.2 Graph-Empowered LLMs.
In these methods, the graph information is mainly encoded
In these methods, researchers design advanced LLM archi-
into the LLM from the “input” side. The ego-graphs associ-
tecture (i.e., Graph-Empowered LLMs) which can conduct
ated with nodes/edges are serialized into a sequence HGv
joint text and graph encoding inside their model architecture.
which can be fed into the LLM together with the texts dv :
Transformers [93] serve as the base model for nowadays pre-
HGv = Graph2Seq(Gv ), (7) trained LMs [23] and LLMs [36]. However, they are designed
for natural language (sequence) encoding and do not take
hv = LLM([HGv , dv ]). (8)
non-sequential structure information into consideration. To
Depending on the choice of Graph2Seq(·) function, the this end, Graph-Empowered LLMs are proposed. They have
methods can be further categorized into rule-based methods a shared philosophy of introducing virtual structure tokens
and GNN-based methods. The illustration of the categories HGv inside each Transformer layer:
can be found in Fig. 3. f(l) = [H (l) , H (l) ]
Hdv Gv dv (10)
Rule-based: Linearizing Graphs into Text Sequence with
Rules. These methods design rules to describe the structure where HGv can be learnable embeddings or output from
with natural language and adopt a text prompt template graph encoders. Then the original multi-head attention
as Graph2Seq(·). For example, given an ego-graph Gvi of (MHA) in Transformers is modified into an asymmetric MHA
the paper node vi connecting to author nodes vj and vk to take the structure tokens into consideration:
and venue nodes vt and vs , HGvi = Graph2Seq(Gvi ) = “The MHAasy (Hdv , H
(l)
f ) = ∥U (l)
u=1 headu (Hdv , Hdv ),
f (l) (l)
dv
centor paper node is vi . Its author neighbor nodes are vj and (l) f(l)⊤
!
vk and its venue neighbor nodes are vt and vs ”. This is the (l) f(l) Qu Ku eu(l) ,
where headu (Hdv , Hdv ) = softmax p ·V
most straightforward and easiest way (without introducing d/U
(l) (l) f(l) (l) f(l) W (l) .
extra model parameters) to encode graph structures into Q(l) f(l)
u = Hdv WQ,u , Ku = Hdv WK,u , Vu
e (l) = H
dv V,u
language models. Along this line, InstructGLM [46] designs (11)
templates to describe local ego-graph structure (maximum
With the asymmetric MHA mechanism, the node encoding
3-hop connection) for each node and conduct instruction
process of the (l + 1)-th layer will be:
tuning for node classification and link prediction. GraphText
′
[65] further proposes a syntax tree-based method to transfer f(l) = Normalize(H (l) + MHAasy (H
H f(l) , H (l) )),
dv dv dv dv
structure into text sequence. Researchers [82] also study when (l+1) ′ ′ (12)
Hdv f(l) + MLP(H
= Normalize(H f(l) )).
and why the linearized structure information on graphs can dv dv
improve the performance of LLM on node classification and Along this line of work, GreaseLM [67] proposes to have a
find that the structure information is beneficial when the language encoding component and a graph encoding compo-
textual information associated with the node is scarce (in nent in each layer. These two components interact through a
this case, the structure information can provide auxiliary modality-fusion layer (MInt layer), where a special structure
information gain). token is added to the text Transformer input, and a special
GNN-based: Encoding Graphs into Special Tokens with node is added to the graph encoding layer. DRAGON [81]
GNNs. Different from rule-based methods which use natural further proposes strategies to pretrain GreaseLM with unsu-
language prompts to linearize graphs into sequences, GNN- pervised signals. GraphFormers [72] are designed for node
based methods adopt graph encoder models (i.e., GNN) to representation learning on homogeneous text-attributed
encode the ego-graph associated with nodes into special networks where the current layer [CLS] token hidden states
token representations which are concatenated with the pure of neighboring documents are aggregated and added as a
text information into the language model: new token on the current layer center node text encoding.
Patton [31] proposes to pretrain GraphFormers with two
HGv = Graph2Seq(Gv ) = GraphEnc(Gv ). (9) novel strategies: network-contextualized masked language
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 8

…
…
Graphs
Center: 𝑣! ; 1-hop neighbors: 𝑣" , 𝑣# ;... …

What is the category of the node? Graphs

CNC(C)C1=CC=C2C(=C1)OCO2. … What is the category of the node/graph?
Graphs What is the category of the graph? What is the category of the node/graph?
word sequence hidden state sequence

(a) Rule-based Graph as Sequence (b) GNN-based Graph as Sequence (c) Graph-Empowered LM
Fig. 3. The illustration of various LLM as Predictor methods, including (a) Rule-based Graph As Sequence, (b) GNN-based Graph As Sequence, (c)
Graph-Empowered LLMs.

modeling and masked node prediction. Heterformer [73] fine-tuning the language model. A summarization of the two-
introduces virtual neighbor tokens for text-rich neighbors tower graph-centric LLM fine-tuning objectives can be found
and textless neighbors which are concatenated with the in Appendix Table 7.
original text tokens and fed into each Transformer layer. There are other methods using the one-tower pipeline,
Edgeformers [74] are proposed for representation learning where node pairs are concatenated and encoded together:
on textual-edge networks where edges are associated with
hvi ,vj = LLMθ (dvi , dvj ), min f (hvi ,vj ). (14)
rich textual information. When conducting edge encoding, θ
virtual node tokens will be concatenated onto the original LinkBERT [30] proposes a document relation prediction
edge text tokens for joint encoding. objective (an extension of next sentence prediction in BERT
[23]) which aims to classify the relation of two node text pairs
5.1.3 Graph-Aware LLM finetuning. from contiguous, random, and linked. MICoL [59] explores
In these methods, the graph information is mainly injected predicting the node pairs’ binary meta-path or meta-graph
into the LLM by “fine-tuning on graphs”. Researchers indicated relation with the one-tower language model.
assume that the structure of graphs can provide hints on 5.1.4 Discussion
what documents are “semantically similar” to what other Although the community is making good progress, there are
documents. For example, papers citing each other in an still some open questions to be solved.
academic graph can be of similar topics. These methods Graph as Code Sequence. Existing graphs as sequence
adopt vanilla language models that take text as input (e.g., methods are mainly rule-based or GNN-based. The former
BERT [23] and SciBERT [25]) as the base model and fine-tune relies on natural language to describe the graphs which is
them with structure signals on the graph [51]. After that, not natural for structure data, while the latter has a GNN
the LLMs will learn node/edge representations that capture component that needs to be trained. A more promising way
the graph homophily from the text perspective. This is the is to obtain a structure-aware sequence for graphs that can
simplest way to utilize LLMs on graphs. However, during support zero-shot inference. A potential solution is to adopt
encoding, the model itself can only consider text. codes (that can capture structures) to describe the graphs
Most methods adopt the two-tower encoding and training and utilize code LLMs [22].
pipeline, where the representation of each node is obtained Advanced Graph-Empowered LLM techniques. Graph-
separately and the model is optimized as follows: empowered LLM is a promising direction to achieve foun-
dational models for graphs. However, existing works are far
hvi = LLMθ (dvi ), min f (hvi , {hv+ }, {hv− }). (13)
θ i i from enough: 1) Task. Existing methods are mainly designed
for representation learning (with encoder-only LLMs) which
Here vi+ represents the positive nodes to vi , vi− represents the
are hard to adopt for generation tasks. A potential solution
negative nodes to vi and f (·) denotes the pairwise training
is to design Graph-Empowered LLMs with decoder-only or
objective. Different methods have different strategies for vi+
encoder-decoder LLMs as the base architecture. 2) Pretraining.
and vi− with different training objectives f (·). SPECTER [51]
Pretraining is important to enable LLMs with contextualized
constructs the positive text/node pairs with the citation
data understanding capability, which can be generalized
relation, explores random negatives and structure hard
to other tasks. However, existing works mainly focus on
negatives, and fine-tunes SciBERT [25] with the triplet
pretraining LLMs on homogeneous text-attributed networks.
loss. SciNCL [52] extends SPECTER by introducing more
Future studies are needed to explore LLM pretraining in
advanced positive and negative sampling methods based on
more diverse real-world scenarios including heterogeneous
embeddings trained on graphs. Touchup-G [54] proposes the
text-attributed networks [73], dynamic text-attributed net-
measurement of feature homophily on graphs and brings
works [128], and textual-edge networks [74].
up a binary cross-entropy fine-tuning objective. TwHIN-
BERT [56] mines positive node pairs with off-the-shelf 5.2 LLM as Encoder
heterogeneous information network embeddings and trains LLMs extract textual features to serve as initial node feature
the model with a contrastive social loss. MICoL [59] discovers vectors for GNNs, which then generate node/edge repre-
semantically positive node pairs with meta-path [90] and sentations and make predictions. These methods typically
adopts the InfoNCE objective. E2EG [60] utilizes a similar adopt an LLM-GNN cascaded architecture to obtain the final
philosophy from GIANT [58] and adds a neighbor prediction representation hvi for node vi :
objective apart from the downstream task objective. WalkLM
[61] conducts random walks for structure linearization before xvi = LLM(dvi ) hvi = GNN(Xv , G). (15)
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 9

🔥 🔥 🔥 🔥 🔥

🔥Trainable
🔥 🔥 🔥 🔥 🔥 🔥
❄ Fixed LLM
step1 step2 ❄ Teacher Student
(a) One-step Training (b) Two-step Training (c) Augmenta7on (d) Dis7lla7on
Fig. 4. The illustration of various techniques related to LLM as Encoder, including (a) One-step Training, (b) Two-step Training, (c) Data Augmentation,
and (d) Knowledge Distillation.

Here xvi is the feature vector that captures the textual in NLP [83], [89]. LLM-GNN [64] proposes to conduct
information dvi associated with vi . The final representation zero-shot node classification on text-attributed networks by
hvi will contain both textual information and structure labeling a few nodes and using the pseudo labels to fine-
information of vi and can be used for downstream tasks. tune GNNs. TAPE [70] presents a method that uses LLM to
In the following sections, we will discuss the optimization, generate prediction text and explanation text, which serve as
augmentation, and distillation of such models. The figures augmented text data compared with the original text data.
for these techniques can be found in Fig. 4. A following medium-scale language model is adopted to
5.2.1 Optimization encode the texts and output features for augmented texts
and original text respectively before feeding into GNNs. ENG
One-step training refers to training the LLM and GNN
[71] brings forward the idea of generating labeled nodes for
together in the cascaded architecture for the downstream
each category, adding edges between labeled nodes and other
tasks. TextGNN [77] explores GCN [84], GraphSAGE [85],
nodes, and conducting semi-supervised GNN learning for
GAT [86] as the base GNN architecture, adds skip connection
node classification.
between LLM output and GNN output, and optimizes the
whole architecture for sponsored search task. AdsGNN 5.2.3 Knowledge Distillation
[78] further extends TextGNN by proposing edge-level LLM-GNN cascaded pipeline is capable of capturing both
information aggregation. GNN-LM [66] adds GNN layers text information and structure information. However, the
to enable the vanilla language model to reference similar pipeline suffers from time complexity issues during inference,
contexts in the corpus for language modeling. Joint training since GNNs need to conduct neighbor sampling and LLMs
LLMs and GNNs in a cascaded pipeline is convenient but need to encode the text associated with both the center
may suffer from efficiency [68] (only support sampling a few node and its neighbors. A straightforward solution is to
one-hop neighbors regarding memory complexity) and local serve the LLM-GNN cascade pipeline as the teacher model
minimal [35] (LLM underfits the data) issues. and distill it into an LLM as the student model. In this
Two-step training means first adapting LLMs to the graph, case, during inference, the model (which is a pure LLM)
and then finetuning the whole LLM-GNN cascaded pipeline. only needs to encode the text on the center node and avoid
GIANT [58] proposes to conduct neighborhood prediction time-consuming neighbor sampling. AdsGNN [78] proposes
with the use of XR-Transformers [79] and results in an LLM an L2-loss to force the outputs of the student model to
that can output better feature vectors than bag-of-words preserve topology after the teacher model is trained. GraD
and vanilla BERT [23] embedding for node classification. [69] introduces three strategies including the distillation
LM-GNN [68] introduces graph-aware pre-fine-tuning to objective and task objective to optimize the teacher model
warm up the LLM on the given graph before fine-tuning and distill its capability to the student model.
the whole LLM-GNN pipeline and demonstrating significant 5.2.4 Discussion
performance gain. SimTeG [35] finds that the simple frame-
Given that GNNs are demonstrated as powerful models in
work of first training the LLMs on the downstream task and
encoding graphs, “LLMs as encoders” seems to be the most
then fixing the LLMs and training the GNNs can result in
straightforward way to utilize LLMs on graphs. However,
outstanding performance. They further find that using the
there are still open questions.
efficient fine-tuning method, e.g., LoRA [40] to tune the LLM
Limited Task: Go Beyond Representation Learning. Current
can alleviate overfitting issues. GaLM [80] explores ways
“LLMs as encoders” methods or LLM-GNN cascaded architec-
to pretrain the LLM-GNN cascaded architecture. The two-
tures are mainly focusing on representation learning, given
step strategy can effectively alleviate the insufficient training
the single embedding propagation-aggregation mechanism
of the LLM which contributes to higher text representation
of GNNs, which prevents it from being adopted to generation
quality but is more computationally expensive and time-
tasks (e.g., node/text generation). A potential solution to
consuming than the one-step training strategy.
this challenge can be to conduct GNN encoding for LLM-
5.2.2 Data Augmentation generated token-level representations and to design proper
With its demonstrated zero-shot capability [43], LLMs can be decoders that can perform generation based on the LLM-
used for data augmentation to generate additional text data GNN cascaded model outputs.
for the LLM-GNN cascaded architecture. The philosophy Low Efficiency: Advanced Knowledge Distillation. The
of using LLM to generate pseudo data is widely explored LLM-GNN cascaded pipeline suffers from time complexity
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 10

issues since the model needs to conduct neighbor sampling pseudo-label by GNN contras8ve learning

and then embedding encoding for each neighboring node.

Although there are methods that explore distilling the text

learned LLM-GNN model into an LLM model for fast graph

inference, they are far from enough given that the inference
of LLM itself is time-consuming. A potential solution is to pseudo-label by LLM graph text

distill the model into a much smaller LM or even an MLP. (a) LLM-GNN Predic0on Alignment (b) LLM-GNN Latent Space Alignment
Similar methods [87] have been proven effective in GNN to Fig. 5. The illustration of LLM as Aligner methods, including (a) LLM-GNN
MLP distillation and are worth exploring for the LLM-GNN Prediction Alignment and (b) LLM-GNN Latent Space Alignment.
cascaded pipeline as well.
KL-divergence-based neighbor-level knowledge alignment:
5.3 LLM as Aligner minimize the neighborhood similarity distribution calculated
These methods contain an LLM component for text encoding between LLM and GNN. G2P2 [63] further extends node-
and a GNN component for structure encoding. These two text contrastive learning by adding text-summary interaction
components are served equally and trained iteratively or and node-summary interaction. Then, they introduce using
parallelly. LLMs and GNNs can mutually enhance each other label texts in the text modality for zero-shot classification,
since the LLMs can provide textual signals to GNNs, while and using soft prompts for few-show classification. THLM
the GNNs can deliver structure information to LLMs. Accord- [33] proposes to pretrain the language model by contrastive
ing to how the LLM and the GNN interact, these methods can learning with a heterogeneous GNN on heterogeneous text-
be further categorized into: LLM-GNN Prediction Alignment attributed networks. The pretrained LLM can be fine-tuned
and LLM-GNN Latent Space Alignment. The illustration of on downstream tasks.
these two categories of methods can be found in Fig. 5. 5.3.3 Discussion.
5.3.1 LLM-GNN Prediction Alignment In “LLMs as Aligners” methods, most research is adopt-
This refers to training the LLM with the text data on a ing shallow GNNs (e.g., GCN, GAT, with thousands of
graph and training the GNN with the structure data on a parameters) to be the graph encoders that are aligned
graph iteratively. LLM will generate labels for nodes from the with LLMs through iterative training (i.e., prediction align-
text perspective and serve them as pseudo-labels for GNN ment) or contrastive training (i.e., latent space alignment).
training, while GNN will generate labels for nodes from the Although LLMs (with millions or billions of parameters)
structure perspective and serve them as pseudo-labels for have strong expressive capability, the shallow GNNs (with
LLM training. By this design, these two modality encoders limited representative capability) can constrain the mutual
can learn from each other and contribute to a final joint text learning effectiveness between LLMs and GNNs. A potential
and graph encoding. In this direction, LTRN [57] proposes solution is to adopt GNNs which can be scaled up [88].
a novel GNN architecture with personalized PageRank Furthermore, deeper research to explore what is the best
[94] and attention mechanism for structure encoding while model size combination for LLMs and GNNs in such “LLMs
adopting BERT [23] as the language model. The pseudo labels as Aligners” LLM-GNN mutual enhancement framework is
generated by LLM and GNN are merged for the next iteration very important.
of training. GLEM [62] formulates the iterative training
process into a pseudo-likelihood variational framework, 6 T EXT-PAIRED G RAPHS
where the E-step is to optimize LLM and the M-step is to Graphs are prevalent data objects in scientific disciplines
train the GNN. such as cheminformatics [183], [194], [200], material infor-
5.3.2 LLM-GNN Latent Space Alignment matics [181], bioinformatics [201], and computer vision [147].
Within these diverse fields, graphs frequently come paired
It denotes connecting text encoding (LLM) and structure
with critical graph-level text information. For instance,
encoding (GNN) with cross-modality contrastive learning:
molecular graphs in cheminformatics are annotated with text
hdvi = LLM(dvi ), hvi = GNN(Gv ), (16) properties such as toxicity, water solubility, and permeability
Sim(hdvi , hvi ) properties [181], [183]. Research on such graphs (scientific
l(hdvi , hvi ) = P , (17) discovery) could be accelerated by the text information
j̸=i Sim(hdvi , hvj ) and the adoption of LLMs. In this section, we review the
X 1
L= (l(hdvi , hvi ) + l(hvi , hdvi )) (18) application of LLMs on graph-captioned graphs with a
v ∈G
2|G| focus on molecular graphs. According to the technique
i
categorization in Section 3.2, we begin by investigating
A similar philosophy is widely used in vision-language methods that utilize LLMs as Predictor. Then, we discuss
joint modality learning [96]. Along this line of approaches, methods that align GNNs with LLMs. We summarize all
ConGrat [53] adopts GAT [86] as the graph encoder and surveyed methods in Appendix Table 8.
tries MPNet [34] as the language model encoder. They
have expanded the original InfoNCE loss by incorporating 6.1 LLM as Predictor
graph-specific elements. These elements pertain to the most In this subsection, we review how to conduct “LLM as
likely second, third, and subsequent choices regarding the Predictor” for graph-level tasks. Existing methods can be
nodes from which a text originates and the texts that categorized into Graph as Sequence (treat graph data as
a node generates. In addition to the node-level multi- sequence input) and Graph-Empowered LLMs (design model
modality contrastive objective, GRENADE [55] proposes architecture to encode graphs).
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 11

6.1.1 Graph as Sequence descriptions from a molecule) and text-based molecular

For text-paired graphs, we have three steps to utilize existing generation (where a molecular graph structure is generated
LLM for graph inputs. Step 1: Linearize graphs into sequence from a natural description). Specifically, MolT5 [123] is
with rule-based methods. Step 2: Tokenize the linearized se- developed based on the T5 [29], suitable for these two tasks.
quence. Step 3: Train/Finetune different LLMs (e.g., Encoder- It formulates molecule-text translation as a multilingual
only, Encoder-Decoder, Decoder-only) for specific tasks. We problem and initializes the model using the T5 checkpoint.
will discuss each step as follows. The model was pre-trained on two monolingual corpora:
the Colossal Clean Crawled Corpus (C4) [29] for the natural
Step 1: Rule-based Graph Linearization. Rule-based lin-
language modality and one million SMILES [156] for the
earization converts molecular graphs into text sequences
molecule modality. Text+Chem T5 [171] extends the input
that can be processed by LLMs. To achieve this, researchers
and output domains to include both SMILES and texts,
develop specifications based on human expertise in the form
unlocking LLMs for more generation functions such as
of line notations [148]. For example, the Simplified Molecular-
text or reaction generation. ChatMol [166] exploits the
Input Line-Entry System (SMILES) [148] records the symbols
interactive capabilities of LLMs and proposes designing
of nodes encountered during a depth-first traversal of
molecule structures through multi-turn dialogs with T5.
a molecular graph. The International Chemical Identifier
(InChI) [149] encodes molecular structures into unique string Decoder-only LLMs. Decoder-only architectures have been
texts with more hierarchical information. Canonicalization adopted for recent LLMs due to their advanced generation
algorithms produce unique SMILES for each molecule, ability. MolGPT [177] and MolXPT [169] are GPT-style models
often referred to as canonical SMILES. However, there are used for molecule classification and generation. Specifically,
more than one SMILES corresponding to a single molecule MolGPT [177] focuses on conditional molecule generation
and SMILES sometimes represent invalid molecules; LLMs tasks using scaffolds, while MolXPT [169] formulates the
learned from these linearized sequences can easily generate classification task as a question-answering problem with yes
invalid molecules (e.g., incorrect ring closure symbols and or no responses. RT [164] adopts XLNet [27] and focuses
unmatched parentheses) due to syntactical errors. To this end, on molecular regression tasks. It frames the regression as a
DeepSMILES [150] is proposed. It can alleviate this issue in conditional sequence modeling problem. Galactica [178] is
most cases but does not guarantee 100% robustness. The a set of LLMs with a maximum of 120 billion parameters,
linearized string could still violate basic physical constraints. which is pretrained on two million compounds from Pub-
To fully address this problem, SELFIES [151] is introduced Chem [183]. Therefore, Galactica could understand molecular
which consistently yields valid molecular graphs. graph structures through SMILES. With instruction tuning
data and domain knowledge, researchers also adapt general-
Step 2: Tokenization. These approaches for linearized domain LLMs such as LLaMA to recognize molecular graph
sequences are typically language-independent. They operate structures and solve molecule tasks [160]. Recent studies
at both character level [167], [178] and substring level [162], also explore the in-context learning capabilities of LLMs
[169], [173]–[176], based on SentencePiece or BPE [155]. on graphs. LLM-ICL [168] assesses the performance of
Additionally, RT [164] proposes a tokenization approach that LLMs across eight tasks in the molecular domain, ranging
facilitates handling regression tasks within LM Transformers. from property classification to molecule-text translation.
Step 3: Encoding the Linearized Graph with LLMs. Encoder- MolReGPT [165] proposes a method to retrieve molecules
only LLMs. Earlier LLMs like SciBERT [25] and BioBERT [180] with similar structures and descriptions to improve in-
are trained on scientific literature to understand natural context learning. LLM4Mol [163] utilizes the summarization
language descriptions related to molecules but are not capa- capability of LLMs as a feature extractor and combines it
ble of comprehending molecular graph structures. To this with a smaller, tunable LLM for specific prediction tasks.
end, SMILES-BERT [179] and MFBERT [176] are proposed 6.1.2 Graph-Empowered LLMs
for molecular graph classification with linearized SMILES Different from the methods that adopt the original LLM
strings. Since scientific natural language descriptions contain architecture (i.e., Transformers) and input the graphs as
human expertise which can serve as a supplement for sequences to LLMs, graph-empowered LLMs attempt to
molecular graph structures, recent advances emphasize joint design LLM architectures that can conduct joint encoding
understanding of them [159], [175]: The linearized graph of text and graph structures. Some works modify the posi-
sequence is concatenated with the raw natural language data tional encoding of Transformers. For instance, GIMLET [47]
and then input into the LLMs. Specifically, KV-PLM [175] is treats nodes in a graph as tokens. It uses one Transformer
built based on BERT [23] to understand the molecular struc- to manage both the graph structure and text sequence
ture in a biomedical context. CatBERTa [159], as developed [v1 , v2 , . . . , v|V| , s|V|+1 , . . . , s|V|+|dG | ], where v ∈ V is a node
from RoBERTa [24], specializes in the prediction of catalyst and s ∈ dG is a token in the text associated with G . This
properties for molecular graphs. sequence cannot reflect graph structure. Therefore, a new
Encoder-Decoder LLMs. Encoder-only LLMs may lack the position encoding (PE) is used to jointly encode graph
capability for generation tasks. In this section, we discuss structures and text sequences. It defines the relative distance
LLMs with encoder-decoder architectures. For example, between tokens i and j as follows:
Chemformer [156] uses a similar architecture as BART [28]. 
The representation from the encoder can be used for property i − j
 if i, j ∈ dG ,
GSD(i, j) + Meanek ∈SP(i,j) xek if i, j ∈ V ,
prediction tasks, and the whole encoder-decoder architecture PE(i, j) = −∞ if i ∈ V, j ∈ dG ,

can be optimized for molecule generation. Others focus 0

if i ∈ dG , j ∈ V .
on molecule captioning (which involves generating textual (19)
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12

GSD is the graph shortest distance between two nodes, electrons, hybridization state, aromaticity, and presence in a
and Meank∈SP(i,j) represents the mean pooling of the edge ring. Bond features encompass the bond’s type (e.g., single,
features xek along the shortest path SP(i, j) between nodes double, or triple), the bond’s stereochemistry (e.g., E/Z
i and j . GIMLET [47] adapts bi-directional attention for node or cis/trans), and whether the bond is conjugated [188].
tokens and enables texts to selectively attend to nodes. These Each feature provides specific information about atomic
designs render the Transformer’s submodule, which handles properties and structure, crucial for molecular modeling and
the graph part, equivalent to a Graph Transformer [141]. cheminformatics. One may directly vectorize the molecular
Cross-attention is also used to interact representations graph structure into binary vectors [186] and then apply
between graphs and texts. Given the graph hidden state hG , parameterized Multilayer Perceptrons (MLPs) on the top
its node-level hidden state Hv and text hidden state HdG , of these vectors to get the graph representation. These
Text2Mol [122] implemented interaction between representa- vectorization approaches are based on human-defined rules
tions in the hidden layers of encoders, while Prot2Text [161] and vary, such as MACCS, ECFP, and CDK fingerprints [186].
implemented this interaction within the layers of between
These rules take inputs of a molecule and output a vector
WQ HdG ·(WK Hv )T consisting of 0/1 bits. Each bit denotes a specific type of
encoder and decoder HdG = softmax √
d
·
k substructure related to functional groups that could be used
WV Hv , where WQ , WK , WV are trainable parameters for various property predictions. Fingerprints consider atoms
that transform the query modality (e.g., sequences) and and structures, but they cannot automatically learn from the
the key/value modality (e.g., graphs) into the attention graph structure. GNNs could serve as automatic feature
space. Furthermore, Prot2Text [161] utilizes two trainable extractors to replace or enhance fingerprints. Some specific
parameter matrices W1 and W2 to integrate the graph methods are explored in Section 6.1.2, while the other graph
representation into the sequence representation HdG = prior such as the eigenvectors of a graph Laplacian and the
HdG + 1|dG | hG W1 W2 . random walk prior could also be used [142].
LLM Outputs for Prediction. LMs like KV-PLM [175],
6.1.3 Discussion SMILES-BERT [179], MFBERT [176], and Chemformer [156]
LLM Inputs with Sequence Prior. The first challenge is that the use a prediction head on the output vector of the last layer.
progress in advanced linearization methods has not progressed in These models are finetuned with standard classification and
tandem with the development of LLMs. Emerging around 2020, regression losses but may not fully utilize all the parameters
linearization methods for molecular graphs like SELFIES and advantages of the complete architecture. In contrast,
offer significant grammatical advantages, yet advanced LMs models like RT [164], MolXPT [169], and Text+Chem T5 [171]
and LLMs from graph machine learning and language model frame prediction as a text generation task. These models are
communities might not fully utilize these, as these encoded trained with either masked language modeling or autoregres-
results are not part of pretraining corpora prior to their sive targets, which requires a meticulous design of the context
proposal. Consequently, recent studies [168] indicate that words in the text [164]. Specifically, domain knowledge
LLMs, such as GPT-3.5/4, may be less adept at using SELFIES instructions may be necessary to activate the in-context
compared to SMILES. Therefore, the performance of LM-only learning ability of LLMs, thereby making them domain
and LLM-only methods may be limited by the expressiveness experts [168]. For example, a possible template could be
of older linearization methods, as there is no way to optimize divided into four parts: {General Description}{Task-Specific
these hard-coded rules during the learning pipeline of LLMs. Description}{Question-Answer Examples}{Test Question}.
However, the second challenge remains as the inductive bias of LLM Outputs for Reasoning. Since string representations
graphs may be broken by linearization. Rule-based linearization of molecular graphs usually carry new and in-depth domain
methods introduce inductive biases for sequence modeling, knowledge, which is beyond the knowledge of LLMs, recent
thereby breaking the permutation invariance assumption work [146], [157], [165] also attempts to utilize the reasoning
inherent in molecular graphs. It may reduce task difficulty ability of LLMs, instead of using them as a knowledge source
by introducing sequence order to reduce the search space. for predicting the property of molecular graphs. ReLM [157]
However, it does not mean model generalization. Specifically, utilizes GNNs to suggest top-k candidates, which were then
there could be multiple string-based representations for a used to construct multiple-choice answers for in-context
single graph from single or different approaches. Numerous learning. ChemCrow [146] designs the LLMs as the chemical
studies [152]–[154] have shown that training on different agent to implement various chemical tools. It avoided direct
string-based views of the same molecule can improve the inference in an expertise-intensive domain.
sequential model’s performance, as these data augmentation
6.2 LLM as Aligner
approaches manage to retain the permutation-invariance
nature of graphs. These advantages are also achievable with 6.2.1 Latent Space Alignment
a permutation-invariant GNN, potentially simplifying the One may directly align the latent spaces of the GNN and LLM
model by reducing the need for complex, string-based data through contrastive learning and predictive regularization.
augmentation design. Typically, a graph representation from a GNN can be read
LLM Inputs with Graph Prior. Rule-based linearization may out by summarizing all node-level representations, and a
be considered less expressive and generalizable compared sequence representation can be obtained from the [CLS]
to the direct graph representation with rich node features, token. We first use two projection heads, which are usually
edge features, and the adjacency matrix [187]. Various atomic MLPs, to map the separate representation vectors from the
features include atomic number, chirality, degree, formal GNN and LLM into a unified space as hG and hdG , and
charge, number of hydrogen atoms, number of radical then align them within this space. Specifically, MoMu [174]
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 13

and MoMu-v2 [173] retrieve two sentences from the corpus dimensions. The scale of GNNs may be a bottleneck in learn-
for each molecular graph. During training, graph data ing semantic meaningful representation and there is a risk of
augmentation was applied to molecular graphs, creating two over-reliance on one modality, neglecting the other. Therefore,
augmented views. Consequently, there are four pairs of G for future large-scale GNN designs comparable to LLMs,
and dG . For each pair, the contrastive loss for space alignment scaling up the dimension size and adding deeper layers, may
exp(cos(hG ,hdG )/τ ) be considered. Besides, Transformer encoders [142] may also
is as ℓMoMu = − log P where τ is the
d˜G ̸=dG exp cos hG ,hd˜
G
/τ improve the expressive power of deep GNNs.
temperature hyper-parameter and d˜G denotes the sequence Generation Decoder with GNNs. GNNs are often not used
not paired to the graph G . MoleculeSTM [172] also applies as decoders for graph generation. The prevalent decoders
contrastive learning to minimize the representation distance are mostly text-based, generating linearized graph structures
between a molecular graph G and its corresponding texts dG , such as SMILES. These methods may be sensitive to the
while maximizing the distance between the molecule and sequence order in the linearized graph. Generative diffusion
unrelated descriptions. MoleculeSTM [172] randomly sam- models [202] on graphs could be utilized in future work to
ples negative graphs or texts to construct negative pairs of design generators with GNNs.
˜ and (G̃, d). Similarly, MolFM [162] and GIT-Mol [158]
(G, d)
implement contrastive loss with mutual information and
7 A PPLICATIONS
negative sampling. These two methods also use cross-entropy 7.1 Datasets, Splitting and Evaluation
to regularize the unified space with the assumption that We summarize the datasets for three scenarios (namely pure
randomly permuted graph and text inputs are predictable if graphs, text-attributed graphs, and text-paired graphs) and
they originate from the same molecule. show them in Table 5, Table 2, and Table 3 respectively.
However, the aforementioned methods cannot leverage
7.1.1 Pure Graphs
task labels. Given a classification label y , CLAMP [170]
learns to map active molecules (y = 1) so that they In Table 5, we summarize the pure graph reasoning prob-
align with the corresponding assay description for each lems discussed in Section 4. Many problems are shared or
molecular graph G : ℓCLAMP = y log σ τ −1 hTG hdG revisited in different datasets due to their commonality. NL-

+
(1 − y) log 1 − σ τ −1 hTG hdG . CLAMP [170] requires la- Graph [124], LLMtoGraph [125] and GUC [126] study a set of
bels to encourage that active molecules and their cor- standard graph reasoning problems, including connectivity,
responding text descriptions are clustered together in shortest path, and graph diameter. GraphQA [131] bench-
the latent space. To advance the alignment between two marks a similar set of problems but additionally describes the
modalities, MolCA [167] trains the Query Transformer (Q- graphs in real-world scenarios to study the effect of graph
Former) [190] for molecule-text projecting and contrastive grounding. LLM4DyG [128] focuses on reasoning tasks on
alignment. Q-former initializes Nq learnable query tokens temporally evolving graphs. Accuracy is the most common
Nq
{qk }k=1 . These query tokens are updated with self-attention evaluation metric as they are primarily formulated as graph
and interact with the output of GNNs through cross- question-answering tasks.
attention to obtain the k -th queried molecular represen- 7.1.2 Text-Attributed Graphs
tation vector (hG )k := Q-Former(qk ). The query tokens We summarize the famous datasets for evaluating models
share the same self-attention modules with the texts, but on text-attributed graphs in Table 2. The datasets are mostly
use different MLPs, allowing the Q-Former to be used from the academic, e-commerce, book, social media, and
for obtaining the representation of text sequence hdG := Wikipedia domains. The popular tasks to evaluate models
Q-Former([CLS]). Then we have ℓMolCA = −ℓg2t − ℓt2g , on those datasets include node classification, link prediction,
exp(maxk cos((hG )k ,hdG )/τ )
where ℓg2t = log P , and edge classification, regression, and recommendation. The
exp maxk cos (hG )k ,hd˜ /τ
d˜G ̸=dG G evaluation metrics for node/edge classification include
exp(maxk cos(hdG ,(hG )k )/τ )
ℓt2g = log . Accuracy, Macro-F1, and Micro-F1. For link prediction and
exp(maxk cos(hdG ,(hG̃ )k )/τ )
P
G̸̃=G recommendation evaluation, Mean Reciprocal Rank (MRR),
6.2.2 Discussion Normalized Discounted Cumulative Gain (NDCG), and Hit
Ratio (Hit) usually serve as metrics. While evaluating model
Larger-Scale GNNs. GNNs integrate atomic and graph struc-
performance on regression tasks, people tend to adopt mean
tural features for molecular representation learning [145].
absolute errors (MAE) or root mean square error (RMSE).
Specifically, Text2Mol [122] utilizes the GCN [84] as its graph
encoder and extracts unique identifiers for node features 7.1.3 Text-Paired Graphs
based on Morgan fingerprints [186]. MoMu [174], MoMu- Table 3 shows text-paired graph datasets (including text-
v2 [173], MolFM [162], GIT-Mol [158], and MolCA [167] available and graph-only datasets). For Data Splitting, options
prefer GIN [189] as the backbone, as GIN has been proven include random splitting, source-based splitting, activity
to be as expressive and powerful as the Weisfeiler-Lehman cliffs and scaffolds [196], and data balancing [143]. Graph
graph isomorphism test. As described in Section 2.2, there classification usually adopts AUC [188] as the metrics,
has been notable progress in making GNNs deeper, more while regression uses MAE, RMSE, and R2 [145]. For text
generalizable, and more powerful since the proposal of the generation evaluation, people tend to use the Bilingual
GCN [84] in 2016 and the GIN [189] in 2018. However, most Evaluation Understudy (BLEU) score; while for molecule
reviewed works [158], [162], [167], [173], [174] are developed generation evaluation, heuristic evaluation methods (based
using the GIN [189] as a proof of concept for their approaches. on factors including validity, novelty, and uniqueness) are
These pretrained GINs feature five layers and 300 hidden adopted. However, it is worth noted that BLEU score is
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14

TABLE 2
Data collection in Section 5 for text-attributed graphs. Task: “NC”, “UAP”, “LP”, “Rec”, “EC”, “RG” denote node classification, user
activity prediction, link prediction, recommendation, edge classification, and regression task.
Text. Data Year Task # Nodes # Edges Domain Source & Notes
ogb-arxiv 2020.5 NC 169,343 1,166,243 Academic OGB [188]
ogb-products 2020.5 NC 2,449,029 61,859,140 E-commerce OGB [188]
ogb-papers110M 2020.5 NC 111,059,956 1,615,685,872 Academic OGB [188]
ogb-citation2 2020.5 LP 2,927,963 30,561,187 Academic OGB [188]
Cora 2000 NC 2,708 5,429 Academic [10]
Citeseer 1998 NC 3,312 4,732 Academic [11]
DBLP 2023.1 NC, LP 5,259,858 36,630,661 Academic www.aminer.org/citation
Node

MAG 2020 NC, LP, Rec RG ∼ 10M ∼ 50M Academic multiple domains [12] [13]
Goodreads-books 2018 NC, LP ∼ 2M ∼ 20M Books multiple domains [14]
Amazon-items 2018 NC, LP, Rec ∼ 15.5M ∼ 100M E-commerce multiple domains [15]
SciDocs 2020 NC, UAP, LP, Rec - - Academic [51]
PubMed 2020 NC 19,717 44,338 Academic [16]
Wikidata5M 2021 LP ∼ 4M ∼ 20M Wikipedia [17]
Twitter 2023 NC, LP 176,279 2,373,956 Social [53]
Goodreads-reviews 2018 EC, LP ∼ 3M ∼ 100M Books multiple domains [14]
Edge

Amazon-reviews 2018 EC, LP ∼ 15.5M ∼ 200M E-commerce multiple domains [15]

Stackoverflow 2023 EC, LP 129,322 281,657 Social [74]

TABLE 3
Data collection in Section 6 for text-captioned graphs. “PT”, “FT”, “Cap.”, “GC”, “Retr.’, and “Gen.” refer to pretraining, finetuning, caption, graph
classification, retrieval, and graph generation, respectively. The superscript for the size denotes # graph-text pairs1 , # graphs2 , # assays3 .
Data Date Task Size Source & Notes
ChEMBL-2023 [185] 2023 Various 2.4M2 ,20.3M3 Drug-like
PubChem [183] 2019 Various 96M2 ,237M3 Biomedical
PC324K [167] 2023 PT, Cap., 324K1 PubChem [183]
MolXPT-PT [169] 2023 PT 30M2 PubChem [183], PubMed, ChEBI [182]
ChE-bio [47] 2023 PT 365K2 ChEMBL [184]
ChE-phy [47] 2023 PT 365K2 ChEMBL [184]
ChE ZS [47] 2023 GC 91K2 ChEMBL [184]
PC223M [170] 2023 PT, Retr. 223M1 ,2M2 ,20K3 PubChem [183]
PCSTM [172] 2022 PT 281K1 PubChem [183]
PCdes [183] 2022 FT, Cap, Retr. 15K1 PubChem [183]
ChEBI-20 [122] 2021 FT., Retr., Gen., Cap. 33K1 PubChem [183], ChEBI [182]

efficient but less accurate, while heuristic evaluation methods 7.3 Practical applications
are problematic subject to unintended modes, such as the
7.3.1 Scientific Discovery
superfluous addition of carbon atoms in [197].
Virtual Screening. It aims to search a library of unlabeled
7.2 Open-source Implementations
molecules to identify useful structures for a given task.
HuggingFace. HF Transformers1 is the most popular Python Machine learning models could automatically screen out
library for Transformers-based language models. Besides, it trivial candidates to accelerate this process. However, train-
also provides two additional packages: Datasets2 for easily ing accurate models is not easy since labeled molecules are
accessing and sharing datasets and Evaluate3 for easily limited in size and imbalanced in distribution [143]. There are
evaluating machine learning models and datasets. many efforts to improve GNNs against data sparsity [143],
Fairseq. Fairseq4 is another open-source Python library for [145], [192]. However, it is difficult for a model to generalize
Transformers-based language models. and understand in-depth domain knowledge that it has never
PyTorch Geometric. PyG5 is an open-source Python library been trained on. Texts could be complementary knowledge
for graph machine learning. It packages more than 60 types sources. Discovering task-related content from massive
of GNN, aggregation, and pooling layers. scientific papers and using them as instructions has great
Deep Graph Library. DGL6 is another open-source Python potential to design accurate GNNs in virtual screening [47].
library for graph machine learning. Molecular Generation. Molecular generation and optimiza-
RDKit. RDKit7 is one of the most popular open-source tion is one fundamental goal for drug and material discovery.
cheminformatics software programs that facilitates various Scientific hypotheses of molecules [199], can be represented
operations and visualizations for molecular graphs. It offers in the joint space of GNNs and LLMs. Then, one may search
many useful APIs, such as the linearization implementation in the latent space for a better hypothesis that aligns with
for molecular graphs, to convert them into easily stored the text description (human requirements) and adheres to
SMILES and to convert these SMILES back into graphs. structural constraints like chemical validity. Chemical space
1. https://huggingface.co/docs/transformers/index has been found to contain more than 1060 molecules [198],
2. https://huggingface.co/docs/datasets/index which is beyond the capacity of exploration in wet lab exper-
3. https://huggingface.co/docs/evaluate/index
iments. Generating constrained candidates within relevant
4. https://github.com/facebookresearch/fairseq
5. https://pytorch-geometric.readthedocs.io/en/latest/index.html subspaces is a challenge [202] and promising, especially
6. https://www.dgl.ai/ when incorporating textual conditions.
7. https://www.rdkit.org/docs/ Synthesis Planning. Synthesis designs start from available
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 15

molecules and involve planning a sequence of steps that Education Domain. In the education domain, we can
can finally produce a desired chemical compound through construct a graph with coursework as nodes and their
a series of reactions [199]. This procedure includes a se- relations as edges. The model learned on such a graph can be
quence of reactant molecules and reaction conditions. Both utilized for knowledge tracing [136] and student performance
graphs and texts play important roles in this process. For prediction [137].
example, graphs may represent the fundamental structure of
molecules, while texts may describe the reaction conditions, 8 F UTURE DIRECTIONS
additives, and solvents. LLMs can assist in the planning by Better Benchmark Datasets. Most pure graph benchmarks
suggesting possible synthesis paths directly or by serving as evaluate LLMs’ reasoning ability on homogeneous graphs
agents to operate on existing planning tools [146]. but do not include evaluations on heterogeneous or spatial-
7.3.2 Computational Social Science temporal graphs. For text-attributed graphs, as summarized
in Table 2, most benchmark datasets are from academic
In computational social science, researchers are interested
domains and e-commerce domains. However, in the real
in modeling the behavior of people/users and discovering
world, text-attributed graphs are ubiquitous across multiple
new knowledge that can be utilized to forecast the future.
domains (e.g., legal and health). More diverse datasets are
The behaviors of users and interactions between users can
needed to comprehensively evaluate LLMs on real-world
be modeled as graphs, where the nodes are associated with
scenarios. For text-paired graphs, as summarized in Table 3,
rich text information (e.g., user profile, messages, emails). We
there is a lack of comprehensive datasets covering various
will show two example scenarios below.
machine learning tasks in chemistry. Although a massive
E-commerce. In E-commerce platforms, there are many
number of scientific papers are available, preprocessing
interactions (e.g., purchase, view) between users and prod-
them into a ready-to-use format and pairing them with
ucts. For example, users can view or purchase products.
specific molecular graph data points of interest remains
In addition, the users, products, and their interactions are
a cumbersome and challenging task. Besides, we could
associated with rich text information. For instance, products
investigate graph-text pairs in 3D space, where each molecule
have titles/descriptions and users can leave a review of
may be associated with atomic coordinates [138].
products. In this case, we can construct a graph [102]
Broader Task Space with LLMs. More comprehensive stud-
where nodes are users and products, while edges are their
ies on the performance of LLMs for graph tasks hold promise
interactions. Both nodes and edges are associated with text.
for the future. While LLMs as encoder approaches have been
It is important to utilize both the text information and the
explored for text-attributed graphs, their application to text-
graph structure information (user behavior) to model users
captioned molecular graphs remains underexplored. Promis-
and items and solve complex downstream tasks (e.g., item
ing directions include using LLMs for data augmentation
recommendation [106], bundle recommendation [107], and
and knowledge distillation to design domain-specific GNNs
product understanding [108]).
for various text-paired graph tasks. Furthermore, although
Social Media. In social media platforms, there are many
graph generation has been approached in text-paired graphs,
users and they interact with each other through messages,
it remains an open problem for text-attributed graphs (i.e.,
emails, and so on. In this case, we can build a graph where
how to conduct joint text and graph structure generation)
nodes are users and edges are the interaction between users.
Multi-Modal Foundation Models. One open question is,
There will be text associated with nodes (e.g., user profile)
“Should we use one foundation model to unify different
and edges (e.g., messages). Interesting research questions
modalities, and how?” The modalities can include texts,
will be how to do joint text and graph structure modeling
graphs, and even images. For instance, molecules can be
to deeply understand the users for friend recommendation
represented as graphs, described as texts, and photographed
[109], user analysis [110], community detection [111], and
as images; products can be treated as nodes in a graph,
personalized response generation [97], [98].
associated with a title/description, and combined with an
7.3.3 Specific Domains image. Designing a model that can conduct joint encoding
In many specific domains, text data are interconnected and for all modalities will be useful but challenging. Furthermore,
lie in the format of graphs. The structure information on the there has always been tension between building a unified
graphs can be utilized to better understand the text unit and foundational model and customizing model architectures
contribute to advanced problem-solving. for different domains. It is thus intriguing to ask whether a
Academic Domain. In the academic domain, graphs [12] unified architecture will suit different data types, or if tailor-
are constructed with papers as nodes and their relations ing model designs according to domains will be necessary.
(e.g., citation, authorship, etc) as edges. The representation Correctly answering this question can save economic and
learned for papers on such graphs can be utilized for paper intellectual resources from unnecessary attempts and also
recommendation [103], paper classification [104], and author shed light on a deeper understanding of graph-related tasks.
identification [105]. Efficienct LLMs on Graphs. While LLMs have shown
Legal Domain. In the legal domain, opinions given by a strong capability to learn on graphs, they suffer from
the judges always contain references to opinions given for inefficiency in graph linearization and model optimization.
previous cases. In such scenarios, people can construct a On one hand, as discussed in Section 5.1.1 and 6.1.1, many
graph [99] based on the citation relations between opinions. methods rely on transferring graphs into sequences that can
The representations learned on such a graph with both be inputted into LLMs. However, the length of the transferred
text and structure information can be utilized for clause sequence will increase significantly as the size of the graph
classification [100] and opinion recommendation [101]. increases. This poses challenges since LLMs always have a
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 16

maximum sequence input length and a long input sequence [4] Reimers, N. and Gurevych, I., “Sentence-BERT: Sentence Embed-
will lead to higher time and memory complexity. On the other dings using Siamese BERT-Networks,” in EMNLP, 2019.
[5] Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S.,
hand, optimizing LLMs itself is computationally expensive. Yogatama, D., Bosma, M., Zhou, D., Metzler, D. and Chi, E.H.,
Although some general efficient tuning methods such as “Emergent Abilities of Large Language Models,” in TMLR, 2022.
LoRA are proposed, there is a lack of discussion on graph- [6] Nagamochi, H. and Ibaraki, T., “Algorithmic aspects of graph
aware LLM efficient tuning methods. connectivity,” in Cambridge University Press, 2018.
[7] Goldberg, A.V. and Harrelson, C., “Computing the shortest path: A
Generalizable and Robust LLMs on Graphs. Another
search meets graph theory,” in SODA (Vol. 5, pp. 156-165), 2005.
interesting direction is to explore the generalizability and [8] Sun, Z., Wang, H., Wang, H., Shao, B. and Li, J., “Efficient subgraph
robustness of LLMs on graphs. Generalizability refers to matching on billion node graphs,” in arXiv preprint arXiv:1205.6691,
having the ability to transfer the knowledge learned from 2012.
one domain graph to another; while robustness denotes [9] Chen, Z., Mao, H., Li, H., Jin, W., Wen, H., Wei, X., ... & Tang, J.
(2023). Exploring the potential of large language models (llms) in
having consistent prediction regarding obfuscations and learning on graphs. arXiv preprint arXiv:2307.03393.
attacks. Although LLMs have demonstrated their strong [10] McCallum, A.K., Nigam, K., Rennie, J. and Seymore, K., “Automat-
generalizability in processing text, they still suffer from ing the construction of internet portals with machine learning,” in
Information Retrieval, 3, pp.127-163, 2000.
robustness and hallucination issues, which are to be solved
[11] Giles, C.L., Bollacker, K.D. and Lawrence, S., “CiteSeer: An au-
for graph data modeling as well. tomatic citation indexing system,” in Proceedings of the third ACM
LLM as Dynamic Agents on Graphs. Although LLMs have conference on Digital libraries (pp. 89-98), 1998.
shown their advanced capability in generating text, one- [12] Wang, K., Shen, Z., Huang, C., Wu, C.H., Dong, Y. and Kanakia,
A., “Microsoft academic graph: When experts are not enough,” in
pass generation of LLMs suffers from hallucination and
Quantitative Science Studies, 1(1), pp.396-413, 2020.
misinformation issues due to the lack of accurate parametric [13] Zhang, Y., Jin, B., Zhu, Q., Meng, Y. and Han, J., “The Effect of
knowledge. Simply augmenting retrieved knowledge in Metadata on Scientific Literature Tagging: A Cross-Field Cross-
context is also bottlenecked by the capacity of the retriever. Model Study,” in WWW, 2023.
In many real-world scenarios, graphs such as academic [14] Wan, M. and McAuley, J., “Item recommendation on monotonic
behavior chains,” in Proceedings of the 12th ACM conference on
networks, and Wikipedia are dynamically looked up by recommender systems, 2018.
humans for knowledge-guided reasoning. Simulating such [15] Ni, J., Li, J. and McAuley, J., “Justifying recommendations using
a role of dynamic agents can help LLMs more accurately re- distantly-labeled reviews and fine-grained aspects,” in EMNLP-
IJCNLP, 2019.
trieve relevant information via multi-hop reasoning, thereby
[16] Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B. and Eliassi-
correcting their answers and alleviating hallucinations. Rad, T., “Collective classification in network data,” in AI magazine,
29(3), pp.93-93, 2008.
9 C ONCLUSION [17] Wang, X., Gao, T., Zhu, Z., Zhang, Z., Liu, Z., Li, J. and Tang,
In this paper, we provide a comprehensive review of large J., “KEPLER: A unified model for knowledge embedding and pre-
language models on graphs. We first categorize graph trained language representation,” in TACL, 2021.
[18] Liu, L., Du, B., Ji, H., Zhai, C. and Tong, H., “Neural-answering
scenarios where LMs can be adopted and summarize the logical queries on knowledge graphs,” in KDD., 2021.
large language models on graph techniques. We then provide [19] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Philip, S. Y.,
a thorough review, analysis, and comparison of methods “A comprehensive survey on graph neural networks,” in IEEE
within each scenario. Furthermore, we summarize available transactions on neural networks and learning systems, 32(1), 4-24, 2020.
[20] Liu, J., Yang, C., Lu, Z., Chen, J., Li, Y., Zhang, M., Bai, T., Fang, Y.,
datasets, open-source codebases, and multiple applications. Sun, L., Yu, P.S. and Shi, C., “Towards Graph Foundation Models: A
Finally, we suggest future directions for large language Survey and Beyond,” in arXiv preprint arXiv:2310.11829, 2023.
models on graphs. [21] Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J. and Wu, X., “Unifying
Large Language Models and Knowledge Graphs: A Roadmap,” in
ACKNOWLEDGMENTS arXiv preprint arXiv:2306.08302, 2023.
[22] Wang, Y., Le, H., Gotmare, A.D., Bui, N.D., Li, J. and Hoi, S.C.,
This work was supported in part by US DARPA KAIROS “Codet5+: Open code large language models for code understanding
Program No. FA8750-19-2-1004 and INCAS Program No. and generation.,” in arXiv preprint arXiv:2305.07922, 2023.
HR001121C0165, National Science Foundation IIS-19-56151, [23] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., “Bert: Pre-
training of deep bidirectional transformers for language understand-
and the Molecule Maker Lab Institute: An AI Research
ing,” in NAACL, 2019.
Institutes program supported by NSF under Award No. [24] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis,
2019897, and the Institute for Geospatial Understanding M., Zettlemoyer, L. and Stoyanov, V., “Roberta: A robustly optimized
through an Integrative Discovery Environment (I-GUIDE) bert pretraining approach,” in arXiv preprint arXiv:1907.11692, 2019.
[25] Beltagy, I., Lo, K. and Cohan, A., “SciBERT: A pretrained language
by NSF under Award No. 2118329. Any opinions, findings,
model for scientific text,” in arXiv preprint arXiv:1903.10676, 2019.
and conclusions or recommendations expressed herein are [26] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal,
those of the authors and do not necessarily represent the P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., and Agarwal,
views, either expressed or implied, of DARPA or the U.S. “Language models are few-shot learners,” in NeurIPS, 2020.
Government. [27] Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R. and Le,
Q.V., “Xlnet: Generalized autoregressive pretraining for language
understanding,” in NeurIPS, 2019.
R EFERENCES [28] Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed,
[1] Yang, W., Xie, Y., Lin, A., Li, X., Tan, L., Xiong, K., Li, M. and Lin, J., A., Levy, O., Stoyanov, V. and Zettlemoyer, L., “Bart: Denoising
“End-to-end open-domain question answering with bertserini,” in sequence-to-sequence pre-training for natural language generation,
NAACL, 2019. translation, and comprehension,” in ACL, 2020.
[2] Liu, Y. and Lapata, M., “Text Summarization with Pretrained [29] Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena,
Encoders,” in EMNLP, 2019. M., Zhou, Y., Li, W. and Liu, P.J., “Exploring the limits of transfer
[3] Wang, A., Singh, A., Michael, J., Hill, F., Levy, O. and Bowman, S.R., learning with a unified text-to-text transformer,” in JMLR, 2020.
“GLUE: A Multi-Task Benchmark and Analysis Platform for Natural [30] Yasunaga, M., Leskovec, J. and Liang, P., “LinkBERT: Pretraining
Language Understanding,” in ICLR, 2018. Language Models with Document Links,” in ACL, 2022.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 17

[31] Jin, B., Zhang, W., Zhang, Y., Meng, Y., Zhang, X., Zhu, Q. and Han, [55] Li, Y., Ding, K. and Lee, K., “GRENADE: Graph-Centric Lan-
J., “Patton: Language Model Pretraining on Text-Rich Networks,” in guage Model for Self-Supervised Representation Learning on Text-
ACL, 2023. Attributed Graphs,” in EMNLP., 2023.
[32] Zhang, X., Malkov, Y., Florez, O., Park, S., McWilliams, B., Han, J. [56] Zhang, X., Malkov, Y., Florez, O., Park, S., McWilliams, B., Han, J.
and El-Kishky, A., “TwHIN-BERT: a socially-enriched pre-trained and El-Kishky, A., “TwHIN-BERT: A Socially-Enriched Pre-trained
language model for multilingual Tweet representations,” in KDD, Language Model for Multilingual Tweet Representations at Twitter,”
2023. in KDD., 2023.
[33] Zou, T., Yu, L., Huang, Y., Sun, L. and Du, B., “Pretraining Language [57] Zhang, X., Zhang, C., Dong, X.L., Shang, J. and Han, J., “Minimally-
Models with Text-Attributed Heterogeneous Graphs,” in arXiv supervised structure-rich text categorization via learning on text-rich
preprint arXiv:2310.12580, 2023. networks,” in WWW., 2021.
[34] Song, K., Tan, X., Qin, T., Lu, J. and Liu, T.Y., “Mpnet: Masked and [58] Chien, E., Chang, W.C., Hsieh, C.J., Yu, H.F., Zhang, J., Milenkovic,
permuted pre-training for language understanding,” in NeurIPs., O., and Dhillon, I.S., “Node feature extraction by self-supervised
2020. multi-scale neighborhood prediction,” in ICLR., 2022.
[35] Duan, K., Liu, Q., Chua, T.S., Yan, S., Ooi, W.T., Xie, Q. and He, J., [59] Zhang, Y., Shen, Z., Wu, C.H., Xie, B., Hao, J., Wang, Y.Y., Wang, K.
“Simteg: A frustratingly simple approach improves textual graph and Han, J., “Metadata-induced contrastive learning for zero-shot
learning,” in arXiv preprint arXiv:2308.02565., 2023. multi-label text classification,” in WWW., 2022.
[36] Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., [60] Dinh, T.A., Boef, J.D., Cornelisse, J. and Groth, P., “E2EG: End-
Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E. and to-End Node Classification Using Graph Topology and Text-based
Krusche, S., “ChatGPT for good? On opportunities and challenges Node Attributes,” in arXiv preprint arXiv:2208.04609., 2022.
of large language models for education,” in Learning and individual [61] Tan, Y., Zhou, Z., Lv, H., Liu, W. and Yang, C., “Walklm: A
differences, 103., 2023. uniform language model fine-tuning framework for attributed graph
[37] Lester, B., Al-Rfou, R. and Constant, N., “The power of scale for embedding,” in NeurIPs., 2023.
parameter-efficient prompt tuning,” in EMNLP, 2021. [62] Zhao, J., Qu, M., Li, C., Yan, H., Liu, Q., Li, R., Xie, X. and Tang,
[38] Li, X.L. and Liang, P., “Prefix-tuning: Optimizing continuous J., “Learning on large-scale text-attributed graphs via variational
prompts for generation,” in ACL, 2021. inference,” in ICLR., 2023.
[39] Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Larous- [63] Wen, Z. and Fang, Y., “Augmenting Low-Resource Text Classifica-
silhe, Q., Gesmundo, A., Attariyan, M. and Gelly, S., “Parameter- tion with Graph-Grounded Pre-training and Prompting,” in SIGIR.,
efficient transfer learning for NLP,” in ICML, 2019. 2023.
[40] Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, [64] Chen, Z., Mao, H., Wen, H., Han, H., Jin, W., Zhang, H., Liu, H.
L. and Chen, W., “Lora: Low-rank adaptation of large language and Tang, J., “Label-free Node Classification on Graphs with Large
models,” in ICLR, 2022. Language Models (LLMS),” in arXiv preprint arXiv:2310.04668., 2023.
[65] Zhao, J., Zhuo, L., Shen, Y., Qu, M., Liu, K., Bronstein, M., Zhu, Z.
[41] Tian, Y., Song, H., Wang, Z., Wang, H., Hu, Z., Wang, F., Chawla,
and Tang, J., “Graphtext: Graph reasoning in text space,” in arXiv
N.V. and Xu, P., “Graph Neural Prompting with Large Language
preprint arXiv:2310.01089., 2023.
Models,” in arXiv preprint arXiv:2309.15427., 2023.
[66] Meng, Y., Zong, S., Li, X., Sun, X., Zhang, T., Wu, F. and Li, J.,
[42] Chai, Z., Zhang, T., Wu, L., Han, K., Hu, X., Huang, X. and Yang, Y.,
“Gnn-lm: Language modeling based on global contexts via gnn,” in
“GraphLLM: Boosting Graph Reasoning Ability of Large Language
ICLR., 2022.
Model,” in arXiv preprint arXiv:2310.05845., 2023.
[67] Zhang, X., Bosselut, A., Yasunaga, M., Ren, H., Liang, P., Manning,
[43] Wei, J., Bosma, M., Zhao, V.Y., Guu, K., Yu, A.W., Lester, B., Du, N.,
C.D. and Leskovec, J., “Greaselm: Graph reasoning enhanced
Dai, A.M. and Le, Q.V., “Finetuned language models are zero-shot
language models for question answering,” in ICLR., 2022.
learners,” in ICLR., 2022.
[68] Ioannidis, V.N., Song, X., Zheng, D., Zhang, H., Ma, J., Xu, Y., Zeng,
[44] Sanh, V., Webson, A., Raffel, C., Bach, S.H., Sutawika, L., Alyafeai, B., Chilimbi, T. and Karypis, G., “Efficient and effective training of
Z., Chaffin, A., Stiegler, A., Scao, T.L., Raja, A. and Dey, M., language and graph neural network models,” in AAAI, 2023.
“Multitask prompted training enables zero-shot task generalization,”
[69] Mavromatis, C., Ioannidis, V.N., Wang, S., Zheng, D., Adeshina, S.,
in ICLR., 2022.
Ma, J., Zhao, H., Faloutsos, C. and Karypis, G., “Train Your Own
[45] Tang, J., Yang, Y., Wei, W., Shi, L., Su, L., Cheng, S., Yin, D. GNN Teacher: Graph-Aware Distillation on Textual Graphs,” in
and Huang, C., “GraphGPT: Graph Instruction Tuning for Large PKDD, 2023.
Language Models,” in arXiv preprint arXiv:2310.13023., 2023. [70] He, X., Bresson, X., Laurent, T. and Hooi, B., “Explanations as
[46] Ye, R., Zhang, C., Wang, R., Xu, S. and Zhang, Y., “Natural language Features: LLM-Based Features for Text-Attributed Graphs,” in arXiv
is all a graph needs,” in arXiv preprint arXiv:2308.07134., 2023. preprint arXiv:2305.19523., 2023.
[47] Zhao, H., Liu, S., Ma, C., Xu, H., Fu, J., Deng, Z.H., Kong, L. and Liu, [71] Yu, J., Ren, Y., Gong, C., Tan, J., Li, X. and Zhang, X., “Empower Text-
Q., “GIMLET: A Unified Graph-Text Model for Instruction-Based Attributed Graphs Learning with Large Language Models (LLMs),”
Molecule Zero-Shot Learning,” in bioRxiv, pp.2023-05., 2023. in arXiv preprint arXiv:2310.09872., 2023.
[48] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, [72] Yang, J., Liu, Z., Xiao, S., Li, C., Lian, D., Agrawal, S., Singh, A.,
Q.V. and Zhou, D., “Chain-of-thought prompting elicits reasoning Sun, G. and Xie, X., “GraphFormers: GNN-nested transformers for
in large language models,” in NeurIPs., 2022. representation learning on textual graph,” in NeurIPs., 2021.
[49] Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T.L., Cao, Y. and [73] Jin, B., Zhang, Y., Zhu, Q. and Han, J., “Heterformer: Transformer-
Narasimhan, K., “Tree of thoughts: Deliberate problem solving with based deep node representation learning on heterogeneous text-rich
large language models,” in arXiv preprint arXiv:2305.10601., 2023. networks,” in KDD., 2023.
[50] Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Gianinazzi, L., [74] Jin, B., Zhang, Y., Meng, Y. and Han, J., “Edgeformers: Graph-
Gajda, J., Lehmann, T., Podstawski, M., Niewiadomski, H., Nyczyk, Empowered Transformers for Representation Learning on Textual-
P. and Hoefler, T., “Graph of thoughts: Solving elaborate problems Edge Networks,” in ICLR., 2023.
with large language models,” in arXiv preprint arXiv:2308.09687., [75] Jin, B., Zhang, W., Zhang, Y., Meng, Y., Zhao, H. and Han, J.,
2023. “Learning Multiplex Embeddings on Text-rich Networks with One
[51] Cohan, A., Feldman, S., Beltagy, I., Downey, D. and Weld, D.S., Text Encoder,” in arXiv preprint arXiv:2310.06684., 2023.
“Specter: Document-level representation learning using citation- [76] Qin, Y., Wang, X., Zhang, Z. and Zhu, W., “Disentangled Represen-
informed transformers,” in ACL., 2020. tation Learning with Large Language Models for Text-Attributed
[52] Ostendorff, M., Rethmeier, N., Augenstein, I., Gipp, B. and Rehm, Graphs,” in arXiv preprint arXiv:2310.18152., 2023.
G., “Neighborhood contrastive learning for scientific document [77] Zhu, J., Cui, Y., Liu, Y., Sun, H., Li, X., Pelger, M., Yang, T., Zhang,
representations with citation embeddings,” in EMNLP., 2022. L., Zhang, R. and Zhao, H., “Textgnn: Improving text encoder via
[53] Brannon, W., Fulay, S., Jiang, H., Kang, W., Roy, B., Kabbara, J. and graph neural network in sponsored search,” in WWW., 2021.
Roy, D., “ConGraT: Self-Supervised Contrastive Pretraining for Joint [78] Li, C., Pang, B., Liu, Y., Sun, H., Liu, Z., Xie, X., Yang, T., Cui,
Graph and Text Embeddings,” in arXiv preprint arXiv:2305.14321., Y., Zhang, L. and Zhang, Q., “Adsgnn: Behavior-graph augmented
2023. relevance modeling in sponsored search,” in SIGIR., 2021.
[54] Zhu, J., Song, X., Ioannidis, V.N., Koutra, D. and Faloutsos, C., [79] Zhang, J., Chang, W.C., Yu, H.F. and Dhillon, I., “Fast multi-
“TouchUp-G: Improving Feature Representation through Graph- resolution transformer fine-tuning for extreme multi-label text
Centric Finetuning,” in arXiv preprint arXiv:2309.13885., 2023. classification,” in NeurIPs., 2021.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 18

[80] Xie, H., Zheng, D., Ma, J., Zhang, H., Ioannidis, V.N., Song, X., Ping, [109] Chen, L., Xie, Y., Zheng, Z., Zheng, H. and Xie, J., “Friend recom-
Q., Wang, S., Yang, C., Xu, Y. and Zeng, B., “Graph-Aware Language mendation based on multi-social graph convolutional network,” in
Model Pre-Training on a Large Graph Corpus Can Help Multiple IEEE Access, 8, pp.43618-43629, 2020.
Graph Applications,” in KDD., 2023. [110] Wang, G., Zhang, X., Tang, S., Zheng, H. and Zhao, B.Y., “Unsu-
[81] Yasunaga, M., Bosselut, A., Ren, H., Zhang, X., Manning, pervised clickstream clustering for user behavior analysis,” in CHI,
C.D., Liang, P.S. and Leskovec, J., “Deep bidirectional language- 2016.
knowledge graph pretraining,” in NeurIPs., 2022. [111] Shchur, O. and Günnemann, S., “Overlapping community detec-
[82] Huang, J., Zhang, X., Mei, Q. and Ma, J., “CAN LLMS EF- tion with graph neural networks,” in arXiv:1909.12201., 2019.
FECTIVELY LEVERAGE GRAPH STRUCTURAL INFORMATION: [112] Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S.,
WHEN AND WHY,” in arXiv preprint arXiv:2309.16595.., 2023. Yogatama, D., Bosma, M., Zhou, D., Metzler, D. and Chi, E.H., 2022.
[83] Jin, X., Vinzamuri, B., Venkatapathy, S., Ji, H. and Natarajan, P., ”Emergent Abilities of Large Language Models” in Transactions on
“Adversarial Robustness for Large Language NER models using Machine Learning Research, 2022.
Disentanglement and Word Attributions,” in EMNLP., 2023. [113] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y. and Iwasawa, Y., 2022.
[84] Kipf, T.N. and Welling, M., “Semi-supervised classification with ”Large language models are zero-shot reasoners” in NeurIPS.
graph convolutional networks,” in ICLR., 2017. [114] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E.,
[85] Hamilton, W., Ying, Z. and Leskovec, J., “Inductive representation Le, Q.V. and Zhou, D., 2022. ”Chain-of-thought prompting elicits
learning on large graphs,” in NeurIPs., 2017. reasoning in large language models” in NeurIPS.
[86] Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P. and [115] Radford, A., 2019. ”Language Models are Unsupervised Multitask
Bengio, Y., “Graph attention networks,” in ICLR., 2018. Learners” in OpenAI Blog, 2019.
[87] Zhang, S., Liu, Y., Sun, Y. and Shah, N., “Graph-less Neural [116] Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P. and Soricut,
Networks: Teaching Old MLPs New Tricks Via Distillation,” in R., 2019, September. ”ALBERT: A Lite BERT for Self-supervised
ICLR., 2022. Learning of Language Representations” in ICLR.
[88] Liu, M., Gao, H. and Ji, S., “Towards deeper graph neural networks,” [117] Clark, K., Luong, M.T., Le, Q.V. and Manning, C.D., 2019, Septem-
in KDD., 2020. ber. ”ELECTRA: Pre-training Text Encoders as Discriminators Rather
[89] Meng, Y., Huang, J., Zhang, Y. and Han, J., “Generating training Than Generators” in ICLR.
data with language models: Towards zero-shot language under- [118] Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz,
standing,” in NeurIPS., 2022. E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S. and Nori, H.,
[90] Sun, Y., Han, J., Yan, X., Yu, P.S. and Wu, T., “Pathsim: Meta 2023. ”Sparks of artificial general intelligence: Early experiments
path-based top-k similarity search in heterogeneous information with gpt-4” in arXiv preprint arXiv:2303.12712.
networks,” in VLDB., 2011.
[119] Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei,
[91] Liu, H., Li, C., Wu, Q. and Lee, Y.J., “Visual instruction tuning,” in Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S. and Bikel, D.,
NeurIPs., 2023. 2023. ”Llama 2: Open foundation and fine-tuned chat models” in
[92] Park, C., Kim, D., Han, J. and Yu, H., “Unsupervised attributed arXiv preprint arXiv:2307.09288.
multiplex network embedding,” in AAAI., 2020.
[120] Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Chap-
[93] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
lot, D.S., Casas, D.D.L., Bressand, F., Lengyel, G., Lample, G.,
A.N., Kaiser, Ł. and Polosukhin, I., “Attention is all you need,” in
Saulnier, L. and Lavaud, L.R., 2023. ”Mistral 7B” in arXiv preprint
NeurIPs., 2017.
arXiv:2310.06825.
[94] Haveliwala, T.H., “Topic-sensitive pagerank,” in WWW., 2002.
[121] Alayrac, J.B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson,
[95] Oord, A.V.D., Li, Y. and Vinyals, O., “Representation learning with
Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M. and Ring, R.,
contrastive predictive coding,” in arXiv:1807.03748., 2018.
2022. ”Flamingo: a visual language model for few-shot learning” in
[96] Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agar- NeurIPS.
wal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J. and Krueger,
[122] Edwards, C., Zhai, C. and Ji, H., 2021. ”Text2mol: Cross-modal
G., “Learning transferable visual models from natural language
molecule retrieval with natural language queries” in EMNLP.
supervision,” in ICML., 2021.
[97] Sun, C., Li, J., Fung, Y.R., Chan, H.P., Abdelzaher, T., Zhai, C. and Ji, [123] Edwards, C., Lai, T., Ros, K., Honke, G., Cho, K. and Ji, H., 2022,
H., “Decoding the silent majority: Inducing belief augmented social December. ”Translation between Molecules and Natural Language”
graph with large language model for response forecasting,” in arXiv in EMNLP.
preprint arXiv:2310.13297., 2023. [124] Wang, H., Feng, S., He, T., Tan, Z., Han, X. and Tsvetkov, Y., ”Can
[98] Sun, C., Li, J., Chan, H.P., Zhai, C. and Ji, H., “Measuring the Effect Language Models Solve Graph Problems in Natural Language?” in
of Influential Messages on Varying Personas,” in ACL., 2023. arXiv preprint arXiv:2305.10037., 2023.
[99] Whalen, R., “Legal networks: The promises and challenges of legal [125] Liu, C. and Wu, B., 2023. ”Evaluating large language models on
network analysis,” in Mich. St. L. Rev.., 2016. graphs: Performance insights and comparative analysis” in arXiv
[100] Friedrich, A. and Palmer, A. and Pinkal, M., “Situation entity preprint arXiv:2308.11224, 2023.
types: automatic classification of clause-level aspect,” in ACL., 2016. [126] Guo, J., Du, L. and Liu, H.. ”GPT4Graph: Can Large Language
[101] Guha, N., Nyarko, J., Ho, D.E., Ré, C., Chilton, A., Narayana, Models Understand Graph Structured Data? An Empirical Evalua-
A., Chohlas-Wood, A., Peters, A., Waldon, B., Rockmore, D.N. and tion and Benchmarking” in arXiv preprint arXiv:2305.15066, 2023.
Zambrano, D., “Legalbench: A collaboratively built benchmark for [127] Zhang, J., 2023. ”Graph-ToolFormer: To Empower LLMs with
measuring legal reasoning in large language models,” in arXiv Graph Reasoning Ability via Prompt Augmented by ChatGPT” in
preprint arXiv:2308.11462., 2023. arXiv preprint arXiv:2304.11116, 2023.
[102] Lin, Y., Wang, H., Chen, J., Wang, T., Liu, Y., Ji, H., Liu, Y. [128] Zhang, Z., Wang, X., Zhang, Z., Li, H., Qin, Y., Wu, S. and Zhu,
and Natarajan, P., “Personalized entity resolution with dynamic W.. ”LLM4DyG: Can Large Language Models Solve Problems on
heterogeneous knowledge graph representations,” in arXiv preprint Dynamic Graphs?” in arXiv preprint arXiv:2310.17110, 2023.
arXiv:2104.02667, 2021. [129] Luo, L., Li, Y.F., Haffari, G. and Pan, S., 2023. ”Reasoning on
[103] Bai, X., Wang, M., Lee, I., Yang, Z., Kong, X. and Xia, F., “Scientific graphs: Faithful and interpretable large language model reasoning”
paper recommendation: A survey,” in Ieee Access, 2019. in arXiv preprint arXiv:2310.01061, 2023.
[104] Chowdhury, S. and Schoen, M.P., “Research paper classification [130] Jiang, J., Zhou, K., Dong, Z., Ye, K., Zhao, W.X. and Wen, J.R..
using supervised machine learning techniques,” in Intermountain ”Structgpt: A general framework for large language model to reason
Engineering, Technology and Computing, 2020. over structured data” in arXiv preprint arXiv:2305.09645, 2023.
[105] Madigan, D., Genkin, A., Lewis, D.D., Argamon, S., Fradkin, D. [131] Fatemi, B., Halcrow, J. and Perozzi, B.. ”Talk like a graph: Encoding
and Ye, L., “Author identification on the large scale,” in CSNA, 2005. graphs for large language models” in arXiv:2310.04560, 2023.
[106] He, X., Deng, K., Wang, X., Li, Y., Zhang, Y. and Wang, M., [132] Sun, J., Xu, C., Tang, L., Wang, S., Lin, C., Gong, Y., Shum, H.Y.
“Lightgcn: Simplifying and powering graph convolution network and Guo, J.. ”Think-on-graph: Deep and responsible reasoning of
for recommendation,” in SIGIR, 2020. large language model with knowledge graph” in arXiv preprint
[107] Chang, J., Gao, C., He, X., Jin, D. and Li, Y., “Bundle recommenda- arXiv:2307.07697, 2023.
tion with graph convolutional networks,” in SIGIR, 2020. [133] Danny Z. Chen.. ”Developing algorithms and software for geo-
[108] Xu, H., Liu, B., Shu, L. and Yu, P., “Open-world learning and metric path planning problems” in ACM Comput. Surv. 28, 4es (Dec.
application to product classification,” in WWW, 2019. 1996), 18–es. https://doi.org/10.1145/242224.242246, 1996.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 19

[134] Iqbal A., Hossain Md., Ebna A., ”Airline Scheduling with Max [159] Ock J, Guntuboina C, Farimani AB. Catalyst Property Prediction
Flow algorithm” in IJCA, 2018. with CatBERTa: Unveiling Feature Exploration Strategies through
[135] Li Jiang, Xiaoning Zang, Ibrahim I.Y. Alghoul, Xiang Fang, Junfeng Large Language Models. arXiv preprint arXiv:2309.00563, 2023.
Dong, Changyong Liang. ”Scheduling the covering delivery problem [160] Fang Y, Liang X, Zhang N, Liu K, Huang R, Chen Z, Fan X, Chen
in last mile delivery” in Expert Systems with Applications, 2022. H., Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset
[136] Nakagawa, H., Iwasawa, Y. and Matsuo, Y., ”Graph-based knowl- for Large Language Models. arXiv preprint arXiv:2306.08018, 2023.
edge tracing: modeling student proficiency using graph neural [161] Abdine H, Chatzianastasis M, Bouyioukos C, Vazirgiannis M.,
network” in WI, 2019. Prot2Text: Multimodal Protein’s Function Generation with GNNs
[137] Li, H., Wei, H., Wang, Y., Song, Y. and Qu, H.. ”Peer-inspired and Transformers, arXiv preprint arXiv:2307.14367, 2023.
student performance prediction in interactive online question pools [162] Luo Y, Yang K, Hong M, Liu X, Nie Z., MolFM: A Multimodal
with graph neural network” in CIKM, 2020. Molecular Foundation Model, arXiv preprint arXiv:2307.09484, 2023.
[138] Zhang, X., Wang, L., Helwig, J., Luo, Y., Fu, C., Xie, Y., ... & Ji, S. [163] Qian, C., Tang, H., Yang, Z., Liang, H., & Liu, Y., Can large
(2023). Artificial intelligence for science in quantum, atomistic, and language models empower molecular property prediction? arXiv
continuum systems. arXiv preprint arXiv:2307.08423. preprint arXiv:2307.07443, 2023
[139] Rusch, T. K., Bronstein, M. M., & Mishra, S. (2023). A sur- [164] Born, J., & Manica, M., Regression Transformer enables concur-
vey on oversmoothing in graph neural networks. arXiv preprint rent sequence regression and generation for molecular language
arXiv:2303.10993. modelling. Nature Machine Intelligence, 5(4), 432-444, 2023.
[140] Topping, J., Di Giovanni, F., Chamberlain, B. P., Dong, X., & Bron- [165] Li J, Liu Y, Fan W, Wei XY, Liu H, Tang J, Li Q., Empowering
stein, M. M. (2021). Understanding over-squashing and bottlenecks Molecule Discovery for Molecule-Caption Translation with Large
on graphs via curvature. arXiv preprint arXiv:2111.14522. Language Models: A ChatGPT Perspective. arXiv, 2023.
[141] Ying, C., Cai, T., Luo, S., Zheng, S., Ke, G., He, D., ... & Liu, [166] Zeng, Z., Yin, B., Wang, S., Liu, J., Yang, C., Yao, H., ... & Liu,
T. Y. (2021). Do transformers really perform badly for graph Z., Interactive Molecular Discovery with Natural Language. arXiv,
representation?. NeurIPS, 34, 28877-28888. 2023.
[142] Rampášek, L., Galkin, M., Dwivedi, V. P., Luu, A. T., Wolf, G., [167] Liu Z, Li S, Luo Y, Fei H, Cao Y, Kawaguchi K, Wang X, Chua TS.,
& Beaini, D. (2022). Recipe for a general, powerful, scalable graph MolCA: Molecular Graph-Language Modeling with Cross-Modal
transformer. NeurIPS, 35, 14501-14515. Projector and Uni-Modal Adapter, in EMNLP, 2023.
[143] Liu, G., Zhao, T., Inae, E., Luo, T., & Jiang, M. (2023). [168] Guo T, Guo K, Liang Z, Guo Z, Chawla NV, Wiest O, Zhang X.
Semi-Supervised Graph Imbalanced Regression. arXiv preprint What indeed can GPT models do in chemistry? A comprehensive
arXiv:2305.12087. benchmark on eight tasks. in NeurIPS, 2023.
[169] Liu Z, Zhang W, Xia Y, Wu L, Xie S, Qin T, Zhang M, Liu TY.,
[144] Wu Q, Zhao W, Li Z, Wipf DP, Yan J. Nodeformer: A scalable
MolXPT: Wrapping Molecules with Text for Generative Pre-training,
graph structure learning transformer for node classification. NeurIPS.
in ACL, 2023.
2022 Dec 6;35:27387-401.
[170] Seidl, P., Vall, A., Hochreiter, S., & Klambauer, G., Enhancing
[145] Liu, G., Zhao, T., Xu, J., Luo, T., & Jiang, M., Graph rationalization
activity prediction models in drug discovery with the ability to
with environment-based augmentations, In ACM SIGKDD, 2022.
understand human language, in ICML, 2023.
[146] Bran, A. M., Cox, S., White, A. D., & Schwaller, P., ChemCrow:
[171] Christofidellis, D., Giannone, G., Born, J., Winther, O., Laino, T.,
Augmenting large-language models with chemistry tools, arXiv
& Manica, M., Unifying molecular and textual representations via
preprint arXiv:2304.05376, 2023.
multi-task language modelling, in ICML, 2023.
[147] Riesen, K., & Bunke, H., IAM graph database repository for [172] Liu, S., Nie, W., Wang, C., Lu, J., Qiao, Z., Liu, L., ... & Anandku-
graph based pattern recognition and machine learning. In Structural, mar, A. Multi-modal molecule structure-text model for text-based
Syntactic, and Statistical Pattern Recognition: Joint IAPR International retrieval and editing, Nature Machine Intelligence, 2023.
Workshop.
[173] Lacombe, R., Gaut, A., He, J., Lüdeke, D., & Pistunova, K., Extract-
[148] Weininger, D., SMILES, a chemical language and information ing Molecular Properties from Natural Language with Multimodal
system. 1. Introduction to methodology and encoding rules. Journal Contrastive Learning, ICML Workshop on Computational Biology, 2023.
of chemical information and computer sciences, 28(1), 31-36, 1988
[174] Su, B., Du, D., Yang, Z., Zhou, Y., Li, J., Rao, A., ... & Wen, J. R.,
[149] Heller S, McNaught A, Stein S, Tchekhovskoi D, Pletnev I. InChI- A molecular multimodal foundation model associating molecule
the worldwide chemical structure identifier standard. Journal of graphs with natural language, arXiv preprint arXiv:2209.05481. 2022.
cheminformatics. 2013 Dec;5(1):1-9. [175] Zeng, Z., Yao, Y., Liu, Z., & Sun, M., A deep-learning system
[150] O’Boyle, N., & Dalke, A., DeepSMILES: an adaptation of SMILES bridging molecule structure and biomedical text with comprehen-
for use in machine-learning of chemical structures, 2018. sion comparable to human professionals, Nature communications.
[151] Krenn, M., Häse, F., Nigam, A., Friederich, P., & Aspuru-Guzik, A., [176] Iwayama, M., Wu, S., Liu, C., & Yoshida, R., Functional Output
Self-referencing embedded strings (SELFIES): A 100% robust molec- Regression for Machine Learning in Materials Science. Journal of
ular string representation. Machine Learning: Science and Technology. Chemical Information and Modeling, 62(20), 4837-4851, 2022.
[152] Bjerrum, E. J. (2017). SMILES enumeration as data augmenta- [177] Bagal V, Aggarwal R, Vinod PK, Priyakumar UD. MolGPT:
tion for neural network modeling of molecules. arXiv preprint molecular generation using a transformer-decoder model. Journal of
arXiv:1703.07076. Chemical Information and Modeling. 2021 Oct 25;62(9):2064-76.
[153] Arús-Pous, J., Johansson, S. V., Prykhodko, O., Bjerrum, E. J., [178] Taylor, R., Kardas, M., Cucurull, G., Scialom, T., Hartshorn, A.,
Tyrchan, C., Reymond, J. L., ... & Engkvist, O. (2019). Randomized Saravia, E., ... & Stojnic, R., Galactica: A large language model for
SMILES strings improve the quality of molecular generative models. science. arXiv, 2022.
Journal of cheminformatics, 11(1), 1-13. [179] Wang, S., Guo, Y., Wang, Y., Sun, H., & Huang, J., Smiles-bert: large
[154] Tetko IV, Karpov P, Bruno E, Kimber TB, Godin G. Augmentation scale unsupervised pre-training for molecular property prediction.
is what you need!. InInternational Conference on Artificial Neural In BCB, 2019.
Networks 2019 Sep 9 (pp. 831-835). Cham: Springer International [180] Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J.,
Publishing. BioBERT: a pre-trained biomedical language representation model
[155] Kudo, T., & Richardson, J., Sentencepiece: A simple and language for biomedical text mining. Bioinformatics, 36(4), 1234-1240, 2020.
independent subword tokenizer and detokenizer for neural text [181] Ma, R., & Luo, T. (2020). PI1M: a benchmark database for polymer
processing, in EMNLP, 2018. informatics. Journal of Chemical Information and Modeling.
[156] Irwin, R., Dimitriadis, S., He, J., & Bjerrum, E. J. (2022). Chem- [182] Hastings, J., Owen, G., Dekker, A., Ennis, M., Kale, N., Muthukr-
former: a pre-trained transformer for computational chemistry. ishnan, V., ... & Steinbeck, C., ChEBI in 2016: Improved services and
Machine Learning: Science and Technology, 3(1), 015022. an expanding collection of metabolites. Nucleic acids research.
[157] Shi, Y., Zhang, A., Zhang, E., Liu, Z., & Wang, X., ReLM: [183] Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., ... &
Leveraging Language Models for Enhanced Chemical Reaction Bolton, E. E., PubChem 2019 update: improved access to chemical
Prediction, in EMNLP, 2023. data, Nucleic acids research, 47(D1), D1102-D1109, 2019.
[158] Liu P, Ren Y, Ren Z., Git-mol: A multi-modal large language model [184] Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M.,
for molecular science with graph, image, and text, arXiv preprint Hersey, A., ... & Overington, J. P., ChEMBL: a large-scale bioactivity
arXiv:2308.06911, 2023 database for drug discovery. Nucleic acids research.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 20

[185] Zdrazil B, Felix E, Hunter F, Manners EJ, Blackshaw J, Corbett A PPENDIX

S, de Veij M, Ioannidis H, Lopez DM, Mosquera JF, Magarinos MP.
The ChEMBL Database in 2023: a drug discovery platform spanning .1 Training & Inference Framework with LLMs
multiple bioactivity data types and time periods. Nucleic Acids There are two typical training and inference paradigms
Research. 2023 Nov 2:gkad1004.
[186] Mellor, C. L., Robinson, R. M., Benigni, R., Ebbrell, D., Enoch, S.
to apply language models on graphs: 1) Pretraining-then-
J., Firman, J. W., ... & Cronin, M. T. D. (2019). Molecular fingerprint- finetuning: typically adopted for medium-scale large lan-
derived similarity measures for toxicological read-across: Recom- guage models; and 2) Pretraining-then-prompting: typically
mendations for optimal use. Regulatory Toxicology and Pharmacology. adopted for large-scale large language models.
[187] Krenn, M., Ai, Q., Barthel, S., Carson, N., Frei, A., Frey, N. C., ...
& Aspuru-Guzik, A. (2022). SELFIES and the future of molecular Pretraining denotes training the language model with unsu-
string representations. Patterns, 3(10). pervised objectives to initialize them with language under-
[188] Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., ... & standing and inference ability for downstream tasks. Typical
Leskovec, J., Open graph benchmark: Datasets for machine learning
on graphs. In NeurIPS, 2020.
pretraining objectives for pure text include masked language
[189] Xu, K., Hu, W., Leskovec, J., & Jegelka, S. How powerful are graph modeling [23], auto-regressive causal language modeling
neural networks? In ICLR, 2019. [26], corruption-reconstruction language modeling [28] and
[190] Li, J., Li, D., Savarese, S., & Hoi, S., Blip-2: Bootstrapping language- text-to-text transfer modeling [29]. When extended in the
image pre-training with frozen image encoders and large language
models. arXiv preprint arXiv:2301.12597. graph domain, language model pretraining strategies include
[191] Zang, C., & Wang, F. Moflow: an invertible flow model for document relation prediction [30], network-contextualized
generating molecular graphs. In ACM SIGKDD, 2020. masked language modeling [31], contrastive social prediction
[192] Liu, G., Inae, E., Zhao, T., Xu, J., Luo, T., & Jiang, M. (2023). Data-
Centric Learning from Unlabeled Graphs with Diffusion Model.
[32] and context graph prediction [33].
arXiv preprint arXiv:2303.10108. Finetuning refers to the process of training the language
[193] Wang, Y., Lipka, N., Rossi, R. A., Siu, A., Zhang, R., & Derr, T. model with labeled data for the downstream tasks. Language
Knowledge graph prompting for multi-document question answer- model fine-tuning methodology can be further categorized
ing. AAAI, 2024.
[194] Guo, Z., Yu, W., Zhang, C., Jiang, M. and Chawla, N.V., GraSeq: into fully fine-tuning, efficient fine-tuning, and instruction
graph and sequence fusion learning for molecular property predic- tuning.
tion. CIKM, 2020.
[195] Yu, W., Zhu, C., Qin, L., Zhang, Z., Zhao, T., & Jiang, M. • Full Finetuning means updating all the parameters
Diversifying content generation for commonsense reasoning with inside the language model. It is the most commonly
mixture of knowledge graph experts. ACL findings, 2022. used fine-tuning method that fully stimulates the
[196] Deng, J., Yang, Z., Wang, H., Ojima, I., Samaras, D., & Wang, F.
(2023). A systematic study of key elements underlying molecular language model’s potential for downstream tasks, but
property prediction. Nature Communications, 14(1), 6395. can suffer from heavy computational overload [36]
[197] Renz, P., Van Rompaey, D., Wegner, J. K., Hochreiter, S., & and result in overfitting issues [35].
Klambauer, G. (2019). On failure modes in molecule generation
and optimization. Drug Discovery Today: Technologies, 32, 55-63.
• Efficient Finetuning refers to only fine-tuning a
[198] Reymond, J. L. (2015). The chemical space project. Accounts of subset of parameters inside the language model.
Chemical Research, 48(3), 722-730. Efficient tuning methods for pure text include prompt
[199] Wang, H., Fu, T., Du, Y., Gao, W., Huang, K., Liu, Z., ... & Zitnik, tuning [37], prefix tuning [38], adapter [39] and LoRA
M. (2023). Scientific discovery in the age of artificial intelligence.
Nature, 620(7972), 47-60. [40]. Efficient language model fine-tuning methods
[200] Wang, H., Li, W., Jin, X., Cho, K., Ji, H., Han, J., & Burke, M. D. particularly designed for graph data include graph
Chemical-reaction-aware molecule representation learning. arXiv, neural prompt [41] and graph-enhanced prefix [42].
2021.
[201] Lai, T. M., Zhai, C., & Ji, H. (2023). KEBLM: Knowledge-Enhanced
• Instruction Tuning denotes fine-tuning language
Biomedical Language Models. Journal of Biomedical Informatics. model with downstream task instructions [43] [44] to
[202] Liu G, Xu J, Luo T, Jiang M. Inverse Molecular Design with Multi- encourage model generalization to unseen tasks in
Conditional Diffusion Guidance. arXiv, 2024. inference. It is an orthogonal concept with full fine-
[203] Li, M., Li, S., Wang, Z., Huang, L., Cho, K., Ji, H., Han, J. and
Voss, C. The future is not one-dimensional: Complex event schema tuning and efficient fine-tuning, in other words, one
induction by graph modeling for event prediction. arXiv preprint can choose both full fine-tuning and efficient fine-
arXiv:2104.06344. tuning for instruction tuning. Instruction tuning is
adopted in the graph domain for node classification
[45], link prediction [46], and graph-level tasks [47].
Prompting is a technique to apply language model for
downstream task solving without updating the model param-
eters. One needs to formulate the test samples into natural
language sequences and ask the language model to directly
conduct inference based on the in-context demonstrations.
This is a technique particularly popular for large-scale au-
toregressive language models. Apart from direct prompting,
following-up works propose chain-of-thought prompting
[48], tree-of-thought prompting [49], and graph-of-thought
prompting [50].
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 21

TABLE 4
A collection of LLM reasoning methods on pure graph discussed in Section 4. We do not include the backbone models used in these methods
studied in the original papers, as these methods generally apply to any LLMs. The “Papers” column lists the papers that study the specific methods.

Method Graph Format or Encoding Reasoning Process Reasoning Category Papers

Zero-Shot Verbalized edge or adjacency Directly answering. Direct Answering [124]–[126], [128],
list. [131]
Role Prompting Verbalized edge or adjacency Directly answering by designating a specific role to the Direct Answering [126]
list. LLM.
Format Explanation Verbalized edge or adjacency Encouraging the LLM to explain the input graph format Direct Answering [126]
list. first.
GraphLLM Prefix tokens encoded by a Directly answering. Direct Answering [42]
graph encoder.
Few-Shot (In-Context Verbalized edge or adjacency Directly answering by following the examples. Direct Answering [124], [125], [128],
Learning) lists preceded with a few [131]
demonstrative examples.
Chain-of-Thought Verbalized edge or adjacency Reasoning through a series of intermediate reasoning steps Heuristic Reasoning [124]–[126], [128],
lists preceded with a few in the generation following the examples. [131], [132]
demonstrative examples.
Self-Consistency Verbalized edge or adjacency Reasoning through a series of intermediate reasoning Heuristic Reasoning [124]
lists preceded with a few steps in generation, and then selecting the most consistent
demonstrative examples. answer.
Build-a-Graph Verbalized edge or adjacency Reconstructing the graph in output, and then reasoning on Heuristic Reasoning [124], [131]
list. the graph.
Context-Summarization Verbalized edge or adjacency Directly answering by first summarizing the key elements Heuristic Reasoning [126]
list. in the graph.
Reasoning-on-Graph Retrieved paths from external First, plan the reasoning process in the form of paths to be Heuristic Reasoning [129]
graphs. retrieved and then infer on the retrieved paths.
Iterative Reading-then- Retrived neighboring edges Iteratively retrieving neighboring edges or nodes from ex- Heuristic Reasoning [130], [132]
Reasoning or nodes from external ternal graphs and inferring from the retrieved information.
graphs.
Algorithmic Reasoning Verbalized edge or adjacency Simulating the reasoning process of a relevant algorithm in Algorithmic Reasoning [124]
list. a generation.
Calling APIs External Knowledge Base. Generate the reasoning process as (probably nested) API Algorithmic Reasoning [127], [132]
calls to be executed externally on the knowledge base.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 22

TABLE 5
A collection of pure graph reasoning problems studied in Section 4. G = (V, E) denotes a graph with vertices V and edges E . v and e denote
individual vertices and edges, respectively. The “Papers” column lists the papers that study the problem using LLMs. The “Complexity” column lists
the time complexity of standard algorithms for the problem, ignoring more advanced but complex algorithms that are not comparable to LLMs’
reasoning processes.

Problem Definition Applications Typical Complexity Papers

Connectivity Given a graph G and two nodes u and v , tell Relationship Detection, Link O(|E|) or O(V 2 ) [124], [125]
if they are connected by a path. Prediction
Neighbor Detection Given a graph G and a node v , find the nodes Recommendation, O(min(|E|, |V |)) [126]
connected to v . Knowledge QA
Node Degree Given a graph G and a node v , find the Entity Popularity, Importance O(min(|E|, |V |)) [125], [126]
number of edges connected to v . Ranking
Attribute Retrieval Given a graph G with node-level informa- Recommendation, Node Clas- O(1) [126]
tion and a node v , return the attribute of v . sification, Node QA
Graph Size Given a graph G , find the number of nodes Graph-level Classification O(|V | + |E|) [126]
and edges.
Cycle Detection Given a graph G , tell if it contains a cycle. Loop Elimination, Program O(|V |) [124]
Loop Detection
Diameter Given a graph G , find the diameter of G . Graph-level Classification, O(|V |3 ) or O(|V |2 log |V | + [126]
Clustering |V ||E|)
Topological Sort Given a directed acyclic graph G , find a Timeline Generation, Depen- O(|V | + |E|) [124]
topological ordering of its vertices so that dency Parsing, Scheduling
for every edge (u, v), u comes before v in
the ordering.
Wedge or Triangle Detection Given a graph G and a vertex v , identify if Relationship Detection, Link O(|V | + |E|) [125]
there is a wedge or triangle centered at v . Prediction
Maximum Triplet Sum Given a graph G , find the maximum sum Community Detection O(|V |3 ) [42]
of the weights of three vertices that are
connected.
Shortest Path Given a graph G and two nodes u and v , Navigation, Planning O(|E|) or O(V 2 ) [42], [124], [125]
find the shortest path between u and v .
Maximum Flow Given a directed graph G with a source node Transportation Planning, Net- O(|V ||E|2 ), [124]
s and a sink node t, find the maximum flow work Design O(|E||V | log |V |) or O(|V |3 )
from s to t.
p
Bipartite Graph Matching Given a bipartite graph G with two disjoint Recommendation, Resource O(|E| |V |) [42], [124]
sets of vertices V1 and V2 , find a matching Allocation, Scheduling
between V1 and V2 that maximizes the
number of matched pairs.
Graph Neural Networks Given a graph G with node features X of Node Classification, Graph- O(ld|V |2 ) [124]
dimension d, simulate a graph neural net- level Classification
works with l payers and return the encoded
node features
Clustering Coefficient Given a graph G , find the clustering coeffi- Community Detection, Node O(|V |3 ) [126]
cient of G . Clustering
Substrcuture Counting Given a graph G and a subgraph G′,count Pattern Matching, Subgraph NP-Complete [42]
the number of occurrences of G ′ in G . Detection, Abnormality De-
tection
Hamilton Path Given a graph G , find a path that visits every Route Planning, Drilling Ma- NP-Complete [124]
vertex exactly once. chine Planning, DNA Se-
quencing
(Knowledge) Graph QA Given a (knowledge) graph G and a question Dialogue System, Smart As- — [126], [129]–[132]
q , find the answer to q . sistant, Recommendation
Graph Query Language Gen- Given a graph G and a query q , generate a Graph Summarization, FAQ — [126]
eration query language that can be used to query G . Generation, Query Sugges-
tions
Node Classification Given a graph G , predict the class of a node Recommendation, User Pro- — [126], [127]
v. filing, Abnormality Detection
Graph Classification Given a graph G , predict the class of G . Molecule Property Prediction, — [126], [127]
Moledule QA, Graph QA
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 23

TABLE 6
Summary of large language models on text-attributed graphs. Role of LM: “TE”, “SE”, “ANN” and “AUG” denote text encoder, structure encoder,
annotator (labeling the node/edges), and augmentator (conduct data augmentation). Task: “NC”, “UAP”, “LP”, “Rec”, “QA”, “NLU”, “EC”, “LM”, “RG”
denote node classification, user activity prediction, link prediction, recommendation, question answering, natural language understanding, edge
classification, language modeling, and regression task.

Approach Category Role of LM LM Size Focus Task

GNN-LM [66] LLM as Encoder TE 237M Task LM
GIANT [58] LLM as Encoder TE 110M Task NC
TextGNN [77] LLM as Encoder TE 110M Task Search
AdsGNN [78] LLM as Encoder TE 110M Task Search
LM-GNN [68] LLM as Encoder TE 110M Efficiency NC, LP, EC
GraD [69] LLM as Encoder TE 110M/66M Efficiency LP, NC
TAPE [70] LLM as Encoder TE, AUG 129M/GPT-3.5 Task NC
SimTeG [35] LLM as Encoder TE 80M/355M Task NC, LP
LLM-GNN [64] LLM as Encoder ANN GPT-3.5 Task NC
ENG [71] LLM as Encoder TE, AUG 80M/GPT-3.5 Task NC
SPECTER [51] LLM as Predictor TE 110M Representation NC, UAP, LP, Rec
GraphFormers [72] LLM as Predictor TE, SE 110M Representation LP
GreaseLM [67] LLM as Predictor TE, SE 355M Task QA
SciNCL [52] LLM as Predictor TE 110M Representation NC, UAP, LP, Rec
MICoL [59] LLM as Predictor TE 110M Supervision NC
LinkBERT [30] LLM as Predictor TE 110M Pretraining QA, NLU
Heterformer [73] LLM as Predictor TE, SE 110M Representation NC, LP
E2EG [60] LLM as Predictor TE 66M Task NC
TwHIN-BERT [56] LLM as Predictor TE 110M/355M Pretraining NC, LP
Edgeformers [74] LLM as Predictor TE, SE 110M Representation NC, LP, EC
Patton [31] LLM as Predictor TE, RE 110M Pretraining NC, LP, Search
InstructGLM [46] LLM as Predictor TE, SE 250M/7B Generalization NC, LP
GNP [41] LLM as Predictor TE, SE 3B/11B Task QA
Touchup-G [54] LLM as Predictor TE 110M Representation NC, LP
DGTL [76] LLM as Predictor TE, SE 13B Task NC
GraphText [65] LLM as Predictor TE, SE GPT-3.5/4 Task NC
GraphGPT [45] LLM as Predictor TE, SE 7B Generalization NC
METERN [75] LLM as Predictor TE, RE 110M Representation NC, LP, Rec, RG
LTRN [57] LLM as Aligner TE 110M Supervision NC
GLEM [62] LLM as Aligner TE 110M Task NC
G2P2 [63] LLM as Aligner TE 110M Supervision NC
ConGraT [53] LLM as Aligner TE 110M/82M Representation LP, LM, NC
GRENADE [55] LLM as Aligner TE 110M Representation NC, LP
THLM [33] LLM as Aligner TE 110B Pretraining NC, LP
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 24

TABLE 7
A summarization of Graph-Aware LLM finetuning objectives on text-attributed graphs. vi+ and vi− denote a positive training node and a negative
training node to vi , respectively.

Method positive vi+ negative vi− Objective f (·)

(vi , vi− ) ∈
/ E; max{||hvi − hv+ ||2 − ||hvi − hv− ||2 + m, 0}
SPECTER [51] (vi , vi+ ) ∈ E
(vi , vu ) ∈ E , (vu , vi− ) ∈ E , (vi , vi− ) ∈
/E i i

−
SciNCL [52] ||hvi − hv+ ||2 ∈ (k+ − c+ ; k+ ] ||hvi − hv− ||2 ∈ (khard − c− −
hard ; khard ] max{||hvi − hv+ ||2 − ||hvi − hv− ||2 + m, 0}
i i i i

Touchup-G [54] (vi , vi+ ) ∈ E (vi , vi− ) ∈

/E log(hvi · hv+ ) + log(1 − hvi · hv− )
i i
exp(cos(hvi ,h + )/η)
v
TwHIN-BERT [56] cos(xvi , xv+ ) < k in-batch random − log P i
exp(cos(hvi ,h − )/η)
i −
vi v i
exp(cos(hvi ,h + )/η)
v
MICoL [59] vi+ ∈ NM (vi ) in-batch random − log P i
exp(cos(hvi ,h − )/η)
−
vi v i

(vi , vi+ ) ∈ E (vi , vi− ) ∈

P
E2EG [60] /E log(hvi · hv+ ) + vi−
log(1 − hvi · hv− )
i i

TABLE 8
Model collection in Section 6 for text-captioned graphs. “Lin.” and “Vec.” represent Linearized Graph Encoding and Vectorized Graph Encoding.
“Classif.”, “Regr.”, “NER”, “RE”, “Retr.”, “Gen.”, “Cap.” represent classification, regression, named entity recognition, relation extraction, (molecule)
graph retrieval, (molecule) graph generation, (molecule) graph captioning.

Model LM Encoder Graph Encoder Gen. Decoder LM Size Task

SMILES-BERT [179] Transformer [93] Linearized N.A. 30M-565M Classification
Text2Mol [122] SciBERT [25] GCN Transformer [93] ≥110M Retrieval
MolGPT [177] N.A. Linearized GPT 6M Generation
Chemformer [156] BART [28] Linearized BART [28] 45M-230M Regression, Gen.
Classif., NER,
KV-PLM [175] BERT [23] Linearized N.A 110M-340M
RE, Retrieval
LLM as Predictor in Section 6.1

MFBERT [176] RoBERTa [24] Linearized N.A. 110M-340M Classification

Galatica [178] N.A. Linearized Transformer [93] 125M-120B Classification
MolT5 [123] T5.1.1 [29] Linearized Transformer 80M-780M Gen., Cap.
Classif, Gen.,
Text+Chem T5 [171] T5 [29] Linearized T5 [29] 80M-780M
Caption
GPT-3.5/4 Classification
LLM-ICL [168] N.A. Linearized LLaMA2 [119] ≥ 780M Generation
Galactica [178] Caption
GIMLET [47] T5 [29] GT T5 [29] 80M-780M Classif., Regr.
Classif, Gen.,
MolXPT [169] N.A. Linearized GPT-2 350M
Caption
ChatMol [166] T5 [29] Linearized T5 [29] 80M-780M Gen., Cap.
MolReGPT [165] N.A. Linearized GPT-3.5 N.A. Gen., Cap.
RT [164] N.A. Linearized XLNet [27] 27M Regr. Gen.
LLM4Mol [163] RoBERTa [24] Linearized GPT-3.5 N.A. Classif, Regr.
LLaMA-Mol [160] N.A. Linearized LLaMA [119] 7B Regr., Gene.
Prot2Text [161] N.A. GNN GPT-2 256M-760M Caption
CatBERTa [159] N.A. Linearized RoBERTa [24] N.A. Regression
ReLM [157] N.A. GNN GPT-3.5 N.A. Classification
SciBERT [25] MolT5 [123] Classif, Gen.,
MoMu [174] GNN 82M-782M
KV-PLM [175] MoFlow [191] Caption, Retr.
LLM as Aligner in Section 6.2

GIN Classif, Gen.,

MoleculeSTM [172] BART [28] BART [28] 45M-230M
Linearized Caption
BioBERT [180] GNN Classification
CLAMP [170] N.A. ≤11B
CLIP [96], T5 [29] Lin., Vec. Retrieval
Classif., Gen.
MolFM [162] BERT [23] GIN MolT5 [123] 61.8M
Caption, Retr.
MoMu-v2 [173] SciBERT [25] GIN N.A. 82M-782M Classification
GIN Classif, Gen.
GIT-Mol [158] SciBERT [25] MolT5 [123] 190M-890M
Linearized Caption
Classif., Regr.,
MolCA [167] Galactica [178] GIN N.A. 100M-877M
Retrieval

The Future of AI and Machines by Slidesgo
No ratings yet
The Future of AI and Machines by Slidesgo
39 pages
Data Governance - Value Orders and Jurisdictional Conflicts - Anke Sophia Obendiek - 2022 - Oxford University Press - 9780192870193 - My Books
100% (1)
Data Governance - Value Orders and Jurisdictional Conflicts - Anke Sophia Obendiek - 2022 - Oxford University Press - 9780192870193 - My Books
305 pages
Open Ai
100% (2)
Open Ai
23 pages
Data Modeling Concept Latest
No ratings yet
Data Modeling Concept Latest
25 pages
Data Analytics Template - Task 3 - Final
No ratings yet
Data Analytics Template - Task 3 - Final
11 pages
Towards Automatically Extracting Story Graphs From Natural Language Stories
No ratings yet
Towards Automatically Extracting Story Graphs From Natural Language Stories
8 pages
Computational Semiotics
No ratings yet
Computational Semiotics
289 pages
Unit 4 LLM
No ratings yet
Unit 4 LLM
11 pages
Covertklyniac22 PDF
No ratings yet
Covertklyniac22 PDF
26 pages
Data Analyst Interview Questions by Jessica Ayodele
No ratings yet
Data Analyst Interview Questions by Jessica Ayodele
10 pages
02 - Data Preparation and Cleaning
No ratings yet
02 - Data Preparation and Cleaning
16 pages
Seminar 7 Introduction To Databases
No ratings yet
Seminar 7 Introduction To Databases
41 pages
Excel Adv Formulae & Functions
No ratings yet
Excel Adv Formulae & Functions
26 pages
Introduction To Natural Language Processing (NLP)
No ratings yet
Introduction To Natural Language Processing (NLP)
87 pages
NLP Python Intro 1-3
100% (1)
NLP Python Intro 1-3
79 pages
NLP 3 4 5
No ratings yet
NLP 3 4 5
105 pages
024 KnowledgeRepresentationMethods (Puhr)
No ratings yet
024 KnowledgeRepresentationMethods (Puhr)
18 pages
Visualisation For Data Science Predict Overview 3267
No ratings yet
Visualisation For Data Science Predict Overview 3267
15 pages
Databases and Data Modelling
No ratings yet
Databases and Data Modelling
53 pages
Proximity Search Operators Guidelines
No ratings yet
Proximity Search Operators Guidelines
2 pages
Word Semantics, Sentence Semantics and Utterance Semantics
No ratings yet
Word Semantics, Sentence Semantics and Utterance Semantics
11 pages
Leveraging Ms Office With Ai in Boosting Productivity
No ratings yet
Leveraging Ms Office With Ai in Boosting Productivity
57 pages
Text and Sentiment Analysis
No ratings yet
Text and Sentiment Analysis
41 pages
Data Cleaning
No ratings yet
Data Cleaning
16 pages
A Review On Large Language Models Architectures Ap
No ratings yet
A Review On Large Language Models Architectures Ap
31 pages
Data Analist
No ratings yet
Data Analist
5 pages
01 Basics of Data Analytics and Machine Learning
No ratings yet
01 Basics of Data Analytics and Machine Learning
16 pages
Lecture2 ImageFormationRepresentation
No ratings yet
Lecture2 ImageFormationRepresentation
34 pages
Text Mining in R (Intro)
0% (1)
Text Mining in R (Intro)
4 pages
50 50 Ai Prompt Prelim Exam
No ratings yet
50 50 Ai Prompt Prelim Exam
13 pages
Knowledge Graph Construction Using Large Language Models
No ratings yet
Knowledge Graph Construction Using Large Language Models
17 pages
Ai Course File
No ratings yet
Ai Course File
67 pages
A Review On Large Language Models Architectures Applications Taxonomies Open Issues and Challenges
No ratings yet
A Review On Large Language Models Architectures Applications Taxonomies Open Issues and Challenges
36 pages
Generative Ai in Higher Education and Beyond: Sciencedirect
No ratings yet
Generative Ai in Higher Education and Beyond: Sciencedirect
8 pages
Development of An Indian Legal Language Model (LLM) For Enhanced Legal Text Analysis and Assistance
No ratings yet
Development of An Indian Legal Language Model (LLM) For Enhanced Legal Text Analysis and Assistance
7 pages
Business Data Analyst Interview Questions 1701686362
No ratings yet
Business Data Analyst Interview Questions 1701686362
42 pages
Parsing
No ratings yet
Parsing
38 pages
LLM Paper
No ratings yet
LLM Paper
26 pages
Qurana: Corpus of The Quran Annotated With Pronominal Anaphora
No ratings yet
Qurana: Corpus of The Quran Annotated With Pronominal Anaphora
8 pages
Data Governance Final v2
100% (1)
Data Governance Final v2
514 pages
Generative Ai Course
No ratings yet
Generative Ai Course
3 pages
Unit I
No ratings yet
Unit I
30 pages
Data Analytics Vs
No ratings yet
Data Analytics Vs
3 pages
ATN
100% (1)
ATN
16 pages
Information Retrieval, Question Answering Systems, and Chatgpt: Technology, Capability, and Intelligence
No ratings yet
Information Retrieval, Question Answering Systems, and Chatgpt: Technology, Capability, and Intelligence
15 pages
Power Query in PowerBI
No ratings yet
Power Query in PowerBI
20 pages
Agent Based Models Are Here and Disrupting GPT RAG 1717410571
No ratings yet
Agent Based Models Are Here and Disrupting GPT RAG 1717410571
12 pages
Text Mining Techniques Applications and Issues2
No ratings yet
Text Mining Techniques Applications and Issues2
5 pages
2023 Intro To Generative Ai
No ratings yet
2023 Intro To Generative Ai
15 pages
Introduction To RAG (Retrieval Augmented Generation) and Vector Database - by Sachinsoni - Medium
No ratings yet
Introduction To RAG (Retrieval Augmented Generation) and Vector Database - by Sachinsoni - Medium
18 pages
Keyphrase Extraction (3rd Review)
No ratings yet
Keyphrase Extraction (3rd Review)
22 pages
Video-LLaMA: A Novel and Advanced Audio-Visual Language Model For Video Content
No ratings yet
Video-LLaMA: A Novel and Advanced Audio-Visual Language Model For Video Content
7 pages
W3 Data Models
No ratings yet
W3 Data Models
48 pages
Semantic Modelling - Understanding Semantic Layer - V2 (Autosaved)
No ratings yet
Semantic Modelling - Understanding Semantic Layer - V2 (Autosaved)
36 pages
Full Text Indexes in Postgresql
No ratings yet
Full Text Indexes in Postgresql
37 pages
Langauage Model
No ratings yet
Langauage Model
148 pages
Data Visualization PDF
No ratings yet
Data Visualization PDF
90 pages
Lecture 6-Text Mining and Sentiment Analysis
No ratings yet
Lecture 6-Text Mining and Sentiment Analysis
57 pages
Data Literacy Fundamentals: Understanding the Power & Value of Data
From Everand
Data Literacy Fundamentals: Understanding the Power & Value of Data
Ben Jones
No ratings yet
Building Data-Driven Applications with LlamaIndex: A practical guide to retrieval-augmented generation (RAG) to enhance LLM applications
From Everand
Building Data-Driven Applications with LlamaIndex: A practical guide to retrieval-augmented generation (RAG) to enhance LLM applications
Andrei Gheorghiu
No ratings yet
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
From Everand
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
Zhenya Antić
No ratings yet
AI Titans : the 3 Masters : Socrates, Perplexity.ai, ChatGPT 4: AI, #2
From Everand
AI Titans : the 3 Masters : Socrates, Perplexity.ai, ChatGPT 4: AI, #2
Christophe Paroni
No ratings yet
Python Roadmap 2024 1705755575
No ratings yet
Python Roadmap 2024 1705755575
14 pages
Nikita Prasad Polynomial Regression Basics 1710359781
No ratings yet
Nikita Prasad Polynomial Regression Basics 1710359781
16 pages
UAE Corporate Tax Summarized Notes 1706275338
No ratings yet
UAE Corporate Tax Summarized Notes 1706275338
43 pages
Annexure II (For Continuing Students) : Signature of Concerned Bank Official With Stamp
No ratings yet
Annexure II (For Continuing Students) : Signature of Concerned Bank Official With Stamp
2 pages
IMPACT Magazine Issue 4
No ratings yet
IMPACT Magazine Issue 4
68 pages
Adani Enterprises Australia
No ratings yet
Adani Enterprises Australia
21 pages
News Summarization Report 40pages
No ratings yet
News Summarization Report 40pages
9 pages
Iitmp Aaiml Brochure
No ratings yet
Iitmp Aaiml Brochure
30 pages
A Roadmap For Big Model
No ratings yet
A Roadmap For Big Model
200 pages
Artificial Intelligence: Lab Activity - 4.1 To Create A Form of A Future Job Advertisement
No ratings yet
Artificial Intelligence: Lab Activity - 4.1 To Create A Form of A Future Job Advertisement
14 pages
WP Demystifying Ai
No ratings yet
WP Demystifying Ai
33 pages
Ihub - IITR - PCP in Generative AI and Machine Learning - 41223
No ratings yet
Ihub - IITR - PCP in Generative AI and Machine Learning - 41223
30 pages
Rulebook For KUET CSE BitFest 2025 Datathon
No ratings yet
Rulebook For KUET CSE BitFest 2025 Datathon
13 pages
Researchdiscoveryandvisualizationusing Research Rabbit Ausecaseof AIinlibraries
No ratings yet
Researchdiscoveryandvisualizationusing Research Rabbit Ausecaseof AIinlibraries
25 pages
Download
No ratings yet
Download
1 page
TATA Assignment
No ratings yet
TATA Assignment
28 pages
The Role of Artificial Intelligence in Healthcare
No ratings yet
The Role of Artificial Intelligence in Healthcare
3 pages
History of Robots
No ratings yet
History of Robots
4 pages
The Power and Limits of Deep Learning: Research-Technology Management
No ratings yet
The Power and Limits of Deep Learning: Research-Technology Management
7 pages
Ai Development Services PDF
No ratings yet
Ai Development Services PDF
4 pages
Aishwarya DL Mini Project Report
No ratings yet
Aishwarya DL Mini Project Report
4 pages
Why Johny Cant Prompt
No ratings yet
Why Johny Cant Prompt
21 pages
Session 1.1 Understanding Ai
No ratings yet
Session 1.1 Understanding Ai
5 pages
Python Data Science 2024 - Explo - Wilson, Stephen
No ratings yet
Python Data Science 2024 - Explo - Wilson, Stephen
170 pages
Composite Exhibit 2 (Part 1)
No ratings yet
Composite Exhibit 2 (Part 1)
55 pages
Tong Hop de
No ratings yet
Tong Hop de
146 pages
AI & ML Notes VTU ECE
No ratings yet
AI & ML Notes VTU ECE
16 pages
Unit 2
No ratings yet
Unit 2
28 pages
Document From Arnab Bhattacharya
No ratings yet
Document From Arnab Bhattacharya
42 pages
Chapter 2 Our Relation With AI
No ratings yet
Chapter 2 Our Relation With AI
21 pages
Unit 1 - Problem Solving
No ratings yet
Unit 1 - Problem Solving
68 pages
Program Detail - Mastering AI Bootcamp v3
No ratings yet
Program Detail - Mastering AI Bootcamp v3
21 pages
Computer Science Study Abroad Handbook 2023 2024
No ratings yet
Computer Science Study Abroad Handbook 2023 2024
23 pages
PFE-RACHED Hamza
No ratings yet
PFE-RACHED Hamza
54 pages