Research Proposal LLMs and Knowledge Graphs

The document proposes a MSc dissertation project that investigates integrating symbolic and stochastic AI by combining knowledge graphs and large language models. The goal is to evaluate how structured knowledge from knowledge graphs can enhance reasoning in language models. Specifically, the project aims to integrate explicit and implicit knowledge from knowledge graphs with the learning capabilities of language models to improve performance on question answering and knowledge exploration tasks, while maintaining interpretability. The objectives are to conduct a literature review, design and implement an initial solution, evaluate the effectiveness, potentially extend the work to areas like explainability or knowledge graph updating, and deploy the final system online. Risks and uncertainties include potential changes to the proposal based on new findings and interactions with the advisor.

Uploaded by

kammy19941

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

310 views4 pages

Research Proposal LLMs and Knowledge Graphs

Uploaded by

kammy19941

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

MSc.

AI Dissertation - Initial Project Proposal

Introduction

Artificial Intelligence (AI) has historically been viewed through two distinct lenses: symbolic AI and
stochastic AI. The former is characterised by methodologies associated with the explicit
representation of facts and knowledge, epitomised by logical reasoning. Stochastic AI, on the other
hand, encompasses techniques that encapsulate problems and corresponding solutions in probabilistic
terms, where learning emerges from pattern recognition and exposure to ever increasing amounts of
data. Though deep learning has heralded a surge in the development of stochastic AI, neither symbolic
AI nor stochastic AI individually possess the full capacity to represent, understand, and reason across
the diverse complexities present in real-world problems. Symbolic AI excels in logic and rule-based
tasks but struggles with ambiguities inherent in real-world data. Conversely, stochastic AI can discern
patterns in data but offer little transparency into the underlying decision-making processes.

Problem Description

My initial research proposal proposes an investigation into the integration of symbolic and stochastic
AI, specifically focusing on the symbiosis of Knowledge Graphs (KGs) and Large Language Models
(LLMs). Given the opportunities possible with KG’s (Peng et al. 2023), I am interested in evaluating
the extent to which the structured knowledge representation of KGs, encapsulating both explicit and
implicit facts, can enhance the reasoning capabilities of LLMs, producing a system that exhibits some
of the beneficial characteristics of both symbolic AI and stochastic AI (Ilkou & Koutraki, 2020).

The problem my research project seeks to solve is: "How can we effectively integrate explicit and
implicit structured knowledge in KGs with the learning capabilities of LLMs, and what impact
does this integration have on the model's performance and interpretability for ‘question and
answering’ and ‘knowledge exploration’?" This question arises from the observation that while
KGs can provide a structured representation of explicit facts and potentially inferred knowledge, the
mechanism for effectively exploiting this wealth of structured information in stochastic models like
LLMs is not fully explored. The challenge is therefore not only about the varying integration
strategies of knowledge graphs (Galkin et al. 2016), but also how to maintain and utilise the
distinction between explicit and implicit knowledge for knowledge understanding, and how the
integration impacts the model's ability to reason, learn, and explain its outputs.

Although symbolic AI is primarily engaged in the management of overt factual data and established
relational paradigms, the intricacies of reality frequently introduce a level of complexity that escalates
non-linearly as the quantity of these facts and rules proliferate. This invariably makes the
representation of all potential facts and rules an exponentially arduous task. It introduces uncertainty
about what knowledge is represented in the system and how accurately it reflects the real world. It is
therefore important to evaluate three main areas:

1. Uncertainty in knowledge representation: It is vital to assess the extent to which the

knowledge graph (KG) and rule embedding systems can accurately represent real-world
knowledge. This includes evaluating its ability to correctly capture and encode facts,
relationships, and rules from a variety of domains and datatypes.

2. Uncertainty in Inference: Here, the aim would be to understand the system’s capacity to
draw plausible inferences from any represented knowledge. This involves evaluating the
ability of the system to correctly and coherently generate new facts based on the embedded
rules. Metrics such as precision and recall, along with other measures like logical consistency,
can be employed.
3. Uncertainty Propagation: The system should also be assessed for how uncertainty
propagates through the chain of inferences. For instance, if a fact is inferred based on a chain
of several other facts and rules, and there is uncertainty associated with each of them, to what
extent does this affect the certainty of the final inference?

This problem, once addressed, could lead to the development of AI systems that leverage the strengths
of both KGs and LLMs. Such systems could potentially exhibit improved reasoning, generalisation,
and explanation capabilities, addressing some of the key limitations seen in both symbolic and
stochastic AI systems (Garnelo & Shanahan 2019).

Future Work

In case the problem solution is achieved earlier than expected, there are multiple avenues worth
pursuing for further insights and developments:

Scalability and Efficiency: Neuro-symbolic systems, when integrating knowledge graphs with LLMs
and applying rule embedding, may become very complex and computationally demanding. It would
be insightful to investigate techniques to optimize the computational efficiency and scalability of such
systems, while reducing the cost to train models.

Development of Explainable AI Systems: With the complexity of Neuro-Symbolic systems and their
inferred outputs, it becomes crucial to develop methods for explaining the system's reasoning process,
enhancing transparency, and aiding in its deployment in risk-sensitive settings. This is especially
important to bridge the gap between academic insights and application of novel approaches in
industry while adhering to regulatory laws and compliances. For example, a neuro-symbolic system
integrating an LLM with a knowledge graph could provide a framework to ensure regulatory
compliance, by reasoning on a set of explicitly encoded rules. Research could focus on how to best
implement this while maintaining the model's performance.

Knowledge Graph Update Mechanisms: As the world is inherently changing and the interpretation
of knowledge and information can change, it is important to study methods to automate the updating
of the KG and consider ways in which the interpretation of knowledge can change over time. This
includes both explicit knowledge (new facts) and implicit knowledge (new inferences and rules
derived from existing facts). The effective updating of the KG can ensure the system stays fresh with
the latest knowledge and potentially novel ways to interpret knowledge. (Zhong et al, 2023)

Interactive Learning and Feedback Loops: In many practical applications, models do not operate in
isolation but are part of a feedback loop, where they interact with users and learn from these
interactions. Research could focus on how to incorporate this feedback into the neuro-symbolic
framework, allowing the model to continuously learn and update its knowledge while maintaining the
models performance.

Project Objectives

Literature review: The literature review will serve a multi-faceted role within the thesis, providing a
rigorous and critical examination of the current knowledge landscape that directly relates to my
chosen topic of LLM’s and KG’s towards assimilating, evaluating, and contextualising the corpus of
academic works on subdisciplines related to the problem.

Solution Design and Curating data: Here I aim to critically evaluate and finalise both the end-to-end
design I aim to build and the data that I will be using for the thesis.
Initial Coding Section: Here the aim is to build the finalised design and showcase the reduction in
hallucinations through knowledge graphs and to compare them to a baseline LLM and LLM’s of
varying sizes/designs/types without knowledge graphs. (Aim to leverage different approaches to more
effective knowledge representations and inferencing to evaluate the extent to which different types of
knowledge graphs have different impacts on reducing hallucinations).

Post Coding Section: Here the aim is to ensure the coding section is complete, tested and evaluate
the effectiveness of each approach and using specific metrics to measure the quality and reliability of
its outputs.

Second Coding Section: Assuming the initial coding session and target is met, the next aim would be
to further extend the project by approaching one of the topics noted above in the ‘future work’ section,
and meetings with my thesis advisor to better understand the best approach to take if I reach this
stage.

Deployment: Once the coding and testing is completed, I aim to host the project online as a simple
full stack project for people to interact with. This can be in the form of a simple application hosted on
the cloud, along with a write up detailing my findings.

The last objective is to not only develop more efficient and in-depth research skills throughout the
thesis, but to also develop the necessary skills to pursue a doctorate one day and am hoping to learn
this along the way from my thesis advisor if possible.

Project Plan

Above is a Gantt chart I have created to roughly estimate the amount of time allocated to each task. I
have decided not to develop a week-by-week breakdown of each task as there are too many variables
and potential sources of errors or risks so early on in this research proposal to consider making a
rigorous plan. However, I have allocated sufficient time to the implementation of the solution in case
of any alterations to the project proposal or interesting deviations that may occur during the literature
review and interactions with my thesis advisor that could result in more time needed for coding and
evaluating my implemented solution. I have also allocated extra time to consider a topic in the ‘future
works’ section if possible and time to deploy the solution online. The aim is to complete the entire
thesis within 7 months, including deployment and completing the final write up, with one month of
contingency time.

Risks

As with any research, there are many risks and uncertainties to take into consideration and approaches
to mitigate them. Below are possible solutions to approaching certain risks that could arise:
Risk How to approach them
Costs: As training LLM’s are both costly and There are many open-source LLM’s on
time intensive, it may be necessary to find an huggingface and online that are available for
alternative approach instead of building the research purposes; thus, it might be necessary to
LLM’s myself. spend time deciding on the criteria needed to
pick the optimal ones for this project.
I could also continue searching for cheaper
remote resources such as
https://lambdalabs.com to keep costs down and
allow for more faster training of LLM’s if I can
train them simultaneously.
Large scope: The ambit of the research A potential solution involves the employment of
proposal presents concerns due to its extensive structured research design methodologies, a
breadth. Given the accelerated pace of emergent rigorous framework, based on clearly defined
research in LLM’s, there is a chance for objectives, key research questions, and
inadvertent immersion into intricate detail and hypotheses, may help delineate the boundaries
falling behind. of the investigation; this is something I am
aiming to work on as I develop my final
research proposal and interact with my thesis
advisor during the thesis
Time Management: Thesis and dissertation I aim to improve my time management skills
work can be time-consuming and, without and implementing effective time management
proper management, can result in missed strategies like using project management tools,
deadlines. There is an impact on time setting realistic milestones, and regular progress
management and student academic achievement. tracking can help keep your work on schedule.
(Nasrullah & Khan 2015)
Dataset Curation and collection: Collecting I aim to consider multiple data collection
original data for research can be fraught with methods, ensuring ethical considerations are
potential issues, from gaining access to the met and critically evaluating my data sources. I
necessary resources to the reliability and validity could use datasets and data sources commonly
of the data I am collecting, along with different used in industry and previous papers that could
types of data such as geospatial data. make comparisons to prior works easier.

References

Expert.ai Team, 2022. Implicit v Explicit Knowledge for Language Understanding [Online]. Available from:
https://www.expert.ai/blog/implicit-vs-explicit-knowledge-for-language-understanding/ [Accessed 20 July
2023]

Garnelo, Marta & Shanahan, Murray. (2019). Reconciling deep learning with symbolic artificial intelligence:
representing objects and relations. Current Opinion in Behavioral Sciences. 29. 17-23.
10.1016/j.cobeha.2018.12.010.

Ilkou, Eleni & Koutraki, Maria. (2020). Symbolic Vs Sub-symbolic AI Methods: Friends or Enemies?.
10.1145/3340531.3414072.

Nasrullah, Shazia & Saqib Khan, Muhammad. (2015). The Impact of Time Management on the Students'
Academic Achievements. 11.

Peng, C., Xia, F., Naseriparsa, M. et al. Knowledge Graphs: Opportunities and Challenges. Artif Intell
Rev (2023). https://doi.org/10.1007/s10462-023-10465-9

Zhong, Lingfeng & Wu, Jia & Li, Qian & Peng, Hao & Wu, Xindong. (2023). A Comprehensive Survey on
Automatic Knowledge Graph Construction. 10.48550/arXiv.2302.05019.

China-2021-2022-Problems-And-Solutions-56632522: 4.8 Out of 5.0 (31 Reviews)
100% (1)
China-2021-2022-Problems-And-Solutions-56632522: 4.8 Out of 5.0 (31 Reviews)
130 pages
11 Plus Non Verbal Reasoning Paper 1 Odd One Out SDJ
No ratings yet
11 Plus Non Verbal Reasoning Paper 1 Odd One Out SDJ
7 pages
The Test Is Complete: Enlightks Ecdl/Icdl - Word Processing 5.0 - Word 2007 - Diag. Eng
No ratings yet
The Test Is Complete: Enlightks Ecdl/Icdl - Word Processing 5.0 - Word 2007 - Diag. Eng
10 pages
Mathematical Olympiad in China (2009-2010) Problems and Solutions (Z-Library)
No ratings yet
Mathematical Olympiad in China (2009-2010) Problems and Solutions (Z-Library)
146 pages
KUKA Rob Product Portfolio en Screen
No ratings yet
KUKA Rob Product Portfolio en Screen
77 pages
IMO Training 2008: Graph Theory: By: Adrian Tang Email: Tang at Math - Ucalgary.ca
No ratings yet
IMO Training 2008: Graph Theory: By: Adrian Tang Email: Tang at Math - Ucalgary.ca
12 pages
Artificial Intelligence: Tutorial 5 Questions Predicate Logic
No ratings yet
Artificial Intelligence: Tutorial 5 Questions Predicate Logic
2 pages
Application Problems 20
No ratings yet
Application Problems 20
3 pages
Digital Logic Design Exam Key
No ratings yet
Digital Logic Design Exam Key
5 pages
2023 Toward The Third Generation Artificial Intelligenc
No ratings yet
2023 Toward The Third Generation Artificial Intelligenc
19 pages
Nuro Symbolic AI 1706972510
No ratings yet
Nuro Symbolic AI 1706972510
38 pages
2403 11996v3-2
No ratings yet
2403 11996v3-2
85 pages
Symbolicai: A Framework For Logic-Based Approaches Combining Generative Models and Solvers
No ratings yet
Symbolicai: A Framework For Logic-Based Approaches Combining Generative Models and Solvers
39 pages
AIML Unit 2 Notes
No ratings yet
AIML Unit 2 Notes
49 pages
MIT - Accelerating Scientific Discovery With KG and AI
No ratings yet
MIT - Accelerating Scientific Discovery With KG and AI
83 pages
AI - Unit 2
No ratings yet
AI - Unit 2
25 pages
Knowledge Representation
No ratings yet
Knowledge Representation
10 pages
AI-first Module Notes-1
No ratings yet
AI-first Module Notes-1
10 pages
Enhancing QA Systems with KGs & LLMs
No ratings yet
Enhancing QA Systems with KGs & LLMs
9 pages
Enhancing Educational Qa Systems: Integrating Knowledge Graphs and Large Language Models For Context-Aware Learning
No ratings yet
Enhancing Educational Qa Systems: Integrating Knowledge Graphs and Large Language Models For Context-Aware Learning
9 pages
E Nhancing e Ducational Qa S Ystems I Ntegrating K Nowledge G Raphs A ND L Arge L Anguage M Odels F or C Ontext A Ware L Earning
No ratings yet
E Nhancing e Ducational Qa S Ystems I Ntegrating K Nowledge G Raphs A ND L Arge L Anguage M Odels F or C Ontext A Ware L Earning
9 pages
Unit 3 - Artificial Intelligence - DR A.kanagaraj
No ratings yet
Unit 3 - Artificial Intelligence - DR A.kanagaraj
29 pages
2-Problems in Representing Knowledge
No ratings yet
2-Problems in Representing Knowledge
1 page
Knowledge Representation
No ratings yet
Knowledge Representation
31 pages
Knowledge Representation and Inference
No ratings yet
Knowledge Representation and Inference
15 pages
Knowledge Representation in AI
No ratings yet
Knowledge Representation in AI
311 pages
Notes Artificial Intelligence Unit 2
No ratings yet
Notes Artificial Intelligence Unit 2
49 pages
Unit Iv: Syllabus: Knowledge Representation: Introduction, Approaches To Knowledge Representation, Knowledge
No ratings yet
Unit Iv: Syllabus: Knowledge Representation: Introduction, Approaches To Knowledge Representation, Knowledge
14 pages
Knowledge Representation Issues: Expressiveness
No ratings yet
Knowledge Representation Issues: Expressiveness
5 pages
Exploring Knowledge Graph-Based Neural-Symbolic System From Application Perspective
No ratings yet
Exploring Knowledge Graph-Based Neural-Symbolic System From Application Perspective
23 pages
Knowledge Representation KR Knowledge Representation and Reasoning KRR KR&R KR
No ratings yet
Knowledge Representation KR Knowledge Representation and Reasoning KRR KR&R KR
16 pages
Ai Tu Cse. 3
No ratings yet
Ai Tu Cse. 3
55 pages
Notes Artificial Intelligence Unit 2
No ratings yet
Notes Artificial Intelligence Unit 2
48 pages
AI May 2007
No ratings yet
AI May 2007
12 pages
Knowledge Representation
No ratings yet
Knowledge Representation
93 pages
AI and Expert System
No ratings yet
AI and Expert System
18 pages
Knowledge Representation
No ratings yet
Knowledge Representation
103 pages
Synergizing Knowledge Graphs With Large Language Models: A Comprehensive Review and Future Prospects
No ratings yet
Synergizing Knowledge Graphs With Large Language Models: A Comprehensive Review and Future Prospects
8 pages
Unit 2
No ratings yet
Unit 2
21 pages
Lecture Notes 2 Expert Systems
No ratings yet
Lecture Notes 2 Expert Systems
19 pages
Artificial Intelligence and Expert Systems
No ratings yet
Artificial Intelligence and Expert Systems
10 pages
Module 2
No ratings yet
Module 2
128 pages
Knowledge Representation-October 16
No ratings yet
Knowledge Representation-October 16
19 pages
Proposal
No ratings yet
Proposal
5 pages
参考文献
No ratings yet
参考文献
22 pages
AI Solved Main Answer Document 12
No ratings yet
AI Solved Main Answer Document 12
20 pages
Model Construction Operators
No ratings yet
Model Construction Operators
115 pages
Unit 3 - 1 - Knowledge Representation
No ratings yet
Unit 3 - 1 - Knowledge Representation
23 pages
Knowledge Representation Notes
No ratings yet
Knowledge Representation Notes
5 pages
New Programming Language
No ratings yet
New Programming Language
12 pages
1 s2.0 S1877042810002004 Main
No ratings yet
1 s2.0 S1877042810002004 Main
4 pages
03 - (Lec2) Knowledge Representation
No ratings yet
03 - (Lec2) Knowledge Representation
26 pages
Beyond Statistical Learning: Exact Learning Is Essential For General Intelligence
No ratings yet
Beyond Statistical Learning: Exact Learning Is Essential For General Intelligence
24 pages
Ai Mid QB 2
No ratings yet
Ai Mid QB 2
10 pages
AI Knowledge Representation Guide
No ratings yet
AI Knowledge Representation Guide
39 pages
Report CS Elective 1 Group 1
No ratings yet
Report CS Elective 1 Group 1
52 pages
Unit Iii
No ratings yet
Unit Iii
19 pages
Unit-4 Knowledge Representation
No ratings yet
Unit-4 Knowledge Representation
19 pages
Artigo IA
No ratings yet
Artigo IA
6 pages
A Review of Some Techniques For Inclusion of Domain Knowledge Into Deep Neural Networks
No ratings yet
A Review of Some Techniques For Inclusion of Domain Knowledge Into Deep Neural Networks
15 pages
Difference Between A Latch and A Flip Flop PDF
100% (1)
Difference Between A Latch and A Flip Flop PDF
2 pages
FortiManager 6.4 New Features Guide
No ratings yet
FortiManager 6.4 New Features Guide
198 pages
Lightspeed Trader: User Manual
No ratings yet
Lightspeed Trader: User Manual
117 pages
Envoy S MAN EN INTL
No ratings yet
Envoy S MAN EN INTL
39 pages
Smartphone and Internet Addiction
No ratings yet
Smartphone and Internet Addiction
13 pages
Regarding Purchase of IT Product (03.06.2024)
No ratings yet
Regarding Purchase of IT Product (03.06.2024)
10 pages
Cybersecurity Program Template
No ratings yet
Cybersecurity Program Template
19 pages
Jual Beli Cartridge Bekas
No ratings yet
Jual Beli Cartridge Bekas
3 pages
ECE 385 Fall 2014 Lab Manual 20140829
50% (2)
ECE 385 Fall 2014 Lab Manual 20140829
308 pages
SM 102 Impact Print Test May7
No ratings yet
SM 102 Impact Print Test May7
1 page
Lab 03
No ratings yet
Lab 03
2 pages
Centrala Desfumare Modulara UCS 6000
No ratings yet
Centrala Desfumare Modulara UCS 6000
69 pages
PHD MB 36mpresentation Final
No ratings yet
PHD MB 36mpresentation Final
51 pages
GH-3100 Install & Operation Manual
No ratings yet
GH-3100 Install & Operation Manual
80 pages
Connect Student Registration
No ratings yet
Connect Student Registration
10 pages
Kinco Automation Solutions Catalog
No ratings yet
Kinco Automation Solutions Catalog
14 pages
Gigaset DA610 User Guide
No ratings yet
Gigaset DA610 User Guide
16 pages
Agile PLM Sample Resume - 1
No ratings yet
Agile PLM Sample Resume - 1
4 pages
MIS - Project Title Proposal
100% (1)
MIS - Project Title Proposal
14 pages
Policies and Issues On Internet and Implications To Tand L
No ratings yet
Policies and Issues On Internet and Implications To Tand L
23 pages
Log
No ratings yet
Log
2 pages
Waiver
No ratings yet
Waiver
1 page
PLT-04561 A.0 - USB CDC EEM Driver 3.15 Release Notes
No ratings yet
PLT-04561 A.0 - USB CDC EEM Driver 3.15 Release Notes
4 pages
A High-Efficiency, Small-Size Gan Doherty Amplifier For Lte Micro-Cell and Active Antenna System Applications
No ratings yet
A High-Efficiency, Small-Size Gan Doherty Amplifier For Lte Micro-Cell and Active Antenna System Applications
4 pages
Chapter 6
No ratings yet
Chapter 6
27 pages
Learning HUB
No ratings yet
Learning HUB
6 pages
43NM60ND Mos PDF
No ratings yet
43NM60ND Mos PDF
13 pages
Planning in The Face of Crisis: Mary Ann Gienette Dungca - Medina
No ratings yet
Planning in The Face of Crisis: Mary Ann Gienette Dungca - Medina
19 pages
Digital Signature Act 1997 Malaysia
No ratings yet
Digital Signature Act 1997 Malaysia
62 pages
Word Processing Letterhead Format Instructions
No ratings yet
Word Processing Letterhead Format Instructions
4 pages