[go: up one dir, main page]

CN118779468A - A method for entity relationship extraction based on large model and dynamic prompts - Google Patents

A method for entity relationship extraction based on large model and dynamic prompts Download PDF

Info

Publication number
CN118779468A
CN118779468A CN202411064169.3A CN202411064169A CN118779468A CN 118779468 A CN118779468 A CN 118779468A CN 202411064169 A CN202411064169 A CN 202411064169A CN 118779468 A CN118779468 A CN 118779468A
Authority
CN
China
Prior art keywords
entity
paragraph
relationship
template
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411064169.3A
Other languages
Chinese (zh)
Inventor
赵霞
刘雪姣
黄江勇
于重重
周东岳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Technology and Business University
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Publication of CN118779468A publication Critical patent/CN118779468A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

本发明公开了一种基于大模型和动态提示的实体关系抽取方法,该方法包括:定义领域知识图谱的模式层及实体集;利用模式层及实体集构建实体关系三元组示例集;构建实体向量数据库DB;构建prompt模板集PR;将待处理文本划分为段落,并提取段落的关键词列表;利用关键词列表为每个段落构造段落‑实体关联列表T;对每个段落,利用关联列表T、prompt模板集PR和实体关系三元组示例集构造动态提示,将动态提示送入大模型进行实体关系抽取,并检验结果的正确性。本发明融合了知识图谱、实体向量表示、动态提示与大模型的优势,无需微调大模型,实现实体关系的自动抽取,降低了实体关系抽取的成本,提升了效率和正确率,具有广泛的应用推广价值。The present invention discloses an entity relationship extraction method based on a large model and dynamic prompts, the method comprising: defining a pattern layer and an entity set of a domain knowledge graph; constructing an entity relationship triplet example set using the pattern layer and the entity set; constructing an entity vector database DB; constructing a prompt template set PR; dividing the text to be processed into paragraphs, and extracting a keyword list of the paragraphs; constructing a paragraph-entity association list T for each paragraph using the keyword list; for each paragraph, constructing a dynamic prompt using the association list T, the prompt template set PR and the entity relationship triplet example set, sending the dynamic prompt to the large model for entity relationship extraction, and verifying the correctness of the result. The present invention integrates the advantages of knowledge graphs, entity vector representation, dynamic prompts and large models, does not need to fine-tune the large model, realizes automatic extraction of entity relationships, reduces the cost of entity relationship extraction, improves efficiency and accuracy, and has a wide range of application and promotion value.

Description

一种基于大模型和动态提示的实体关系抽取方法A method for entity relationship extraction based on large model and dynamic prompts

技术领域Technical Field

本发明涉及实体关系抽取,具体涉及一种基于大模型和动态提示的实体关系抽取方法,属于人工智能应用领域。The present invention relates to entity relationship extraction, and in particular to an entity relationship extraction method based on a large model and dynamic prompts, and belongs to the field of artificial intelligence applications.

背景技术Background Art

随着大语言模型在各类任务中展现出卓越的表现力,人们正在积极探究其在不同领域内的广泛应用可能性。现有的大模型使用通用领域的知识语料库训练,能够从广泛的文本中学习语言模式,从非结构化文本中抽取实体关系用于构建通用领域知识图谱。由于缺乏特定领域知识语料的训练,并且特定领域的术语、概念和实体关系较复杂,现有通用大模型用于特定领域的实体关系抽取,尚无法达到预期效果。As large language models have shown excellent expressiveness in various tasks, people are actively exploring their wide application possibilities in different fields. Existing large models are trained using knowledge corpora in general fields, and can learn language patterns from a wide range of texts and extract entity relationships from unstructured texts to build general domain knowledge graphs. Due to the lack of training in knowledge corpora in specific fields, and the complexity of terminology, concepts, and entity relationships in specific fields, existing general large models cannot achieve the expected results when used to extract entity relationships in specific fields.

为了提高大语言模型抽取实体关系准确性,研究者们提出了两类方法:一是利用特定领域知识构建语料库进行大模型微调;二是通过提示工程技术为大模型提供辅助理解文本的提示。黄菁等在专利“基于动态提示的API实体-关系联合提取方法及系统”构建联合提取API实体-关系的动态提示并构建了一种结构化提取语言,基于结构化提取语言构造训练集和测试集以微调大模型来进行API实体关系联合抽取。该专利针对软件开发的API领域,通过动态提示提升大模型提取API描述文本中实体关系的准确率;杨莉等在专利“基于提示学习的实体关系抽取方法、装置、 介质和设备”将句子特征进行编码并基于提示学习进行调优,将实体关系抽取任务分解为实体识别和关系分类两个阶段,结合问答任务,利用待抽取文本的语义信息,从给定的少量注释文本中抽取出三元组。该专利将实体关系抽取分为前后两个独立阶段,不能做联合抽取;黄宜华等在专利“一种基于动态提示学习的小样本嵌套关系抽取算法”提出基于动态提示学习的小样本嵌套关系抽取算法框架,针对嵌套关系抽取算法,提高关系识别准确率,而不涉及实体关系的联合抽取。In order to improve the accuracy of entity relationship extraction by large language models, researchers have proposed two types of methods: one is to use specific domain knowledge to build a corpus for fine-tuning large models; the other is to provide large models with prompts to assist in understanding text through prompt engineering technology. Huang Jing et al. constructed dynamic prompts for joint extraction of API entity-relationships in the patent "API entity-relationship joint extraction method and system based on dynamic prompts" and constructed a structured extraction language. Based on the structured extraction language, training sets and test sets were constructed to fine-tune the large model for joint extraction of API entity relationships. This patent targets the API field of software development and improves the accuracy of large models in extracting entity relationships in API description texts through dynamic prompts; Yang Li et al. encoded sentence features in the patent "Entity relationship extraction method, device, medium and equipment based on prompt learning" and optimized based on prompt learning, decomposing the entity relationship extraction task into two stages: entity recognition and relationship classification. Combined with question-answering tasks, the semantic information of the text to be extracted is used to extract triples from a given small amount of annotated text. This patent divides entity relationship extraction into two independent stages, and cannot perform joint extraction. Huang Yihua et al. proposed a small sample nested relationship extraction algorithm framework based on dynamic prompt learning in the patent "A small sample nested relationship extraction algorithm based on dynamic prompt learning", which improves the relationship recognition accuracy of the nested relationship extraction algorithm without involving the joint extraction of entity relationships.

发明内容Summary of the invention

本发明公开了一种基于大模型和动态提示的实体关系抽取方法,针对特定领域知识优化动态提示构造方法,利用大语言模型抽取非结构化文本(后简称文本)中的实体关系;该方法包括:1)定义领域知识图谱的模式层及实体集;2)利用模式层及实体集构建实体关系三元组示例集;3)构建实体向量数据库DB;4)构建prompt模板集PR;5)将待处理文本划分为段落,并提取段落的关键词列表;6)利用关键词列表为每个段落构造段落-实体关联列表T;7)对每个段落,利用关联列表T、prompt模板集PR和实体关系三元组示例集构造动态提示;8)对每个段落,将动态提示送入大模型进行实体关系抽取,并检验结果的正确性;具体来说,本发明的方法包括下列步骤:The present invention discloses an entity relationship extraction method based on a large model and dynamic prompts, optimizes a dynamic prompt construction method for specific domain knowledge, and uses a large language model to extract entity relationships in unstructured text (hereinafter referred to as text); the method comprises: 1) defining a pattern layer and an entity set of a domain knowledge graph; 2) constructing an entity relationship triple example set using the pattern layer and the entity set; 3) constructing an entity vector database DB; 4) constructing a prompt template set PR; 5) dividing the text to be processed into paragraphs, and extracting a keyword list of the paragraphs; 6) constructing a paragraph-entity association list T for each paragraph using the keyword list; 7) for each paragraph, constructing a dynamic prompt using the association list T, the prompt template set PR and the entity relationship triple example set; 8) for each paragraph, sending the dynamic prompt to the large model for entity relationship extraction, and checking the correctness of the result; specifically, the method of the present invention comprises the following steps:

A. 定义领域知识图谱的模式层及实体集,具体步骤如下:A. Define the model layer and entity set of the domain knowledge graph. The specific steps are as follows:

A1. 定义实体类别的集合C={C1,…Ci,…CN}(i∈[1,N]),实体类别间关系的集合R={R1,…Rj,…RM}(j∈[1,M]),N和M表示待建知识图谱中的实体类别数和关系数;A1. Define the set of entity categories C={C 1 ,…C i ,…C N }(i∈[1,N]), the set of relationships between entity categories R={R 1 ,…R j ,…R M }(j∈[1,M]), where N and M represent the number of entity categories and the number of relationships in the knowledge graph to be built;

A2. 从行业领域术语词典或目录中,抽取属于实体类别Ci(i∈[1,N])的词汇作为实体,构成该类别的实体集合Ei={ei1,…eij,…,eik}(i∈[1,N],j∈[1,k],k表示实体集合中实体的个数,因实体类别Ci而异);A2. Extract words belonging to entity category C i (i∈[1,N]) from the industry terminology dictionary or catalog as entities to form the entity set E i ={e i1 ,…e ij ,…,e ik } of this category (i∈[1,N], j∈[1,k], k represents the number of entities in the entity set, which varies depending on the entity category C i );

A3. 将具有关系Rj(j∈[1,M])的实体类别Ch(h∈[1,N])和实体类别Ct(t∈[1,N])记为实体类别三元组(Ch,Rj,Ct),其中Ch称为头实体类别,其实体集合记为Ehj,Ct称为尾实体类别,其实体集合记为Etj;所有的实体类别三元组构成知识图谱的模式层;A3. The entity category Ch (h∈[1,N]) and entity category C t (t∈[1,N]) with the relation R j (j∈[1,M]) are recorded as entity category triples ( Ch , R j , C t ), where Ch is called the head entity category and its entity set is recorded as E hj , C t is called the tail entity category and its entity set is recorded as E tj ; all entity category triplets constitute the pattern layer of the knowledge graph;

B. 利用模式层及实体集构建实体关系三元组示例集,以实体类别三元组(Ch,Rj,Ct)为例,具体操作如下:B. Use the schema layer and entity set to construct an example set of entity relationship triples. Take the entity category triple (C h , R j , C t ) as an example. The specific operations are as follows:

B1. 以满足Rj关系为条件,从实体集合Eh和实体集合Et中选取F个实体对;B1. Select F entity pairs from entity set E h and entity set E t under the condition that they satisfy the R j relationship;

B2. 将每个实体对和关系Rj,组成一个实体关系三元组(ehi,Rj,eti),其中实体ehi∈Eh(i∈[1,F]),eti∈Et(i∈[1,F]);B2. Combine each entity pair and relation R j into an entity-relation triple (e hi ,R j ,e ti ), where entity e hi ∈E h (i∈[1,F]), e ti ∈E t (i∈[1,F]);

B3. 用步骤B2构建的F个实体关系三元组,构建关系Rj的实体关系三元组示例集Sj={(ehi,Rj,eti) for i in [1:F]},其中for i in [1:F]表示i的取值为1到F的整数;B3. Using the F entity-relationship triples constructed in step B2, construct an example set of entity-relationship triples S j ={(e hi ,R j ,e ti ) for i in [1:F]} of relation R j , where for i in [1:F] indicates that the value of i is an integer from 1 to F;

C. 构建实体向量数据库DB,具体步骤如下:C. Construct entity vector database DB. The specific steps are as follows:

C1. 对实体集Ei(i∈[1,N])里的每个实体eij∈Ei(i∈[1,N],j∈[1,|Ei|]),使用预训练语言模型编码为实体向量veijC1. For each entity e ij ∈E i (i∈[1,N],j∈[1,|Ei|]) in the entity set E i (i∈[1,N]), encode it into an entity vector ve ij using the pre-trained language model;

C2. 将每一对实体eij和实体向量veij存入实体向量数据库DB;C2. storing each pair of entity e ij and entity vector ve ij into the entity vector database DB;

D. 构建prompt模板集PR,具体步骤如下:D. Build the prompt template set PR. The specific steps are as follows:

D1. 定义模板集PR={pr1,…,prk,…,prn}(k∈[1,n]),n表示模板集PR中含有的模板数量;D1. Define the template set PR = {pr 1 ,…,pr k ,…,pr n } (k∈[1,n]), where n represents the number of templates contained in the template set PR;

D2. 构建每个提示模板prk∈PR,模板中包含若干模板变量,具体步骤如下:D2. Construct each prompt template pr k ∈PR, which contains several template variables. The specific steps are as follows:

D2.1定义领域变量为${field},其中field的值域是领域标识的集合;D2.1 defines a domain variable as ${field}, where the value range of field is the set of domain identifiers;

D2.2定义段落变量为${Pi},其中Pi的值域是段落标识的集合;D2.2 defines the paragraph variable as ${P i }, where the value range of P i is the set of paragraph tags;

D2.3定义实体变量为${e1}、…、${ej}、…、${em},其中e1~em是段落中对应的实体;D2.3 Define entity variables as ${e 1 }, …, ${e j }, …, ${e m }, where e 1 ~e m are the corresponding entities in the paragraph;

D2.4定义关系变量为${R1}、…、${Rj}、…、${RM},其中R1~RM是待建知识图谱中的所有关系;D2.4 Define the relationship variables as ${R 1 }, …, ${R j }, …, ${R M }, where R 1 ~R M are all the relationships in the knowledge graph to be built;

D2.5定义实体关系三元组示例集变量为${S1}、…、${Sj}、…、${SM},其中S1~Sm是关系R1~RM对应的实体关系三元组示例集;D2.5 defines the entity-relationship triple example set variables as ${S 1 }, …, ${S j }, …, ${S M }, where S 1 ~S m is the entity-relationship triple example set corresponding to relations R 1 ~R M ;

E. 将待处理文本划分为段落,并提取段落的关键词列表,具体步骤如下:E. Divide the text to be processed into paragraphs and extract the keyword list of the paragraphs. The specific steps are as follows:

E1. 将待处理的非结构化文本划分成段落,具体步骤如下:E1. Divide the unstructured text to be processed into paragraphs. The specific steps are as follows:

E1.1 清洗文本数据,去除无用的标点符号、特殊字符等;E1.2 使用预训练的词向量模型对文本进行词向量表示,生成词向量序列;E1.3 将词向量序列输入到神经网络序列标注模型,对文本进行主题边界检测,得到每个段落的起止位置;E1.1 Clean the text data and remove useless punctuation marks, special characters, etc.; E1.2 Use the pre-trained word vector model to represent the text with word vectors and generate word vector sequences; E1.3 Input the word vector sequence into the neural network sequence labeling model to detect the topic boundaries of the text and obtain the start and end positions of each paragraph;

E1.4 根据检测的主题边界,将文本划分成不同的段落,得到段落集合P={P0,…Pi,…,Pt}(i∈[1,t]),t为集合中段落的总数;E1.4 Divide the text into different paragraphs according to the detected topic boundaries, and obtain the paragraph set P = {P 0 ,…P i ,…,P t } (i∈[1,t]), where t is the total number of paragraphs in the set;

E2. 提取每个段落Pi(i∈[1,t])中的关键词列表,具体步骤如下:E2. Extract the keyword list in each paragraph Pi (i∈[1,t]). The specific steps are as follows:

E2.1提取每个段落Pi(i∈[1,t])的关键词,加入集合Ki={ki1,ki2,...,kip}(p为Pi的关键词总数),关键词提取方法包括但不限于TF-IDF、TextRank、基于预训练语言模型的方法;E2.1 Extract keywords from each paragraph P i (i∈[1,t]) and add them to the set K i ={k i1 ,k i2 ,...,k ip } (p is the total number of keywords in P i ). Keyword extraction methods include but are not limited to TF-IDF, TextRank, and methods based on pre-trained language models;

E2.2 将段落Pi与关键词集合Ki组合为段落Pi的关键词列表PLi=[Pi, Ki];E2.2 Combine paragraph Pi and keyword set Ki into a keyword list PL i = [P i , Ki ] of paragraph Pi ;

F. 利用关键词列表为每个段落构造段落-实体关联列表T,以段落Pi为例,具体步骤如下:F. Use the keyword list to construct a paragraph-entity association list T for each paragraph. Taking paragraph Pi as an example, the specific steps are as follows:

F1. 将Ki中的每个关键词kij∈Ki(j∈[1,p]),使用步骤C1的预训练语言模型将其编码为关键词向量vkijF1. Encode each keyword k ij ∈K i (j∈[1,p]) in K i into a keyword vector vk ij using the pre-trained language model in step C1;

F2. 将关键词向量vkij在实体向量库中进行相似度搜索,获取Top-N个相似的实体,记为Top-N[kij]={e1,e2,…,eN};F2. Perform similarity search on the keyword vector vk ij in the entity vector library to obtain the Top-N similar entities, denoted as Top-N[k ij ]={e 1 ,e 2 ,…,e N };

F3. 将Ki里的所有关键词对应的所有相似实体,去重,构成实体列表LiF3. Remove duplicates from all similar entities corresponding to all keywords in K i to form an entity list L i ;

F4. 将段落Pi、实体列表Li组合为Pi的段落-实体关联列表Ti=[Pi, Li];F4. Combine paragraph Pi and entity list Li into Pi 's paragraph-entity association list Ti = [ Pi , Li ];

G. 对每个段落,利用关联列表T、prompt模板集PR和实体关系三元组示例集构造动态提示,以段落Pi的关联列表Ti、模板prk和实体关系三元组示例集为例,具体步骤如下:G. For each paragraph, construct a dynamic prompt using the association list T, prompt template set PR and entity relationship triplet example set. Taking the association list Ti , template pr k and entity relationship triplet example set of paragraph Pi as an example, the specific steps are as follows:

G1. 将提示模板prk中的模板变量,替换为对应的变量值,具体步骤如下:G1. Replace the template variables in the prompt template pr k with the corresponding variable values. The specific steps are as follows:

G1.1 将关联列表Ti中的段落Pi代入段落变量${Pi};G1.1 Substitute the paragraph Pi in the association list Ti into the paragraph variable ${ Pi };

G1.2 将关联列表Ti中的实体列表Li的每个实体分别代入实体变量${e1}~${em};G1.2 Substitute each entity of the entity list Li in the association list Ti into the entity variables ${e 1 }~${e m };

G1.3 将实体列表Li的实体具有的关系R1~Rj代入关系变量${R1}~${Rj};G1.3 Substitute the relations R 1 ~R j of the entities in the entity list Li into the relation variables ${R 1 }~${R j };

G1.4 将关系R1~Rj对应的实体三元组示例集S1~Sj里的实体关系三元组示例集,代入实体关系三元组示例集变量${S1} ~${Sj};G1.4 Substitute the entity relationship triple example set in the entity triple example set S 1 ~S j corresponding to the relation R 1 ~R j into the entity relationship triple example set variable ${S 1 } ~${S j };

G2. 对于模板中的剩余变量,按照模板变量的定义输入对应的值,形成一个完整的动态提示内容QkPi(k∈[1,n],i∈[1,t]),例如,输入field(领域名称)代入领域变量${field};G2. For the remaining variables in the template, enter the corresponding values according to the definition of the template variables to form a complete dynamic prompt content Q k P i (k∈[1,n],i∈[1,t]). For example, enter field (field name) and substitute it into the field variable ${field};

H. 将动态提示内容送入大模型进行实体关系抽取,并检验结果的正确性,具体步骤如下:H. Send the dynamic prompt content to the big model for entity relationship extraction and verify the correctness of the results. The specific steps are as follows:

H1. 通过调用API的方式与大模型对话,将动态提示内容送入多个大模型中进行实体关系抽取;H1. Communicate with the big model by calling the API, and send the dynamic prompt content to multiple big models for entity relationship extraction;

H2. 对比分析多个大模型的实体关系抽取结果,选取最优抽取结果。H2. Compare and analyze the entity relationship extraction results of multiple large models and select the optimal extraction result.

本发明与现有技术相比,具有以下优点:实现了一套完整的面向特定领域非结构化文本的实体关系抽取方法,从构建领域知识图谱的模式层和实体集开始,动态构建prompt所需要的各种要素,并生成动态提示,引导大模型进行特定领域的实体关系抽取;利用预训练语言模型构建实体向量数据库,提升了段落文本中实体识别和关系抽取的准确性。本发明融合了知识图谱、实体向量表示、动态提示与大模型的优势,无需微调大模型,实现实体关系的自动抽取,降低了实体关系抽取的成本,提升了效率和正确率,具有广泛的应用推广价值。Compared with the prior art, the present invention has the following advantages: a complete set of entity relationship extraction methods for unstructured text in specific fields is implemented, starting from the pattern layer and entity set of the domain knowledge graph, the various elements required for prompt are dynamically constructed, and dynamic prompts are generated to guide the large model to extract entity relationships in specific fields; the entity vector database is constructed using a pre-trained language model, which improves the accuracy of entity recognition and relationship extraction in paragraph text. The present invention combines the advantages of knowledge graph, entity vector representation, dynamic prompt and large model, without fine-tuning the large model, to achieve automatic extraction of entity relationships, reduce the cost of entity relationship extraction, improve efficiency and accuracy, and has a wide range of application and promotion value.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1:一种基于大模型和动态提示的实体关系抽取方法流程图Figure 1: Flowchart of an entity relationship extraction method based on a large model and dynamic prompts

具体实施方式DETAILED DESCRIPTION

下面结合附图和具体实施实例对本发明进行详细说明。The present invention is described in detail below with reference to the accompanying drawings and specific implementation examples.

本发明公开了一种基于大模型和动态提示的实体关系抽取方法,该方法包括:1)定义领域知识图谱的模式层及实体集;2)利用模式层及实体集构建实体关系三元组示例集;3)构建实体向量数据库DB;4)构建prompt模板集PR;5)将待处理文本划分为段落,并提取段落的关键词列表;6)利用关键词列表为每个段落构造段落-实体关联列表T;7)对每个段落,利用关联列表T、prompt模板集PR和实体关系三元组示例集构造动态提示;8)对每个段落,将动态提示送入大模型进行实体关系抽取,并检验结果的正确性。下面按照步骤,以Python程序设计语言课程知识图谱的一部分为例,以其中一个实体类的部分实体及其相关的部分关系的抽取为实例,对本发明中的技术方案进行清楚、完整地描述。The present invention discloses an entity relationship extraction method based on a large model and dynamic prompts, the method comprising: 1) defining a pattern layer and an entity set of a domain knowledge graph; 2) constructing an entity relationship triple example set using the pattern layer and the entity set; 3) constructing an entity vector database DB; 4) constructing a prompt template set PR; 5) dividing the text to be processed into paragraphs, and extracting a keyword list of the paragraphs; 6) constructing a paragraph-entity association list T for each paragraph using the keyword list; 7) for each paragraph, constructing a dynamic prompt using the association list T, the prompt template set PR and the entity relationship triple example set; 8) for each paragraph, sending the dynamic prompt to the large model for entity relationship extraction, and checking the correctness of the result. The following steps are taken as an example of a part of the knowledge graph of the Python programming language course, and the extraction of some entities of one entity class and their related partial relations as an example, to clearly and completely describe the technical solution in the present invention.

1. 定义领域知识图谱的模式层及实体集,具体步骤如下:1. Define the model layer and entity set of the domain knowledge graph. The specific steps are as follows:

1.1 Python程序设计语言课程知识图谱的一部分需求,是抽取教材中的各种知识点,并将不同知识点之间的包含关系和依赖关系抽取出来;1.1 Part of the requirements for the Python programming language course knowledge graph is to extract various knowledge points in the textbook and extract the inclusion and dependency relationships between different knowledge points;

根据这个需求,定义实体类别{知识点},记为C1,实体类别集合记为C={知识点};知识点实体间的关系集合R={包含关系,依赖关系},包含关系记为R1,依赖关系记为R2,待建知识图谱中的实体类别数M=1,关系类别数N=2;According to this requirement, we define the entity category {knowledge point}, denoted as C 1 , and the entity category set as C={knowledge point}; the relationship set between knowledge point entities is R={inclusion relationship, dependency relationship}, the inclusion relationship is denoted as R 1 , and the dependency relationship is denoted as R 2 . The number of entity categories in the knowledge graph to be built is M=1, and the number of relationship categories is N=2.

1.2 根据教材目录,抽取C1的实体集合E1={基本数据类型,控制结构,数字类型,分支结构,整数类型,单分支结构,二进制,十进制,八进制,十六进制,布尔类型};1.2 According to the textbook catalog, extract the entity set E 1 of C 1 = {basic data type, control structure, digital type, branch structure, integer type, single branch structure, binary, decimal, octal, hexadecimal, Boolean type};

1.3 将知识点 C1类别之间存在的包含关系R1和依赖关系R2,对应的实体类别三元组和对应的实体集合,具体记为:1.3 The inclusion relationship R 1 and dependency relationship R 2 between the knowledge point C 1 categories, the corresponding entity category triples and the corresponding entity sets are specifically recorded as:

1.3.1 包含关系R1对应的实体类别三元组记为(知识点,包含关系,知识点),头实体集合记为Eh1={基本数据类型,控制结构,数字类型,分支结构,整数类型},Et1={数字类型,分支结构,整数类型,单分支结构,二进制,十进制,八进制,十六进制};1.3.1 The entity category triple corresponding to the inclusion relation R1 is recorded as (knowledge point, inclusion relation, knowledge point), and the head entity set is recorded as Eh1 = {basic data type, control structure, digital type, branch structure, integer type}, Et1 = {digital type, branch structure, integer type, single branch structure, binary, decimal, octal, hexadecimal};

1.3.2 依赖关系R2对应的实体类别三元组记为(知识点,依赖关系,知识点),头实体集合记为Eh2={控制结构},Et2={基本数据类型,布尔类型};1.3.2 The entity category triple corresponding to the dependency relationship R2 is recorded as (knowledge point, dependency relationship, knowledge point), and the head entity set is recorded as Eh2 = {control structure}, Et2 = {basic data type, Boolean type};

2. 利用模式层及实体集构建实体关系三元组示例集,以实体类别三元组(知识点,包含关系,知识点)和(知识点,依赖关系,知识点)为例,具体操作如下:2. Use the pattern layer and entity set to construct an example set of entity relationship triples. Take the entity category triples (knowledge point, inclusion relationship, knowledge point) and (knowledge point, dependency relationship, knowledge point) as examples. The specific operations are as follows:

2.1 以满足包含关系R1关系为条件,从实体集合Eh1和实体集合Et1中选取2个实体对;以满足依赖关系R2关系为条件,从实体集合Eh2和实体集合Et2中选取2个实体对;2.1 Select two entity pairs from entity set E h1 and entity set E t1 under the condition of satisfying the inclusion relationship R 1 ; select two entity pairs from entity set E h2 and entity set E t2 under the condition of satisfying the dependency relationship R 2 ;

2.2 包含关系R1的实体关系三元组实例集S1={(控制结构,包含关系,分支结构),(分支结构,包含关系,二分支结构)};依赖关系R2的实体关系三元组实例集S2={(控制结构,依赖关系关系,基本数字类型),(控制结构,依赖关系,布尔类型)};2.2 The entity relationship triple instance set S 1 of the inclusion relation R 1 ={(control structure, inclusion relation, branch structure), (branch structure, inclusion relation, two-branch structure)}; the entity relationship triple instance set S 2 of the dependency relation R 2 ={(control structure, dependency relation, basic numeric type), (control structure, dependency relation, Boolean type)};

3. 构建实体向量数据库DB,具体步骤如下:3. Build the entity vector database DB. The specific steps are as follows:

3.1 对实体集E1里的每个实体e1j∈E1(j∈[1,11]),使用BERT编码为实体向量ve1j(j∈[1,11]);3.1 For each entity e 1j ∈E 1 (j∈[1,11]) in the entity set E 1 , use BERT to encode it into an entity vector ve 1j (j∈[1,11]);

3.2 将每一对实体e1j和实体向量ve1j存入实体向量数据库DB;3.2 Store each pair of entity e 1j and entity vector ve 1j into the entity vector database DB;

4. 构建prompt模板集PR,具体步骤如下:4. Build the prompt template set PR. The specific steps are as follows:

4.1 定义模板集PR={pr1,pr2,pr3};4.1 Define the template set PR = {pr 1 ,pr 2 ,pr 3 };

4.2 构建每个提示模板prk∈PR,i∈[1,3],模板中包含若干模板变量,具体步骤如下:4.2 Construct each prompt template pr k ∈ PR, i ∈ [1, 3], the template contains several template variables, the specific steps are as follows:

4.2.1 定义pr1的模板内容:“你是特定领域${field}的资深专家,给定一个段落${Pi},并给定段落中的实体:${e1}、……、实体${ej}、……、实体${em},请抽取出这段话中的实体关系三元组;三元组的格式为(头实体,关系,尾实体);实体之间只允许存在以下几种关系:${R1}、${R2};其中R1关系是${explainR1},R2关系是${explainR2}”;模板包含变量:field(领域名称)、Pi(段落)、e1~em(段落Pi中对应的实体)、R1~R2(待建知识图谱中的关系)和explainR1~explainR2(根据关系R1~R2对应的实体关系三元组示例集,构建关系R1~R2的示例说明);4.2.1 Define the template content of pr 1 : "You are a senior expert in a specific field ${field}. Given a paragraph ${P i } and the entities in the paragraph: ${e 1 }, ..., entity ${e j }, ..., entity ${e m }, please extract the entity-relationship triples in this paragraph; the format of the triples is (head entity, relationship, tail entity); only the following relationships are allowed between entities: ${R 1 }, ${R 2 }; the R 1 relationship is ${explainR 1 }, and the R 2 relationship is ${explainR 2 }"; the template contains variables: field (field name), Pi (paragraph), e 1 ~e m (the corresponding entities in paragraph Pi ), R 1 ~R 2 (the relationships in the knowledge graph to be built) and explainR 1 ~explainR 2 (according to the example set of entity-relationship triples corresponding to the relationships R 1 ~R 2 , construct an example description of the relationships R 1 ~R 2 );

4.2.2 定义pr2的模板内容:“请以${field}领域专业人士的角度对段落${Pi}进行实体关系三元组抽取,要求如下:1.三元组的实体必须是以下几个:${e1}、……、实体${ej}、……、实体${em};2.三元组的关系必须是以下几种:${R1}、${R2};3.参考以下示例:关系${R1}的示例${S1}、关系${R2}的示例${S2}”;模板包含变量:field(领域名称)、Pi(段落)、e1~em(段落Pi中对应的实体)、R1~R2(待建知识图谱中的关系)和S1~S2(关系R1~R2对应的实体关系三元组示例集S1~S2);4.2.2 Define the template content of pr 2 : "Please extract entity-relationship triples from paragraph ${Pi} from the perspective of professionals in the ${field} field, with the following requirements: 1. The entities of the triples must be the following: ${e 1 }, ..., entity ${e j }, ..., entity ${e m }; 2. The relations of the triples must be the following: ${R 1 }, ${R 2 }; 3. Refer to the following examples: example ${S 1 } of relation ${R 1 }, example ${S 2 } of relation ${R 2 } "; the template contains variables: field (field name), Pi (paragraph), e 1 ~e m (entities corresponding to paragraph Pi ), R 1 ~R 2 (relations in the knowledge graph to be built) and S 1 ~S 2 (example set S 1 ~S 2 of entity-relationship triples corresponding to relations R 1 ~R 2 );

4.2.3 定义pr3的模板内容:请基于以下信息,为指定段落文本进行实体关系三元组抽取:段落:${Pi};实体:${e1}、……、实体${ej}、……、实体${em};关系:${R1}、${R2};三元组实例:${S1}、${S2};”;模板包含变量:field(领域名称)、Pi(段落)、e1~em(段落Pi中对应的实体)、R1~R2(待建知识图谱中的关系)和S1~S2(关系R1~R3对应的实体关系三元组示例集S1~S2);4.2.3 Define the template content of pr 3 : Please extract entity-relationship triples for the specified paragraph text based on the following information: paragraph: ${Pi}; entity: ${e 1 }, ..., entity ${e j }, ..., entity ${e m }; relationship: ${R 1 }, ${R 2 }; triple instance: ${S 1 }, ${S 2 };"; The template contains variables: field (field name), Pi (paragraph), e 1 ~e m (the corresponding entity in paragraph Pi ), R 1 ~R 2 (the relationship in the knowledge graph to be built) and S 1 ~S 2 (the entity-relationship triple example set S 1 ~S 2 corresponding to the relationship R 1 ~R 3 );

5. 将待处理文本划分为段落,并提取段落的关键词列表,具体步骤如下:5. Divide the text to be processed into paragraphs and extract the keyword list of the paragraphs. The specific steps are as follows:

5.1对待处理的非结构化文本进行段落划分,具体步骤如下:5.1 Divide the unstructured text to be processed into paragraphs. The specific steps are as follows:

5.1.1 清洗文本数据,去除无用的标点符号、特殊字符等;5.1.2 使用BERT对文本进行词向量表示,生成词向量序列;5.1.3 将词向量序列输入到BiLSTM-CRF模型,对文本进行主题边界检测,得到每个段落的起止位置;5.1.4 根据检测的主题边界,将文本划分成不同的段落,得到段落集合P={P0,…Pi,…,P10}(i∈[1,10]);5.1.1 Clean the text data and remove useless punctuation marks, special characters, etc.; 5.1.2 Use BERT to represent the text as word vectors and generate a word vector sequence; 5.1.3 Input the word vector sequence into the BiLSTM-CRF model, perform topic boundary detection on the text, and obtain the start and end positions of each paragraph; 5.1.4 Divide the text into different paragraphs based on the detected topic boundaries, and obtain the paragraph set P={P 0 ,…P i ,…,P 10 }(i∈[1,10]);

5.2 提取每个段落Pi(i∈[1,10])中的关键词,以段落P4为例,具体步骤如下:5.2 Extract keywords from each paragraph Pi (i∈[1,10]). Taking paragraph P4 as an example, the specific steps are as follows:

5.2.1利用通义千问大模型提取段落P4:“整数类型与数学中整数的概念一致,下面是整数类型的例子:1010,99,-217,0x9a,-0x89。整数类型共有4种进制表示:十进制、二进制、八进制和十六进制。默认情况整数采用十进制,其他进制需要增加引导符号。二进制数以0b引导八进制数以0o引导,十六进制数以0x引导,大小写字母均可使用。”的关键词,加入集合K4={整数类型,整数,整数概念,进制表示,十进制,二进制,八进制,十六进制,引导符号};5.2.1 Use the Tongyi Qianwen model to extract the keywords of paragraph P 4 : "Integer type is consistent with the concept of integer in mathematics. The following are examples of integer types: 1010, 99, -217, 0x9a, -0x89. Integer type has four bases: decimal, binary, octal and hexadecimal. By default, integers use decimal. Other bases require a leading symbol. Binary numbers are led by 0b, octal numbers are led by 0o, and hexadecimal numbers are led by 0x. Both uppercase and lowercase letters can be used." and add the keywords to the set K 4 ={integer type, integer, integer concept, base representation, decimal, binary, octal, hexadecimal, leading symbol};

5.2.2 将段落P4与关键词集合K4组合为段落P4的关键词列表PL4=[P4, K4];5.2.2 Combine paragraph P 4 and keyword set K 4 into keyword list PL 4 =[P 4 , K 4 ] of paragraph P 4 ;

6. 利用关键词列表为每个段落构造段落-实体关联列表T,以段落P4为例,具体步骤如下:6. Use the keyword list to construct a paragraph-entity association list T for each paragraph. Taking paragraph P 4 as an example, the specific steps are as follows:

6.1 将K4中的每个关键词k4j∈K4(j∈[1,8]),使用BERT将其编码为关键词向量vk4j6.1 Encode each keyword k 4j ∈K 4 (j∈[1,8]) in K 4 into a keyword vector vk 4j using BERT;

6.2 将关键词向量vk4j在实体向量库中进行相似度搜索,获取Top-1个相似的实体,以关键词“整数类型”为例,获取Top-1个相似的实体,也是 “整数类型”,记为Top-1[k4j]={整数类型};6.2 Perform similarity search on the keyword vector vk 4j in the entity vector library to obtain the top-1 similar entities. Taking the keyword "integer type" as an example, the top-1 similar entity is also "integer type", recorded as Top-1[k 4j ]={integer type};

6.3 将K4里的所有关键词对应的所有相似实体,去重,构成实体列表L4=[整数类型,十进制,二进制,八进制,十六进制];6.3 Remove duplicates from all similar entities corresponding to all keywords in K 4 to form an entity list L 4 = [integer type, decimal, binary, octal, hexadecimal];

6.4 将段落P4、实体列表L4组合为P4的段落-实体关联列表T4=[P4, L4];6.4 Combine paragraph P 4 and entity list L 4 into P 4 's paragraph-entity association list T 4 = [P 4 , L 4 ];

7. 对每个段落,利用关联列表T、prompt模板集PR和实体关系三元组示例集构造动态提示,以段落P4的关联列表T4、模板pr1和实体关系三元组示例集为例,具体步骤如下:7. For each paragraph, construct a dynamic prompt using the association list T, the prompt template set PR, and the entity relationship triplet example set. Take the association list T 4 , template pr 1 , and the entity relationship triplet example set of paragraph P 4 as an example. The specific steps are as follows:

7.1 将提示模板pr1中的模板变量,替换为对应的变量值,具体步骤如下:7.1 Replace the template variables in the prompt template pr 1 with the corresponding variable values. The specific steps are as follows:

7.1.1 将关联列表T4中的段落P4代入段落变量${Pi};7.1.1 Substitute paragraph P 4 in association list T 4 into paragraph variable ${P i };

7.1.2 将将关联列表T4中的实体列表L4的每个实体分别代入实体变量${e1}~${e5};7.1.2 Substitute each entity of the entity list L 4 in the association list T 4 into the entity variables ${e 1 }~${e 5 };

7.1.3 将实体列表L4的实体具有的关系R1~R2分别代入关系变量${R1} ~${R2};7.1.3 Substitute the relations R 1 ~R 2 of the entities in the entity list L 4 into the relation variables ${R 1 } ~${R 2 } respectively;

7.1.4 将关系R1~R2对应的实体三元组示例集S1~S2里的实体三元组,代入实体关系三元组示例集变量${S1} ~${S2};7.1.4 Substitute the entity triples in the entity triple example set S 1 ~S 2 corresponding to the relations R 1 ~R 2 into the entity relation triple example set variables ${S 1 } ~${S 2 };

7.2 对于模板中的剩余变量,按照需要输入模板变量对应的值,形成一个完整的动态提示内容QkP4(k∈[1,3]),以pr1为例,输入“Python程序设计语言”代入领域变量${field};输入“指两个知识点属于同一类型的概念且前者所指的范围将后者包含着,如控制结构与分支结构的关系”,代入模板变量${explainR1};输入“指学习或应用某个知识点时,必须先掌握或理解另一个或多个知识点,如学习控制结构之前,必须先掌握布尔类型这一知识点”,代入模板变量${explainR2};7.2 For the remaining variables in the template, enter the values corresponding to the template variables as needed to form a complete dynamic prompt content Q k P 4 (k∈[1,3]). Taking pr 1 as an example, enter "Python programming language" and substitute it into the field variable ${field}; enter "refers to the concept that two knowledge points belong to the same type and the scope referred to by the former includes the latter, such as the relationship between the control structure and the branch structure", and substitute it into the template variable ${explainR 1 }; enter "refers to the fact that when learning or applying a certain knowledge point, one must first master or understand another or more knowledge points, such as before learning the control structure, one must first master the Boolean type knowledge point", and substitute it into the template variable ${explainR 2 };

因此动态提示内容Q1P4为:你是特定领域Python程序设计语言的资深专家,给定一个段落“整数类型与数学中整数的概念一致,下面是整数类型的例子:1010,99,-217,0x9a,-0x89。整数类型共有4种进制表示:十进制、二进制、八进制和十六进制。默认情况整数采用十进制,其他进制需要增加引导符号。二进制数以0b引导八进制数以0o引导,十六进制数以0x引导,大小写字母均可使用。”,并给定段落中的实体:整数、十进制、二进制、八进制、十六进制,请抽取出这段话中的实体关系三元组;三元组的格式为(头实体,关系,尾实体);实体之间只允许存在以下几种关系:包含关系、依赖关系;其中包含关系是指两个知识点属于同一类型的概念且前者所指的范围将后者包含着,如控制结构与分支结构的关系,依赖关系是指学习或应用某个知识点时,必须先掌握或理解另一个或多个知识点,如学习控制结构之前,必须先掌握布尔类型这一知识点;Therefore, the dynamic prompt content Q 1 P 4 is: You are a senior expert in the Python programming language in a specific field. Given a paragraph "The integer type is consistent with the concept of integers in mathematics. The following are examples of integer types: 1010, 99, -217, 0x9a, -0x89. Integer types have 4 bases: decimal, binary, octal, and hexadecimal. By default, integers use decimal, and other bases require a leading symbol. Binary numbers are led by 0b, octal numbers are led by 0o, and hexadecimal numbers are led by 0x. Both uppercase and lowercase letters can be used.", and given the entity in the paragraph: integer Number, decimal, binary, octal, hexadecimal, please extract the entity relationship triples in this paragraph; the format of the triple is (head entity, relationship, tail entity); only the following relationships are allowed between entities: inclusion relationship, dependency relationship; inclusion relationship means that two knowledge points belong to the same type of concept and the scope of the former includes the latter, such as the relationship between control structure and branch structure; dependency relationship means that when learning or applying a certain knowledge point, you must first master or understand another one or more knowledge points, such as before learning control structure, you must first master the knowledge point of Boolean type;

同理,可得动态提示内容Q2P4为:请以Python程序设计语言领域专业人士的角度对段落“整数类型与数学中整数的概念一致,下面是整数类型的例子:1010,99,-217,0x9a,-0x89。整数类型共有4种进制表示:十进制、二进制、八进制和十六进制。默认情况整数采用十进制,其他进制需要增加引导符号。二进制数以0b引导八进制数以0o引导,十六进制数以0x引导,大小写字母均可使用。”进行实体关系三元组抽取,要求如下:1.三元组的实体必须是以下几种:整数、十进制、二进制、八进制、十六进制,2.三元组的关系必须是以下几种:包含关系、依赖关系,3.参考以下示例:包含关系的示例(控制结构,包含关系,分支结构),(分支结构,包含关系,二分支结构),依赖关系的示例(控制结构,依赖关系,基本数据类型),(控制结构,依赖关系,布尔类型);Similarly, we can get the dynamic prompt content Q 2 P 4 : Please use the perspective of professionals in the field of Python programming language to extract entity relationship triples from the paragraph "Integer types are consistent with the concept of integers in mathematics. The following are examples of integer types: 1010, 99, -217, 0x9a, -0x89. Integer types are represented in 4 bases: decimal, binary, octal, and hexadecimal. By default, integers are in decimal, and other bases require a leading symbol. Binary numbers are led by 0b, octal numbers are led by 0o, and hexadecimal numbers are led by 0x. Both uppercase and lowercase letters can be used." The requirements are as follows: 1. The entities of the triples must be the following: integer, decimal, binary, octal, hexadecimal, 2. The relationships of the triples must be the following: inclusion relationship, dependency relationship, 3. Refer to the following examples: Examples of inclusion relationships (control structure, inclusion relationship, branch structure), (branch structure, inclusion relationship, binary branch structure), examples of dependency relationships (control structure, dependency relationship, basic data type), (control structure, dependency relationship, Boolean type);

可得动态提示内容Q3P4为:请基于以下信息,为指定段落文本进行实体关系三元组抽取:段落:“整数类型与数学中整数的概念一致,下面是整数类型的例子:1010,99,-217,0x9a,-0x89。整数类型共有4种进制表示:十进制、二进制、八进制和十六进制。默认情况整数采用十进制,其他进制需要增加引导符号。二进制数以0b引导八进制数以0o引导,十六进制数以0x引导,大小写字母均可使用。”;实体:整数、十进制、二进制、八进制、十六进制;关系:包含关系、依赖关系;三元组实例:(控制结构,包含关系,分支结构),(分支结构,包含关系,二分支结构),(控制结构,依赖关系,基本数据类型),(控制结构,依赖关系,布尔类型);The dynamic prompt content Q 3 P 4 is: Please extract entity relationship triples for the specified paragraph text based on the following information: Paragraph: "The integer type is consistent with the concept of integers in mathematics. The following are examples of integer types: 1010, 99, -217, 0x9a, -0x89. There are 4 bases for integer types: decimal, binary, octal and hexadecimal. By default, integers use decimal, and other bases need to add leading symbols. Binary numbers are led by 0b, octal numbers are led by 0o, and hexadecimal numbers are led by 0x. Both uppercase and lowercase letters can be used."; Entities: integers, decimal, binary, octal, hexadecimal; Relationships: inclusion relations, dependency relations; Triple instances: (control structure, inclusion relations, branch structure), (branch structure, inclusion relations, two-branch structure), (control structure, dependency relations, basic data types), (control structure, dependency relations, Boolean type);

8. 将动态提示内容送入大模型进行实体关系抽取,具体步骤如下:8. Send the dynamic prompt content to the big model for entity relationship extraction. The specific steps are as follows:

8.1 通过调用API的方式与大模型对话,将步骤7生成的动态提示内容分别送入ChatGPT-4和Claude-3-opus中进行实体关系抽取,以动态提示内容Q1P4、Q2P4和Q3P4为例,ChatGPT-4和Claude-3-opus的实体关系抽取结果如表1所示;8.1 By calling the API to communicate with the big model, the dynamic prompt content generated in step 7 is sent to ChatGPT-4 and Claude-3-opus for entity relationship extraction. Taking the dynamic prompt content Q 1 P 4 , Q 2 P 4 and Q 3 P 4 as examples, the entity relationship extraction results of ChatGPT-4 and Claude-3-opus are shown in Table 1;

表1 实体关系抽取结果Table 1 Entity relationship extraction results

ChatGPT-4ChatGPT-4 Claude-3-opusClaude-3-opus Q1P4 Q 1 P 4 基于您的要求,以下是从段落中抽取的实体关系三元组:包含关系:(整数,包含关系, 十进制)(整数, 包含关系, 二进制)(整数, 包含关系, 八进制)(整数, 包含关系, 十六进制)依赖关系:(二进制, 依赖关系, 引导符号0b)(八进制, 依赖关系,引导符号0o)(十六进制,依赖关系, 引导符号0x)这些三元组精确地描述了段落中涉及的实体之间的关系,并满足了您的要求。Based on your requirements, the following are the entity relationship triplets extracted from the paragraph: Inclusion relationship: (integer, inclusion relationship, decimal) (integer, inclusion relationship, binary) (integer, inclusion relationship, octal) (integer, inclusion relationship, hexadecimal) Dependency relationship: (binary, dependency relationship, leading symbol 0b) (octal, dependency relationship, leading symbol 0o) (hexadecimal, dependency relationship, leading symbol 0x) These triplets accurately describe the relationship between the entities involved in the paragraph and meet your requirements. 根据给定的段落和实体,我抽取出以下实体关系三元组:(整数类型, 包含,十进制)(整数类型, 包含, 二进制)(整数类型,包含, 八进制)(整数类型, 包含, 十六进制)(二进制, 依赖, 0b引导)(八进制, 依赖, 0o引导)(十六进制, 依赖, 0x引导)这些三元组表示:整数类型包含十进制、二进制、八进制和十六进制四种进制表示。二进制数依赖于0b引导符号。八进制数依赖于0o引导符号。十六进制数依赖于0x引导符号。Based on the given paragraphs and entities, I extracted the following entity relationship triples: (integer type, including, decimal) (integer type, including, binary) (integer type, including, octal) (integer type, including, hexadecimal) (binary, dependent, 0b leading) (octal, dependent, 0o leading) (hexadecimal, dependent, 0x leading) These triples indicate that the integer type contains four base representations: decimal, binary, octal, and hexadecimal. Binary numbers depend on the 0b leading symbol. Octal numbers depend on the 0o leading symbol. Hexadecimal numbers depend on the 0x leading symbol. Q2P4 Q 2 P 4 基于提供的段落和给定的要求,我们可以从文本中抽取出以下实体关系三元组:(整数, 包含关系, 十进制)(整数, 包含关系,二进制)(整数, 包含关系, 八进制)(整数, 包含关系, 十六进制)这四个三元组表示整数包含了不同的进制表示。每个进制都是整数的一种表现形式。(二进制, 依赖关系,0b引导符)(八进制, 依赖关系, 0o引导符)(十六进制, 依赖关系, 0x引导符)这三个三元组表示不同的进制依赖于特定的引导符来表示,在Python中,0b、0o、0x是用来表示二进制、八进制和十六进制的引导符号。请注意,文本中并没有直接提及十进制的特定引导符,因为默认情况下,不带任何前缀的数字被解释为十进制,因此没有创建依赖于特定引导符的三元组。Based on the provided paragraph and given requirements, we can extract the following entity-relationship triples from the text: (integer, inclusion, decimal)(integer, inclusion, binary)(integer, inclusion, octal)(integer, inclusion, hexadecimal)These four triples indicate that integers contain different base representations. Each base is a representation of an integer. (binary, dependency, 0b leader)(octal, dependency, 0o leader)(hexadecimal, dependency, 0x leader)These three triples indicate that different bases depend on specific leaders to represent. In Python, 0b, 0o, and 0x are the leader symbols used to represent binary, octal, and hexadecimal. Note that the specific leader of the decimal system is not directly mentioned in the text because, by default, numbers without any prefix are interpreted as decimal, so no triples that depend on specific leaders are created. 根据给定的段落和要求,我抽取出以下实体关系三元组:1. (整数, 包含关系, 十进制)2. (整数,包含关系, 二进制)3.(整数, 包含关系, 八进制) 4. (整数, 包含关系, 十六进制)5. (二进制, 依赖关系, 0b引导)6. (八进制, 依赖关系,0o引导)7. (十六进制,依赖关系, 0x引导)以上三元组满足给定的实体和关系限制条件。整数类型包含了十进制、二进制、八进制和十六进制四种表示方式。而二进制、八进制、十六进制又分别依赖于0b、0o、0x的引导符号。According to the given paragraphs and requirements, I extracted the following entity-relationship triples: 1. (integer, inclusion, decimal) 2. (integer, inclusion, binary) 3. (integer, inclusion, octal) 4. (integer, inclusion, hexadecimal) 5. (binary, dependency, 0b leading) 6. (octal, dependency, 0o leading) 7. (hexadecimal, dependency, 0x leading) The above triples meet the given entity and relationship constraints. The integer type includes four representations: decimal, binary, octal and hexadecimal. Binary, octal and hexadecimal rely on the leading symbols 0b, 0o and 0x respectively. Q3P4 Q 3 P 4 根据您提供的信息,我们可以对指定段落进行实体关系三元组的抽取。在这个段落中,关键实体有“整数”、“十进制”、“二进制”、“八进制”、“十六进制”,而关系类型则包括“包含关系”、“同一关系”和“依赖关系”。这里是如何构建这些实体和关系的一些三元组示例:1. 包含关系(整数,包含关系,十进制)(整数,包含关系,二进制)(整数,包含关系,八进制(整数,包含关系,十六进制)2. 依赖关系(二进制, 依赖关系, 0b引导符)(八进制, 依赖关系, 0o引导符)(十六进制, 依赖关系, 0x引导符)表明非十进制的表示需要依赖特定的引导符号,如0b、0o、0x。这些三元组能够帮助描述和理解整数类型在不同进制表示下的结构和特征。每个三元组均准确反映了段落中提到的信息,如整数类型的不同表示方法及其依赖关系。Based on the information you provided, we can extract entity-relationship triples for the specified paragraph. In this paragraph, the key entities are "integer", "decimal", "binary", "octal", and "hexadecimal", while the relationship types include "inclusion relationship", "identity relationship", and "dependency relationship". Here are some examples of how to build these entity and relationship triples:1. Inclusion relationship (integer, inclusion relationship, decimal)(integer, inclusion relationship, binary)(integer, inclusion relationship, octal(integer, inclusion relationship, hexadecimal)2. Dependency relationship (binary, dependency relationship, 0b leader)(octal, dependency relationship, 0o leader)(hexadecimal, dependency relationship, 0x leader)Indicates that non-decimal representations require specific leader symbols, such as 0b, 0o, and 0x. These triples can help describe and understand the structure and characteristics of integer types in different base representations. Each triple accurately reflects the information mentioned in the paragraph, such as different representation methods of integer types and their dependencies. 好的,根据给定的段落和实体,我尝试抽取三元组关系如下:(整数, 包含关系, 十进制)(整数, 包含关系, 二进制)(整数, 包含关系, 八进制)(整数,包含关系, 十六进制)(二进制, 依赖关系, 0b)(八进制, 依赖关系, 0o)(十六进制, 依赖关系, 0x)这些三元组涵盖了段落中提到的整数类型与不同进制表示形式之间的包含关系。整数类型包含四种进制表示,这四种进制有各自对应的表示方法,而二进制、八进制和十六进制的表示都依赖于特定的引导符号。OK, given the paragraph and entity, I try to extract the following triples: (integer, inclusion, decimal) (integer, inclusion, binary) (integer, inclusion, octal) (integer, inclusion, hexadecimal) (binary, dependency, 0b) (octal, dependency, 0o) (hexadecimal, dependency, 0x) These triplets cover the inclusion relationship between the integer types mentioned in the paragraph and different base representations. The integer type contains four base representations, and these four bases have their own corresponding representation methods, and the binary, octal and hexadecimal representations all rely on specific leading symbols.

8.2 对比分析两个大模型的实体关系抽取结果,选取最优抽取结果,具体操作如下:8.2 Compare and analyze the entity relationship extraction results of the two large models and select the optimal extraction result. The specific operations are as follows:

对于包含关系,两个大模型在三个动态提示下的实体关系抽取结果基本一致;对于依赖关系,两个大模型在三个动态提示下抽取的尾实体在表达上略有不同,需对依赖关系的实体关系三元组进行检验,选取最优抽取结果。For inclusion relationships, the entity relationship extraction results of the two large models under three dynamic prompts are basically the same; for dependency relationships, the tail entities extracted by the two large models under three dynamic prompts are slightly different in expression, and it is necessary to test the entity relationship triplets of the dependency relationship and select the optimal extraction result.

本发明实现了一套完整的面向特定领域非结构化文本的实体关系抽取方法,从构建领域知识图谱的模式层和实体集开始,动态构建prompt所需要的各种要素,并生成动态提示,引导大模型进行特定领域的实体关系抽取;利用预训练语言模型构建实体向量数据库,提升了段落文本中实体识别和关系抽取的准确性。本发明融合了知识图谱、实体向量表示、动态提示与大模型的优势,无需微调大模型,实现实体关系的自动抽取,降低了实体关系抽取的成本,提升了效率和正确率,具有广泛的应用推广价值。The present invention realizes a complete entity relationship extraction method for unstructured text in a specific field. Starting from the pattern layer and entity set of the domain knowledge graph, the various elements required for prompt are dynamically constructed, and dynamic prompts are generated to guide the large model to extract entity relationships in a specific field; the entity vector database is constructed using a pre-trained language model to improve the accuracy of entity recognition and relationship extraction in paragraph text. The present invention combines the advantages of knowledge graph, entity vector representation, dynamic prompt and large model, without fine-tuning the large model, to achieve automatic extraction of entity relationships, reduce the cost of entity relationship extraction, improve efficiency and accuracy, and has a wide range of application and promotion value.

Claims (5)

1.一种基于大模型和动态提示的实体关系抽取方法,其步骤包括:1. A method for extracting entity relationships based on a large model and dynamic prompts, the steps of which include: A.定义领域知识图谱的模式层及实体集,具体步骤如下:A. Define the model layer and entity set of the domain knowledge graph. The specific steps are as follows: A1. 定义实体类别的集合C={C1,…Ci,…CN}(i∈[1,N]),实体类别间关系的集合R={R1,…Rj,…RM}(j∈[1,M]),N和M表示待建知识图谱中的实体类别数和关系数;A1. Define the set of entity categories C={C 1 ,…C i ,…C N }(i∈[1,N]), the set of relationships between entity categories R={R 1 ,…R j ,…R M }(j∈[1,M]), where N and M represent the number of entity categories and the number of relationships in the knowledge graph to be built; A2. 从行业领域术语词典或目录中,抽取属于实体类别Ci(i∈[1,N])的词汇作为实体,构成该类别的实体集合Ei={ei1,…eij,…,eik}(i∈[1,N],j∈[1,k],k表示实体集合中实体的个数,因实体类别Ci而异);A2. Extract words belonging to entity category C i (i∈[1,N]) from the industry terminology dictionary or catalog as entities to form the entity set E i ={e i1 ,…e ij ,…,e ik } of this category (i∈[1,N], j∈[1,k], k represents the number of entities in the entity set, which varies depending on the entity category C i ); A3. 将具有关系Rj(j∈[1,M])的实体类别Ch(h∈[1,N])和实体类别Ct(t∈[1,N])记为实体类别三元组(Ch,Rj,Ct),其中Ch称为头实体类别,其实体集合记为Ehj,Ct称为尾实体类别,其实体集合记为Etj;所有的实体类别三元组构成知识图谱的模式层;A3. The entity category Ch (h∈[1,N]) and entity category C t (t∈[1,N]) with the relation R j (j∈[1,M]) are recorded as entity category triples ( Ch , R j , C t ), where Ch is called the head entity category and its entity set is recorded as E hj , C t is called the tail entity category and its entity set is recorded as E tj ; all entity category triplets constitute the pattern layer of the knowledge graph; B.利用模式层及实体集构建实体关系三元组示例集,以实体类别三元组(Ch,Rj,Ct)为例,具体操作如下:B. Use the pattern layer and entity set to construct an example set of entity relationship triples. Take the entity category triple (C h , R j , C t ) as an example. The specific operations are as follows: B1. 以满足Rj关系为条件,从实体集合Eh和实体集合Et中选取F个实体对;B1. Select F entity pairs from entity set E h and entity set E t under the condition that they satisfy the R j relationship; B2. 将每个实体对和关系Rj,组成一个实体关系三元组(ehi,Rj,eti),其中实体ehi∈Eh(i∈[1,F]),eti∈Et(i∈[1,F]);B2. Combine each entity pair and relation R j into an entity-relation triple (e hi ,R j ,e ti ), where entity e hi ∈E h (i∈[1,F]), e ti ∈E t (i∈[1,F]); B3. 用步骤B2构建的F个实体关系三元组,构建关系Rj的实体关系三元组示例集Sj={(ehi,Rj,eti) for i in [1:F]},其中for i in [1:F]表示i的取值为1到F的整数;B3. Using the F entity-relationship triples constructed in step B2, construct an example set of entity-relationship triples S j ={(e hi ,R j ,e ti ) for i in [1:F]} of relation R j , where for i in [1:F] indicates that the value of i is an integer from 1 to F; C.构建实体向量数据库DB,具体步骤如下:C. Construct entity vector database DB. The specific steps are as follows: C1. 对实体集Ei(i∈[1,N])里的每个实体eij∈Ei(i∈[1,N],j∈[1,|Ei|]),使用预训练语言模型编码为实体向量veijC1. For each entity e ij ∈E i (i∈[1,N],j∈[1,|Ei|]) in the entity set E i (i∈[1,N]), encode it into an entity vector ve ij using the pre-trained language model; C2. 将每一对实体eij和实体向量veij存入实体向量数据库DB;C2. storing each pair of entity e ij and entity vector ve ij into the entity vector database DB; D.构建prompt模板集PR,具体步骤如下:D. Build the prompt template set PR. The specific steps are as follows: D1. 定义模板集PR={pr1,…,prk,…,prn}(k∈[1,n]),n表示模板集PR中含有的模板数量;D1. Define the template set PR = {pr 1 ,…,pr k ,…,pr n } (k∈[1,n]), where n represents the number of templates contained in the template set PR; D2. 构建每个提示模板prk∈PR,模板中包含若干模板变量;D2. Construct each prompt template pr k ∈ PR, where the template contains several template variables; E. 将待处理文本划分为段落,并提取段落的关键词列表,具体步骤如下:E. Divide the text to be processed into paragraphs and extract the keyword list of the paragraphs. The specific steps are as follows: E1. 将待处理的非结构化文本划分成段落;E1. Divide the unstructured text to be processed into paragraphs; E2. 提取每个段落Pi(i∈[1,t])中的关键词列表;E2. Extract the keyword list in each paragraph Pi (i∈[1,t]); F.利用关键词列表为每个段落构造段落-实体关联列表T,以段落Pi为例,具体步骤如下:F. Use the keyword list to construct a paragraph-entity association list T for each paragraph. Taking paragraph Pi as an example, the specific steps are as follows: F1. 将Ki中的每个关键词kij∈Ki(j∈[1,p]),使用步骤C1的预训练语言模型将其编码为关键词向量vkijF1. Encode each keyword k ij ∈K i (j∈[1,p]) in K i into a keyword vector vk ij using the pre-trained language model in step C1; F2. 将关键词向量vkij在实体向量库中进行相似度搜索,获取Top-N个相似的实体,记为Top-N[kij]={e1,e2,…,eN};F2. Perform similarity search on the keyword vector vk ij in the entity vector library to obtain the Top-N similar entities, denoted as Top-N[k ij ]={e 1 ,e 2 ,…,e N }; F3. 将Ki里的所有关键词对应的所有相似实体,去重,构成实体列表LiF3. Remove duplicates from all similar entities corresponding to all keywords in K i to form an entity list L i ; F4. 将段落Pi、实体列表Li组合为Pi的段落-实体关联列表Ti=[Pi, Li];F4. Combine paragraph Pi and entity list Li into Pi 's paragraph-entity association list Ti = [ Pi , Li ]; G.对每个段落,利用关联列表T、prompt模板集PR和实体关系三元组示例集构造动态提示;G. For each paragraph, construct a dynamic prompt using the association list T, the prompt template set PR, and the entity relationship triple example set; G1. 将提示模板prk中的模板变量,替换为对应的变量值;G1. Replace the template variables in the prompt template pr k with the corresponding variable values; G2. 对于模板中的剩余变量,按照模板变量的定义输入对应的值,形成一个完整的动态提示内容QkPi(k∈[1,n],i∈[1,t]),例如,输入field(领域名称)代入领域变量${field};G2. For the remaining variables in the template, enter the corresponding values according to the definition of the template variables to form a complete dynamic prompt content Q k P i (k∈[1,n],i∈[1,t]). For example, enter field (field name) and substitute it into the field variable ${field}; H.将动态提示内容送入大模型进行实体关系抽取,并检验结果的正确性,具体步骤如下:H. Send the dynamic prompt content to the big model for entity relationship extraction and verify the correctness of the results. The specific steps are as follows: H1. 通过调用API的方式与大模型对话,将动态提示内容送入多个大模型中进行实体关系抽取;H1. Communicate with the big model by calling the API, and send the dynamic prompt content to multiple big models for entity relationship extraction; H2. 对比分析多个大模型的实体关系抽取结果,选取最优抽取结果。H2. Compare and analyze the entity relationship extraction results of multiple large models and select the optimal extraction result. 2.如权利要求1所述的一种基于大模型和动态提示的实体关系抽取方法,构建每个提示模板prk∈PR,模板中包含若干模板变量,具体步骤如下:2. The entity relationship extraction method based on a large model and dynamic prompts as claimed in claim 1, constructing each prompt template prk∈PR, wherein the template contains a number of template variables, and the specific steps are as follows: D2.1定义领域变量为${field},其中field的值域是领域标识的集合;D2.1 defines a domain variable as ${field}, where the value range of field is the set of domain identifiers; D2.2定义段落变量为${Pi},其中Pi的值域是段落标识的集合;D2.2 defines the paragraph variable as ${P i }, where the value range of P i is the set of paragraph tags; D2.3定义实体变量为${e1}、…、${ej}、…、${em},其中e1~em是段落中对应的实体;D2.3 Define entity variables as ${e 1 }, …, ${e j }, …, ${e m }, where e 1 ~e m are the corresponding entities in the paragraph; D2.4定义关系变量为${R1}、…、${Rj}、…、${RM},其中R1~RM是待建知识图谱中的所有关系;D2.4 Define the relationship variables as ${R 1 }, …, ${R j }, …, ${R M }, where R 1 ~R M are all the relationships in the knowledge graph to be built; D2.5定义实体关系三元组示例集变量为${S1}、…、${Sj}、…、${SM},其中S1~Sm是关系R1~RM对应的实体关系三元组示例集。D2.5 defines the entity-relationship triplet example set variables as ${S 1 }, …, ${S j }, …, ${S M }, where S 1 ~S m is the entity-relationship triplet example set corresponding to relations R 1 ~R M. 3.如权利要求1所述的一种基于大模型和动态提示的实体关系抽取方法,将待处理的非结构化文本划分成段落,具体步骤如下:3. The entity relationship extraction method based on a large model and dynamic prompts as claimed in claim 1, wherein the unstructured text to be processed is divided into paragraphs, and the specific steps are as follows: E1.1 清洗文本数据,去除无用的标点符号、特殊字符等;E1.2 使用预训练的词向量模型对文本进行词向量表示,生成词向量序列;E1.3 将词向量序列输入到神经网络序列标注模型,对文本进行主题边界检测,得到每个段落的起止位置;E1.1 Clean the text data and remove useless punctuation marks, special characters, etc.; E1.2 Use the pre-trained word vector model to represent the text with word vectors and generate word vector sequences; E1.3 Input the word vector sequence into the neural network sequence labeling model to detect the topic boundaries of the text and obtain the start and end positions of each paragraph; E1.4 根据检测的主题边界,将文本划分成不同的段落,得到段落集合P={P0,…Pi,…,Pt}(i∈[1,t]),t为集合中段落的总数。E1.4 Based on the detected topic boundaries, the text is divided into different paragraphs, and the paragraph set P = {P 0 ,…P i ,…,P t } (i∈[1,t]) is obtained, where t is the total number of paragraphs in the set. 4.如权利要求1所述的一种基于大模型和动态提示的实体关系抽取方法,提取每个段落Pi(i∈[1,t])中的关键词列表,具体步骤如下:4. The entity relationship extraction method based on a large model and dynamic prompts as claimed in claim 1, extracting a keyword list in each paragraph Pi (i∈[1,t]), the specific steps are as follows: E2.1提取每个段落Pi(i∈[1,t])的关键词,加入集合Ki={ki1,ki2,...,kip}(p为Pi的关键词总数),关键词提取方法包括但不限于TF-IDF、TextRank、基于预训练语言模型的方法;E2.1 Extract keywords from each paragraph P i (i∈[1,t]) and add them to the set K i ={k i1 ,k i2 ,...,k ip } (p is the total number of keywords in P i ). Keyword extraction methods include but are not limited to TF-IDF, TextRank, and methods based on pre-trained language models; E2.2 将段落Pi与关键词集合Ki组合为段落Pi的关键词列表PLi=[Pi, Ki]。E2.2 Combine paragraph Pi and keyword set Ki into keyword list PL i = [P i , Ki ] of paragraph Pi . 5.如权利要求1所述的一种基于大模型和动态提示的实体关系抽取方法,将提示模板prk中的模板变量,替换为对应的变量值,具体步骤如下:5. The entity relationship extraction method based on a large model and dynamic prompts as claimed in claim 1, wherein the template variables in the prompt template prk are replaced with corresponding variable values, and the specific steps are as follows: G1.1 将关联列表Ti中的段落Pi代入段落变量${Pi};G1.1 Substitute the paragraph Pi in the association list Ti into the paragraph variable ${ Pi }; G1.2 将关联列表Ti中的实体列表Li的每个实体分别代入实体变量${e1}~${em};G1.2 Substitute each entity of the entity list Li in the association list Ti into the entity variables ${e 1 }~${e m }; G1.3 将实体列表Li的实体具有的关系R1~Rj代入关系变量${R1}~${Rj};G1.3 Substitute the relations R 1 ~R j of the entities in the entity list Li into the relation variables ${R 1 }~${R j }; G1.4 将关系R1~Rj对应的实体三元组示例集S1~Sj里的实体关系三元组示例集,代入实体关系三元组示例集变量${S1} ~${Sj}。G1.4 Substitute the entity relationship triple example set in the entity triple example set S 1 ~S j corresponding to the relation R 1 ~R j into the entity relationship triple example set variable ${S 1 } ~${S j }.
CN202411064169.3A 2024-04-30 2024-08-05 A method for entity relationship extraction based on large model and dynamic prompts Pending CN118779468A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202410535188 2024-04-30
CN2024105351883 2024-04-30

Publications (1)

Publication Number Publication Date
CN118779468A true CN118779468A (en) 2024-10-15

Family

ID=92991032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411064169.3A Pending CN118779468A (en) 2024-04-30 2024-08-05 A method for entity relationship extraction based on large model and dynamic prompts

Country Status (1)

Country Link
CN (1) CN118779468A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119168052A (en) * 2024-11-22 2024-12-20 南昌大学 A forensics method and system based on virtual knowledge graph and big data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119168052A (en) * 2024-11-22 2024-12-20 南昌大学 A forensics method and system based on virtual knowledge graph and big data

Similar Documents

Publication Publication Date Title
CN113987209B (en) Natural language processing method, device, computing device and storage medium based on knowledge-guided prefix fine-tuning
CN106407113B (en) A kind of bug localization method based on the library Stack Overflow and commit
WO2020010834A1 (en) Faq question and answer library generalization method, apparatus, and device
CN116127090B (en) Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
US20150066476A1 (en) Methods and Systems of Four Valued Analogical Transformation Operators Used in Natural Language Processing and Other Applications
Gong et al. Continual pre-training of language models for math problem understanding with syntax-aware memory network
Kashmira et al. Generating entity relationship diagram from requirement specification based on nlp
CN112487190A (en) Method for extracting relationships between entities from text based on self-supervision and clustering technology
CN118779468A (en) A method for entity relationship extraction based on large model and dynamic prompts
Cui et al. Simple question answering over knowledge graph enhanced by question pattern classification
Bao et al. Text generation from tables
CN116341569A (en) Professional document intelligent auxiliary reading method based on domain knowledge base
Kalo et al. Knowlybert-hybrid query answering over language models and knowledge graphs
Sunkle et al. Comparison and synergy between fact-orientation and relation extraction for domain model generation in regulatory compliance
CN111831624A (en) Data table creating method and device, computer equipment and storage medium
CN115658846A (en) Intelligent search method and device suitable for open-source software supply chain
CN113901224B (en) Confidential text recognition model training method, system and device based on knowledge distillation
CN111666374A (en) Method for integrating additional knowledge information into deep language model
CN114372148A (en) A data processing method and terminal device based on knowledge graph technology
WO2019043380A1 (en) Semantic parsing
CN116483990B (en) Internet news content automatic generation method based on big data
Bader et al. Facilitating User-Centric Model-Based Systems Engineering Using Generative AI.
CN108549633B (en) Text fine-grained emotion generation method based on probabilistic reasoning and emotion cognition
CN114757181B (en) Method and device for training and extracting event of end-to-end event extraction model based on prior knowledge
CN115309858A (en) Arithmetic expression generation method, device, equipment, medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination