[go: up one dir, main page]

CN111881683A - Method and device for generating relation triples, storage medium and electronic equipment - Google Patents

Method and device for generating relation triples, storage medium and electronic equipment Download PDF

Info

Publication number
CN111881683A
CN111881683A CN202010596226.8A CN202010596226A CN111881683A CN 111881683 A CN111881683 A CN 111881683A CN 202010596226 A CN202010596226 A CN 202010596226A CN 111881683 A CN111881683 A CN 111881683A
Authority
CN
China
Prior art keywords
representation
relation
semantic relationship
subject
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010596226.8A
Other languages
Chinese (zh)
Inventor
魏哲培
田原
常毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202010596226.8A priority Critical patent/CN111881683A/en
Publication of CN111881683A publication Critical patent/CN111881683A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

本申请提供了一种关系三元组的生成方法、装置、存储介质和电子设备,其中,方法包括:获取输入文本对应的表示编码;从所述表示编码中识别出候选主体;判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体;若存在,根据所述候选主体、所述目标语义关系及所述客体,生成关系三元组。采用本申请实施例的方案,可全面准确地识别出输入文本中的关系三元组。

Figure 202010596226

The present application provides a method, device, storage medium and electronic device for generating relation triples, wherein the method includes: acquiring a representation code corresponding to input text; identifying candidate subjects from the representation code; judging the representation Whether there is an object that has a target semantic relationship with the candidate subject in the encoding; if there is, a relationship triplet is generated according to the candidate subject, the target semantic relationship and the object. By adopting the solutions of the embodiments of the present application, the relation triples in the input text can be recognized comprehensively and accurately.

Figure 202010596226

Description

关系三元组的生成方法、装置、存储介质和电子设备Method, device, storage medium and electronic device for generating relation triples

技术领域technical field

本申请涉及信息处理技术领域,具体而言,涉及一种关系三元组的生成方法、装置、存储介质和电子设备。The present application relates to the technical field of information processing, and in particular, to a method, apparatus, storage medium and electronic device for generating relation triples.

背景技术Background technique

知识图谱的基本组成要素是关系事实,这些事实采用(主体,关系,客体)的形式将两个实体通过语义关系连接起来,称为关系三元组。从自然语言文本中抽取关系三元组是构建大规模知识图谱的关键步骤。然而目前生成关系三元组的方法都不够全面和准确。The basic components of knowledge graphs are relational facts, which take the form of (subject, relation, object) to connect two entities through semantic relations, called relation triples. Extracting relational triples from natural language texts is a key step in building large-scale knowledge graphs. However, the current methods for generating relational triples are not comprehensive and accurate enough.

发明内容SUMMARY OF THE INVENTION

为了解决上述问题,本申请实施例提供了一种关系三元组的生成方法、装置、存储介质和电子设备,本技术方案如下:In order to solve the above problems, the embodiments of the present application provide a method, device, storage medium and electronic device for generating a relation triple. The technical solution is as follows:

第一方面,本申请实施例提供了一种关系三元组的生成方法,包括以下步骤:In a first aspect, an embodiment of the present application provides a method for generating a relation triple, including the following steps:

获取输入文本对应的表示编码;Get the representation code corresponding to the input text;

从所述表示编码中识别出候选主体;identifying candidate subjects from the representation encoding;

判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体;Judging whether there is an object having a target semantic relationship with the candidate subject in the representation encoding;

若存在,根据所述候选主体、所述语义关系及所述客体,生成关系三元组。If there is, a relation triple is generated according to the candidate subject, the semantic relation and the object.

可选地,所述获取输入文本对应的表示编码,包括:Optionally, the obtaining the representation code corresponding to the input text includes:

获取输入文本;get input text;

通过BERT编码器对所述输入文本进行编码,生成所述输入文本对应的表示编码。The input text is encoded by the BERT encoder to generate a representation code corresponding to the input text.

可选地,所述判断所述表示编码中是否存在于所述候选主体具有语义关系的客体之后,还包括:Optionally, the judging whether the representation encoding exists after the object with the semantic relationship of the candidate subject further includes:

若不存在,确定所述候选主体无法基于所述目标语义关系构成关系三元组。If not, it is determined that the candidate subject cannot form a relation triple based on the target semantic relation.

可选地,所述从所述表示编码中识别出候选主体,包括:Optionally, the identifying a candidate subject from the representation encoding includes:

使用主体标注器从所述表示编码中识别出多个候选主体;identifying a plurality of candidate subjects from the representation encoding using a subject tagger;

所述判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体,包括:The judging whether there is an object having a target semantic relationship with the candidate subject in the representation encoding includes:

判断所述表示编码中是否存在与各所述候选主体具有语义关系的各客体;judging whether there are objects that have a semantic relationship with each candidate subject in the representation code;

所述若存在,根据所述候选主体、所述语义关系及所述客体,生成关系三元组,包括:If there is, according to the candidate subject, the semantic relationship and the object, a relationship triple is generated, including:

若存在,根据各所述候选主体、所述语义关系及各所述客体,生成至少一个关系三元组。If there is, at least one relation triple is generated from each of the candidate subjects, the semantic relation and each of the objects.

可选地,所述判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体,包括:Optionally, the judging whether there is an object having a target semantic relationship with the candidate subject in the representation encoding includes:

使用目标语义关系对应的客体标注器,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体。Using the object tagger corresponding to the target semantic relationship, it is judged whether there is an object having the target semantic relationship with the candidate subject in the representation encoding.

可选地,所述使用目标语义关系对应的客体标注器,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体,包括:Optionally, the use of the object tagger corresponding to the target semantic relationship to determine whether there is an object having the target semantic relationship with the candidate subject in the representation encoding includes:

使用多个客体标注器,并行判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体,所述多个客体标注器中的每个客体标注器对应于不同的目标语义关系。Using a plurality of object taggers, it is judged in parallel whether there is an object having a target semantic relationship with the candidate subject in the representation encoding, and each object tagger in the plurality of object taggers corresponds to a different target semantic relationship.

可选地,所述使用目标语义关系对应的客体标注器,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体,包括:Optionally, the use of the object tagger corresponding to the target semantic relationship to determine whether there is an object having the target semantic relationship with the candidate subject in the representation encoding includes:

使用目标语义关系对应的客体标注器,基于所述表示编码中各目标词的编码表示及所述候选主体的编码表示,计算所述各目标词对应于客体起始位置的概率;Using the object tagger corresponding to the target semantic relationship, based on the encoded representation of each target word in the representation encoding and the encoded representation of the candidate subject, calculate the probability that each target word corresponds to the starting position of the object;

使用目标语义关系对应的客体标注器,基于所述表示编码中各目标词的编码表示及所述候选主体的编码表示,计算所述各目标词对应于客体结束位置的概率;Using the object tagger corresponding to the target semantic relationship, based on the encoded representation of each target word in the representation encoding and the encoded representation of the candidate subject, calculate the probability that each target word corresponds to the end position of the object;

根据所述各目标词对应于客体起始位置的概率及所述各目标词对应于客体结束位置的概率,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体。According to the probability that each target word corresponds to the starting position of the object and the probability that each target word corresponds to the ending position of the object, it is determined whether there is an object having the target semantic relationship with the candidate subject in the representation encoding.

第二方面,本申请实施例提供了一种关系三元组的生成装置,包括:In a second aspect, an embodiment of the present application provides an apparatus for generating a relation triple, including:

编码获取单元,用于获取输入文本对应的表示编码;an encoding acquisition unit, used to acquire the representation encoding corresponding to the input text;

主体识别单元,用于从所述表示编码中识别出候选主体;a subject identification unit for identifying candidate subjects from the representation encoding;

客体判断单元,用于判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体;an object judgment unit, used for judging whether there is an object having a target semantic relationship with the candidate subject in the representation code;

关系生成单元,用于若存在,根据所述候选主体、所述语义关系及所述客体,生成关系三元组。A relation generating unit, configured to generate a relation triplet according to the candidate subject, the semantic relation and the object if there is one.

第三方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述任一项方法的步骤。In a third aspect, an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the steps of any one of the foregoing methods.

第四方面,本申请实施例提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述任一项方法的步骤。In a fourth aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements any of the above when executing the program steps of the method.

在本申请实施例中,获取输入文本对应的表示编码;从所述表示编码中识别出候选主体;判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体;若存在,根据所述候选主体、所述目标语义关系及所述客体,生成关系三元组。基于候选主体,判断表示编码中是否存在与候选主体具有目标语义关系的客体。采用本申请实施例的方案,可全面准确地识别出输入文本中的关系三元组。In the embodiment of the present application, the representation code corresponding to the input text is obtained; the candidate subject is identified from the representation code; it is judged whether there is an object that has a target semantic relationship with the candidate subject in the representation code; The candidate subject, the target semantic relation and the object generate relation triples. Based on the candidate subject, it is determined whether there is an object that has a target semantic relationship with the candidate subject in the representation encoding. By adopting the solutions of the embodiments of the present application, the relation triples in the input text can be recognized comprehensively and accurately.

附图说明Description of drawings

图1为本申请实施例提供的不同的关系三元组重叠方式;Fig. 1 provides different relation triple overlapping modes provided by the embodiment of the present application;

图2为本申请实施例提供的一种关系三元组的生成方法的流程示意图;2 is a schematic flowchart of a method for generating a relation triple according to an embodiment of the present application;

图3为本申请实施例提供的另一种关系三元组的生成方法的整体框架结构示意图;3 is a schematic diagram of the overall framework structure of another method for generating relation triples provided by an embodiment of the present application;

图4为本申请实施例提供的又一种关系三元组的生成方法的流程示意图;4 is a schematic flowchart of another method for generating a relation triplet provided by an embodiment of the present application;

图5为本申请实施例提供的一种关系三元组的生成装置的结构示意图;5 is a schematic structural diagram of an apparatus for generating a relation triplet provided by an embodiment of the present application;

图6为本申请实施例所涉及的一种电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device involved in an embodiment of the present application.

具体实施方式Detailed ways

下面结合附图和实施例对本申请进行进一步的介绍。The present application will be further introduced below with reference to the accompanying drawings and embodiments.

在下述介绍中,术语“第一”、“第二”仅为用于描述的目的,而不能理解为指示或暗示相对重要性。下述介绍提供了本申请的多个实施例,不同实施例之间可以替换或者合并组合,因此本申请也可认为包含所记载的相同和/或不同实施例的所有可能组合。因而,如果一个实施例包含特征A、B、C,另一个实施例包含特征B、D,那么本申请也应视为包括含有A、B、C、D的一个或多个所有其他可能的组合的实施例,尽管该实施例可能并未在以下内容中有明确的文字记载。In the following introduction, the terms "first" and "second" are used for descriptive purposes only, and should not be construed as indicating or implying relative importance. The following description provides multiple embodiments of the present application, and different embodiments may be substituted or combined, so the present application may also be considered to include all possible combinations of the same and/or different embodiments described. Thus, if one embodiment includes features A, B, C and another embodiment includes features B, D, the application should also be considered to include all other possible combinations of one or more of A, B, C, D example, although this example may not be explicitly described in the following content.

下面的描述提供了示例,并且不对权利要求书中阐述的范围、适用性或示例进行限制。可以在不脱离本申请内容的范围的情况下,对描述的元素的功能和布置做出改变。各个示例可以适当省略、替代或添加各种过程或组件。例如所描述的方法可以以所描述的顺序不同的顺序来执行,并且可以添加、省略或组合各种步骤。此外,可以将关于一些示例描述的特征组合到其他示例中。The following description provides examples, and does not limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements described without departing from the scope of the present disclosure. Various examples may omit, substitute or add various procedures or components as appropriate. For example, the methods described may be performed in an order different from that described, and various steps may be added, omitted, or combined. Furthermore, features described with respect to some examples may be combined in other examples.

可通过为流水线型方法(pipeline method)和联合方法(joint method)生成关系三元组。流水线型方法将关系三元组抽取任务分为两个独立的步骤,首先从句子中抽取出所有的实体,并将识别出的实体组成实体对,然后使用关系分类方法确定实体之间的关系,最终得到关系三元组。不同于流水线方法,联合方法旨在提出一个联合模型同时抽取出实体及其之间的语义关系,其中包括基于特征的联合方法和基于神经网络的联合方法。目前,基于神经网络的联合方法是关系三元组抽取领域较先进的方法,本申请实施例提供的方法即属于基于神经网络的联合方法。Relational triples can be generated by the pipeline method and the joint method. The pipelined method divides the relation triple extraction task into two independent steps. First, all entities are extracted from the sentence, and the identified entities are grouped into entity pairs, and then the relation classification method is used to determine the relationship between the entities. We end up with relational triples. Different from pipeline methods, joint methods aim to propose a joint model to simultaneously extract entities and their semantic relations, including feature-based joint methods and neural network-based joint methods. At present, the neural network-based joint method is a relatively advanced method in the field of relation triple extraction, and the method provided by the embodiments of the present application belongs to the neural network-based joint method.

流水线型方法将关系三元组抽取分为两个独立的步骤,忽略了实体抽取和关系分类两个步骤之间的相关性,而且会不可避免的存在误差传播问题,即前一步骤(实体抽取)中引入的误差会影响后一步骤(关系分类)的性能。而基于特征的联合方法严重依赖于第三方自然语言处理工具,需要进行繁琐的特征工程,耗费大量的人力和资源。现有基于神经网络的联合方法采用神经网络进行特征提取,无需人工设计特征,在关系三元组抽取任务中取得的一定的提升。随着自然语言处理领域的不断发展,在简单语境下(例如,一个句子仅包含一个关系三元组)进行关系三元组抽取已经能够达到不错的效果。然而,无论是流水线方法还是联合方法,现有方法通常将关系建模为实体对之间的离散标签。这种建模方式在复杂语境下(一个句子中包含多个关系三元组,有时甚至多达五个以上),尤其当多个三元组有重叠的情况时,会使得关系分类成为一个极其困难的不平衡多分类问题,导致最终抽取出的关系三元组不够全面和准确。因此,现有方法均无法很好地处理复杂语境下的关系三元组抽取问题。The pipelined method divides relation triple extraction into two independent steps, ignoring the correlation between the two steps of entity extraction and relation classification, and there will inevitably be an error propagation problem, that is, the previous step (entity extraction). ) can affect the performance of the latter step (relation classification). The feature-based joint method relies heavily on third-party natural language processing tools, requires tedious feature engineering, and consumes a lot of manpower and resources. Existing joint methods based on neural networks use neural networks for feature extraction without the need to manually design features, and achieve a certain improvement in relational triplet extraction tasks. With the continuous development of the field of natural language processing, the extraction of relational triples in simple contexts (for example, a sentence contains only one relational triplet) has been able to achieve good results. However, existing methods, whether pipelined or federated, typically model relationships as discrete labels between pairs of entities. In complex contexts (a sentence contains multiple relation triples, sometimes as many as five or more), especially when multiple triples overlap, relation classification becomes a The extremely difficult unbalanced multi-classification problem leads to the fact that the final extracted relation triples are not comprehensive and accurate. Therefore, none of the existing methods can deal well with the problem of relation triple extraction in complex contexts.

关系三元组抽取任务的目标是从句子中确定所有可能的(主体,关系,客体)三元组,其中一些三元组可能会共享一些主体(subject)或者客体(object),即三元组之间可能会存在重叠问题。根据三元组之间不同的重叠形式,将其划分为正常(Normal)、单实体重叠(SEO,Single Entity Overlap)和实体对重叠(EPO,Entity Pair Overlap)三种类型。图1为本申请实施例提供的不同的关系三元组重叠方式。如图1所示,从上向下分别为正常类型、实体对重叠类型和单实体重叠类型。The goal of relation triple extraction task is to identify all possible (subject, relation, object) triples from a sentence, some of which may share some subject or object, i.e. triples There may be overlapping issues between them. According to the different overlapping forms between triples, it is divided into three types: normal (Normal), single entity overlap (SEO, Single Entity Overlap) and entity pair overlap (EPO, Entity Pair Overlap). FIG. 1 provides different overlapping manners of relation triples according to an embodiment of the present application. As shown in Figure 1, from top to bottom, they are normal type, entity pair overlap type and single entity overlap type.

基于以上分析,本申请实施例提供一种用于关系三元组抽取的新型级联二元标注方法及装置。不同于传统方法将关系建模为实体对之间的离散标签,本申请实施例提出将关系建模为从主体(subject)映射到客体(object)的函数,为知识图谱研究社区提供了一个崭新的视角来重新审视经典的关系三元组抽取任务,并在此基础上实现了一个不受重叠三元组问题困扰的新型级联二元标注框架来进行关系三元组抽取。Based on the above analysis, the embodiments of the present application provide a novel cascaded binary labeling method and device for relation triplet extraction. Different from traditional methods that model relationships as discrete labels between pairs of entities, the embodiments of the present application propose to model relationships as functions that map from subject to object, which provides a brand-new knowledge graph research community. From the perspective of this paper, we re-examine the classic relational triplet extraction task, and on this basis, we implement a novel cascaded binary annotation framework that is not troubled by the overlapping triplet problem for relational triplet extraction.

参见图2,图2为本申请实施例提供的一种关系三元组的生成方法的流程示意图,所述方法包括:Referring to FIG. 2, FIG. 2 is a schematic flowchart of a method for generating a relation triple according to an embodiment of the present application, and the method includes:

S201、获取输入文本对应的表示编码。S201. Obtain a representation code corresponding to the input text.

表示编码为根据输入文本生成的编码。可通过预先设定的方法,将输入文本生成对应的表示编码,后续的处理过程均基于表示编码进行。Indicates that the encoding is an encoding generated from the input text. A corresponding representation code can be generated from the input text through a preset method, and subsequent processing processes are all performed based on the representation code.

可选地,S201可包括:Optionally, S201 may include:

获取输入文本;get input text;

通过BERT编码器对所述输入文本进行编码,生成所述输入文本对应的表示编码。The input text is encoded by the BERT encoder to generate a representation code corresponding to the input text.

编码器模块旨在从输入的句子中抽取丰富的语义特征信息,本申请实施例采用预训练的BERT模型来对上下文信息进行编码。其中,BERT模型是一个基于多层双向Transformer结构的语言表示模型,其能够同时考虑每个单词的左侧和右侧上下文信息来学习文本的深层表示特征。通过BERT编码器对输入文本进行编码之后,产生的第N层编码表示hN将输入到级联解码器中,最终解码出句子中所包含的关系三元组。The encoder module aims to extract rich semantic feature information from the input sentence, and the embodiment of the present application uses a pre-trained BERT model to encode the context information. Among them, the BERT model is a language representation model based on a multi-layer bidirectional Transformer structure, which can simultaneously consider the left and right context information of each word to learn the deep representation features of the text. After the input text is encoded by the BERT encoder, the resulting N-th layer encoded representation h N will be input to the cascaded decoder, and finally the relation triples contained in the sentence are decoded.

S202、从所述表示编码中识别出候选主体。S202. Identify a candidate subject from the representation code.

可通过对S201中获取到的表示编码进行解码来识别出候选主体。识别出至少一个候选主体。候选主体可以为表示人、事、物、地点或抽象概念的名词或代词。The candidate subject may be identified by decoding the representation code obtained in S201. At least one candidate subject is identified. Candidate subjects can be nouns or pronouns representing people, things, things, places, or abstract concepts.

S203、判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体。S203. Determine whether there is an object having a target semantic relationship with the candidate subject in the representation encoding.

目标语义关系用于描述主体与客体之间的关系事实。同一表示编码中可存在多个语义关系。客体可以为表示人、事、物、地点或抽象概念的名词或代词。The target semantic relation is used to describe the relational fact between the subject and the object. Multiple semantic relationships can exist in the same representation encoding. Objects can be nouns or pronouns that represent people, things, things, places, or abstract concepts.

不同于现有方法,本申请实施例的方法采用将主体映射到客体的函数,来判断表示编码中是否存在与所述候选主体具有目标语义关系的客体。在即给定目标语义关系和目标主体的条件下识别出标识编码中可能的客体。Different from the existing method, the method of the embodiment of the present application uses a function of mapping a subject to an object to determine whether there is an object having a target semantic relationship with the candidate subject in the representation encoding. Identify possible objects in the token encoding given the target semantic relation and target subject.

S204、若存在,根据所述候选主体、所述语义关系及所述客体,生成关系三元组。S204. If there is, generate a relation triplet according to the candidate subject, the semantic relation and the object.

关系三元组可用于表示主体与客体之间的关系事实,关系三元组可采用(主体(subject),关系(relation),客体(object)),简称为(s,r,o)的形式来表示。Relation triples can be used to represent the relationship facts between subjects and objects, and relation triples can take the form of (subject, relation, object), abbreviated as (s, r, o) To represent.

可选地,S203之后,还包括:Optionally, after S203, it also includes:

若不存在,确定所述候选主体无法基于所述目标语义关系构成关系三元组。If not, it is determined that the candidate subject cannot form a relation triple based on the target semantic relation.

如果给定目标语义关系和目标主体的条件下,在标识编码中识别不到可能的客体,则可判断该候选主体不是一个真实的主体,无法构成关系三元组。If no possible object can be identified in the identification code given the target semantic relationship and the target subject, it can be judged that the candidate subject is not a real subject and cannot form a relation triple.

可选地,S203可包括:Optionally, S203 may include:

使用目标语义关系对应的客体标注器,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体。Using the object tagger corresponding to the target semantic relationship, it is judged whether there is an object having the target semantic relationship with the candidate subject in the representation encoding.

可在系统中根据不同的目标语义关系设定多个对应的客体标注器。每个客体标注器只用来识别特定的目标语义关系。本申请实施例的方法不需要学习传统的关系分类器f(s,o)→r,而是学习关系特定的客体标注器fr(s)→o,每个标注器都将在给定关系和主体的条件下识别出所有可能的客体。Multiple corresponding object taggers can be set in the system according to different target semantic relations. Each object tagger is only used to identify specific object semantic relations. The method of the embodiment of the present application does not need to learn the traditional relation classifier f(s,o)→r, but learns relation-specific object taggers f r (s)→o, each tagger will All possible objects are identified under the condition of the subject and the subject.

可选地,所述使用目标语义关系对应的客体标注器,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体,包括:Optionally, the use of the object tagger corresponding to the target semantic relationship to determine whether there is an object having the target semantic relationship with the candidate subject in the representation encoding includes:

使用多个客体标注器,并行判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体,所述多个客体标注器中的每个客体标注器对应于不同的目标语义关系。Using a plurality of object taggers, it is judged in parallel whether there is an object having a target semantic relationship with the candidate subject in the representation encoding, and each object tagger in the plurality of object taggers corresponds to a different target semantic relationship.

系统中设定多个客体标注器,每个客体标注器只要来识别对应的目标语义关系。系统通过多个客体标注器识别出与候选主体具有不同语义关系的客体。需要说明的是,同一候选主体和同一语义关系可对应于多个客体,同一候选主体和不同语义关系也可对应不同客体。Multiple object taggers are set in the system, and each object tagger only needs to identify the corresponding target semantic relationship. The system identifies objects with different semantic relationships with candidate subjects through multiple object taggers. It should be noted that the same candidate subject and the same semantic relationship may correspond to multiple objects, and the same candidate subject and different semantic relationships may also correspond to different objects.

可选地,所述使用目标语义关系对应的客体标注器,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体,包括:Optionally, the use of the object tagger corresponding to the target semantic relationship to determine whether there is an object having the target semantic relationship with the candidate subject in the representation encoding includes:

使用目标语义关系对应的客体标注器,基于所述表示编码中各目标词的编码表示及所述候选主体的编码表示,计算所述各目标词对应于客体起始位置的概率;Using the object tagger corresponding to the target semantic relationship, based on the encoded representation of each target word in the representation encoding and the encoded representation of the candidate subject, calculate the probability that each target word corresponds to the starting position of the object;

使用目标语义关系对应的客体标注器,基于所述表示编码中各目标词的编码表示及所述候选主体的编码表示,计算所述各目标词对应于客体结束位置的概率;Using the object tagger corresponding to the target semantic relationship, based on the encoded representation of each target word in the representation encoding and the encoded representation of the candidate subject, calculate the probability that each target word corresponds to the end position of the object;

根据所述各目标词对应于客体起始位置的概率及所述各目标词对应于客体结束位置的概率,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体。According to the probability that each target word corresponds to the starting position of the object and the probability that each target word corresponds to the ending position of the object, it is determined whether there is an object having the target semantic relationship with the candidate subject in the representation encoding.

可通过下列公式计算客体起始位置的概率和客体结束位置的概率:The probability of the starting position of the object and the probability of the ending position of the object can be calculated by the following formulas:

Figure BDA0002557458150000091
Figure BDA0002557458150000091

Figure BDA0002557458150000092
Figure BDA0002557458150000092

其中,

Figure BDA0002557458150000093
Figure BDA0002557458150000094
分别表示输入文本中的第i个单词对应于一个客体起始和结束位置的概率,如果概率超过某一阈值,则该位置的单词将被赋予标签“1”。否则,该位置的单词将被赋予标签“0”。标签“1”代表第i个单词为一个客体起始位置或结束位置。标签“0”代表第i个单词不为一个客体起始位置或结束位置。xi是输入句子中第i个单词的编码表示,W(·)表示可训练权重参数,b(·)表示偏置参数,σ表示sigmoid激活函数,
Figure BDA0002557458150000095
是识别出的第k个候选主体的编码表示。对于主体标注器中识别出的每一个主体,将迭代式地进行相同的解码过程。通过上述解码过程,在给定候选主体的情况下,能够同时识别出客体和对应的关系。in,
Figure BDA0002557458150000093
and
Figure BDA0002557458150000094
respectively represent the probability that the i-th word in the input text corresponds to the starting and ending positions of an object. If the probability exceeds a certain threshold, the word at this position will be assigned the label "1". Otherwise, the word at that position will be given the label "0". The label "1" indicates that the ith word is an object start position or end position. The label "0" indicates that the ith word is not an object start position or end position. x i is the encoded representation of the ith word in the input sentence, W ( ) represents the trainable weight parameter, b ( ) represents the bias parameter, σ represents the sigmoid activation function,
Figure BDA0002557458150000095
is the encoded representation of the k-th candidate subject identified. The same decoding process is performed iteratively for each subject identified in the subject tagger. Through the above decoding process, given a candidate subject, the object and the corresponding relationship can be simultaneously identified.

本申请实施例提供的关系三元组的生成方法,基于候选主体,判断表示编码中是否存在与候选主体具有目标语义关系的客体。采用本申请实施例的方案,可全面准确地识别出输入文本中的关系三元组。In the method for generating a relation triplet provided by the embodiment of the present application, based on the candidate subject, it is judged whether there is an object that has a target semantic relationship with the candidate subject in the encoding. By adopting the solutions of the embodiments of the present application, the relation triples in the input text can be recognized comprehensively and accurately.

参见图3,图3为本申请实施例提供的一种关系三元组的生成方法的流程示意图,所述方法包括:Referring to FIG. 3, FIG. 3 is a schematic flowchart of a method for generating a relation triplet provided by an embodiment of the present application. The method includes:

S301、获取输入文本对应的表示编码。S301. Obtain a representation code corresponding to the input text.

S302、使用主体标注器从所述表示编码中识别出多个候选主体。S302. Use a subject tagger to identify multiple candidate subjects from the representation encoding.

S303、判断所述表示编码中是否存在与各所述候选主体具有语义关系的各客体。S303. Determine whether each object having a semantic relationship with each candidate subject exists in the representation encoding.

S304、若存在,根据各所述候选主体、所述语义关系及各所述客体,生成至少一个关系三元组。S304. If there is, generate at least one relation triplet according to each of the candidate subjects, the semantic relation and each of the objects.

在表示编码中,使用主体标注器识别出多个候选主体。对于多个候选主体中的各候选主体,依次在表示编码中判断与各候选主体具有语义关系的各客体。本申请实施例的方法,可依次查找到与各候选主体具有语义关系的各客体。In representation encoding, a body tagger is used to identify multiple candidate bodies. For each candidate subject among the plurality of candidate subjects, each object having a semantic relationship with each candidate subject is sequentially determined in the representation encoding. With the method of this embodiment of the present application, each object that has a semantic relationship with each candidate subject can be found in sequence.

需要说明的是,本申请实施例的方法中涉及到两个过程。第一个过程是针对多个候选主体的,在表示编码中识别出多个候选主体。第二个过程是针对步骤S303中的判断过程,系统中可设有多个客体标注器,每个客体标注器都对应于特定的语义关系,不同客体标注器对应于不同语义关系。对于每个候选主体,利用多个客体标注器,分别判断表示编码中是否存在与该候选主体对应的客体,从而全面准确地识别出输入文本中的关系三元组。It should be noted that two processes are involved in the method of the embodiment of the present application. The first process is for multiple candidate subjects, which are identified in the representation encoding. The second process is for the judgment process in step S303. There may be multiple object taggers in the system, each object tagger corresponds to a specific semantic relationship, and different object taggers correspond to different semantic relationships. For each candidate subject, multiple object taggers are used to determine whether there is an object corresponding to the candidate subject in the representation encoding, so as to fully and accurately identify the relation triples in the input text.

此外,在利用客体标注器判断表示编码中是否存在与候选主体对应的客体的过程中,可以是多个关系特定的客体标注器同时工作,互不干扰,各自识别出与候选主体对应的客体。也可以在系统中设定多个客体标注器的执行顺序,按照上述执行顺序,控制多个客体标注器依次进行识别。本申请实施例对于如何控制客体标注器进行识别不做限定。In addition, in the process of using the object tagger to determine whether there is an object corresponding to the candidate subject in the representation encoding, multiple relation-specific object taggers can work simultaneously without interfering with each other, and each identify the object corresponding to the candidate subject. It is also possible to set the execution sequence of multiple object taggers in the system, and control the multiple object taggers to perform recognition in sequence according to the above execution sequence. This embodiment of the present application does not limit how to control the object tagger to perform identification.

为使本申请实施例提供的方案更加便于理解,下面提供一种具体实施方法。该方法不需要学习传统的关系分类器f(s,o)→r,而是学习关系特定的客体标注器fr(s)→o,每个标注器都将在给定关系和主体的条件下识别出所有可能的客体。在这种框架下,关系三元组抽取问题就被分解为如下的两步级联过程:首先,确定出句子中所有可能的主体;然后针对每个主体,使用关系特定的标注器来同时识别出所有可能的关系和对应的客体。In order to make the solutions provided by the embodiments of the present application easier to understand, a specific implementation method is provided below. Instead of learning a traditional relation classifier f(s,o)→r, the method learns relation-specific object annotators f r (s)→o, each of which will be conditioned on a given relation and subject All possible objects are identified below. Under this framework, the relation triple extraction problem is decomposed into the following two-step cascade process: first, all possible subjects in the sentence are identified; then for each subject, a relation-specific tagger is used to simultaneously identify List all possible relationships and corresponding objects.

图4为本申请实施例提供的另一种关系三元组的生成方法的整体框架结构示意图。如图4所示,本申请实施例的框架采用“编码器—解码器”结构,分别是基于BERT的编码器(BERT Encoder)和级联解码器(Cascade Decoder)。其中,最关键的两个模块分别是级联解码器中的主体标注器(Subject Tagger)模块以及关系特定的客体标注器(Relation-Specific Object Taggers)模块。将实现为建立在编码器结构之上的级联二元标注器,具体技术细节如下:FIG. 4 is a schematic diagram of an overall framework structure of another method for generating relation triples provided by an embodiment of the present application. As shown in FIG. 4 , the framework of the embodiment of the present application adopts an "encoder-decoder" structure, which are respectively a BERT-based encoder (BERT Encoder) and a cascaded decoder (Cascade Decoder). Among them, the two most critical modules are the subject tagger (Subject Tagger) module and the relation-specific object tagger (Relation-Specific Object Taggers) module in the cascade decoder. It will be implemented as a cascaded binary tagger built on top of the encoder structure. The technical details are as follows:

1.BERT编码器1. BERT encoder

编码器模块旨在从输入的句子中抽取丰富的语义特征信息,本发明采用预训练的BERT模型来对上下文信息进行编码。The encoder module aims to extract rich semantic feature information from the input sentence, and the present invention uses a pre-trained BERT model to encode the context information.

2.级联解码器2. Cascade decoder

本申请实施例中的级联解码器由两部分构成,分别是主体标注器模块和关系特定的客体标注器模块。其解码过程由如下级联的两步构成,首先使用主体标注器从输入的句子中识别出所有可能的主体,然后对于每一个候选主体,使用关系特定的客体标注器来识别句子中是否存在与给定的候选主体具有语义关系的客体。若存在,则将所有的客体标注出来,最终根据相应的主体、关系以及客体,产生关系三元组。否则,该候选主体不是一个真实的主体,无法构成关系三元组。The cascaded decoder in the embodiment of the present application consists of two parts, which are a subject tagger module and a relation-specific object tagger module. The decoding process consists of the following cascaded two steps, first using a subject tagger to identify all possible subjects from the input sentence, and then, for each candidate subject, using a relation-specific object tagger to identify whether there is a sentence with A given candidate subject has a semantically related object. If it exists, mark all the objects, and finally generate a relationship triplet according to the corresponding subject, relationship and object. Otherwise, the candidate subject is not a real subject and cannot form a relation triple.

2.1主体标注器2.1 Principal Annotator

主体标注器通过对BERT编码器产生的编码表示进行直接解码来识别出所有的主体。具体地,主体标注器采用两个相同结构的二元分类器,通过对句子中的每个词赋予0/1二元标签来表示当前词是否对应于一个主体的起始或结束位置,从而确定主体的起止范围。主体标注器对每个单词的操作如下:The subject tagger identifies all subjects by directly decoding the encoded representation produced by the BERT encoder. Specifically, the subject tagger adopts two binary classifiers with the same structure, and determines whether the current word corresponds to the starting or ending position of a subject by assigning a 0/1 binary label to each word in the sentence, thereby determining The starting and ending range of the subject. The body tagger operates on each word as follows:

Figure BDA0002557458150000121
Figure BDA0002557458150000121

Figure BDA0002557458150000122
Figure BDA0002557458150000122

其中,

Figure BDA0002557458150000123
Figure BDA0002557458150000124
分别表示输入句子中的第i个单词对应于一个主体起始和结束位置的概率,如果概率超过某一阈值,则该位置的单词将被赋予标签“1”。否则,该位置的单词将被赋予标签“0”。xi是输入句子中第i个单词的编码表示,W(·)表示可训练权重参数,b(·)表示偏置参数,σ表示sigmoid激活函数。in,
Figure BDA0002557458150000123
and
Figure BDA0002557458150000124
respectively represent the probability that the i-th word in the input sentence corresponds to the starting and ending positions of a subject. If the probability exceeds a certain threshold, the word at this position will be assigned the label "1". Otherwise, the word at that position will be given the label "0". x i is the encoded representation of the ith word in the input sentence, W ( ) represents the trainable weight parameter, b ( ) represents the bias parameter, and σ represents the sigmoid activation function.

2.2关系特定的客体标注器2.2 Relation-Specific Object Annotators

根据主体标注器抽取出的实体,关系特定的客体标注器旨在同时确定出与主体相关的客体以及它们之间的语义关系。如图4所示,关系特定的客体标注器由一系列二元分类器构成,每一组对应一个具体的关系,其结构与主体标注器中的二元分类器相同。不同于主体标注器直接对BERT编码器产生的编码表示进行解码,客体标注器在解码的同时还考虑了候选主体的特征vsub,其对每个单词的操作如下:Based on the entities extracted by the subject tagger, the relation-specific object tagger aims to simultaneously identify subject-related objects and their semantic relations. As shown in Figure 4, the relation-specific object tagger consists of a series of binary classifiers, each group corresponding to a specific relation, and its structure is the same as the binary classifier in the subject tagger. Unlike the subject tagger, which directly decodes the encoded representation produced by the BERT encoder, the object tagger also considers the feature v sub of the candidate subject while decoding, and operates on each word as follows:

Figure BDA0002557458150000125
Figure BDA0002557458150000125

Figure BDA0002557458150000126
Figure BDA0002557458150000126

其中,

Figure BDA0002557458150000127
Figure BDA0002557458150000128
分别表示输入句子中的第i个单词对应于一个客体起始和结束位置的概率,如果概率超过某一阈值,则该位置的单词将被赋予标签“1”。否则,该位置的单词将被赋予标签“0”,
Figure BDA0002557458150000129
是主体标注器识别出的第k个主体的编码表示。对于主体标注器中识别出的每一个主体,本申请实施例将迭代式地进行相同的解码过程。通过上述解码过程,在给定候选主体的情况下,本申请实施例所提出的关系特定的客体标注器能够同时识别出客体和对应的关系。in,
Figure BDA0002557458150000127
and
Figure BDA0002557458150000128
respectively represent the probability that the ith word in the input sentence corresponds to the starting and ending positions of an object. If the probability exceeds a certain threshold, the word at this position will be assigned the label "1". Otherwise, the word at that position will be given the label "0",
Figure BDA0002557458150000129
is the encoded representation of the k-th subject identified by the subject tagger. For each subject identified in the subject tagger, the embodiment of the present application will perform the same decoding process iteratively. Through the above decoding process, given a candidate subject, the relationship-specific object tagger proposed in the embodiment of the present application can identify the object and the corresponding relationship at the same time.

在知识图谱相关的研究中,本申请实施例的方案首次提出把关系建模为将主体映射到客体的函数,并基于此实现了一种新型的级联二元标注框架来进行关系三元组抽取。在关系三元组抽取任务中,将关系建模为映射主体到客体的函数,并使用级联二元标注框架来进行抽取。In the research related to knowledge graph, the solution of the embodiment of this application proposes for the first time that the relationship is modeled as a function that maps subjects to objects, and based on this, a new type of cascaded binary annotation framework is implemented to perform relationship triples. Extract. In the relation triple extraction task, relations are modeled as functions that map subjects to objects, and a cascaded binary annotation framework is used for extraction.

通过将关系建模为函数,本申请实施例的方案所提出的新型级联二元标注框架不会受到重叠三元组问题的困扰,相比于现有方法能够更高效地从复杂语境中同时提取多个关系三元组。实验表明,该方法在多个场景下均大幅改善了关系三元组抽取的性能,尤其在现有方法难以解决的重叠三元组问题上取得了较高的进展,在NYT和WebNLG数据集上分别将当前的最佳水平F1值从72.1%提高到了89.6%、从61.6%提高到了91.8%,能够有效保证在复杂语境下实现知识图谱的高精度自动构建。By modeling the relationship as a function, the novel cascaded binary annotation framework proposed by the solution of the embodiment of the present application will not suffer from the problem of overlapping triples, and can more efficiently extract information from complex contexts compared to existing methods. Extract multiple relational triples simultaneously. Experiments show that this method greatly improves the performance of relational triplet extraction in multiple scenarios, especially on the overlapping triplet problem, which is difficult to solve by existing methods. On the NYT and WebNLG datasets The F1 value of the current best level is increased from 72.1% to 89.6%, and from 61.6% to 91.8%, which can effectively ensure the high-precision automatic construction of knowledge graphs in complex contexts.

请参见图5,图5是本申请实施例提供的一种关系三元组的生成装置结构示意图。如图5所示,所示关系三元组的生成装置包括:Please refer to FIG. 5. FIG. 5 is a schematic structural diagram of an apparatus for generating a relation triple according to an embodiment of the present application. As shown in Figure 5, the generating device of the shown relation triple includes:

编码获取单元501,用于获取输入文本对应的表示编码;an encoding acquisition unit 501, used for acquiring the representation encoding corresponding to the input text;

主体识别单元502,用于从所述表示编码中识别出候选主体;a subject identification unit 502, configured to identify candidate subjects from the representation code;

客体判断单元503,用于判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体;an object judgment unit 503, configured to judge whether there is an object having a target semantic relationship with the candidate subject in the representation encoding;

关系生成单元504,用于若存在,根据所述候选主体、所述语义关系及所述客体,生成关系三元组。The relationship generating unit 504 is configured to generate a relationship triplet according to the candidate subject, the semantic relationship and the object, if it exists.

可选地,所述编码获取单元501具体用于:Optionally, the code obtaining unit 501 is specifically configured to:

获取输入文本;get input text;

通过BERT编码器对所述输入文本进行编码,生成所述输入文本对应的表示编码。The input text is encoded by the BERT encoder to generate a representation code corresponding to the input text.

可选地,所述关系生成单元504还用于:Optionally, the relationship generating unit 504 is further configured to:

若不存在,确定所述候选主体无法基于所述目标语义关系构成关系三元组。If not, it is determined that the candidate subject cannot form a relation triple based on the target semantic relation.

可选地,所述主体识别单元502具体用于:Optionally, the subject identification unit 502 is specifically configured to:

使用主体标注器从所述表示编码中识别出多个候选主体;identifying a plurality of candidate subjects from the representation encoding using a subject tagger;

所述客体判断单元503具体用于:The object judgment unit 503 is specifically used for:

判断所述表示编码中是否存在与各所述候选主体具有语义关系的各客体;judging whether there are objects that have a semantic relationship with each candidate subject in the representation code;

所述关系生成单元504具体用于:The relationship generating unit 504 is specifically used for:

若存在,根据各所述候选主体、所述语义关系及各所述客体,生成至少一个关系三元组。If there is, at least one relation triple is generated from each of the candidate subjects, the semantic relation and each of the objects.

可选地,所述客体判断单元503具体用于:Optionally, the object judgment unit 503 is specifically configured to:

使用目标语义关系对应的客体标注器,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体。Using the object tagger corresponding to the target semantic relationship, it is judged whether there is an object having the target semantic relationship with the candidate subject in the representation encoding.

可选地,所述客体判断单元503具体用于:Optionally, the object judgment unit 503 is specifically configured to:

使用多个客体标注器,判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体,所述多个客体标注器中的每个客体标注器对应于不同的目标语义关系。Using a plurality of object taggers, it is determined whether there is an object having a target semantic relationship with the candidate subject in the representation encoding, and each object tagger in the plurality of object taggers corresponds to a different target semantic relationship.

可选地,所述客体判断单元503具体用于:Optionally, the object judgment unit 503 is specifically configured to:

使用目标语义关系对应的客体标注器,基于所述表示编码中各目标词的编码表示及所述候选主体的编码表示,计算所述各目标词对应于客体起始位置的概率;Using the object tagger corresponding to the target semantic relationship, based on the encoded representation of each target word in the representation encoding and the encoded representation of the candidate subject, calculate the probability that each target word corresponds to the starting position of the object;

使用目标语义关系对应的客体标注器,基于所述表示编码中各目标词的编码表示及所述候选主体的编码表示,计算所述各目标词对应于客体结束位置的概率;Using the object tagger corresponding to the target semantic relationship, based on the encoded representation of each target word in the representation encoding and the encoded representation of the candidate subject, calculate the probability that each target word corresponds to the end position of the object;

根据所述各目标词对应于客体起始位置的概率及所述各目标词对应于客体结束位置的概率,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体。According to the probability that each target word corresponds to the starting position of the object and the probability that each target word corresponds to the ending position of the object, it is determined whether there is an object having the target semantic relationship with the candidate subject in the representation encoding.

本领域的技术人员可以清楚地了解到本申请实施例的技术方案可借助软件和/或硬件来实现。本说明书中的“单元”和“模块”是指能够独立完成或与其他部件配合完成特定功能的软件和/或硬件,其中硬件例如可以是FPGA(Field-Programmable Gate Array,现场可编程门阵列)、IC(Integrated Circuit,集成电路)等。Those skilled in the art can clearly understand that the technical solutions of the embodiments of the present application can be implemented by means of software and/or hardware. The "unit" and "module" in this specification refer to software and/or hardware that can perform a specific function independently or in cooperation with other components, wherein the hardware can be, for example, an FPGA (Field-Programmable Gate Array, Field Programmable Gate Array). , IC (Integrated Circuit, integrated circuit) and so on.

本申请实施例的各处理单元和/或模块,可通过实现本申请实施例所述的功能的模拟电路而实现,也可以通过执行本申请实施例所述的功能的软件而实现。Each processing unit and/or module in the embodiments of the present application may be implemented by analog circuits that implement the functions described in the embodiments of the present application, or may be implemented by software that executes the functions described in the embodiments of the present application.

本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述关系三元组的生成方法的步骤。其中,计算机可读存储介质可以包括但不限于任何类型的盘,包括软盘、光盘、DVD、CD-ROM、微型驱动器以及磁光盘、ROM、RAM、EPROM、EEPROM、DRAM、VRAM、闪速存储器设备、磁卡或光卡、纳米系统(包括分子存储器IC),或适合于存储指令和/或数据的任何类型的媒介或设备。Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the steps of the foregoing method for generating a relation triple. Among them, the computer-readable storage medium may include, but is not limited to, any type of disk, including floppy disks, optical disks, DVDs, CD-ROMs, micro-drives, and magneto-optical disks, ROM, RAM, EPROM, EEPROM, DRAM, VRAM, flash memory devices , magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of medium or device suitable for storing instructions and/or data.

参见图6,其示出了本申请实施例所涉及的一种电子设备的结构示意图,该电子设备可以用于实施上述实施例中提供的关系三元组的生成方法。具体来讲:Referring to FIG. 6 , it shows a schematic structural diagram of an electronic device involved in an embodiment of the present application, and the electronic device can be used to implement the method for generating relation triples provided in the foregoing embodiments. Specifically:

存储器1020可用于存储软件程序以及模块,处理器1080通过运行存储在存储器1020的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器1020可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据终端设备的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1020可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器1020还可以包括存储器控制器,以提供处理器1080和输入单元1030对存储器1020的访问。The memory 1020 may be used to store software programs and modules, and the processor 1080 executes various functional applications and data processing by running the software programs and modules stored in the memory 1020 . The memory 1020 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data (such as audio data, phone book, etc.) created by the use of the terminal device, etc. Additionally, memory 1020 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 1020 may further include a memory controller to provide access to the memory 1020 by the processor 1080 and the input unit 1030 .

输入单元1030可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。具体地,输入单元1030可包括触敏表面1031(例如:触摸屏、触摸板或触摸框)。触敏表面1031,也称为触摸显示屏或者触控板,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触敏表面1031上或在触敏表面1031附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触敏表面1031可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器1080,并能接收处理器1080发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触敏表面1031。The input unit 1030 may be used to receive input numerical or character information, and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control. Specifically, the input unit 1030 may include a touch-sensitive surface 1031 (eg, a touch screen, a touch pad, or a touch frame). Touch-sensitive surface 1031, also known as a touch display or trackpad, collects touch operations by a user on or near it (such as a user using a finger, stylus, etc., any suitable object or accessory on or on touch-sensitive surface 1031). operation near the touch-sensitive surface 1031), and drive the corresponding connection device according to the preset program. Optionally, the touch-sensitive surface 1031 may include two parts, a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it to the touch controller. To the processor 1080, and can receive the command sent by the processor 1080 and execute it. In addition, the touch-sensitive surface 1031 may be implemented using resistive, capacitive, infrared, and surface acoustic wave types.

显示单元1040可用于显示由用户输入的信息或提供给用户的信息以及终端设备的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。显示单元1040可包括显示面板1041,可选的,可以采用LCD(Liquid Crystal Display,液晶显示器)、OLED(Organic Light-Emitting Diode,有机发光二极管)等形式来配置显示面板1041。进一步的,触敏表面1031可覆盖显示面板1041,当触敏表面1031检测到在其上或附近的触摸操作后,传送给处理器1080以确定触摸事件的类型,随后处理器1080根据触摸事件的类型在显示面板1041上提供相应的视觉输出。虽然触敏表面1031与显示面板1041可以是作为两个独立的部件来实现输入和输入功能,但是在某些实施例中,可以将触敏表面1031与显示面板1041集成而实现输入和输出功能。The display unit 1040 may be used to display information input by the user or information provided to the user and various graphical user interfaces of the terminal device, which may be composed of graphics, text, icons, videos and any combination thereof. The display unit 1040 may include a display panel 1041. Optionally, the display panel 1041 may be configured in the form of an LCD (Liquid Crystal Display, liquid crystal display), an OLED (Organic Light-Emitting Diode, organic light emitting diode) and the like. Further, the touch-sensitive surface 1031 can cover the display panel 1041, and when the touch-sensitive surface 1031 detects a touch operation on or near it, it transmits it to the processor 1080 to determine the type of the touch event, and then the processor 1080 determines the type of the touch event according to the touch event. Type provides corresponding visual output on display panel 1041 . Although touch-sensitive surface 1031 and display panel 1041 may be implemented as two separate components to implement input and input functions, in some embodiments, touch-sensitive surface 1031 and display panel 1041 may be integrated to implement input and output functions.

处理器1080是终端设备的控制中心,利用各种接口和线路连接整个终端设备的各个部分,通过运行或执行存储在存储器1020内的软件程序和/或模块,以及调用存储在存储器1020内的数据,执行终端设备的各种功能和处理数据,从而对终端设备进行整体监控。可选的,处理器1080可包括一个或多个处理核心;其中,处理器1080可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1080中。The processor 1080 is the control center of the terminal device, using various interfaces and lines to connect various parts of the entire terminal device, by running or executing the software programs and/or modules stored in the memory 1020, and calling the data stored in the memory 1020. , perform various functions of the terminal equipment and process data, so as to monitor the terminal equipment as a whole. Optionally, the processor 1080 may include one or more processing cores; wherein, the processor 1080 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface, and application programs, etc. The modem processor mainly handles wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 1080.

具体在本实施例中,终端设备的显示单元是触摸屏显示器,终端设备还包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行述一个或者一个以上程序包含实现上述关系三元组的生成方法的步骤。Specifically in this embodiment, the display unit of the terminal device is a touch screen display, the terminal device further includes a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to be displayed by one or more programs. The execution of the one or more programs by the above processor includes the steps of implementing the above method for generating relation triples.

在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms. of.

在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。Each functional unit in each embodiment of the present application may be all integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above-mentioned integrated units It can be implemented in the form of hardware or in the form of hardware plus software functional units.

以上介绍仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. For those skilled in the art, the present application may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included within the protection scope of this application.

Claims (10)

1.一种关系三元组的生成方法,其特征在于,所述方法包括:1. a generation method of relation triple, is characterized in that, described method comprises: 获取输入文本对应的表示编码;Get the representation code corresponding to the input text; 从所述表示编码中识别出候选主体;identifying candidate subjects from the representation encoding; 判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体;Judging whether there is an object having a target semantic relationship with the candidate subject in the representation encoding; 若存在,根据所述候选主体、所述目标语义关系及所述客体,生成关系三元组。If there is, a relation triplet is generated according to the candidate subject, the target semantic relation and the object. 2.根据权利要求1所述的方法,其特征在于,所述获取输入文本对应的表示编码,包括:2. The method according to claim 1, wherein the obtaining the corresponding representation code of the input text comprises: 获取输入文本;get input text; 通过BERT编码器对所述输入文本进行编码,生成所述输入文本对应的表示编码。The input text is encoded by the BERT encoder to generate a representation code corresponding to the input text. 3.根据权利要求1所述的方法,其特征在于,所述判断所述表示编码中是否存在于所述候选主体具有目标语义关系的客体之后,还包括:3. The method according to claim 1, wherein the judging whether the representation encoding exists after the object with the target semantic relationship of the candidate subject further comprises: 若不存在,确定所述候选主体无法基于所述目标语义关系构成关系三元组。If not, it is determined that the candidate subject cannot form a relation triple based on the target semantic relation. 4.根据权利要求1所述的方法,其特征在于,所述从所述表示编码中识别出候选主体,包括:4. The method of claim 1, wherein the identifying candidate subjects from the representation encoding comprises: 使用主体标注器从所述表示编码中识别出多个候选主体;identifying a plurality of candidate subjects from the representation encoding using a subject tagger; 所述判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体,包括:The judging whether there is an object having a target semantic relationship with the candidate subject in the representation encoding includes: 判断所述表示编码中是否存在与各所述候选主体具有语义关系的各客体;judging whether there are objects that have a semantic relationship with each candidate subject in the representation code; 所述若存在,根据所述候选主体、所述语义关系及所述客体,生成关系三元组,包括:If there is, according to the candidate subject, the semantic relationship and the object, a relationship triple is generated, including: 若存在,根据各所述候选主体、所述语义关系及各所述客体,生成至少一个关系三元组。If there is, at least one relation triple is generated from each of the candidate subjects, the semantic relation and each of the objects. 5.根据权利要求1所述的方法,其特征在于,所述判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体,包括:5. The method according to claim 1, wherein the judging whether there is an object having a target semantic relationship with the candidate subject in the representation encoding comprises: 使用目标语义关系对应的客体标注器,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体。Using the object tagger corresponding to the target semantic relationship, it is judged whether there is an object having the target semantic relationship with the candidate subject in the representation encoding. 6.根据权利要求5所述的方法,其特征在于,所述使用目标语义关系对应的客体标注器,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体,包括:6 . The method according to claim 5 , wherein the use of the object tagger corresponding to the target semantic relationship to determine whether there is an object having the target semantic relationship with the candidate subject in the representation encoding includes: 6 . : 使用多个客体标注器,并行判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体,所述多个客体标注器中的每个客体标注器对应于不同的目标语义关系。Using a plurality of object taggers, it is judged in parallel whether there is an object having a target semantic relationship with the candidate subject in the representation encoding, and each object tagger in the plurality of object taggers corresponds to a different target semantic relationship. 7.根据权利要求5所述的方法,其特征在于,所述使用目标语义关系对应的客体标注器,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体,包括:7 . The method according to claim 5 , wherein, by using an object tagger corresponding to the target semantic relationship, judging whether there is an object having the target semantic relationship with the candidate subject in the representation encoding, comprising: 8 . : 使用目标语义关系对应的客体标注器,基于所述表示编码中各目标词的编码表示及所述候选主体的编码表示,计算所述各目标词对应于客体起始位置的概率;Using the object tagger corresponding to the target semantic relationship, based on the encoded representation of each target word in the representation encoding and the encoded representation of the candidate subject, calculate the probability that each target word corresponds to the starting position of the object; 使用目标语义关系对应的客体标注器,基于所述表示编码中各目标词的编码表示及所述候选主体的编码表示,计算所述各目标词对应于客体结束位置的概率;Using the object tagger corresponding to the target semantic relationship, based on the encoded representation of each target word in the representation encoding and the encoded representation of the candidate subject, calculate the probability that each target word corresponds to the end position of the object; 根据所述各目标词对应于客体起始位置的概率及所述各目标词对应于客体结束位置的概率,判断所述表示编码中是否存在与所述候选主体具有所述目标语义关系的客体。According to the probability that each target word corresponds to the starting position of the object and the probability that each target word corresponds to the ending position of the object, it is determined whether there is an object having the target semantic relationship with the candidate subject in the representation encoding. 8.一种关系三元组的生成装置,其特征在于,所述装置包括:8. A device for generating relation triples, wherein the device comprises: 编码获取单元,用于获取输入文本对应的表示编码;an encoding acquisition unit, used to acquire the representation encoding corresponding to the input text; 主体识别单元,用于从所述表示编码中识别出候选主体;a subject identification unit for identifying candidate subjects from the representation encoding; 客体判断单元,用于判断所述表示编码中是否存在与所述候选主体具有目标语义关系的客体;an object judgment unit, used for judging whether there is an object having a target semantic relationship with the candidate subject in the representation code; 关系生成单元,用于若存在,根据所述候选主体、所述语义关系及所述客体,生成关系三元组。A relation generating unit, configured to generate a relation triplet according to the candidate subject, the semantic relation and the object if there is one. 9.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现所述权利要求1-7中任一项所述方法的步骤。9. A computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the steps of the method according to any one of the claims 1-7 are implemented. 10.一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现所述权利要求1-7中任一项所述方法的步骤。10. An electronic device, comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that, when the processor executes the program, the implementation of the claims 1-7 The steps of any one of the methods.
CN202010596226.8A 2020-06-28 2020-06-28 Method and device for generating relation triples, storage medium and electronic equipment Pending CN111881683A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010596226.8A CN111881683A (en) 2020-06-28 2020-06-28 Method and device for generating relation triples, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010596226.8A CN111881683A (en) 2020-06-28 2020-06-28 Method and device for generating relation triples, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN111881683A true CN111881683A (en) 2020-11-03

Family

ID=73157067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010596226.8A Pending CN111881683A (en) 2020-06-28 2020-06-28 Method and device for generating relation triples, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111881683A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560460A (en) * 2020-12-08 2021-03-26 北京百度网讯科技有限公司 Method and device for extracting structured information, electronic equipment and readable storage medium
CN112560490A (en) * 2020-12-08 2021-03-26 吉林大学 Knowledge graph relation extraction method and device, electronic equipment and storage medium
CN115146068A (en) * 2022-06-01 2022-10-04 西北工业大学 Relation triple extraction method, device, device and storage medium
CN116483946A (en) * 2022-01-14 2023-07-25 腾讯科技(深圳)有限公司 Data processing method, device, equipment and computer program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598001A (en) * 2019-08-05 2019-12-20 平安科技(深圳)有限公司 Method, device and storage medium for extracting association entity relationship
CN111079431A (en) * 2019-10-31 2020-04-28 北京航天云路有限公司 Entity relation joint extraction method based on transfer learning
CN111324747A (en) * 2020-02-28 2020-06-23 北京百度网讯科技有限公司 Method and device for generating triples and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598001A (en) * 2019-08-05 2019-12-20 平安科技(深圳)有限公司 Method, device and storage medium for extracting association entity relationship
CN111079431A (en) * 2019-10-31 2020-04-28 北京航天云路有限公司 Entity relation joint extraction method based on transfer learning
CN111324747A (en) * 2020-02-28 2020-06-23 北京百度网讯科技有限公司 Method and device for generating triples and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHEPEI WEI等: "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction", ARXIV:1909.03227V4 [CS.CL], pages 1 - 5 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560460A (en) * 2020-12-08 2021-03-26 北京百度网讯科技有限公司 Method and device for extracting structured information, electronic equipment and readable storage medium
CN112560490A (en) * 2020-12-08 2021-03-26 吉林大学 Knowledge graph relation extraction method and device, electronic equipment and storage medium
CN112560460B (en) * 2020-12-08 2022-02-25 北京百度网讯科技有限公司 Method and device for extracting structured information, electronic equipment and readable storage medium
CN116483946A (en) * 2022-01-14 2023-07-25 腾讯科技(深圳)有限公司 Data processing method, device, equipment and computer program product
CN116483946B (en) * 2022-01-14 2024-09-20 腾讯科技(深圳)有限公司 Data processing method, device, equipment and computer program product
CN115146068A (en) * 2022-06-01 2022-10-04 西北工业大学 Relation triple extraction method, device, device and storage medium
CN115146068B (en) * 2022-06-01 2023-10-03 西北工业大学 Method, device, equipment and storage medium for extracting relation triples

Similar Documents

Publication Publication Date Title
CN108984683B (en) Method, system, equipment and storage medium for extracting structured data
US11327978B2 (en) Content authoring
US11157490B2 (en) Conversational virtual assistant
CN111881683A (en) Method and device for generating relation triples, storage medium and electronic equipment
US12032906B2 (en) Method, apparatus and device for quality control and storage medium
CN106650780B (en) Data processing method and device, classifier training method and system
WO2021121198A1 (en) Semantic similarity-based entity relation extraction method and apparatus, device and medium
WO2021135469A1 (en) Machine learning-based information extraction method, apparatus, computer device, and medium
US20190347571A1 (en) Classifier training
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
WO2021139247A1 (en) Construction method, apparatus and device for medical domain knowledge map, and storage medium
WO2018045646A1 (en) Artificial intelligence-based method and device for human-machine interaction
US9703773B2 (en) Pattern identification and correction of document misinterpretations in a natural language processing system
CN111984589A (en) Document processing method, document processing device and electronic equipment
CN110795544B (en) Content searching method, device, equipment and storage medium
US11769013B2 (en) Machine learning based tenant-specific chatbots for performing actions in a multi-tenant system
WO2023173554A1 (en) Inappropriate agent language identification method and apparatus, electronic device and storage medium
CN112836013B (en) Data labeling method and device, readable storage medium and electronic equipment
CN112749558B (en) Target content acquisition method, device, computer equipment and storage medium
CN117390213A (en) Training method of image and text retrieval model based on OSCAR and method of implementing image and text retrieval
CN113505786A (en) Test question photographing and judging method and device and electronic equipment
CN112148862A (en) Question intention identification method and device, storage medium and electronic equipment
WO2021104274A1 (en) Image and text joint representation search method and system, and server and storage medium
CN117612533A (en) Method and device for processing data and electronic equipment
WO2024005950A1 (en) Advanced formatting of ink data using spatial information and semantic context

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201103