CN115034529A

CN115034529A - Training method, device and equipment of state recognition model and readable storage medium

Info

Publication number: CN115034529A
Application number: CN202110246221.7A
Authority: CN
Inventors: 李旭瑞; 林田谦谨; 孙常龙; 刘晓钟; 李红松
Original assignee: Alibaba Singapore Holdings Pte Ltd
Current assignee: Alibaba Innovation Co
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2022-09-09

Abstract

Embodiments of the present disclosure disclose a training method, apparatus, device, and readable storage medium for a state recognition model. The training method of the state recognition model includes: obtaining feature data related to a subject and state data related to the subject; determining a specific attribute value matching the feature data according to a preset mapping relationship; The state data related to the subject and the specific attribute value are used to determine the standardized state index of the subject; the state recognition model is trained according to the standardized state index, so that the state data can be converted into a standardized state index, and based on the standardized state index The trained state recognition model can more accurately identify the state of the subject.

Description

Training method, apparatus, device and readable storage medium for state recognition model

技术领域technical field

本公开涉及数据处理技术领域，具体涉及一种状态识别模型的训练方法、装置、电子设备及可读存储介质。The present disclosure relates to the technical field of data processing, and in particular, to a training method, apparatus, electronic device and readable storage medium for a state recognition model.

背景技术Background technique

目前，在对主体的某种状态进行识别时，可能会关注主体本身的特征数据以及与主体有关的状态数据，然而，本发明人发现，由于实际情况复杂，上述内容不能很好地反应主体的真实状态，容易出现状态识别结果不准确的问题。At present, when identifying a certain state of a subject, attention may be paid to the characteristic data of the subject itself and the state data related to the subject. However, the inventors found that due to the complex actual situation, the above content cannot well reflect the subject's state. The real state is prone to inaccurate state recognition results.

例如，在做企业风险识别的时候，通常会关注企业的诉讼风险。然而，对于规模较大的企业，更容易发生诉讼案件，但并不意味着属于高诉讼风险的企业，相反地，规模较小的企业也许涉及的诉讼案件并不多，但相对于同类企业来说，可能已经代表该企业具有较高的诉讼风险。因此，用户仅从被告记录数量中很难直观地判断出该企业的诉讼风险。For example, when doing enterprise risk identification, we usually pay attention to the litigation risk of the enterprise. However, for larger enterprises, litigation cases are more likely to occur, but it does not mean that they are enterprises with high litigation risk. On the contrary, smaller enterprises may be involved in not many litigation cases, but compared with similar enterprises Said, may already represent that the business has a higher risk of litigation. Therefore, it is difficult for users to intuitively judge the litigation risk of the enterprise only from the number of defendants' records.

发明内容SUMMARY OF THE INVENTION

为了解决相关技术中的问题，本公开实施例提供一种状态识别模型的训练方法及装置、状态识别方法及装置、设备及介质。In order to solve the problems in the related art, the embodiments of the present disclosure provide a training method and apparatus for a state recognition model, a state recognition method and apparatus, equipment, and a medium.

第一方面，本公开实施例中提供了一种状态识别模型的训练方法。In a first aspect, an embodiment of the present disclosure provides a method for training a state recognition model.

具体地，所述状态识别模型的训练方法，包括：Specifically, the training method of the state recognition model includes:

获得主体相关的特征数据，以及与所述主体相关的状态数据；obtain subject-related characteristic data, and state data related to said subject;

根据预设的映射关系，确定与所述特征数据相匹配的特定属性值；Determine a specific attribute value matching the feature data according to a preset mapping relationship;

基于所述与主体相关的状态数据以及所述特定属性值，确定所述主体的标准化状态指标；determining a normalized state indicator for the subject based on the subject-related state data and the specific attribute value;

根据所述标准化状态指标训练所述状态识别模型。The state recognition model is trained according to the standardized state indicators.

结合第一方面，本公开在第一方面的第一种实现方式中，该映射关系至少包括：In conjunction with the first aspect, in a first implementation manner of the first aspect of the present disclosure, the mapping relationship at least includes:

特征数据与特定属性值之间的对应关系；其中，不同取值范围的特征数据所对应的特征属性值不同。Correspondence between feature data and specific attribute values; wherein, feature data corresponding to different value ranges have different feature attribute values.

结合第一方面，本公开在第一方面的第二种实现方式中，所述基于所述与主体相关的状态数据以及所述特定属性值，确定所述主体的标准化状态指标包括：With reference to the first aspect, in a second implementation manner of the first aspect of the present disclosure, the determining of the standardized state index of the subject based on the state data related to the subject and the specific attribute value includes:

基于所述状态数据和所述特定属性值，确定原始状态指标；determining an original state indicator based on the state data and the specific attribute value;

根据不同主体的原始状态指标的最大值和最小值，将所述原始状态指标映射到预定取值区间，得到标准化状态指标。According to the maximum value and the minimum value of the original state index of different subjects, the original state index is mapped to a predetermined value interval to obtain a standardized state index.

结合第一方面，本公开在第一方面的第三种实现方式中，所述特征数据包括基于所述主体在主体关系图中的关联关系所确定的图特征。With reference to the first aspect, in a third implementation manner of the first aspect of the present disclosure, the feature data includes graph features determined based on the association relationship of the subject in the subject relationship graph.

结合第一方面的第三种实现方式，本公开在第一方面的第四种实现方式中，所述根据所述标准化状态指标训练所述状态识别模型包括：With reference to the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect of the present disclosure, the training of the state recognition model according to the standardized state index includes:

确定所述主体的特征数据的第一向量表示；determining a first vector representation of characteristic data of the subject;

将所述第一向量表示由高维稀疏的向量表示转化为低维稠密的第二向量表示；Converting the first vector representation from a high-dimensional sparse vector representation to a low-dimensional dense second vector representation;

将所述第二向量表示与所述主体在所述主体关系图中的相邻主体的特征数据的第三向量表示拼接，得到所述主体的特征数据的第四向量表示；The second vector representation is spliced with the third vector representation of the feature data of the subject in the subject relationship diagram adjacent to the subject to obtain the fourth vector representation of the subject's feature data;

将所述第四向量表示编码为具有预定维度的所述主体的特征数据的第五向量表示，作为所述状态识别模型的训练数据；encoding the fourth vector representation as a fifth vector representation of the feature data of the subject having a predetermined dimension as training data for the state recognition model;

以所述标准化状态指标为标签，训练所述状态识别模型。Using the standardized state index as a label, the state recognition model is trained.

结合第一方面的第四种实现方式，本公开在第一方面的第五种实现方式中，所述第一向量表示包括独热特征表示，所述将所述第一向量表示由高维稀疏的向量表示转化为低维稠密的第二向量表示包括：With reference to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect of the present disclosure, the first vector representation includes a one-hot feature representation, and the first vector representation is composed of high-dimensional sparse The vector representation of is transformed into a low-dimensional dense second vector representation consisting of:

将所述独热特征表示通过嵌入操作转化为第六向量表示；converting the one-hot feature representation into a sixth vector representation through an embedding operation;

将所述第六向量表示加权求和得到所述低维稠密的第二向量表示。A weighted summation of the sixth vector representation yields the low-dimensional dense second vector representation.

结合第一方面、第一方面的第一种至第五种实现方式中的任一项，本公开在第一方面的第六种实现方式中，该方法还包括：以图形化方式呈现所述标准化状态指标。With reference to the first aspect and any one of the first to fifth implementation manners of the first aspect, in a sixth implementation manner of the first aspect, the method further includes: graphically presenting the Standardized status indicators.

第二方面，本公开实施例中提供了一种状态识别方法。In a second aspect, an embodiment of the present disclosure provides a state identification method.

具体地，所述状态识别方法，包括：Specifically, the state identification method includes:

获取待识别主体的特征数据；Obtain the characteristic data of the subject to be identified;

将所述特征数据输入到如第一方面、第一方面的第一种至第六种中任一项所述的状态识别模型，以获取所述待识别主体的标准化状态指标。The feature data is input into the state identification model according to any one of the first aspect and the first to sixth aspects of the first aspect, so as to obtain a standardized state index of the subject to be identified.

第三方面，本公开实施例中提供了一种风险识别模型的训练方法。In a third aspect, an embodiment of the present disclosure provides a training method for a risk identification model.

具体地，所述风险识别模型的训练方法，包括：Specifically, the training method of the risk identification model includes:

获取企业的特征数据以及所述企业作为被告的合同诉讼案件的数量，其中，所述特征数据包括营业额；Obtain characteristic data of the enterprise and the number of contract litigation cases in which the enterprise is a defendant, wherein the characteristic data includes turnover;

根据所述营业额与合同数量之间的映射关系，确定所述企业的合同数量；According to the mapping relationship between the turnover and the contract quantity, determine the contract quantity of the enterprise;

基于所述合同诉讼案件的数量和所述企业的合同数量，确定所述企业的标准化风险指标；determining a standardized risk indicator for the enterprise based on the number of contract litigation cases and the number of contracts for the enterprise;

基于所述企业的特征数据确定训练数据，以所述标准化风险指标为标签，训练风险识别模型。The training data is determined based on the characteristic data of the enterprise, and the risk identification model is trained with the standardized risk index as a label.

第四方面，本公开实施例中提供了一种状态识别模型的训练装置。In a fourth aspect, an embodiment of the present disclosure provides an apparatus for training a state recognition model.

具体地，所述状态识别模型的训练装置，包括：Specifically, the training device for the state recognition model includes:

第一获取模块，被配置为获得主体相关的特征数据，以及与所述主体相关的状态数据；a first acquiring module, configured to acquire feature data related to the subject and state data related to the subject;

第一确定模块，被配置为根据预设的映射关系，确定与所述特征数据相匹配的特定属性值；a first determining module, configured to determine a specific attribute value matching the feature data according to a preset mapping relationship;

第二确定模块，被配置为基于所述与主体相关的状态数据以及所述特定属性值，确定所述主体的标准化状态指标；a second determination module configured to determine a standardized state index of the subject based on the subject-related state data and the specific attribute value;

训练模块，被配置为根据所述标准化状态指标训练所述状态识别模型。A training module configured to train the state recognition model according to the standardized state index.

结合第四方面，本公开在第四方面的第一种实现方式中，其中，所述映射关系至少包括：With reference to the fourth aspect, the present disclosure is in a first implementation manner of the fourth aspect, wherein the mapping relationship at least includes:

特征数据与特定属性值之间的对应关系，其中，不同取值范围的特征数据所对应的特征属性值不同。Correspondence between feature data and specific attribute values, wherein the feature attribute values corresponding to feature data with different value ranges are different.

结合第四方面，本公开在第四方面的第二种实现方式中，所述基于所述与主体相关的状态数据以及所述特定属性值，确定所述主体的标准化状态指标包括：With reference to the fourth aspect, in a second implementation manner of the fourth aspect of the present disclosure, the determining of the standardized state index of the subject based on the state data related to the subject and the specific attribute value includes:

根据不同主体原始状态指标的最大值和最小值，将所述原始状态指标映射到预定取值区间，得到标准化状态指标。According to the maximum and minimum values of the original state indexes of different subjects, the original state indexes are mapped to a predetermined value interval to obtain a standardized state index.

结合第四方面，本公开在第四方面的第三种实现方式中，所述特征数据包括基于所述主体在主体关系图中的关联关系所确定的图特征。With reference to the fourth aspect, in a third implementation manner of the fourth aspect of the present disclosure, the feature data includes graph features determined based on the association relationship of the subject in the subject relationship graph.

结合第四方面的第三种实现方式，本公开在第四方面的第四种实现方式中，所述根据所述标准化状态指标训练所述状态识别模型，包括：With reference to the third implementation manner of the fourth aspect, in a fourth implementation manner of the fourth aspect of the present disclosure, the training of the state recognition model according to the standardized state index includes:

结合第四方面的第四种实现方式，本公开在第四方面的第五种实现方式中，所述第一向量表示包括独热特征表示，所述将所述第一向量表示由高维稀疏的向量表示转化为低维稠密的第二向量表示包括：With reference to the fourth implementation manner of the fourth aspect, in a fifth implementation manner of the fourth aspect of the present disclosure, the first vector representation includes a one-hot feature representation, and the first vector representation is composed of high-dimensional sparse The vector representation of is transformed into a low-dimensional dense second vector representation consisting of:

结合第四方面、第四方面的第一种至第五种实现方式中的任一项，本公开在第四方面的第六种实现方式中，该装置还包括：With reference to the fourth aspect and any one of the first to fifth implementation manners of the fourth aspect, the present disclosure is in a sixth implementation manner of the fourth aspect, and the device further includes:

展示模块，被配置为以图形化方式呈现所述标准化状态指标。A presentation module configured to graphically present the standardized status indicators.

第五方面，本公开实施例中提供了一种状态识别装置。In a fifth aspect, an embodiment of the present disclosure provides a state identification device.

具体地，所述状态识别装置，包括：Specifically, the state identification device includes:

第二获取模块，被配置为获取待识别主体的特征数据；a second acquisition module, configured to acquire characteristic data of the subject to be identified;

识别模块，被配置为将所述特征数据输入到如第一方面、第一方面的第一种至第六种实现方式中任一项所述的状态识别模型，以获取所述待识别主体的标准化状态指标。The identification module is configured to input the feature data into the state identification model according to any one of the first aspect and the first to sixth implementation manners of the first aspect, so as to obtain the identification of the subject to be identified. Standardized status indicators.

第六方面，本公开实施例提供了一种电子设备，包括存储器和处理器，其中，所述存储器用于存储一条或多条计算机指令，其中，所述一条或多条计算机指令被所述处理器执行以实现如第一方面、第一方面的第一种至第六种实现方式、第二方面或第三方面中任一项所述的方法。In a sixth aspect, embodiments of the present disclosure provide an electronic device, including a memory and a processor, wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are processed by the The device executes to implement the method as described in any one of the first aspect, the first to sixth implementations of the first aspect, the second aspect or the third aspect.

第七方面，本公开实施例中提供了一种计算机可读存储介质，其上存储有计算机指令，该计算机指令被处理器执行时实现如第一方面、第一方面的第一种至第六种实现方式、第二方面或第三方面中任一项所述的方法。In a seventh aspect, an embodiment of the present disclosure provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed by a processor, implement the first aspect and the first to sixth aspects of the first aspect The method of any one of the implementations, the second aspect, or the third aspect.

根据本公开实施例提供的技术方案，通过获得主体相关的特征数据，以及与所述主体相关的状态数据；根据预设的映射关系，确定与所述特征数据相匹配的特定属性值；基于所述与主体相关的状态数据以及所述特定属性值，确定所述主体的标准化状态指标；根据所述标准化状态指标训练所述状态识别模型，从而可以将状态数据转化为标准化状态指标，基于该标准化状态指标训练的状态识别模型能够更为准确地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the characteristic data related to the subject and the state data related to the subject are obtained; the specific attribute value matching the characteristic data is determined according to the preset mapping relationship; The state data related to the subject and the specific attribute value are determined, and the standardized state index of the subject is determined; the state recognition model is trained according to the standardized state index, so that the state data can be converted into a standardized state index, based on the standardized state index. The state recognition model trained by the state index can more accurately identify the state of the subject.

根据本公开实施例提供的技术方案，通过映射关系至少包括特征数据与特定属性值之间的对应关系，其中，不同取值范围的特征数据所对应的特征属性值不同，从而能够基于该映射关系将特定事件的数量转化为标准化状态指标，基于该标准化状态指标训练的状态识别模型能够更为准确地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the mapping relationship includes at least the corresponding relationship between the feature data and the specific attribute value, wherein the feature attribute values corresponding to the feature data with different value ranges are different, so that the mapping relationship can be based on the mapping relationship. The number of specific events is converted into a standardized state index, and the state recognition model trained based on the standardized state index can more accurately identify the state of the subject.

根据本公开实施例提供的技术方案，通过基于所述与主体相关的状态数据以及所述特定属性值，确定所述主体的标准化状态指标包括：基于所述状态数据和所述特定属性值，确定原始状态指标；根据不同主体原始状态指标的最大值和最小值，将所述原始状态指标映射到预定取值区间，得到标准化状态指标，从而得到的标准化状态指标更加有效，基于该标准化状态指标训练的状态识别模型能够更为准确地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, determining the standardized state index of the subject based on the state data related to the subject and the specific attribute value includes: based on the state data and the specific attribute value, determining The original state index; according to the maximum and minimum values of the original state index of different subjects, the original state index is mapped to a predetermined value interval to obtain a standardized state index, so that the obtained standardized state index is more effective, and training based on the standardized state index The state recognition model can more accurately identify the state of the subject.

根据本公开实施例提供的技术方案，通过所述特征数据包括基于所述主体在主体关系图中的关联关系所确定的图特征，从而可以结合图特征更好地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the feature data includes graph features determined based on the association relationship of the subject in the subject relationship graph, so that the state of the subject can be better identified in combination with the graph features.

根据本公开实施例提供的技术方案，通过确定所述主体的特征数据的第一向量表示；将所述第一向量表示由高维稀疏的向量表示转化为低维稠密的第二向量表示；将所述第二向量表示与所述主体在所述主体关系图中的相邻主体的特征数据的第三向量表示拼接，得到所述主体的特征数据的第四向量表示；将所述第四向量表示编码为具有预定维度的所述主体的特征数据的第五向量表示，作为所述状态识别模型的训练数据；以所述标准化状态指标为标签，训练所述状态识别模型，从而能够更好地利用图特征识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the first vector representation of the feature data of the subject is determined; the first vector representation is converted from a high-dimensional sparse vector representation to a low-dimensional dense second vector representation; The second vector representation is spliced with the third vector representation of the feature data of the subject's adjacent subjects in the subject relationship diagram to obtain a fourth vector representation of the subject's feature data; the fourth vector representation is Representing the fifth vector representation encoded as the feature data of the subject with a predetermined dimension, as the training data of the state recognition model; using the standardized state index as a label, training the state recognition model, so as to be able to better Identify the state of the subject using graph features.

根据本公开实施例提供的技术方案，通过将所述独热特征表示通过嵌入操作转化为第六向量表示；将所述第六向量表示加权求和得到所述低维稠密的第二向量表示，从而解决了稀疏特征带来的资源占用过多的问题。According to the technical solutions provided by the embodiments of the present disclosure, the one-hot feature representation is converted into a sixth vector representation through an embedding operation; the weighted summation of the sixth vector representation obtains the low-dimensional dense second vector representation, Thus, the problem of excessive resource occupation caused by sparse features is solved.

根据本公开实施例提供的技术方案，通过以图形化方式呈现所述标准化状态指标，从而能够更加直观地展示主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, by presenting the standardized state indicators in a graphical manner, the state of the subject can be displayed more intuitively.

根据本公开实施例提供的技术方案，通过获取待识别主体的特征数据；将所述特征数据输入到如上述状态识别模型的训练方法训练的状态识别模型，以获取所述待识别主体的标准化状态指标，从而可以通过标准化状态指标更为准确地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the standardized state of the subject to be identified is obtained by acquiring the characteristic data of the subject to be identified; and inputting the characteristic data into the state identification model trained by the above-mentioned training method of the state identification model indicators, so that the state of the subject can be more accurately identified by standardized state indicators.

根据本公开实施例提供的技术方案，通过获取企业的特征数据以及所述企业作为被告的合同诉讼案件的数量，其中，所述特征数据包括营业额；根据所述营业额与合同数量之间的映射关系，确定所述企业的合同数量；基于所述合同诉讼案件的数量和所述企业的合同数量，确定所述企业的标准化风险指标；基于所述企业的特征数据确定训练数据，以所述标准化风险指标为标签，训练风险识别模型，从而可以将合同诉讼案件的数量转化为标准化风险指标，基于该标准化风险指标训练的风险识别模型能够更为准确地识别企业的风险。According to the technical solutions provided by the embodiments of the present disclosure, the characteristic data of an enterprise and the number of contract litigation cases in which the enterprise is a defendant are obtained, wherein the characteristic data includes turnover; according to the relationship between the turnover and the number of contracts Mapping relationship, determine the number of contracts of the enterprise; determine the standardized risk index of the enterprise based on the number of contract litigation cases and the number of contracts of the enterprise; determine the training data based on the characteristic data of the enterprise, and use the The standardized risk index is used as a label, and the risk identification model is trained, so that the number of contract litigation cases can be converted into standardized risk index, and the risk identification model trained based on the standardized risk index can more accurately identify the risk of the enterprise.

根据本公开实施例提供的技术方案，通过第一获取模块，被配置为获得主体相关的特征数据，以及与所述主体相关的状态数据；第一确定模块，被配置为根据预设的映射关系，确定与所述特征数据相匹配的特定属性值；第二确定模块，被配置为基于所述与主体相关的状态数据以及所述特定属性值，确定所述主体的标准化状态指标；训练模块，被配置为根据所述标准化状态指标训练所述状态识别模型，从而可以将特定事件的数量转化为标准化状态指标，基于该标准化状态指标训练的状态识别模型能够更为准确地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the first obtaining module is configured to obtain the characteristic data related to the subject and the state data related to the subject; the first determining module is configured to obtain according to the preset mapping relationship , determine a specific attribute value that matches the feature data; a second determining module is configured to determine a standardized state index of the subject based on the subject-related state data and the specific attribute value; a training module, The state recognition model is configured to train the state identification model according to the standardized state index, so that the number of specific events can be converted into a standardized state index, and the state identification model trained based on the standardized state index can more accurately identify the state of the subject.

根据本公开实施例提供的技术方案，通过所述映射关系至少包括特征数据与特定属性值之间的对应关系，其中，不同取值范围的特征数据所对应的特征属性值不同，从而能够基于该映射关系将特定事件的数量转化为标准化状态指标，基于该标准化状态指标训练的状态识别模型能够更为准确地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the mapping relationship includes at least the corresponding relationship between the feature data and the specific attribute value, wherein the feature attribute values corresponding to the feature data with different value ranges are different, so that the feature data can be based on the corresponding relationship. The mapping relationship converts the number of specific events into standardized state indicators, and the state recognition model trained based on the standardized state indicators can more accurately identify the state of the subject.

根据本公开实施例提供的技术方案，通过基于所述状态数据和所述特定属性值，确定原始状态指标；根据不同主体原始状态指标的最大值和最小值，将所述原始状态指标映射到预定取值区间，得到标准化状态指标，从而得到的标准化状态指标更加有效，基于该标准化状态指标训练的状态识别模型能够更为准确地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the original state index is determined based on the state data and the specific attribute value; the original state index is mapped to a predetermined value according to the maximum value and the minimum value of the original state index of different subjects According to the value interval, a standardized state index is obtained, so that the obtained standardized state index is more effective, and the state recognition model trained based on the standardized state index can more accurately identify the state of the subject.

根据本公开实施例提供的技术方案，通过基于所述主体在主体关系图中的关联关系所确定的图特征作为特征数据，从而可以结合图特征更好地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the state of the subject can be better identified in combination with the graph features by using the graph features determined based on the association relationship of the subjects in the subject relationship graph as feature data.

根据本公开实施例提供的技术方案，通过将所述独热特征表示的第一向量表示通过嵌入操作转化为第六向量表示；将所述第六向量表示加权求和得到所述低维稠密的第二向量表示，从而解决了稀疏特征带来的资源占用过多的问题。According to the technical solutions provided by the embodiments of the present disclosure, the first vector representation represented by the one-hot feature is converted into a sixth vector representation through an embedding operation; and the low-dimensional dense representation is obtained by weighted summation of the sixth vector representation. The second vector representation solves the problem of excessive resource occupation caused by sparse features.

根据本公开实施例提供的技术方案，通过展示模块，被配置为以图形化方式呈现所述标准化状态指标，从而能够更加直观地展示主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the display module is configured to graphically present the standardized state indicator, so that the state of the subject can be displayed more intuitively.

根据本公开实施例提供的技术方案，通过第二获取模块，被配置为获取待识别主体的特征数据；识别模块，被配置为将所述特征数据输入到如上文所述的状态识别模型的训练方法训练的状态识别模型，以获取所述待识别主体的标准化状态指标，从而可以通过标准化状态指标更为准确地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the second acquisition module is configured to acquire feature data of the subject to be recognized; the recognition module is configured to input the feature data into the training of the state recognition model as described above. The state recognition model trained by the method is used to obtain the standardized state index of the subject to be identified, so that the state of the subject can be more accurately identified through the standardized state index.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

结合附图，通过以下非限制性实施方式的详细描述，本公开的其它特征、目的和优点将变得更加明显。在附图中：Other features, objects and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments, taken in conjunction with the accompanying drawings. In the attached image:

图1示出根据本公开实施例的状态识别模型的训练方法的流程图；1 shows a flowchart of a training method for a state recognition model according to an embodiment of the present disclosure;

图2示出根据本公开实施例的确定特征数据与特定属性值之间的映射关系的流程图；2 shows a flowchart of determining a mapping relationship between feature data and specific attribute values according to an embodiment of the present disclosure;

图3示出根据本公开实施例的分段函数的示意图；3 shows a schematic diagram of a piecewise function according to an embodiment of the present disclosure;

图4示出根据本公开实施例的确定标准化状态指标的流程图；4 shows a flowchart of determining a standardized state indicator according to an embodiment of the present disclosure;

图5示出根据本公开实施例的基于图特征预测状态的示意图；FIG. 5 shows a schematic diagram of predicting a state based on a graph feature according to an embodiment of the present disclosure;

图6示出根据本公开实施例的主体关系图的示意图；6 shows a schematic diagram of a subject relationship diagram according to an embodiment of the present disclosure;

图7示出根据本公开实施例的根据所述标准化状态指标训练所述状态识别模型的流程图；7 shows a flowchart of training the state recognition model according to the standardized state index according to an embodiment of the present disclosure;

图8示出根据本公开实施例的特征数据处理的示意图；8 shows a schematic diagram of feature data processing according to an embodiment of the present disclosure;

图9示出根据本公开实施例的确定第二向量表示的流程图；FIG. 9 shows a flowchart of determining a second vector representation according to an embodiment of the present disclosure;

图10示出根据本公开实施例的图形化方式呈现标准化状态指标的示意图；10 shows a schematic diagram of graphically presenting standardized status indicators according to an embodiment of the present disclosure;

图11示出根据本公开实施例的状态识别方法的流程图；11 shows a flowchart of a state identification method according to an embodiment of the present disclosure;

图12示出根据本公开实施例的企业风险识别方法的流程图；12 shows a flowchart of an enterprise risk identification method according to an embodiment of the present disclosure;

图13示出根据本公开实施例的状态识别模型的训练装置的框图；13 shows a block diagram of a training apparatus for a state recognition model according to an embodiment of the present disclosure;

图14示出根据本公开实施例的状态识别装置的框图；14 shows a block diagram of a state identification device according to an embodiment of the present disclosure;

图15示出根据本公开实施例的电子设备的框图；15 shows a block diagram of an electronic device according to an embodiment of the present disclosure;

图16示出适于实现本公开实施例的方法和装置的计算机系统的结构示意图。FIG. 16 shows a schematic structural diagram of a computer system suitable for implementing the methods and apparatuses of embodiments of the present disclosure.

具体实施方式Detailed ways

下文中，将参考附图详细描述本公开的示例性实施例，以使本领域技术人员可容易地实现它们。此外，为了清楚起见，在附图中省略了与描述示例性实施例无关的部分。Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts unrelated to describing the exemplary embodiments are omitted from the drawings.

在本公开中，应理解，诸如“包括”或“具有”等的术语旨在指示本说明书中所公开的特征、数字、步骤、行为、部件、部分或其组合的存在，并且不欲排除一个或多个其他特征、数字、步骤、行为、部件、部分或其组合存在或被添加的可能性。In the present disclosure, it should be understood that terms such as "comprising" or "having" are intended to indicate the presence of features, numbers, steps, acts, components, parts, or combinations thereof disclosed in this specification, and are not intended to exclude a or multiple other features, numbers, steps, acts, components, parts, or combinations thereof may exist or be added.

另外还需要说明的是，在不冲突的情况下，本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。In addition, it should be noted that the embodiments of the present disclosure and the features of the embodiments may be combined with each other under the condition of no conflict. The present disclosure will be described in detail below with reference to the accompanying drawings and in conjunction with embodiments.

如上文所述，主体本身的特征数据以及与主体有关的事件不能很好地反应主体的真实状态，容易出现状态识别结果不准确的问题。As mentioned above, the characteristic data of the subject itself and the events related to the subject cannot reflect the real state of the subject well, and the problem of inaccurate state recognition results is prone to occur.

例如，在做企业风险识别的时候，通常会关注企业的诉讼风险。然而，对于规模较大的企业，更容易发生诉讼案件，但并不意味着规模较大的企业一定属于高诉讼风险的企业，相反地，规模较小的企业也许涉及的诉讼案件并不多，但相对于同类企业来说，可能已经代表该企业具有较高的诉讼风险。因此，用户仅从被告记录数量中很难直观地判断出该企业的诉讼风险。For example, when doing enterprise risk identification, we usually pay attention to the litigation risk of the enterprise. However, for larger enterprises, lawsuits are more likely to occur, but this does not mean that larger enterprises are necessarily high litigation risk enterprises. On the contrary, smaller enterprises may not be involved in many lawsuits. However, compared with similar companies, it may already represent that the company has a higher risk of litigation. Therefore, it is difficult for users to intuitively judge the litigation risk of the enterprise only from the number of defendants' records.

图1示出根据本公开实施例的状态识别模型训练方法的流程图。FIG. 1 shows a flowchart of a state recognition model training method according to an embodiment of the present disclosure.

如图1所示，该状态识别模型训练方法包括操作S110～S140。As shown in FIG. 1 , the state recognition model training method includes operations S110 to S140.

在操作S110，获得主体相关的特征数据，以及与所述主体相关的状态数据；In operation S110, characteristic data related to the subject and state data related to the subject are obtained;

在操作S120，根据预设的映射关系，确定与所述特征数据相匹配的特定属性值；In operation S120, according to a preset mapping relationship, a specific attribute value matching the feature data is determined;

在操作S130，基于所述与主体相关的状态数据以及所述特定属性值，确定所述主体的标准化状态指标；In operation S130, based on the subject-related state data and the specific attribute value, determining a standardized state index of the subject;

在操作S140，根据所述标准化状态指标训练所述状态识别模型。In operation S140, the state recognition model is trained according to the standardized state index.

根据本公开实施例，以识别企业合同被告风险为例，主体可以是企业，特征数据可以是包括诸如营业额等信息在内的各种企业信息，与所述主体相关的状态数据例如可以是与企业相关的特定事件的数量，特定事件例如可以是诉讼案件、合同诉讼案件或者该企业作为被告方的诉讼案件或合同诉讼案件，。即，操作S110可以是获取企业的特征数据以及作为被告方的合同诉讼案件的数量。According to an embodiment of the present disclosure, taking the identification of the defendant risk of an enterprise contract as an example, the subject may be an enterprise, the characteristic data may be various enterprise information including information such as turnover, and the status data related to the subject may be, for example, related to The number of specific events related to the enterprise, for example, the specific event can be a lawsuit, a contract lawsuit, or a lawsuit or contract lawsuit in which the company is the defendant. That is, operation S110 may be to acquire characteristic data of the enterprise and the number of contract litigation cases as the defendant.

根据本公开实施例，特定属性值是一个不易获得的信息，例如，在本公开实施例中，特定属性值可以是企业每年所签合同的数量。预设的映射关系可以是事先建立的特征数据与特定属性值之间的映射关系，从而在获取到主体的特征数据后，可以根据该映射关系确定主体的特定属性值。例如，发明人发现，企业每年所签合同的数量与企业的年营业额具有一定的正相关性，可以基于能够获取的数据建立年营业额与企业每年所签合同的数量之间的映射关系，从而在获得特征数据中的年营业额数据时可以根据该映射关系确定企业每年所签合同的数量的估计值。本公开实施例利用了不同规模企业的每年平均签合同的份数和企业的规模因子(如平均营业额)之间的关系拟合了函数，对于每一个企业可以利用该函数将其合同被告次数转化为归一化的合同诉讼风险，用于后续的判断和建模。According to the embodiment of the present disclosure, the specific attribute value is information that is not easy to obtain. For example, in the embodiment of the present disclosure, the specific attribute value may be the number of contracts signed by the enterprise each year. The preset mapping relationship may be a mapping relationship between the feature data and specific attribute values established in advance, so that after acquiring the feature data of the subject, the specific attribute value of the subject can be determined according to the mapping relationship. For example, the inventor found that the number of contracts signed by an enterprise each year has a certain positive correlation with the annual turnover of the enterprise, and a mapping relationship between the annual turnover and the number of contracts signed by the enterprise can be established based on the available data Therefore, when the annual turnover data in the characteristic data is obtained, the estimated value of the number of contracts signed by the enterprise each year can be determined according to the mapping relationship. In the embodiment of the present disclosure, a function is fitted by using the relationship between the average number of contracts signed annually by enterprises of different scales and the scale factor (such as average turnover) of the enterprise. Converted to normalized contract litigation risk for subsequent judgment and modeling.

根据本公开实施例，通过综合考量状态数据和特定属性值，可以确定主体的标准化状态指标，用于更好地评估主体的状态。例如，由于不同规模的企业涉及诉讼案件的情况不同，大规模企业相对于小微企业更容易涉及诉讼案件，可以基于企业作为被告的合同诉讼案件与每年所签合同的比值，确定标准化状态指标。According to the embodiments of the present disclosure, by comprehensively considering state data and specific attribute values, a standardized state index of the subject can be determined, which is used to better evaluate the state of the subject. For example, since enterprises of different sizes are involved in different litigation cases, large-scale enterprises are more likely to be involved in litigation cases than small and micro enterprises. Standardized status indicators can be determined based on the ratio of contract litigation cases in which the enterprise is the defendant to the contracts signed each year.

根据本公开实施例，基于状态识别模型，可以使用主体前一段时间的所有特征，去预测主体接下来一段时间的状态，例如，可以使用每个企业的前一段时间的所有特征，去预测企业接下来一段时间的合同诉讼风险。该些特征为可能与合同被告风险有关的特征，例如企业的行业、地区、规模特征、财务状况等等。如此，不仅能掌握企业过去的合同诉讼风险，还能预测接下来更可能接近真实的合同诉讼风险。According to the embodiment of the present disclosure, based on the state recognition model, all the features of the subject in the previous period can be used to predict the state of the subject in the next period. For example, all the characteristics of each enterprise in the previous period can be used to predict the enterprise Contract litigation risk over time. These characteristics are characteristics that may be related to the risk of contract defendants, such as the industry, region, size characteristics, financial status and so on of the enterprise. In this way, not only can we grasp the company's past contract litigation risks, but also predict the future contract litigation risks that are more likely to be closer to the real ones.

例如，可以根据所获取的裁判文书中2019年的记录统计2019年每个企业的合同诉讼风险作为标签，并使用企业2018年的各种历史特征，比如2018年被告次数、2018年行政处罚次数等等去预测该标签。当然企业的一些基本面信息如注册资本这些，通常不会发生变化，可以选用2018或者2019任一年的数据即可。如此，利用训练好的模型，在测试集上，当需要分析一个新的企业时，利用该企业的历史特征，即可预测出当前企业的标准化的合同诉讼风险。For example, the contract litigation risk of each enterprise in 2019 can be counted according to the records in the obtained judgment documents in 2019 as a label, and various historical characteristics of the enterprise in 2018 can be used, such as the number of defendants in 2018, the number of administrative penalties in 2018, etc. wait to predict the label. Of course, some fundamental information of the company, such as registered capital, usually does not change. You can choose the data of 2018 or 2019. In this way, using the trained model, on the test set, when a new enterprise needs to be analyzed, the standardized contract litigation risk of the current enterprise can be predicted by using the historical characteristics of the enterprise.

根据本公开实施例，所述映射关系至少包括特征数据与特征属性值之间的对应关系，其中，不同取值范围的特征数据所对应的特征属性值不同。According to an embodiment of the present disclosure, the mapping relationship includes at least a corresponding relationship between feature data and feature attribute values, wherein feature data corresponding to different value ranges have different feature attribute values.

图2示出根据本公开实施例的确定特征数据与特定属性值之间的映射关系的流程图。FIG. 2 shows a flowchart of determining a mapping relationship between feature data and specific attribute values according to an embodiment of the present disclosure.

如图2所示，该方法还可以包括操作S210～S230。As shown in FIG. 2, the method may further include operations S210-S230.

在操作S210，基于所述特征数据将所述主体划分为多个类别，每个类别对应于所述特征数据的不同取值区间；In operation S210, the subject is divided into a plurality of categories based on the feature data, and each category corresponds to a different value interval of the feature data;

在操作S220，确定每个类别中的主体的特定属性值的平均值；In operation S220, an average value of specific attribute values of subjects in each category is determined;

在操作S230，基于所述特征数据的取值区间以及所述特定属性值的平均值，构建分段函数作为所述特征数据与特定属性值之间的映射关系。In operation S230, based on the value interval of the feature data and the average value of the specific attribute value, a piecewise function is constructed as a mapping relationship between the feature data and the specific attribute value.

仍以企业合同诉讼风险为例。可以首先从裁判文书中抽取企业合同相关的案件，从这批案件中，抽取被告方是企业的记录。抽取完成后，分别统计每个企业的合同相关被告次数。Take corporate contract litigation risk as an example. Cases related to enterprise contracts can be extracted from the judgment documents first, and from this batch of cases, the records that the defendant is an enterprise can be extracted. After the extraction is completed, the number of contract-related defendants of each enterprise is counted separately.

上文提及，单独的被告次数并不能直接体现企业的合同被告风险，需要按照一定的规则进行标准化。标准化状态指标例如可以被定义为与当前企业签每一份合同的平均风险R_contract＝C/N_contract，其中，C是当前企业的历史被告次数，N_contract是企业历史所签合同数。通过抽样调研获得国内不同规模企业的每年平均签合同的份数和企业的平均营业额之间的关系如下表所示。As mentioned above, the number of defendants alone does not directly reflect the contract defendant risk of an enterprise, and it needs to be standardized according to certain rules. The standardized state index can be defined as, for example, the average risk of each contract signed with the current enterprise R _contract =C/N _contract , where C is the historical number of defendants of the current enterprise, and N _contract is the number of contracts signed by the enterprise in history. The relationship between the annual average number of contracts signed by domestic enterprises of different scales and the average turnover of the enterprise is obtained through the sample survey as shown in the following table.

表1Table 1

在每个企业级别的区间段上，假设在单独的区间段上都是线性增长的，这样我们就能够根据当前区间的营业额上下界，以及该区间段上的平均合同个数构建线性函数，作为对实际情况的估计，如图3所示。于是，对于待识别主体，可以通过如图3所示的分段函数，利用特征数据估算出该主体的特定属性值，从而得到主体的标签，例如企业的合同被告风险值R_contract。On each enterprise-level interval, assuming linear growth in individual intervals, we can construct a linear function based on the upper and lower turnover bounds of the current interval and the average number of contracts in that interval, As an estimate of the actual situation, as shown in Figure 3. Therefore, for the subject to be identified, the specific attribute value of the subject can be estimated by using the feature data through the piecewise function as shown in Figure 3, so as to obtain the subject's label, such as the enterprise's contract defendant risk value R _contract .

根据本公开实施例提供的技术方案，通过映射关系至少包括特征数据与特定属性值之间的对应关系；其中，不同取值范围的特征数据所对应的特征属性值不同，从而能够基于该映射关系将特定事件的数量转化为标准化状态指标，基于该标准化状态指标训练的状态识别模型能够更为准确地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the mapping relationship includes at least the corresponding relationship between the feature data and the specific attribute value; wherein, the feature attribute values corresponding to the feature data in different value ranges are different, so that the mapping relationship can be based on the mapping relationship. The number of specific events is converted into a standardized state index, and the state recognition model trained based on the standardized state index can more accurately identify the state of the subject.

图4示出根据本公开实施例的确定标准化状态指标的流程图。FIG. 4 shows a flow diagram of determining a normalized state indicator according to an embodiment of the present disclosure.

如图4所示，上述操作S130，即基于所述状态数据和所述特定属性值，确定所述主体的标准化状态指标可以包括操作S410和S420。As shown in FIG. 4 , the above operation S130, that is, determining the normalized state index of the subject based on the state data and the specific attribute value may include operations S410 and S420.

在操作S410，基于所述状态数据和所述特定属性值，确定原始状态指标；In operation S410, based on the state data and the specific attribute value, determine an original state indicator;

在操作S420，根据不同主体原始状态指标的最大值和最小值，将所述原始状态指标映射到预定取值区间，得到标准化状态指标。In operation S420, according to the maximum value and the minimum value of the original state index of different subjects, the original state index is mapped to a predetermined value interval to obtain a standardized state index.

根据本公开实施例，还可以利用所有企业的R_contract进行最大值和最小值缩放标准化，使得整体的分数范围在预定区间内，例如为0～10分，将标准化之后得到的R′_contract作为标准化状态指标。According to the embodiment of the present disclosure, the R _contracts of all enterprises can also be used to perform scaling normalization of the maximum value and the minimum value, so that the overall score range is within a predetermined interval, for example, 0 to 10 points, and the R′ _contract obtained after normalization is used as the standardization Status indicator.

根据本公开实施例提供的技术方案，通过基于所述特定事件的数量状态数据和所述主体的特定属性值，确定原始状态指标；根据不同主体基于不同主体的原始状态指标的最大值和最小值，将所述原始状态指标映射到预定取值区间，得到标准化状态指标，从而得到的标准化状态指标更加有效，基于该标准化状态指标训练的状态识别模型能够更为准确地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the original state index is determined based on the quantitative state data of the specific event and the specific attribute value of the subject; the maximum and minimum values of the original state index of different subjects are based on different subjects. , the original state index is mapped to a predetermined value interval to obtain a standardized state index, so that the obtained standardized state index is more effective, and the state recognition model trained based on the standardized state index can more accurately identify the state of the subject.

目前的相关技术更多地是考虑当前主体的历史信息对该主体的状态的影响，本发明人发现，在实际情况中，与当前主体具有关联关系的主体的信息对于预测当前主体的状态也有很大帮助。这种关联节点的各种数据和记录在之前的方案中没有很好地被利用。The current related art is more concerned with the influence of the current subject's historical information on the state of the subject. The inventors found that, in actual situations, the information of the subject that has an associated relationship with the current subject is also very useful for predicting the state of the current subject. Great help. Various data and records of such associated nodes are not well utilized in previous schemes.

根据本公开实施例，所述特征数据包括基于所述主体在主体关系图中的关联关系所确定的图特征。According to an embodiment of the present disclosure, the feature data includes graph features determined based on the association relationship of the subject in the subject relationship graph.

例如，仍以预测企业合同被告风险为例，该关联关系例如可以包括投资关系，还可以包括股权架构、合作关系、竞争关系以及其他潜在关系等。这些企业之间的关联关系对于预测企业合同被告风险也会有很大的帮助。单个企业的诉讼记录可能会存在偶然的因素，而如果利用全量数据(尤其是结合了邻居节点属性和记录的图特征)训练一个更全面的合同诉讼预测模型的话，结果会更加准确。For example, still taking the prediction of the defendant risk of a corporate contract as an example, the associated relationship may include, for example, an investment relationship, and may also include an equity structure, a partnership relationship, a competitive relationship, and other potential relationships. The relationship between these enterprises will also be of great help in predicting the risk of contractual defendants. There may be accidental factors in the litigation records of a single enterprise, and if a more comprehensive contract litigation prediction model is trained with full data (especially combining the attributes of neighbor nodes and the graph features of records), the results will be more accurate.

图5示出根据本公开实施例的基于图特征预测状态的示意图。FIG. 5 shows a schematic diagram of predicting a state based on a graph feature according to an embodiment of the present disclosure.

如图5所示，左右两图即主体关系图的示例性示意图。其中，标记为1的节点代表主体状态为第一状态的样本，例如合同被告风险标签为高风险的企业，标记为0节点的代表主体状态为第二状态的样本，例如合同被告风险标签为低风险的企业。未标记的节点表示待识别的主体。周围环境其他节点的状态对于判断当前主体的状态也是有帮助的，可以基于待识别的主体本身的属性和周围节点的状态预测该待识别的主体的状态。As shown in FIG. 5 , the left and right images are an exemplary schematic diagram of the subject relationship diagram. Among them, the nodes marked with 1 represent the samples whose subject state is the first state, such as the enterprise whose contract defendant risk label is high risk, and the node marked 0 represents the samples whose subject state is the second state, such as the contract defendant risk label is low risky business. Unlabeled nodes represent subjects to be identified. The states of other nodes in the surrounding environment are also helpful for judging the state of the current subject, and the state of the subject to be identified can be predicted based on the attributes of the subject itself and the states of surrounding nodes.

例如，一个公司投资的子公司经常被告，那么该公司的被告风险可能也会加大。这里关联关系以投资关系为例。如果主体关系图中的一个节点周围有2个或更多个有合同被告记录高风险的节点，那么该公司的被告风险就增大。那么可以假设计算出一个特征就是当前公司“邻居节点中存在合同被告记录的个数”的这个特征。又或者说，只有当某一个节点的合同被告次数超过5次以上，才认为是高风险节点，那么这样一个特征就是“邻居节点中存在合同被告记录超过5次的个数”。甚至每个邻居节点也可以自带属性，还可以设定这样一个特征“邻居节点中存在合同被告记录并且是私营公司”。类似这样的图特征可以有很多，模型能够自动学习到这些图相关的特征，从而提升状态识别的准确率。For example, if a company invests in subsidiaries that are frequently accused, the company's risk of being accused may also increase. The association relationship here is an example of an investment relationship. If there are 2 or more nodes around a node in the subject relationship graph with high risk of contract defendant records, then the company's defendant risk increases. Then it can be assumed that a feature calculated is the feature of the current company "the number of records of contract defendants in neighbor nodes". In other words, only when the number of contract defendants of a node exceeds 5 times, it is considered a high-risk node, then such a feature is "the number of contract defendant records in neighbor nodes exceeds 5 times". Even each neighbor node can have its own attributes, and it is also possible to set such a feature as "there is a contract defendant record in the neighbor node and it is a private company". There can be many graph features like this, and the model can automatically learn the features related to these graphs, thereby improving the accuracy of state recognition.

根据本公开实施例，图特征的利用例如可以采用GraphSage的原理。GraphSage的基本思想是去学习一个节点的信息是如何通过其邻居节点的特征聚合而来的。相比于图卷积神经网络(GCN)等方法学到的是每个节点的图特征是固定的这个缺点，GraphSage方法学到的节点的图特征是根据节点的邻居关系的变化而变化的，也就是说，即使是旧的节点，如果建立了一些新的连接关系，那么其对应的图特征也会变化，而且也很方便地学到。下面结合图6对GraphSage的原理进行简单说明。According to an embodiment of the present disclosure, the utilization of graph features may, for example, adopt the principle of GraphSage. The basic idea of GraphSage is to learn how the information of a node is aggregated through the features of its neighbor nodes. Compared with the shortcomings of graph convolutional neural network (GCN) and other methods, the graph features of each node are fixed. That is to say, even if it is an old node, if some new connection relationships are established, its corresponding graph features will also change, and it is also easy to learn. The principle of GraphSage is briefly described below with reference to FIG. 6 .

图6示出根据本公开实施例的主体关系图的示意图。FIG. 6 shows a schematic diagram of a subject relationship diagram according to an embodiment of the present disclosure.

如图6所示，A的邻居有B、C、D。以其中一个邻居B为例，该节点也具有两个邻居A和C。以

分别表示节点A、B、C、D的初始属性。在第一轮的时候，对于B节点来说，先将邻居节点A和C的属性

和

加权聚合之后得到的向量与B当前的属性向量

进行拼接，得到

同理可以得到

然后，在第二轮的时候，将

进行加权聚合，得到A节点的邻居节点汇聚向量

再将其与当前A节点本身向量

拼接得到节点A的新向量

以此类推可以得到任意阶数的特征向量，具体阶数的选择可以根据实际需要进行设定。通过Graphsage可以较好地利用邻居节点的属性来预测当前节点的状态。As shown in Figure 6, A's neighbors are B, C, and D. Take one of the neighbors B as an example, the node also has two neighbors A and C. by

Represent the initial properties of nodes A, B, C, and D, respectively. In the first round, for node B, the attributes of neighbor nodes A and C are first

and

The vector obtained after weighted aggregation and the current attribute vector of B

splicing to get

The same can be obtained

Then, in the second round, the

Perform weighted aggregation to get the aggregation vector of neighbor nodes of node A

Then compare it with the current A node itself vector

Splicing to get the new vector of node A

By analogy, eigenvectors of any order can be obtained, and the selection of the specific order can be set according to actual needs. Through Graphsage, the properties of neighbor nodes can be better used to predict the state of the current node.

然而，本发明人发现，传统的Graphsage在碰到很多枚举型特征的时候，由于枚举特征的one-hot(独热)变量的稀疏性，会导致模型效果的变差。例如，所有企业涉及到1000个行业，那么行业属性的one-hot向量就有1000维，特征会非常稀疏。若有多个类似的枚举型属性特征，需要先将该属性进行one-hot处理后拼接到其他属性向量上，就会更加稀疏了。并且，图上很多节点可能没有属性。这样一来普通的Graphsage就没法处理这种属性特征向量稀疏性的问题。However, the inventors found that when the traditional Graphsage encounters many enumerated features, the model effect will be deteriorated due to the sparsity of the one-hot (one-hot) variables of the enumerated features. For example, if all enterprises involve 1000 industries, the one-hot vector of industry attributes will have 1000 dimensions, and the features will be very sparse. If there are multiple similar enumerated attribute features, the attribute needs to be one-hot processed and then spliced to other attribute vectors, which will be more sparse. Also, many nodes on the graph may not have attributes. In this way, ordinary Graphsage cannot deal with the sparsity of this attribute feature vector.

本公开实施例提出一种基于枚举属性embedding压缩的改进版Graphsage方法，通过学习每一个节点的属性embedding的平均池化，实现对多个枚举型属性进行信息压缩，帮助模型更好地学习特征。The embodiment of the present disclosure proposes an improved Graphsage method based on enumeration attribute embedding compression. By learning the average pooling of attribute embedding of each node, information compression of multiple enumeration attributes is realized, which helps the model to learn better feature.

图7示出根据本公开实施例的根据所述标准化状态指标训练所述状态识别模型的流程图。FIG. 7 shows a flowchart of training the state recognition model according to the standardized state index according to an embodiment of the present disclosure.

如图7所示，该方法包括操作S710～S750。As shown in FIG. 7 , the method includes operations S710˜S750.

在操作S710，确定所述主体的特征数据的第一向量表示；In operation S710, determining a first vector representation of the characteristic data of the subject;

在操作S720，将所述第一向量表示由高维稀疏的向量表示转化为低维稠密的第二向量表示；In operation S720, the first vector representation is converted from a high-dimensional sparse vector representation to a low-dimensional dense second vector representation;

在操作S730，将所述第二向量表示与所述主体在所述主体关系图中的相邻主体的特征数据的第三向量表示拼接，得到所述主体的特征数据的第四向量表示；In operation S730, the second vector representation is spliced with the third vector representation of the feature data of the subject's adjacent subjects in the subject relationship diagram to obtain a fourth vector representation of the subject's feature data;

在操作S740，将所述第四向量表示编码为具有预定维度的所述主体的特征数据的第五向量表示，作为所述状态识别模型的训练数据；In operation S740, encoding the fourth vector representation as a fifth vector representation of the feature data of the subject having a predetermined dimension as training data for the state recognition model;

在操作S750，以所述标准化状态指标为标签，训练所述状态识别模型。In operation S750, the state recognition model is trained using the standardized state index as a label.

根据本公开实施例，第一向量表示例如可以是各种枚举型one-hot特征表示，通过将该些高维稀疏的向量表示转化为低维稠密的第二向量表示，至少部分地解决了向量稀疏性的问题。通过Graphsage的思想，使用转化后的第二向量表示替代节点X初始属性

通过将该节点的特征向量与相邻节点的第三特征向量拼接得到第四特征向量，并重新编码为预定维度的第五特征向量，可以在原有的属性特征中融入图特征。According to an embodiment of the present disclosure, the first vector representation may be, for example, various enumerated one-hot feature representations. By converting these high-dimensional and sparse vector representations into low-dimensional and dense second vector representations, the solution is at least partially solved. The problem of vector sparsity. Through the idea of Graphsage, use the transformed second vector to represent the initial attribute of the replacement node X

By splicing the feature vector of the node with the third feature vector of the adjacent node to obtain a fourth feature vector, and re-encoding it into a fifth feature vector of a predetermined dimension, the graph features can be integrated into the original attribute features.

其中，节点embedding信息能同时包含两种信息：邻居节点的属性信息到当前节点的聚合，以及，节点间位置上下文的信息。第一种信息可以通过有监督损失进行学习，即当前节点的状态标签，将其邻居节点的属性聚合到当前节点之后作为当前节点的特征，利用标签去反向优化邻居节点的embedding。该信息的作用可以理解为如果某个特征(比如是否失信)对于状态的作用较大，那么通过聚合邻居节点中的该特征能更好地判断当前节点的状态。第二种信息可以通过无监督损失进行优化，主要思想是让相邻的节点的embedding距离更接近，具体可采用类似于node2vec中的负采样的方法。该信息的作用可以理解为如果邻居节点中状态标签为第一状态的数量很多，那么当前节点的状态为第一状态的概率也很大。Among them, the node embedding information can contain two kinds of information at the same time: the aggregation of the attribute information of the neighbor node to the current node, and the information of the location context between the nodes. The first kind of information can be learned through supervised loss, that is, the state label of the current node, the attributes of its neighbor nodes are aggregated into the current node as the feature of the current node, and the label is used to reversely optimize the embedding of the neighbor node. The role of this information can be understood as if a certain feature (such as trustworthiness) has a greater effect on the state, then the state of the current node can be better judged by aggregating the feature in the neighbor nodes. The second kind of information can be optimized by unsupervised loss. The main idea is to make the embedding distance of adjacent nodes closer. Specifically, a method similar to negative sampling in node2vec can be used. The role of this information can be understood that if the number of state labels in the neighbor nodes is the first state, the probability that the state of the current node is the first state is also high.

下面结合图8所示意的实施例对图7的方法进行说明。The method of FIG. 7 will be described below with reference to the embodiment shown in FIG. 8 .

图8示出根据本公开实施例的特征数据处理的示意图。FIG. 8 shows a schematic diagram of feature data processing according to an embodiment of the present disclosure.

如图8所示，对于节点A、B、C，初始都是通过不同属性的one-hot特征向量进行表示，即节点A、B、C分别有各自的第一向量表示。仍以企业为例，企业自身可以存在一个Nodeone-hot，然后企业的每一个属性i存在一个Feat_{i}one-hot，将这些one-hot向量转化为预定维度的稠密向量，例如embedding向量，将属性的Feat_{i}embedding和企业自身的Node embedding做加权求和后(或者取均值)，得到该企业的最终embedding向量，即第二向量表示。其中，如果某企业某个属性为空，则计算该企业的第二特征向量时不考虑该属性即可。节点A、B、C可以分别计算各自的第二向量表示，如图8所示。通过将节点B和节点C的第二向量表示进行拼接和编码，可以得到节点A的邻居节点的第三向量表示，再与节点A的第二向量表示进行拼接，得到节点A的第四向量表示，通过编码到预定维度，得到第五向量表示，作为节点A的融合了一阶邻居节点的图特征的向量表示。通过以上方法，可以得到各个节点的融合了给定阶数的邻居节点的图特征的向量表示，作为该状态识别模型的训练数据，可以上述标准化状态指标为标签，训练该状态识别模型。As shown in Figure 8, for nodes A, B, and C, they are initially represented by one-hot feature vectors with different attributes, that is, nodes A, B, and C are represented by their respective first vectors. Still taking the enterprise as an example, the enterprise itself can have a Nodeone-hot, and then each attribute i of the enterprise has a Feat_{i}one-hot, and these one-hot vectors are converted into dense vectors of predetermined dimensions, such as embedding vectors, After the weighted summation (or average) of the attribute's Feat_{i}embedding and the enterprise's own Node embedding, the final embedding vector of the enterprise is obtained, that is, the second vector representation. Among them, if an attribute of an enterprise is empty, the attribute may not be considered when calculating the second feature vector of the enterprise. Nodes A, B, C can compute their respective second vector representations, respectively, as shown in FIG. 8 . By splicing and encoding the second vector representations of node B and node C, the third vector representation of the neighbor nodes of node A can be obtained, and then spliced with the second vector representation of node A to obtain the fourth vector representation of node A , by encoding to a predetermined dimension, a fifth vector representation is obtained, which is the vector representation of node A's graph features fused with first-order neighbor nodes. Through the above method, a vector representation of each node's graph features fused with neighbor nodes of a given order can be obtained. As the training data of the state recognition model, the state recognition model can be trained by using the standardized state index as a label.

图9示出根据本公开实施例的确定第二向量表示的流程图。FIG. 9 shows a flowchart of determining a second vector representation according to an embodiment of the present disclosure.

根据本公开实施例，所述第一向量表示包括独热特征表示，如图9所示，该操作S720，即将所述第一向量表示由高维稀疏的向量表示转化为低维稠密的第二向量表示可以包括操作S910和S920。According to an embodiment of the present disclosure, the first vector representation includes a one-hot feature representation. As shown in FIG. 9 , the operation S720 is to convert the first vector representation from a high-dimensional sparse vector representation to a low-dimensional dense second representation The vector representation may include operations S910 and S920.

在操作S910，将所述独热特征表示通过嵌入操作转化为第六向量表示；In operation S910, converting the one-hot feature representation into a sixth vector representation through an embedding operation;

在操作S920，将所述第六向量表示加权求和得到所述低维稠密的第二向量表示。In operation S920, the sixth vector representation is weighted and summed to obtain the low-dimensional dense second vector representation.

根据本公开实施例，独热特征表示即one-hot特征表示。例如，对于具有50个非空属性的主体而言，该50个属性可以分别被表示为50个one-hot特征表示，但这种表示方式过于稀疏，不利于后续的处理。本公开实施例的方法可以对该50个one-hot向量通过嵌入操作(embedding)转化为50个低维稠密的向量表示，即通常所说的embedding向量表示，然后对50个embedding向量聚合处理，例如可以对50个属性分别设定权重，通过加权求和的方式得到该主体的一个向量表示，即第二向量表示。According to an embodiment of the present disclosure, the one-hot feature representation is a one-hot feature representation. For example, for a subject with 50 non-null attributes, the 50 attributes can be represented as 50 one-hot feature representations respectively, but this representation is too sparse, which is not conducive to subsequent processing. The method of the embodiment of the present disclosure can convert the 50 one-hot vectors into 50 low-dimensional dense vector representations through an embedding operation (embedding), that is, the so-called embedding vector representation, and then aggregate the 50 embedding vectors, For example, weights can be set for 50 attributes respectively, and a vector representation of the subject, that is, a second vector representation, can be obtained by means of weighted summation.

根据本公开实施例，如果属性不是枚举类型one-hot而是连续数值型的，可以采用两种方案进行处理，一是直接将连续数值型的属性手动分段进行离散化，转换为可枚举的类型；另一种是将数值归一化后拼接到Mixture embedding上作为其中的1位。According to the embodiment of the present disclosure, if the attribute is not an enumeration type one-hot but a continuous numerical type, two solutions can be used for processing. The type of reference; the other is to normalize the value and splicing it onto the Mixture embedding as one of the bits.

根据本公开实施例，该方法还可以包括以图形化方式呈现所述标准化状态指标。例如，图10示出根据本公开实施例的图形化方式呈现标准化状态指标的示意图，如图10所示，该标准化状态指标例如可以以仪表盘样式呈现在用户界面上，从而可以直观地提示用户当前主体的标准化状态指标。According to an embodiment of the present disclosure, the method may further include graphically presenting the normalized state indicator. For example, FIG. 10 shows a schematic diagram of presenting standardized status indicators in a graphical manner according to an embodiment of the present disclosure. As shown in FIG. 10 , the standardized status indicators can be presented on a user interface in a dashboard style, for example, so that the user can be intuitively prompted A normalized state indicator for the current subject.

图11示出根据本公开实施例的状态识别方法的流程图。FIG. 11 shows a flowchart of a state identification method according to an embodiment of the present disclosure.

如图11所示，该方法包括操作S1110和S1120。As shown in FIG. 11 , the method includes operations S1110 and S1120.

在操作S1110，获取待识别主体的特征数据；In operation S1110, acquiring characteristic data of the subject to be identified;

在操作S1120，将所述特征数据输入到如图1～图10所描述的任意一种状态识别模型的训练方法训练的状态识别模型，以获取所述待识别主体的标准化状态指标。In operation S1120, the feature data is input into a state recognition model trained by any one of the state recognition model training methods described in Figs. 1 to 10 to obtain a standardized state index of the subject to be recognized.

根据本公开实施例提供的技术方案，通过获取待识别主体的特征数据；将所述特征数据输入到如上述状态识别模型训练方法训练的状态识别模型，以获取所述待识别主体的标准化状态指标，从而可以通过标准化状态指标更为准确地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, by acquiring the characteristic data of the subject to be identified; and inputting the characteristic data into the state identification model trained by the above state identification model training method, to obtain the standardized state index of the subject to be identified , so that the state of the subject can be more accurately identified through the standardized state index.

图12示出根据本公开实施例的企业风险识别方法的流程图。FIG. 12 shows a flowchart of an enterprise risk identification method according to an embodiment of the present disclosure.

如图12所示，该方法包括操作S1210～S1240。As shown in FIG. 12 , the method includes operations S1210˜S1240.

在操作S1210，获取企业的特征数据以及所述企业作为被告的合同诉讼案件的数量，其中，所述特征数据包括营业额；In operation S1210, characteristic data of the enterprise and the number of contract litigation cases in which the enterprise is a defendant are acquired, wherein the characteristic data includes turnover;

在操作S1220，根据所述营业额与合同数量之间的映射关系，确定所述企业的合同数量；In operation S1220, the contract quantity of the enterprise is determined according to the mapping relationship between the turnover and the contract quantity;

在操作S1230，基于所述合同诉讼案件的数量和所述企业的合同数量，确定所述企业的标准化风险指标；In operation S1230, a standardized risk index of the enterprise is determined based on the number of contract litigation cases and the number of contracts of the enterprise;

在操作S1240，基于所述企业的特征数据确定训练数据，以所述标准化风险指标为标签，训练风险识别模型。In operation S1240, training data is determined based on the characteristic data of the enterprise, and a risk identification model is trained with the standardized risk index as a label.

例如，某企业在拟与其他主体签订合同前，该企业的公司法务可以应用本公开实施例的方法评估对方主体的风险状态，作为决策的依据。For example, before an enterprise intends to sign a contract with another subject, the corporate legal affairs of the enterprise may apply the method of the embodiments of the present disclosure to assess the risk status of the other subject as a basis for decision-making.

本领域技术人员可以理解，本公开实施例中的以上方案不仅可以用于确定待识别企业的风险，还可以用于识别企业以内或以外的组织、人员的状态或风险，也可以用于识别自然界中的生物或非生物的状态或风险，也可以用于识别以数据形式存在的待识别主体的状态或风险。Those skilled in the art can understand that the above solutions in the embodiments of the present disclosure can not only be used to determine the risks of the enterprise to be identified, but also can be used to identify the status or risks of organizations and personnel inside or outside the enterprise, and can also be used to identify the natural environment. The biological or non-biological status or risk in the data can also be used to identify the status or risk of the subject to be identified in the form of data.

例如，本公开实施例的主体可以是电商平台中的商家，根据本公开实施例的方法判断商家的标准化状态指标，并基于该标准化状态指标用于管理商家，或用于处理针对该商家的投诉等。又如，律师可以通过本公开实施例的方法确定对某一主体的标准化状态指标，作为对该主体的评价，可用于模拟法庭的观点。再如，检察院可以应用本公开实时例的方法进行预测，用于形成预案。除此以外，在其他各种民事、行政管理，或民事、行政纠纷的处理中，人民法院、仲裁委员会、政府和政府的工作部门等都可以应用本公开实施例提供的方法产生的标准化状态指标作为参考。本公开实施例的方法还可以用于检查争议的焦点是否发生了变化。For example, the subject of this embodiment of the present disclosure may be a merchant in an e-commerce platform, and a standardized state indicator of the merchant is judged according to the method of this embodiment of the present disclosure, and based on the standardized state indicator, it is used to manage the merchant, or to process the information for the merchant. complaints, etc. For another example, a lawyer can determine a standardized state index for a certain subject by using the method of the embodiment of the present disclosure, which can be used for the point of view of a mooting court as an evaluation of the subject. For another example, the procuratorate can apply the method of the real-time example of the present disclosure to make predictions for forming a plan. In addition, in the handling of various other civil, administrative, or civil and administrative disputes, the people's courts, arbitration committees, governments, and government departments can all apply the standardized state indicators generated by the methods provided in the embodiments of the present disclosure. Reference. The method of an embodiment of the present disclosure can also be used to check whether the focus of the dispute has changed.

图13示出根据本公开实施例的状态识别模型训练装置的框图。其中，该装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。FIG. 13 shows a block diagram of a state recognition model training apparatus according to an embodiment of the present disclosure. Wherein, the apparatus may be realized by software, hardware or a combination of the two to become part or all of the electronic device.

如图13所示，所述状态识别模型训练装置1300包括第一获取模块1310、第一确定模块1320、第二确定模块1330和训练模块1340。As shown in FIG. 13 , the state recognition model training apparatus 1300 includes a first acquisition module 1310 , a first determination module 1320 , a second determination module 1330 and a training module 1340 .

第一获取模块1310，被配置为获得主体相关的特征数据，以及与所述主体相关的状态数据；a first obtaining module 1310, configured to obtain feature data related to the subject and state data related to the subject;

第一确定模块1320，被配置为根据预设的映射关系，确定与所述特征数据相匹配的特定属性值；The first determining module 1320 is configured to determine a specific attribute value matching the feature data according to a preset mapping relationship;

第二确定模块1330，被配置为基于所述与主体相关的状态数据以及所述特定属性值，确定所述主体的标准化状态指标；The second determination module 1330 is configured to determine the standardized state index of the subject based on the state data related to the subject and the specific attribute value;

训练模块1340，被配置为根据所述标准化状态指标训练所述状态识别模型。The training module 1340 is configured to train the state recognition model according to the standardized state index.

根据本公开实施例，所述映射关系至少包括特征数据与特定属性值之间的对应关系，其中，不同取值范围的特征数据所对应的特征属性值不同。According to an embodiment of the present disclosure, the mapping relationship includes at least a corresponding relationship between the feature data and a specific attribute value, wherein the feature attribute values corresponding to the feature data in different value ranges are different.

根据本公开实施例，所述基于所述与主体相关的状态数据以及所述特定属性值，确定所述主体的标准化状态指标包括：According to an embodiment of the present disclosure, the determining of the standardized state index of the subject based on the state data related to the subject and the specific attribute value includes:

根据本公开实施例，所述根据所述标准化状态指标训练所述状态识别模型包括：According to an embodiment of the present disclosure, the training of the state recognition model according to the standardized state index includes:

根据本公开实施例，所述第一向量表示包括独热特征表示，所述将所述第一向量表示由高维稀疏的向量表示转化为低维稠密的第二向量表示包括：According to an embodiment of the present disclosure, the first vector representation includes a one-hot feature representation, and the converting the first vector representation from a high-dimensional sparse vector representation to a low-dimensional dense second vector representation includes:

根据本公开实施例，该装置还可以包括展示模块，被配置为以图形化方式呈现所述标准化状态指标，从而能够更加直观地展示主体的状态。According to an embodiment of the present disclosure, the apparatus may further include a presentation module configured to present the standardized status indicator in a graphical manner, so that the status of the subject can be displayed more intuitively.

图14示出根据本公开实施例的状态识别装置的框图。其中，该装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。FIG. 14 shows a block diagram of a state recognition apparatus according to an embodiment of the present disclosure. Wherein, the apparatus may be realized by software, hardware or a combination of the two to become part or all of the electronic device.

如图14所示，该状态识别装置1400包括第二获取模块1410和识别模块1420。As shown in FIG. 14 , the state identification device 1400 includes a second acquisition module 1410 and an identification module 1420 .

第二获取模块1410，被配置为获取待识别主体的特征数据；The second obtaining module 1410 is configured to obtain characteristic data of the subject to be identified;

识别模块1420，被配置为将所述特征数据输入到如上文所述的状态识别模型，以获取所述待识别主体的标准化状态指标。The identification module 1420 is configured to input the feature data into the state identification model as described above, so as to obtain the standardized state index of the subject to be identified.

根据本公开实施例提供的技术方案，通过第二获取模块，被配置为获取待识别主体的特征数据；识别模块，被配置为将所述特征数据输入到如上文所述的状态识别模型训练方法训练的状态识别模型，以获取所述待识别主体的标准化状态指标，从而可以通过标准化状态指标更为准确地识别主体的状态。According to the technical solutions provided by the embodiments of the present disclosure, the second acquisition module is configured to acquire feature data of the subject to be recognized; the recognition module is configured to input the feature data into the state recognition model training method described above The trained state identification model is used to obtain the standardized state index of the subject to be identified, so that the state of the subject can be more accurately identified through the standardized state index.

本公开还公开了一种电子设备，图15示出根据本公开实施例的电子设备的框图。The present disclosure also discloses an electronic device, and FIG. 15 shows a block diagram of the electronic device according to an embodiment of the present disclosure.

如图15所示，所述电子设备1500包括存储器1501和处理器1502，其中，存储器1501用于存储一条或多条计算机指令，其中，所述一条或多条计算机指令被所述处理器1502执行以实现如下操作：As shown in FIG. 15 , the electronic device 1500 includes a memory 1501 and a processor 1502 , wherein the memory 1501 is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor 1502 to do the following:

根据本公开实施例，所述映射关系至少包括：特征数据与特定属性值之间的对应关系；其中，不同取值范围的特征数据所对应的特征属性值不同。According to an embodiment of the present disclosure, the mapping relationship includes at least: a correspondence relationship between the feature data and a specific attribute value; wherein the feature attribute values corresponding to the feature data in different value ranges are different.

根据本公开实施例，所述基于所述与主体相关的状态数据以及所述特定属性值，，确定所述主体的标准化状态指标包括：According to an embodiment of the present disclosure, the determining of the standardized state index of the subject based on the state data related to the subject and the specific attribute value includes:

根据本公开实施例，处理器1502还用于执行：以图形化方式呈现所述标准化状态指标。According to the embodiment of the present disclosure, the processor 1502 is further configured to perform: presenting the standardized state indicator in a graphical manner.

根据本公开实施例，所述一条或多条计算机指令被所述处理器1502执行可以实现如下操作：According to an embodiment of the present disclosure, the execution of the one or more computer instructions by the processor 1502 may implement the following operations:

将所述特征数据输入到如上文所述的状态识别模型，以获取所述待识别主体的标准化状态指标。The feature data is input into the state recognition model as described above to obtain standardized state indicators of the subject to be recognized.

如图16所示，计算机系统1600包括处理单元1601，其可以根据存储在只读存储器(ROM)1602中的程序或者从存储部分1608加载到随机访问存储器(RAM)1603中的程序而执行上述实施例中的各种处理。在RAM 1603中，还存储有系统1600操作所需的各种程序和数据。处理单元1601、ROM 1602以及RAM 1603通过总线1604彼此相连。输入/输出(I/O)接口1605也连接至总线1604。As shown in FIG. 16, a computer system 1600 includes a processing unit 1601 that can perform the above-described implementation according to a program stored in a read only memory (ROM) 1602 or a program loaded from a storage section 1608 into a random access memory (RAM) 1603 various treatments in the example. In the RAM 1603, various programs and data necessary for the operation of the system 1600 are also stored. The processing unit 1601 , the ROM 1602 , and the RAM 1603 are connected to each other through a bus 1604 . Input/output (I/O) interface 1605 is also connected to bus 1604 .

以下部件连接至I/O接口1605：包括键盘、鼠标等的输入部分1606；包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1607；包括硬盘等的存储部分1608；以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1609。通信部分1609经由诸如因特网的网络执行通信处理。驱动器1610也根据需要连接至I/O接口1605。可拆卸介质1611，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器1610上，以便于从其上读出的计算机程序根据需要被安装入存储部分1608。其中，所述处理单元1601可实现为CPU、GPU、TPU、FPGA、NPU等处理单元。The following components are connected to the I/O interface 1605: an input section 1606 including a keyboard, a mouse, etc.; an output section 1607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 1608 including a hard disk, etc. ; and a communication section 1609 including a network interface card such as a LAN card, a modem, and the like. The communication section 1609 performs communication processing via a network such as the Internet. Drivers 1610 are also connected to I/O interface 1605 as needed. A removable medium 1611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 1610 as needed so that a computer program read therefrom is installed into the storage section 1608 as needed. The processing unit 1601 may be implemented as a processing unit such as a CPU, a GPU, a TPU, an FPGA, and an NPU.

特别地，根据本公开的实施例，上文描述的方法可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括有形地包含在及其可读介质上的计算机程序，所述计算机程序包含用于执行上述方法的程序代码。在这样的实施例中，该计算机程序可以通过通信部分1609从网络上被下载和安装，和/或从可拆卸介质1611被安装。In particular, according to embodiments of the present disclosure, the methods described above may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a readable medium thereof, the computer program containing program code for performing the above-described method. In such an embodiment, the computer program may be downloaded and installed from the network through the communication portion 1609, and/or installed from the removable medium 1611.

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分，所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function(s) executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

描述于本公开实施例中所涉及到的单元或模块可以通过软件的方式实现，也可以通过可编程硬件的方式来实现。所描述的单元或模块也可以设置在处理器中，这些单元或模块的名称在某种情况下并不构成对该单元或模块本身的限定。The units or modules involved in the embodiments of the present disclosure may be implemented in a software manner, or may be implemented in a programmable hardware manner. The described units or modules may also be provided in the processor, and the names of these units or modules do not constitute a limitation on the units or modules themselves in certain circumstances.

作为另一方面，本公开还提供了一种计算机可读存储介质，该计算机可读存储介质可以是上述实施例中电子设备或计算机系统中所包含的计算机可读存储介质；也可以是单独存在，未装配入设备中的计算机可读存储介质。计算机可读存储介质存储有一个或者一个以上程序，所述程序被一个或者一个以上的处理器用来执行描述于本公开的方法。As another aspect, the present disclosure also provides a computer-readable storage medium, and the computer-readable storage medium may be a computer-readable storage medium included in the electronic device or computer system in the above-mentioned embodiments; it may also exist independently , a computer-readable storage medium that does not fit into a device. The computer-readable storage medium stores one or more programs used by one or more processors to perform the methods described in the present disclosure.

以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本公开中所涉及的发明范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离所述发明构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover the above-mentioned technical features without departing from the inventive concept. Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.

Claims

1. A training method of a state recognition model comprises the following steps:

obtaining feature data relating to a subject, and status data relating to the subject;

determining a specific attribute value matched with the characteristic data according to a preset mapping relation;

determining a standardized status indicator for the subject based on the status data related to the subject and the specific attribute value;

and training the state recognition model according to the standardized state index.

2. The method of claim 1, the mapping comprising at least: the corresponding relation between the characteristic data and the specific attribute value; and the characteristic attribute values corresponding to the characteristic data in different value ranges are different.

3. The method of claim 1, wherein the determining a normalized status indicator for the subject based on the status data related to the subject and the particular attribute value comprises:

determining an original state index based on the state data and the specific attribute value;

and mapping the original state indexes to a preset value interval according to the maximum value and the minimum value of the original state indexes of different main bodies to obtain the standardized state indexes.

4. The method of claim 1, wherein the feature data comprises a graph feature determined based on an associative relationship of the subject in a subject relationship graph.

5. The method of any of claims 4, wherein said training said state recognition model according to said normalized state metrics comprises:

determining a first vector representation of feature data of the subject;

converting the first vector representation from a high-dimensional sparse vector representation to a low-dimensional dense second vector representation;

splicing the second vector representation with a third vector representation of feature data of an adjacent main body of the main body in the main body relation graph to obtain a fourth vector representation of the feature data of the main body;

encoding the fourth vector representation as a fifth vector representation of feature data of the subject having predetermined dimensions as training data of the state recognition model;

and training the state recognition model by taking the standardized state index as a label.

6. The method of claim 5, wherein the first vector representation comprises a one-hot feature representation, the converting the first vector representation from a high-dimensional sparse vector representation to a low-dimensional dense second vector representation comprises:

converting the one-hot feature representation into a sixth vector representation through an embedding operation;

and carrying out weighted summation on the sixth vector representation to obtain the second vector representation with low dimension and density.

7. A state recognition method, comprising:

acquiring characteristic data of a main body to be identified;

inputting the characteristic data into the state recognition model according to any one of claims 1-6 to obtain a standardized state index of the subject to be recognized.

8. A method of training a risk recognition model, comprising:

acquiring characteristic data of an enterprise and the number of contract litigation cases of the enterprise as being billed, wherein the characteristic data comprises turnover;

determining the number of contracts of the enterprise according to the mapping relation between the turnover and the number of contracts;

determining a normalized risk indicator for the enterprise based on the number of contract litigation cases and the number of contracts for the enterprise;

and determining training data based on the characteristic data of the enterprise, and training a risk identification model by taking the standardized risk index as a label.

9. A state recognition model training apparatus comprising:

a first acquisition module configured to acquire feature data related to a subject, and status data related to the subject;

the first determination module is configured to determine a specific attribute value matched with the feature data according to a preset mapping relation;

a second determination module configured to determine a normalized status indicator for the subject based on the status data related to the subject and the specific attribute value;

a training module configured to train the state recognition model according to the normalized state index.

10. The apparatus of claim 9, wherein the mapping relationship comprises at least:

and the corresponding relation between the characteristic data and the specific attribute value, wherein the characteristic attribute values corresponding to the characteristic data in different value ranges are different.

11. The apparatus of claim 9, wherein the determining a normalized status indicator for the subject based on the status data related to the subject and the particular attribute value comprises:

12. The apparatus of claim 9, wherein the feature data comprises a graph feature determined based on an associative relationship of the subject in a subject relationship graph.

13. The apparatus of claim 12, wherein training the state recognition model according to the normalized state metrics comprises:

determining a first vector representation of feature data of the subject;

14. The apparatus of claim 13, wherein the first vector representation comprises a one-hot feature representation, the converting the first vector representation from a high-dimensional sparse vector representation to a low-dimensional dense second vector representation comprises:

15. A state recognition device comprising:

the second acquisition module is configured to acquire characteristic data of the subject to be identified;

an identification module configured to input the feature data into the state recognition model according to any one of claims 1 to 7 to obtain a standardized state index of the subject to be recognized.

16. An electronic device comprising a memory and a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of any of claims 1-8.

17. A readable storage medium having stored thereon computer instructions, characterized in that the computer instructions, when executed by a processor, carry out the method steps of any of claims 1 to 8.