CN117807282A

CN117807282A - Service data processing method, device, electronic equipment and readable storage medium

Info

Publication number: CN117807282A
Application number: CN202410232663.XA
Authority: CN
Inventors: 刘晨; 吴昊; 郑保卫; 李勇; 敖劲松
Original assignee: Encore Beijing Information Technology Co ltd
Current assignee: Encore Beijing Information Technology Co ltd
Priority date: 2024-03-01
Filing date: 2024-03-01
Publication date: 2024-04-02
Anticipated expiration: 2044-03-01
Also published as: CN117807282B

Abstract

This application provides a business data processing method, device, electronic equipment and readable storage medium, relating to the field of data processing. In this method, the retrieval asset data is first obtained; in response to the selection operation of the preset target marking model, the retrieval asset data is marked using the target marking model to obtain the marking result; according to the marking result, the preset target marking model is used to mark the retrieval asset data. Make the first recommendation based on the recommendation strategy and obtain relevant recommendation results, which include recommended data and tags corresponding to the recommended data; obtain a preset asset marking list, which includes multiple asset data to be marked and each to-be-marked asset data. Mark the marking tag corresponding to the asset data; obtain the matching result based on the recommended data, the tag corresponding to the recommended data, the asset data to be marked, and the marking tag; perform a second recommendation based on the matching result and the retrieved asset data, and obtain the recommended asset data , able to recommend accurate asset data to users.

Description

Business data processing methods, devices, electronic equipment and readable storage media

技术领域Technical field

本申请涉及数据处理技术领域，具体涉及一种业务数据处理方法、装置、电子设备和可读存储介质。This application relates to the field of data processing technology, specifically to a business data processing method, device, electronic equipment and readable storage medium.

背景技术Background technique

企业的资产数据是企业合法拥有或控制的数据，通常根据不同性质划分为不同类别，例如可以根据属性划分为产品、市场、订单、成本、收入、服务、渠道等不同资产数据。资产数据盘点和智能打标能够直接或间接带来经济效益，提供给企业新的发展机遇，智能推荐是企业的资产数据盘点和智能打标的一个应用场景，为用户推荐合适的资产数据，实现企业资产价值化。智能推荐通过用户输入查询数据，推荐出用户想要的推荐内容，然而用户在查询数据时，对于搜索的结果并不是很明确，通常会输入一些相关联的内容，而这些相关联内容比较模糊，无法为用户推荐准确的资产数据。The asset data of an enterprise is data that the enterprise legally owns or controls. It is usually divided into different categories according to different properties. For example, it can be divided into different asset data such as products, markets, orders, costs, revenues, services, channels, etc. according to attributes. Asset data inventory and intelligent labeling can directly or indirectly bring economic benefits and provide new development opportunities for enterprises. Intelligent recommendation is an application scenario of asset data inventory and intelligent labeling of enterprises, which recommends appropriate asset data to users and realizes the value of enterprise assets. Intelligent recommendation recommends the recommended content that users want through user input query data. However, when users query data, they are not very clear about the search results and usually enter some related content. However, these related contents are relatively vague and cannot recommend accurate asset data to users.

发明内容Contents of the invention

本申请提供了一种业务数据处理方法、装置、电子设备和可读存储介质，能够为用户推荐准确的资产数据。This application provides a business data processing method, device, electronic device and readable storage medium, which can recommend accurate asset data to users.

本申请实施例的技术方案如下：The technical solution of the embodiment of the present application is as follows:

第一方面，本申请实施例提供了一种业务数据处理方法，所述方法包括：In a first aspect, embodiments of the present application provide a business data processing method, which method includes:

获取检索资产数据；Get retrieval asset data;

响应于对预设的目标打标模型的选择操作，利用所述目标打标模型对所述检索资产数据进行打标，得到打标结果；In response to the selection operation of the preset target marking model, use the target marking model to mark the retrieved asset data to obtain a marking result;

根据所述打标结果，利用预设的推荐策略进行第一推荐，得到相关推荐结果，所述相关推荐结果包括推荐数据和推荐数据对应的标签；According to the labeling result, a first recommendation is performed using a preset recommendation strategy to obtain a relevant recommendation result, wherein the relevant recommendation result includes the recommended data and a label corresponding to the recommended data;

获取预设的资产打标清单，所述资产打标清单包括多个待打标资产数据和各个所述待打标资产数据对应的打标标签；Obtain a preset asset marking list, which includes a plurality of asset data to be marked and marking tags corresponding to each of the asset data to be marked;

根据所述推荐数据、所述推荐数据对应的标签、所述待打标资产数据和所述打标标签，得到匹配结果；Obtain a matching result based on the recommended data, the tag corresponding to the recommended data, the asset data to be marked and the marking tag;

根据所述匹配结果和所述检索资产数据进行第二推荐，得到推荐资产数据。A second recommendation is performed based on the matching result and the retrieved asset data to obtain recommended asset data.

在上述技术方案中，首先获取检索资产数据，检索资产数据为用户输入的检索内容，获取检索资产数据为后续给用户推荐准确的资产数据提供数据支持；响应于对预设的目标打标模型的选择操作，利用目标打标模型对检索资产数据进行打标，得到打标结果，先对检索资产数据进行打标，得到的打标结果指示所属类别，有利于准确推荐；根据打标结果，利用预设的推荐策略进行第一推荐，得到相关推荐结果，相关推荐结果包括推荐数据和推荐数据对应的标签，先利用推荐策略进行粗推荐，有利于提高推荐准确性；获取预设的资产打标清单，资产打标清单包括多个待打标资产数据和各个待打标资产数据对应的打标标签，为后续准确推荐提供数据支持；根据推荐数据、推荐数据对应的标签、待打标资产数据和打标标签，得到匹配结果；根据匹配结果和检索资产数据进行第二推荐，得到推荐资产数据，通过进行第二推荐，能够得到较为准确的推荐资产数据。In the above technical solution, the retrieval asset data is first obtained, and the retrieval asset data is the retrieval content input by the user. The retrieval asset data is obtained to provide data support for subsequent recommendation of accurate asset data to the user; in response to the preset target marking model Select the operation and use the target marking model to mark the retrieved asset data to obtain the marking result. First, mark the retrieved asset data. The obtained marking result indicates the category to which it belongs, which is conducive to accurate recommendation; according to the marking result, use The preset recommendation strategy is used for the first recommendation, and relevant recommendation results are obtained. The relevant recommendation results include recommended data and labels corresponding to the recommended data. First, the recommendation strategy is used to make rough recommendations, which is beneficial to improving the accuracy of the recommendation; the preset asset marking is obtained. List, the asset marking list includes multiple asset data to be marked and marking tags corresponding to each asset data to be marked, providing data support for subsequent accurate recommendations; based on the recommended data, the tags corresponding to the recommended data, and the asset data to be marked and marking tags to obtain matching results; perform a second recommendation based on the matching results and retrieved asset data to obtain recommended asset data. By performing the second recommendation, more accurate recommended asset data can be obtained.

在本申请的一些实施例中，在所述获取预设的资产打标清单之前，所述方法还包括：In some embodiments of the present application, before obtaining the preset asset marking list, the method further includes:

获取多个待打标资产数据，对各个所述待打标资产数据进行预处理，得到多个预处理资产数据；Obtain multiple asset data to be marked, preprocess each of the asset data to be marked, and obtain multiple preprocessed asset data;

利用所述目标打标模型对各个所述预处理资产数据进行打标，得到打标标签；Use the target marking model to mark each of the preprocessed asset data to obtain a marking label;

根据各个所述待打标资产数据和各个所述待打标资产数据对应的打标标签，得到所述资产打标清单。The asset marking list is obtained according to each of the asset data to be marked and the marking tag corresponding to each of the asset data to be marked.

在上述技术方案中，通过对待打标资产数据进行预处理，即进行数据筛选和数据清洗，不仅可以减少计算量，还可以提高打标准确性。然后利用目标打标模型对预处理资产数据进行打标，得到打标标签，将待打标资产数据和打标标签构成资产打标清单，有利于后续根据资产打标清单进行匹配，以推荐准确的资产数据。In the above technical solution, by preprocessing the asset data to be marked, that is, performing data screening and data cleaning, it can not only reduce the amount of calculation, but also improve the marking accuracy. Then use the target marking model to mark the pre-processed asset data to obtain marking labels. The asset data to be marked and the marking labels form an asset marking list, which is conducive to subsequent matching based on the asset marking list to recommend accurate asset data.

在本申请的一些实施例中，在所述利用所述目标打标模型对各个所述预处理资产数据进行打标，得到打标标签之前，所述方法还包括：In some embodiments of the present application, before labeling each of the pre-processed asset data using the target labeling model to obtain a labeling label, the method further includes:

获取资产样本数据集，所述资产样本数据集包括多个资产样本数据和各个所述资产样本数据对应的打标标签；Obtain an asset sample data set, where the asset sample data set includes multiple asset sample data and marking labels corresponding to each of the asset sample data;

对各个所述资产样本数据进行预处理，得到多个预处理样本数据；Preprocess each of the asset sample data to obtain multiple preprocessed sample data;

对各个所述预处理样本数据进行第一样本均衡处理，得到训练数据集；Performing a first sample balancing process on each of the preprocessed sample data to obtain a training data set;

对各个所述预处理样本数据进行第二样本均衡处理，得到验证数据集；Perform a second sample equalization process on each of the preprocessed sample data to obtain a verification data set;

利用所述训练数据集对预设的第一初始打标模型进行调参，得到第二初始打标模型；Use the training data set to adjust parameters of the preset first initial marking model to obtain a second initial marking model;

利用所述验证数据集对所述第二初始打标模型进行验证，在准确率大于预设的准确率阈值的情况下，得到所述目标打标模型。The second initial marking model is verified using the verification data set, and when the accuracy is greater than the preset accuracy threshold, the target marking model is obtained.

在上述技术方案中，通过资产样本数据集，并对资产样本数据进行预处理，即进行数据筛选和数据清洗，能够提高模型的泛化性。再进行样本均衡处理，得到训练集和验证集，利用训练集进行调参，以及利用验证集进行验证，得到目标打标模型，具有较好的泛化性和鲁棒性。In the above technical solution, the generalization of the model can be improved by preprocessing the asset sample data set, that is, performing data screening and data cleaning. Then perform sample balancing processing to obtain a training set and a verification set, use the training set to adjust parameters, and use the verification set for verification to obtain a target marking model, which has good generalization and robustness.

在本申请的一些实施例中，所述对各个所述预处理样本数据进行第一样本均衡处理，得到训练数据集，包括：In some embodiments of the present application, the first sample equalization process is performed on each of the preprocessed sample data to obtain a training data set, including:

按照类别、字段中文名和字段英文名，对所述预处理样本数据进行去重处理，得到第一去重样本数据；Perform deduplication processing on the preprocessed sample data according to categories, field Chinese names and field English names to obtain the first deduplicated sample data;

从所述第一去重样本数据中选取数量排列在前的N个类别，得到选取样本数据，其中，N为大于或者等于2的整数；Select the top N categories from the first deduplication sample data to obtain selected sample data, where N is an integer greater than or equal to 2;

按照类别、字段中文名，对所述选取样本数据进行去重，得到第二去重样本数据；Deduplicate the selected sample data according to categories and field Chinese names to obtain the second deduplicated sample data;

去掉第二去重样本数据中数量不超过预设数量的类别，得到所述训练数据集；Remove the categories whose number does not exceed the preset number in the second deduplication sample data to obtain the training data set;

所述对各个所述预处理样本数据进行第二样本均衡处理，得到验证数据集，包括：Performing second sample equalization processing on each of the preprocessed sample data to obtain a verification data set, including:

去掉所述预处理样本数据中数量不超过预设数量的类别，得到所述验证数据集。Categories whose number does not exceed a preset number in the preprocessed sample data are removed to obtain the verification data set.

在上述技术方案中，先按照类别、字段中文名和字段英文名进行去重，再按照类别、字段中文名进行去重，实现有效去重，避免具有重复记录。通过去掉数据中数量不超过预设数量的类别，减少无效数据对模型准确性的影响。In the above technical solution, deduplication is first carried out according to categories, field Chinese names and field English names, and then deduplication is carried out according to categories and field Chinese names to achieve effective deduplication and avoid duplicate records. Reduce the impact of invalid data on model accuracy by removing categories that do not exceed a preset number in the data.

在本申请的一些实施例中，在所述利用所述目标打标模型对所述检索资产数据进行打标，得到打标结果之前，所述方法还包括：In some embodiments of the present application, before labeling the retrieved asset data using the target labeling model to obtain a labeling result, the method further includes:

对所述检索资产数据进行向量处理，得到资产数据向量；Perform vector processing on the retrieved asset data to obtain an asset data vector;

对所述检索资产数据进行数据升维处理，得到多维度数据向量；Perform data dimension upgrading processing on the retrieved asset data to obtain a multi-dimensional data vector;

将所述资产数据向量和所述多维度数据向量进行融合处理后输入所述目标打标模型。The asset data vector and the multi-dimensional data vector are fused and then input into the target marking model.

在上述技术方案中，通过进行数据升维能够增加检索资产数据的关联数据，以明确推荐目标，再进行数据融合，增加数据的表征能力，有利于提高模型输出的准确性。In the above technical solution, by performing data dimensionality upgrading, the associated data of the retrieved asset data can be increased to clarify the recommendation target, and then data fusion is performed to increase the data representation ability, which is conducive to improving the accuracy of the model output.

在本申请的一些实施例中，所述根据所述推荐数据、所述推荐数据对应的标签、所述待打标资产数据和所述打标标签，得到匹配结果，包括：In some embodiments of the present application, obtaining a matching result based on the recommended data, the tag corresponding to the recommended data, the asset data to be marked, and the marking tag includes:

将所述推荐数据对应的标签与所述打标标签进行类别划分，以及将所述推荐数据与待打标资产数据进行相似度计算，筛选出标签类别一致且相似度小于预设的阈值的数据，得到所述匹配结果。Classify the tags corresponding to the recommended data and the marked tags into categories, calculate the similarity between the recommended data and the asset data to be marked, and filter out data whose tag categories are consistent and whose similarity is less than a preset threshold. , to obtain the matching result.

在上述技术方案中，根据类别和相似度进行准确推荐，得到匹配结果，有利于后续根据匹配结果为用户推荐较为准确的资产数据。In the above technical solution, accurate recommendations are made based on categories and similarities, and matching results are obtained, which is beneficial to subsequent recommendations of more accurate asset data for users based on the matching results.

在本申请的一些实施例中，所述推荐策略为数量限制、匹配度限制、组合限制中的至少一种。In some embodiments of the present application, the recommendation strategy is at least one of quantity limitation, matching degree limitation, and combination limitation.

在上述技术方案中，数量限制、匹配度限制、组合限制具有不同的优势，用户根据需求可以选择相应策略推荐，使得推荐的内容符合用户需求。In the above technical solution, quantity limitation, matching degree limitation, and combination limitation have different advantages. Users can choose corresponding strategies for recommendation according to their needs, so that the recommended content meets the user's needs.

第二方面，本申请实施例提供了一种业务数据处理装置，所述装置包括：In a second aspect, an embodiment of the present application provides a service data processing device, the device comprising:

第一数据获取模块，用于获取检索资产数据；The first data acquisition module is used to obtain and retrieve asset data;

打标模块，用于响应于对预设的目标打标模型的选择操作，利用所述目标打标模型对所述检索资产数据进行打标，得到打标结果；A marking module, configured to respond to a selection operation of a preset target marking model, use the target marking model to mark the retrieved asset data, and obtain a marking result;

第一推荐模块，用于根据所述打标结果，利用预设的推荐策略进行第一推荐，得到相关推荐结果，所述相关推荐结果包括推荐数据和推荐数据对应的标签；A first recommendation module, configured to make a first recommendation based on the labeling result and using a preset recommendation strategy to obtain a relevant recommendation result, wherein the relevant recommendation result includes the recommended data and a label corresponding to the recommended data;

第二数据获取模块，用于获取预设的资产打标清单，所述资产打标清单包括多个待打标资产数据和各个所述待打标资产数据对应的打标标签；The second data acquisition module is used to obtain a preset asset marking list. The asset marking list includes a plurality of asset data to be marked and marking tags corresponding to each of the asset data to be marked;

匹配模块，用于根据所述推荐数据、所述推荐数据对应的标签、所述待打标资产数据和所述打标标签，得到匹配结果；A matching module, configured to obtain a matching result based on the recommended data, the tag corresponding to the recommended data, the asset data to be marked, and the marking tag;

第二推荐模块，用于根据所述匹配结果和所述检索资产数据进行第二推荐，得到推荐资产数据。The second recommendation module is used to make a second recommendation based on the matching result and the retrieved asset data to obtain recommended asset data.

第三方面，本申请实施例提供了一种电子设备，包括处理器、存储器、用户接口及网络接口，所述存储器用于存储指令，所述用户接口和网络接口用于给其他设备通信，所述处理器用于执行所述存储器中存储的指令，以使所述电子设备执行上述第一方面提供的任意一项所述的方法。In a third aspect, embodiments of the present application provide an electronic device, including a processor, a memory, a user interface, and a network interface. The memory is used to store instructions, and the user interface and network interface are used to communicate with other devices. The processor is configured to execute instructions stored in the memory, so that the electronic device executes any one of the methods provided in the first aspect.

第四方面，本申请实施例提供了一种计算机可读存储介质，所述计算机可读存储介质存储有指令，当所述指令被执行时，执行上述第一方面提供的任意一项所述的方法。In a fourth aspect, embodiments of the present application provide a computer-readable storage medium that stores instructions. When the instructions are executed, any one of the methods provided in the first aspect is executed. method.

综上所述，本申请实施例中提供的一个或多个技术方案，至少具有如下技术效果或优点：To sum up, one or more technical solutions provided in the embodiments of this application have at least the following technical effects or advantages:

1、由于采用了获取检索资产数据，并利用目标打标模型对检索资产数据进行打标，根据打标结果进行第一推荐，得到相关推荐结果，从而筛选出与检索资产数据相似的资产数据，再根据相关推荐结果和资产打标清单，得到匹配结果，根据匹配结果和检索资产数据进行第二推荐，得到推荐资产数据的技术手段，所以，有效解决了相关技术中输入的关联内容较为模糊而导致无法为用户推荐准确的资产数据的问题。通过目标打标模型对检索资产数据进行打标，根据打标结果进行第一推荐，筛选出与检索资产数据相似的资产数据，再根据资产打标清单进行第二推荐，通过多层推荐能够为用户推荐准确的资产数据。1. Since the retrieved asset data is obtained and the target marking model is used to mark the retrieved asset data, the first recommendation is made based on the marking results and relevant recommendation results are obtained, thereby filtering out asset data similar to the retrieved asset data. Then, based on the relevant recommendation results and the asset marking list, the matching results are obtained, and the second recommendation is made based on the matching results and the retrieved asset data, and the technical means of recommending asset data are obtained. Therefore, it effectively solves the problem that the related content input in the related technology is relatively vague and unclear. This leads to the inability to recommend accurate asset data to users. The retrieved asset data is marked through the target marking model, the first recommendation is made based on the marking results, the asset data similar to the retrieved asset data is screened out, and the second recommendation is made based on the asset marking list. Multi-layer recommendation can provide Users recommend accurate asset data.

2、采用数据预处理和样本均衡对数据进行处理，能够提高模型的泛化性和鲁棒性。2. Using data preprocessing and sample equalization to process data can improve the generalization and robustness of the model.

3、对数据进行向量处理和数据升维处理，再输入模型中，能够提高训练模型的准确性。3. Perform vector processing and data dimensionality processing on the data, and then input it into the model, which can improve the accuracy of the training model.

附图说明Description of drawings

图1是本申请一个实施例提供的业务数据处理方法的流程示意图之一；Figure 1 is one of the flow diagrams of a business data processing method provided by an embodiment of the present application;

图2是本申请一个实施例提供的业务数据处理方法的流程示意图之二；Figure 2 is a second schematic flowchart of a business data processing method provided by an embodiment of the present application;

图3是本申请一个实施例提供的业务数据处理方法的流程示意图之三；Figure 3 is the third schematic flowchart of a business data processing method provided by an embodiment of the present application;

图4是本申请一个实施例提供的业务数据处理方法的流程示意图之四；Figure 4 is the fourth schematic flowchart of a business data processing method provided by an embodiment of the present application;

图5是本申请一个实施例提供的业务数据处理方法的标签划分示意图；Figure 5 is a schematic diagram of label division of a business data processing method provided by an embodiment of the present application;

图6是图4中步骤S930的一个子步骤流程示意图；Figure 6 is a schematic flowchart of a sub-step of step S930 in Figure 4;

图7是本申请一个实施例提供的业务数据处理方法的样本均衡示意图；Figure 7 is a schematic diagram of sample equalization of a business data processing method provided by an embodiment of the present application;

图8是本申请一个实施例提供的业务数据处理方法的整体流程示意图；FIG8 is a schematic diagram of the overall flow of a business data processing method provided by an embodiment of the present application;

图9是本申请一个实施例提供的业务数据处理装置的结构示意图；Figure 9 is a schematic structural diagram of a business data processing device provided by an embodiment of the present application;

图10是本申请一个实施例提供的电子设备的结构示意图。Figure 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

为了使本领域的技术人员更好地理解本说明书中的技术方案，下面将结合本说明书实施例中的附图，对本说明书实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本申请一部分实施例，而不是全部的实施例。In order to enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of this specification. Obviously, the described The embodiments are only some of the embodiments of this application, not all of them.

在本申请实施例的描述中，“例如”或者“举例来说”等词用于表示作例子、例证或说明。本申请实施例中被描述为“例如”或者“举例来说”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言，使用“例如”或者“举例来说”等词旨在以具体方式呈现相关概念。In the description of the embodiments of this application, words such as "for example" or "for example" are used to represent examples, illustrations or illustrations. Any embodiment or design described as "such as" or "for example" in the embodiments of the present application shall not be construed as being preferred or advantageous over other embodiments or designs. Rather, the use of words such as "for example" or "for example" is intended to present the concept in a concrete manner.

在本申请实施例的描述中，术语“多个”的含义是指两个或两个以上。例如，多个系统是指两个或两个以上的系统，多个屏幕终端是指两个或两个以上的屏幕终端。此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”，除非是以其他方式另外特别强调。In the description of the embodiments of this application, the term “plurality” means two or more. For example, multiple systems refer to two or more systems, and multiple screen terminals refer to two or more screen terminals. In addition, the terms "first" and "second" are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include one or more of these features. The terms “including,” “includes,” “having,” and variations thereof all mean “including but not limited to,” unless otherwise specifically emphasized.

本申请实施例提供一种业务数据处理方法、装置、电子设备和可读存储介质，该业务数据处理方法首先获取检索资产数据，检索资产数据为用户输入的检索内容，获取检索资产数据为后续给用户推荐准确的资产数据提供数据支持；响应于对预设的目标打标模型的选择操作，利用目标打标模型对检索资产数据进行打标，得到打标结果，先对检索资产数据进行打标，得到的打标结果指示所属类别，有利于准确推荐；根据打标结果，利用预设的推荐策略进行第一推荐，得到相关推荐结果，相关推荐结果包括推荐数据和推荐数据对应的标签，先利用推荐策略进行粗推荐，有利于提高推荐准确性；获取预设的资产打标清单，资产打标清单包括多个待打标资产数据和各个待打标资产数据对应的打标标签，为后续准确推荐提供数据支持；根据推荐数据、推荐数据对应的标签、待打标资产数据和打标标签，得到匹配结果；根据匹配结果和检索资产数据进行第二推荐，得到推荐资产数据，通过进行第二推荐，能够得到较为准确的推荐资产数据。与现有技术中输入的关联内容较为模糊而导致无法为用户推荐准确的资产数据的相比，本申请实施例通过目标打标模型对检索资产数据进行打标，根据打标结果进行第一推荐，筛选出与检索资产数据相似的资产数据，再根据资产打标清单进行第二推荐，通过多层推荐能够为用户推荐准确的资产数据。Embodiments of the present application provide a business data processing method, device, electronic device and readable storage medium. The business data processing method first obtains retrieval asset data. The retrieval asset data is the retrieval content input by the user, and the retrieval asset data is obtained for subsequent processing. Users recommend accurate asset data to provide data support; in response to the selection operation of the preset target marking model, use the target marking model to mark the retrieved asset data, and obtain the marking results. First, mark the retrieved asset data. , the obtained labeling result indicates the category to which it belongs, which is conducive to accurate recommendation; according to the labeling result, the preset recommendation strategy is used to make the first recommendation, and relevant recommendation results are obtained. The relevant recommendation results include recommended data and labels corresponding to the recommended data. First Using the recommendation strategy to make rough recommendations is helpful to improve the accuracy of recommendations; obtain a preset asset marking list, which includes multiple asset data to be marked and the marking tags corresponding to each asset data to be marked, which provides the basis for subsequent Provide data support for accurate recommendations; obtain matching results based on recommended data, tags corresponding to recommended data, asset data to be marked, and marking tags; make a second recommendation based on matching results and retrieved asset data, obtain recommended asset data, and perform the third Second recommendation, more accurate recommended asset data can be obtained. Compared with the prior art in which the input related content is relatively vague and cannot recommend accurate asset data to users, the embodiment of the present application marks the retrieved asset data through a target marking model, and makes the first recommendation based on the marking results. , filter out asset data similar to the retrieved asset data, and then make a second recommendation based on the asset marking list. Through multi-layer recommendation, accurate asset data can be recommended to users.

需要说明的是，该业务数据处理方法应用于银行领域、商业资产推荐等领域，还可以应用于其他资产推荐领域。首先通过资产盘点打标管理，将整理后的资产打标清单用于智能推荐，能够为用户推荐准确的资产数据。It should be noted that this business data processing method is applied in the banking field, commercial asset recommendation and other fields, and can also be applied to other asset recommendation fields. First, through asset inventory and marking management, the compiled asset marking list is used for intelligent recommendations, which can recommend accurate asset data to users.

下面结合附图，以银行业为例对本申请实施例提供的技术方案作出进一步说明。The technical solutions provided by the embodiments of this application will be further described below with reference to the accompanying drawings, taking the banking industry as an example.

参照图1，图1是本申请实施例提供的业务数据处理方法的流程示意图。业务数据处理方法应用于业务数据处理装置，通过电子设备或者可读存储介质中的处理器执行业务数据处理方法，该业务数据处理方法包括步骤S100、步骤S200、步骤S300、步骤S400、步骤S500和步骤S600。Referring to Figure 1, Figure 1 is a flow chart of a business data processing method provided in an embodiment of the present application. The business data processing method is applied to a business data processing device, and the business data processing method is executed by a processor in an electronic device or a readable storage medium. The business data processing method includes steps S100, S200, S300, S400, S500, and S600.

步骤S100，获取检索资产数据。Step S100: Obtain retrieval asset data.

在一实施例中，检索资产数据为用户想要获得的数据的简要概括或者具体描述，检索资产数据可以包括贷款政策搜索、银行业务的理财搜索、存款政策搜索等，用户可以通过在预设的网页或者APP中输入检索资产数据，通过预设的接口获取检索资产数据，有利于后续根据检索资产数据进行推荐，提供数据支持。其中，预设的接口包括input()函数。In one embodiment, retrieving asset data is a brief summary or specific description of the data that the user wants to obtain. Retrieving asset data may include loan policy search, banking financial management search, deposit policy search, etc. The user can search through the preset Enter the retrieval asset data on the web page or APP, and obtain the retrieval asset data through the preset interface, which is conducive to subsequent recommendations based on the retrieval asset data and provides data support. Among them, the default interface includes the input() function.

步骤S200，响应于对预设的目标打标模型的选择操作，利用目标打标模型对检索资产数据进行打标，得到打标结果。Step S200, in response to the selection operation of the preset target labeling model, the retrieved asset data is labeled using the target labeling model to obtain a labeling result.

在一实施例中，用户通过在网页上选择目标打标模型，选择不同的模型推荐方式不同，选择的目标打标模型能够使得推荐数据更准确。响应于对预设的目标打标模型的选择操作，利用目标打标模型对检索资产数据进行打标，得到打标结果。打标结果是对数据按照属性或者类别进行划分后，得到的具体小类，类别划分可以按照行业划分标准。由于目标打标模型是预先训练好的，采用目标打标模型能够准确标注出检索资产数据的类别，有利于后续根据打标结果进行推荐。其中目标打标模型可以为bert模型，也可以为bert模型的变体，还可以为其他深度学习模型，能够满足准确的打标需求即可，这里不作赘述。In one embodiment, the user selects a target labeling model on a web page. Different models are recommended in different ways. The selected target labeling model can make the recommended data more accurate. In response to the selection operation of the preset target labeling model, the retrieved asset data is labeled using the target labeling model to obtain a labeling result. The labeling result is a specific subcategory obtained after the data is divided according to attributes or categories. The category division can be based on industry classification standards. Since the target labeling model is pre-trained, the target labeling model can accurately mark the category of the retrieved asset data, which is conducive to subsequent recommendations based on the labeling results. The target labeling model can be a BERT model, a variant of the BERT model, or other deep learning models, as long as it can meet the accurate labeling requirements, which will not be elaborated here.

如图2所示，在利用目标打标模型对检索资产数据进行打标，得到打标结果之前，业务数据处理方法还包括但不限于有以下步骤：As shown in FIG2 , before the target labeling model is used to label the retrieved asset data and the labeling result is obtained, the business data processing method further includes but is not limited to the following steps:

步骤S710，对检索资产数据进行向量处理，得到资产数据向量。Step S710: Perform vector processing on the retrieved asset data to obtain an asset data vector.

在一实施例中，在检索资产数据输入目标打标模型之前，首先对检索资产数据进行向量处理，具体为：利用预设的数据表征算法对检索资产数据进行向量化，得到资产数据向量。通过数据向量化，将检索资产数据转换为目标打标模型能够处理的数据类型，有利于提高计算效率。In one embodiment, before the retrieved asset data is input into the target marking model, vector processing is first performed on the retrieved asset data. Specifically, the retrieved asset data is vectorized using a preset data representation algorithm to obtain an asset data vector. Through data vectorization, the retrieved asset data is converted into a data type that can be processed by the target marking model, which is beneficial to improving computing efficiency.

步骤S720，对检索资产数据进行数据升维处理，得到多维度数据向量。Step S720, perform data dimensionality upgrading processing on the retrieved asset data to obtain a multi-dimensional data vector.

在一实施例中，数据升维为针对该数据拓展相关数据，如图8所示，对用户数据进行数据升维可以拓展为角色、部门、关注业务、历史检索记录、管辖系统等，可以根据预先建立好的知识图谱对检索资产数据进行数据升维处理，得到多维度数据向量。通过进行数据升维处理能够提高检索资产数据的搜索范围，有利于提高推荐准确性。其中知识图谱能够反映各个数据的关联关系，从而利用知识图谱进行快速的数据升维，该知识图谱为根据具体业务场景事先构建好的，其构建过程与其他知识图谱构建类似，这里不作赘述。In one embodiment, data dimension upgrading is to expand relevant data for the data. As shown in Figure 8, data dimension upgrading for user data can be expanded into roles, departments, focused businesses, historical retrieval records, jurisdictional systems, etc., and can be based on The pre-established knowledge graph performs data dimensionality processing on the retrieved asset data to obtain multi-dimensional data vectors. The search scope of retrieving asset data can be increased by performing data dimensionality processing, which is beneficial to improving the accuracy of recommendations. The knowledge graph can reflect the correlation between each data, so that the knowledge graph can be used to quickly increase the data dimension. The knowledge graph is pre-constructed according to specific business scenarios. Its construction process is similar to other knowledge graph construction, so I will not go into details here.

步骤S730，将资产数据向量和多维度数据向量进行融合处理后输入目标打标模型。Step S730: The asset data vector and the multi-dimensional data vector are fused and input into the target labeling model.

在一实施例中，将资产数据向量和多维度数据向量进行融合，可以采用资产数据向量和多维度数据向量进行特征连接，然后输入目标打标模型中，目标打标模型进行打标，能够输出较为准确的打标类别，从而提高推荐的准确性。In one embodiment, the asset data vector and the multi-dimensional data vector are fused. The asset data vector and the multi-dimensional data vector can be used for feature connection, and then input into the target marking model. The target marking model performs marking and can output More accurate marking categories, thereby improving the accuracy of recommendations.

步骤S300，根据打标结果，利用预设的推荐策略进行第一推荐，得到相关推荐结果，相关推荐结果包括推荐数据和推荐数据对应的标签。Step S300, based on the labeling result, a first recommendation is performed using a preset recommendation strategy to obtain a relevant recommendation result, which includes the recommended data and a label corresponding to the recommended data.

在一实施例中，第一推荐为根据检索资产数据进行的一次粗略推荐，先将与打标标签不同类别的数据去除，仅推荐与检索资产数据相关的内容。根据步骤S200得到的打标结果，利用预设的推荐策略进行第一推荐，得到相关推荐结果，有利于后续根据相关推荐结果进行推进，从而为用户推荐准确的资产数据。该相关推荐结果包括推荐数据和推荐数据对应的标签，推荐数据对应的标签与打标结果属于相同的类别，推荐数据与检索资产数据具有较强的相关性，通过匹配度进行表示，匹配度越高表示相关性越强。In one embodiment, the first recommendation is a rough recommendation based on the retrieved asset data. Data of different categories from the marked tags are first removed, and only content related to the retrieved asset data is recommended. According to the marking results obtained in step S200, the preset recommendation strategy is used to make the first recommendation, and relevant recommendation results are obtained, which is conducive to subsequent advancement based on the relevant recommendation results, thereby recommending accurate asset data to the user. The relevant recommendation results include recommended data and tags corresponding to the recommended data. The tags corresponding to the recommended data and the marking results belong to the same category. The recommended data has a strong correlation with the retrieved asset data, which is represented by the matching degree. The higher the matching degree, the higher the correlation. High means stronger correlation.

在另一实施例中，推荐策略为数量限制、匹配度限制、组合限制中的至少一种。在推荐策略为数量限制的情况下，推荐匹配度排列前a条数据，该推荐策略推荐速度较快。其中，a为大于0且小于100的整数，a可以5条等，可以根据情况进行调整，这里不作赘述。在推荐策略为匹配度限制的情况下，推荐匹配度大于预设的关联阈值的数据，该推荐策略推荐速度较快，推荐准确性较高。其中，关联阈值可以为70%等，可以根据情况进行调整，这里不作赘述。在推荐策略为组合限制的情况下，推荐匹配度大于预设的关联阈值，且匹配度排列前a条数据，该推荐策略推荐速度较慢，推荐准确性最高。示例性地，默认匹配度排列的a条数据为10条，关联阈值为70%，则不同策略为：数量限制为匹配度排列前5条，匹配度限制为匹配度大于70%，组合限制匹配度≥70%且匹配度最高的前5条。In another embodiment, the recommendation strategy is at least one of quantity limitation, matching degree limitation, and combination limitation. When the recommendation strategy is limited in quantity, the recommendation matching degree ranks the top A data, and the recommendation strategy recommends faster. Among them, a is an integer greater than 0 and less than 100, a can be 5, etc., and can be adjusted according to the situation, which will not be described here. When the recommendation strategy is limited by matching degree, data whose matching degree is greater than the preset correlation threshold is recommended. This recommendation strategy has faster recommendation speed and higher recommendation accuracy. Among them, the correlation threshold can be 70%, etc., and can be adjusted according to the situation, which will not be described here. When the recommendation strategy is a combination restriction, the recommendation matching degree is greater than the preset association threshold, and the matching degree ranks the top A data, the recommendation speed of this recommendation strategy is slow, and the recommendation accuracy is the highest. For example, the default matching degree arrangement is 10 pieces of data, and the correlation threshold is 70%. The different strategies are: the quantity is limited to the top 5 items in the matching degree ranking, the matching degree is limited to a matching degree greater than 70%, and the combination limit matches The top 5 items with a degree of ≥70% and the highest matching degree.

步骤S400，获取预设的资产打标清单，资产打标清单包括多个待打标资产数据和各个待打标资产数据对应的打标标签。Step S400: Obtain a preset asset marking list. The asset marking list includes multiple asset data to be marked and marking tags corresponding to each asset data to be marked.

在一实施例中，预设的资产打标清单是通过对多个待打标资产数据进行打标，得到打标标签，每一个待打标资产数据和各个待打标资产数据对应的打标标签形成资产打标清单的一条记录。通过预设的数据获取接口获取预设的资产打标清单，为后续推荐较为准确的资产数据提供数据准备。其中预设的数据获取接口可以为read()函数。In one embodiment, the preset asset marking list is obtained by marking multiple asset data to be marked, and each asset data to be marked and the marking tag corresponding to each asset data to be marked are A tag forms a record in the asset tagging list. Obtain the preset asset marking list through the preset data acquisition interface to provide data preparation for subsequent recommendations of more accurate asset data. The preset data acquisition interface can be the read() function.

如图3所示，在获取预设的资产打标清单之前，业务数据处理方法还包括但不限于有以下步骤：As shown in Figure 3, before obtaining the preset asset marking list, the business data processing method also includes but is not limited to the following steps:

步骤S810，获取多个待打标资产数据，对各个待打标资产数据进行预处理，得到多个预处理资产数据。Step S810, obtaining a plurality of asset data to be marked, preprocessing each asset data to be marked, and obtaining a plurality of preprocessed asset data.

在一实施例中，待打标资产数据包括产品、市场、订单、成本、收入、服务、渠道等不同数据资产，可以通过爬虫技术获取多个待打标资产数据，也可以通过对银行业务的历史数据进行整理获得。然后将多个待打标资产数据整理保存为scv、txt或者其他表格形式，利用数据读取接口获取多个待打标资产数据，为后续进行打标得到资产打标清单提供数据支持。In one embodiment, the asset data to be marked includes different data assets such as products, markets, orders, costs, revenue, services, channels, etc. Multiple asset data to be marked can be obtained through crawler technology, or through the analysis of banking services. Historical data is compiled and obtained. Then organize and save multiple asset data to be marked into scv, txt or other table formats, use the data reading interface to obtain multiple asset data to be marked, and provide data support for subsequent marking to obtain the asset marking list.

在一实施例中，对各个待打标资产数据进行预处理，预处理包括数据清洗和数据的向量化。数据清洗对多个待打标资产数据进行数据筛选、去除杂乱数据等，数据筛选为选取待打标资产数据对打标分类有影响的数据列，例如字段中文名、字段英文名、表中文名、表英文名、字段类型和字段长度等，将重要字段数据筛选出来，能够在保证准确度的情况下，减少计算量。去除杂乱数据可以为去除空行、乱码、备注和重复数据等。其中，去除重复数据可以按照类别、字段中文名和字段英文名等字段去除重复记录行，相同数据中只保留一行记录。通过进行数据预处理得到多个预处理资产数据，能够减少计算量，提高打标精度。In one embodiment, each asset data to be marked is preprocessed, and the preprocessing includes data cleaning and data vectorization. Data cleaning performs data screening and removal of messy data on multiple asset data to be marked. Data screening is to select data columns that affect the marking classification of the asset data to be marked, such as the Chinese name of the field, the English name of the field, the Chinese name of the table, the English name of the table, the field type and the field length, etc., and filter out important field data, which can reduce the amount of calculation while ensuring accuracy. Removing messy data can be removing blank lines, garbled characters, notes and duplicate data. Among them, removing duplicate data can remove duplicate record rows according to fields such as category, field Chinese name and field English name, and only retain one row of records in the same data. By performing data preprocessing to obtain multiple preprocessed asset data, the amount of calculation can be reduced and the marking accuracy can be improved.

步骤S820，利用目标打标模型对各个预处理资产数据进行打标，得到打标标签。Step S820: Mark each preprocessed asset data using the target marking model to obtain marking labels.

在一实施例中，目标打标模型为训练好的模型，利用目标打标模型对步骤S810得到的预处理资产数据进行打标，直接将预处理资产数据输入目标打标模型，目标打标模型输出打标标签，有利于后续构建得到资产打标清单。In one embodiment, the target marking model is a trained model. The target marking model is used to mark the preprocessed asset data obtained in step S810. The preprocessed asset data is directly input into the target marking model. The target marking model Outputting the marking tag will facilitate subsequent construction of the asset marking list.

步骤S830，根据各个待打标资产数据和各个待打标资产数据对应的打标标签，得到资产打标清单。Step S830: Obtain an asset marking list based on each asset data to be marked and the marking tag corresponding to each asset data to be marked.

在一实施例中，将每一个待打标资产数据和该待打标资产数据对应的打标标签作为一个记录，多个待打标资产数据及其对应的打标标签构成多条记录，多条记录形成资产打标清单。由于多个待打标资产数据为行业不同属性的数据，资产打标清单包括较为丰富的资产数据及其打标标签，有利于后续利用资产打标清单与相关推荐结果匹配，为用户推荐准确的资产数据。In one embodiment, each asset data to be marked and the marking tag corresponding to the asset data to be marked are regarded as one record, and multiple asset data to be marked and their corresponding marking tags constitute multiple records, and multiple records are formed. Records form an asset marking list. Since the multiple asset data to be marked are data with different attributes in the industry, the asset marking list includes relatively rich asset data and its marking tags, which is conducive to subsequent use of the asset marking list to match relevant recommendation results and recommend accurate information to users. Asset data.

如图4所示，在利用目标打标模型对各个预处理资产数据进行打标，得到打标标签之前，业务数据处理方法还包括但不限于有以下步骤：As shown in FIG4 , before using the target labeling model to label each pre-processed asset data to obtain a labeling label, the business data processing method further includes but is not limited to the following steps:

步骤S910，获取资产样本数据集，资产样本数据集包括多个资产样本数据和各个资产样本数据对应的打标标签。Step S910: Obtain an asset sample data set. The asset sample data set includes multiple asset sample data and marking labels corresponding to each asset sample data.

在一实施例中，资产样本数据集包括多个资产样本数据，多个资产样本数据包括产品、市场、订单、成本、收入、服务、渠道等不同数据资产，可以通过爬虫技术获取多个资产样本数据，然后将多个资产样本数据整理保存为scv、txt或者其他表格形式。资产样本数据集还包括各个资产样本数据对应的打标标签，该打标标签为通过人工对多个资产样本数据进行标注得到，与各个资产样本数据相对应存储于一个文件当中。利用数据读取接口获取资产样本数据集，为后续进行模型训练提供数据支持。In one embodiment, the asset sample data set includes multiple asset sample data. The multiple asset sample data includes different data assets such as products, markets, orders, costs, revenue, services, channels, etc. Multiple asset samples can be obtained through crawler technology. data, and then organize and save multiple asset sample data into scv, txt or other table formats. The asset sample data set also includes marking labels corresponding to each asset sample data. The marking labels are obtained by manually labeling multiple asset sample data and are stored in a file corresponding to each asset sample data. Use the data reading interface to obtain asset sample data sets to provide data support for subsequent model training.

需要说明的是，对资产样本数据进行人工标注，标注规则是按照银行行业公布的标准进行划分，对资产样本数据匹配对应的小类，得到打标标签。如图5所示，类别包括：参与人通用、个人通用、组织通用、客户信息-基本信息、风险信息-信用风险、管理个性-个人客户、企业客户、金融同业客户、员工信息、机构信息、商户信息-基本信息、终端信息等，每一个类别对应有相应的等级，例如商户信息-基本信息对应2级，当资产样本数据输入为商户信息时，进一步确定该商户信息下的小类，基本信息小类，则资产样本数据标注的标签表示为商户信息-基本信息-/-2。It should be noted that the asset sample data is manually labeled, and the labeling rules are divided according to the standards published by the banking industry. The asset sample data is matched with the corresponding sub-categories to obtain the labeling labels. As shown in Figure 5, the categories include: general for participants, general for individuals, general for organizations, customer information-basic information, risk information-credit risk, management personality-individual customers, corporate customers, financial peer customers, employee information, institutional information, merchant information-basic information, terminal information, etc. Each category corresponds to a corresponding level, for example, merchant information-basic information corresponds to level 2. When the asset sample data is input as merchant information, the sub-category under the merchant information is further determined, and the basic information sub-category is then determined. The label of the asset sample data is expressed as merchant information-basic information-/-2.

步骤S920，对各个资产样本数据进行预处理，得到多个预处理样本数据。Step S920: Preprocess each asset sample data to obtain multiple preprocessed sample data.

在一实施例中，对各个资产样本数据进行预处理，预处理包括数据清洗和数据的向量化。数据清洗对多个资产样本数据进行数据筛选、去除杂乱数据等，数据筛选为选取资产样本数据对打标分类有影响的数据列，例如字段中文名、字段英文名、表中文名、表英文名、字段类型和字段长度等，将重要字段数据筛选出来，能够在保证准确度的情况下，减少计算量。去除杂乱数据可以为去除空行、乱码、备注和重复数据等。其中，去除重复数据可以按照类别、字段中文名和字段英文名等字段去除重复记录行，相同数据中只保留一行记录。通过进行数据预处理得到多个预处理样本数据，能够减少计算量，提高模型打标的准确性。In one embodiment, each asset sample data is preprocessed, and the preprocessing includes data cleaning and data vectorization. Data cleaning performs data screening on multiple asset sample data, removes cluttered data, etc. Data screening is to select data columns that have an impact on labeling and classification of asset sample data, such as field Chinese name, field English name, table Chinese name, and table English name. , field type and field length, etc., filtering out important field data can reduce the amount of calculation while ensuring accuracy. Removing cluttered data can include removing blank lines, garbled characters, notes, duplicate data, etc. Among them, removing duplicate data can remove duplicate record rows according to categories, field Chinese names, field English names and other fields, and only retain one row of records in the same data. By performing data preprocessing to obtain multiple preprocessed sample data, the amount of calculation can be reduced and the accuracy of model marking can be improved.

步骤S930，对各个预处理样本数据进行第一样本均衡处理，得到训练数据集。Step S930: Perform a first sample equalization process on each preprocessed sample data to obtain a training data set.

如图6所示，对各个预处理样本数据进行第一样本均衡处理，得到训练数据集，包括但不限于有以下步骤：As shown in FIG6 , performing a first sample balancing process on each preprocessed sample data to obtain a training data set includes but is not limited to the following steps:

步骤S931，按照类别、字段中文名和字段英文名，对预处理样本数据进行去重处理，得到第一去重样本数据。Step S931: Perform deduplication processing on the preprocessed sample data according to the category, field Chinese name and field English name to obtain the first deduplication sample data.

在本申请一些可能的实施例中，根据步骤S920得到的预处理样本数据，按照类别、字段中文名和字段英文名进行去重处理。例如，在多条预处理样本数据的类别、字段中文名和字段英文名均相同的情况下，该多条预处理样本数据为重复数据，将重复数据删除，仅保证相同数据中的一条预处理样本数据，得到第一去重样本数据，有利于后续根据第一去重样本数据的训练集。In some possible embodiments of this application, based on the preprocessed sample data obtained in step S920, deduplication processing is performed according to categories, field Chinese names and field English names. For example, if the categories, Chinese field names, and English names of multiple pieces of preprocessed sample data are the same, the multiple pieces of preprocessed sample data are duplicate data. The duplicate data will be deleted and only one preprocessed sample of the same data will be guaranteed. data to obtain the first deduplication sample data, which is beneficial to subsequent training sets based on the first deduplication sample data.

步骤S932，从第一去重样本数据中选取数量排列在前的N个类别，得到选取样本数据，其中，N为大于或者等于2的整数。Step S932: Select N categories ranked first in number from the first deduplication sample data to obtain selected sample data, where N is an integer greater than or equal to 2.

在本申请一些可能的实施例中，N为大于或者等于2的整数，且N不大于10。从第一去重样本数据中选取数量排列在前的N个类别，当N为2时，即选取出数量最高的2个类别，得到选取样本数据，有利于后续根据选取样本数据进行二次去重。需要说明的是，该类别为大类，大类下还包括多个小类别。In some possible embodiments of the present application, N is an integer greater than or equal to 2, and N is not greater than 10. Select the N categories with the highest number from the first deduplication sample data. When N is 2, the two categories with the highest number are selected to obtain the selected sample data, which is beneficial to subsequent secondary deduplication based on the selected sample data. Heavy. It should be noted that this category is a major category, and the major category also includes multiple sub-categories.

步骤S933，按照类别、字段中文名，对选取样本数据进行去重，得到第二去重样本数据。Step S933: Deduplicate the selected sample data according to categories and field Chinese names to obtain second deduplicated sample data.

在本申请一些可能的实施例中，由于字段中文名可以采用不同的英文翻译，在根据步骤S932得到选取样本的基础上，按照类别和字段中文名再次进行去重处理，得到第二去重样本数据。能够避免英文表示意思相同的数据存在，而构成重复数据，得到第二去重样本数据无重复数据。In some possible embodiments of the present application, since the Chinese name of the field can be translated into different English, based on the selected sample obtained in step S932, deduplication processing is performed again according to the category and the Chinese name of the field to obtain the second deduplication sample data. This can avoid the existence of data with the same meaning in English and thus form duplicate data, and the second deduplication sample data has no duplicate data.

步骤S934，去掉第二去重样本数据中数量不超过预设数量的类别，得到训练数据集。Step S934: Remove categories whose number does not exceed a preset number in the second deduplication sample data to obtain a training data set.

在本申请一些可能的实施例中，根据步骤S933得到第二去重样本数据，统计第二去重样本数据的数量，在每一个小类别中，去掉第二去重样本数据中数量不超过预设数量的类别，预设数量可以为10条，也可以为20条，将第二去重样本数据中数量小于或者等于10条的类别去掉，得到训练数据集。通过上述步骤构建训练数据集，有利于提高模型的准确性。In some possible embodiments of the present application, the second deduplication sample data is obtained according to step S933, and the number of the second deduplication sample data is counted. In each small category, the number of the second deduplication sample data that is removed does not exceed the predetermined number. Set the number of categories. The preset number can be 10 or 20. Remove the categories whose number is less than or equal to 10 in the second deduplication sample data to obtain a training data set. Constructing a training data set through the above steps will help improve the accuracy of the model.

步骤S940，对各个预处理样本数据进行第二样本均衡处理，得到验证数据集。Step S940: Perform a second sample equalization process on each preprocessed sample data to obtain a verification data set.

在一实施例中，对各个预处理样本数据进行第二样本均衡处理，得到验证数据集，具体包括但不限于有：去掉预处理样本数据中数量不超过预设数量的类别，预设数量可以为10条，也可以为20条，将预处理样本数据中数量小于或者等于10条的类别去掉，得到验证数据集，有利于提高模型的准确性。In one embodiment, a second sample equalization process is performed on each preprocessed sample data to obtain a verification data set, which specifically includes but is not limited to: removing categories whose number does not exceed a preset number in the preprocessed sample data. The preset number can be It can be 10 items or 20 items. Remove the categories whose number is less than or equal to 10 items in the preprocessed sample data to obtain the verification data set, which will help improve the accuracy of the model.

如图7所示，假设预处理样本数据有5789条，按照类别、字段中文名和字段英文名，对预处理样本数据进行去重处理，再从去重后的数据中选取数量排列在前的2个类别；按照类别、字段中文名，对选取的数据进行去重，得到去重后的数据量2252条；然后去掉数据中数量不超过10条的类别，得到训练数据集，具有数据2250条。从预处理样本数据中去掉数量不超过10条的类别，得到验证数据集，具有数据5780条。As shown in Figure 7, assuming that there are 5789 pieces of preprocessed sample data, the preprocessed sample data is deduplicated according to the category, field Chinese name and field English name, and then the top 2 numbers are selected from the deduplicated data. categories; deduplicate the selected data according to categories and field Chinese names, and obtain the deduplicated data volume of 2252 items; then remove categories with no more than 10 items in the data to obtain a training data set with 2250 data items. Categories with no more than 10 items were removed from the preprocessed sample data to obtain a verification data set with 5780 data items.

步骤S950，利用训练数据集对预设的第一初始打标模型进行调参，得到第二初始打标模型。Step S950: Use the training data set to adjust parameters of the preset first initial marking model to obtain a second initial marking model.

在一实施例中，训练数据集包括无重复且数据量满足预设数量的预处理样本数据，以及该预处理样本数据对应的打标标签，利用该预处理样本数据和打标标签对预设的第一初始打标模型进行调参，得到第二初始打标模型，有利于后续得到目标打标模型。第一初始打标模型可以为bert模型，也可以为bert模型的变体，还可以为其他深度学习模型，能够满足准确的打标需求即可，这里不作赘述；模型调参为bert模型的调参方式，这里不作赘述。In one embodiment, the training data set includes preprocessed sample data without duplication and whose data volume meets the preset quantity, and the labeling label corresponding to the preprocessed sample data. The preprocessed sample data and the labeling label are used to adjust the parameters of the preset first initial labeling model to obtain the second initial labeling model, which is conducive to the subsequent acquisition of the target labeling model. The first initial labeling model can be a BERT model, a variant of the BERT model, or other deep learning models, as long as it can meet the accurate labeling requirements, which will not be elaborated here; the model parameter adjustment is the parameter adjustment method of the BERT model, which will not be elaborated here.

步骤S960，利用验证数据集对第二初始打标模型进行验证，在准确率大于预设的准确率阈值的情况下，得到目标打标模型。Step S960: Use the verification data set to verify the second initial marking model. If the accuracy is greater than the preset accuracy threshold, the target marking model is obtained.

在一实施例中，训练数据集包括满足预设数量的预处理样本数据，不包括该预处理样本数据对应的打标标签，利用验证数据集对第二初始打标模型进行验证，在准确率大于预设的准确率阈值的情况下，表明该模型的泛化性较好，得到目标打标模型，对检索资产数据进行打标也具有较高的准确性。In one embodiment, the training data set includes pre-processed sample data that meets a preset number and does not include the marking labels corresponding to the pre-processed sample data. The verification data set is used to verify the second initial marking model. In terms of accuracy, If it is greater than the preset accuracy threshold, it indicates that the generalization of the model is good, the target marking model is obtained, and the retrieval asset data is marked with high accuracy.

在另一实施例中，在准确率小于或者等于预设的准确率阈值的情况下，对训练数据集进行数据升维处理，利用升维后的训练数据集对第一初始打标模型进行调参，使得准确率大于预设的准确率阈值。In another embodiment, when the accuracy is less than or equal to the preset accuracy threshold, the training data set is subjected to data dimensionality processing, and the first initial marking model is adjusted using the dimensionally increased training data set. parameters so that the accuracy is greater than the preset accuracy threshold.

再一实施例中，在银行提供新样本数据时，该样本数据为无对应的打标标签，去掉新样本数据中数量不超过预设数量的类别，对目标打标模型进行测试，高达82%的准确性，该目标打标模型具有良好的泛化性。In another embodiment, when the bank provides new sample data, the sample data has no corresponding labeling tags, and the number of categories in the new sample data does not exceed the preset number is removed, and the target labeling model is tested, and the result is as high as 82%. The accuracy of this target marking model has good generalization.

步骤S500，根据推荐数据、推荐数据对应的标签、待打标资产数据和打标标签，得到匹配结果。Step S500: Obtain matching results based on recommended data, tags corresponding to the recommended data, asset data to be marked, and marking tags.

在一实施例中，根据推荐数据、推荐数据对应的标签、待打标资产数据和打标标签，得到匹配结果，具体包括但不限于：首先将推荐数据对应的标签与打标标签进行类别划分，利用预设的分类算法判断推荐数据对应的标签与打标标签是否为同一小类，以及利用相似度算法将推荐数据与待打标资产数据进行相似度计算，筛选出标签类别一致且相似度小于预设的阈值的数据，得到匹配结果，有利于后续根据匹配结果进行准确的资产数据推荐。其中分类算法可以为支持向量机，也可以为深度学习算法等；相似度算法可以为欧式距离、字符相似性算法等，这里不作赘述。In one embodiment, the matching results are obtained based on the recommended data, the tags corresponding to the recommended data, the asset data to be marked, and the marking tags. Specifically, the matching results include but are not limited to: first, classifying the tags corresponding to the recommended data and the marking tags. , use the preset classification algorithm to determine whether the tags corresponding to the recommended data and the marked tags are in the same subcategory, and use the similarity algorithm to calculate the similarity between the recommended data and the asset data to be marked, and filter out the tag categories that are consistent and similar. For data that is less than the preset threshold, the matching results are obtained, which is conducive to subsequent accurate asset data recommendations based on the matching results. The classification algorithm can be a support vector machine or a deep learning algorithm, etc.; the similarity algorithm can be Euclidean distance, character similarity algorithm, etc., which will not be described here.

步骤S600，根据匹配结果和检索资产数据进行第二推荐，得到推荐资产数据。Step S600: Perform a second recommendation based on the matching result and the retrieved asset data to obtain recommended asset data.

在一实施例中，第二推荐为根据匹配结果进行的一次精准推荐，在相同类别的数据中，推荐更符合用户需求的资产数据。有利于步骤S500中得到的匹配结果为相似度高且标签相同，与检索资产数据高度匹配，推荐排列第一的推荐数据，得到推荐资产数据。该推荐资产数据通过多次筛选，更加符合检索资产数据的相关要求，实现为用户推荐准确的资产数据。In one embodiment, the second recommendation is an accurate recommendation based on the matching results. Among the data of the same category, asset data that better meets the user's needs is recommended. It is advantageous for the matching result obtained in step S500 to have high similarity and the same label, and highly match the retrieved asset data. The recommended data ranked first is recommended to obtain the recommended asset data. After multiple screenings, the recommended asset data is more in line with the relevant requirements for retrieving asset data, so that accurate asset data can be recommended to users.

需要说明的是，可以将推荐资产数据中的对应标签作为检索资产数据的标签，将检索资产数据和其对应的标签添加至资产打标清单中，增加资产打标清单中的数据量，能够提高推荐准确性。It should be noted that the corresponding tags in the recommended asset data can be used as tags for retrieving asset data, and the retrieved asset data and its corresponding tags are added to the asset marking list to increase the amount of data in the asset marking list, which can improve Recommended accuracy.

如图8所示，图8示出了本申请实施例提供了一种业务数据处理方法的整体流程示意图，首先，获取检索资产数据，响应于对预设的目标打标模型的选择操作，对检索资产数据进行向量处理，以及对检索资产数据进行升维处理，并将处理的数据融合输入目标打标模型进行打标处理，得到打标结果，得到的打标结果指示所属类别，有利于准确推荐；根据打标结果，利用预设的数量限制、匹配度限制、组合限制中的一种策略进行第一推荐，得到相关推荐结果，相关推荐结果包括推荐数据和推荐数据对应的标签，先利用推荐策略进行粗推荐，有利于提高推荐准确性。As shown in Figure 8, Figure 8 shows an overall flow diagram of a business data processing method provided by an embodiment of the present application. First, the retrieval asset data is obtained, and in response to the selection operation of the preset target marking model, the Retrieve asset data for vector processing, and perform dimensionality upgrading processing on the retrieved asset data, and fuse the processed data into the target marking model for marking processing to obtain the marking results. The obtained marking results indicate the category to which they belong, which is conducive to accuracy Recommendation; based on the marking results, use one of the preset quantity restrictions, matching degree restrictions, and combination restrictions to make the first recommendation, and obtain relevant recommendation results. The relevant recommendation results include recommended data and labels corresponding to the recommended data. First use The recommendation strategy makes rough recommendations, which is helpful to improve the accuracy of recommendations.

在一实施例中，获取多个待打标资产数据，对各个待打标资产数据进行预处理，得到多个预处理资产数据，利用目标打标模型对各个预处理资产数据进行打标，得到打标标签；对打标后的数据进行确认和公布，对资产数据进行盘点，实现资产的有效管理，根据确认和公布后的各个待打标资产数据和各个待打标资产数据对应的打标标签，得到资产打标清单。获取预设的资产打标清单，资产打标清单包括多个待打标资产数据和各个待打标资产数据对应的打标标签；根据推荐数据、推荐数据对应的标签、待打标资产数据和打标标签，得到匹配结果；根据匹配结果和检索资产数据进行第二推荐，得到推荐资产数据，通过进行第二推荐，能够得到较为准确的推荐资产数据。In one embodiment, multiple asset data to be marked are obtained, each asset data to be marked is preprocessed to obtain multiple preprocessed asset data, and each preprocessed asset data is marked using a target marking model to obtain Labeling; confirm and publish the marked data, inventory the asset data, and achieve effective management of assets. According to the confirmed and published asset data to be marked and the corresponding marking of each asset data to be marked, Tag to get the asset tag list. Obtain the preset asset marking list. The asset marking list includes multiple asset data to be marked and the marking tags corresponding to each asset data to be marked; based on the recommended data, the tags corresponding to the recommended data, the asset data to be marked and Mark the tag to get the matching result; make a second recommendation based on the matching result and the retrieved asset data to get the recommended asset data. By making the second recommendation, you can get more accurate recommended asset data.

如图9所示，本申请实施例提供一种业务数据处理装置100，该业务数据处理装置100通过第一数据获取模块110获取检索资产数据，检索资产数据为用户输入的检索内容，获取检索资产数据为后续给用户推荐准确的资产数据提供数据支持；利用打标模块120响应于对预设的目标打标模型的选择操作，利用目标打标模型对检索资产数据进行打标，得到打标结果，先对检索资产数据进行打标，得到的打标结果指示所属类别，有利于准确推荐；再采用第一推荐模块130根据打标结果，利用预设的推荐策略进行第一推荐，得到相关推荐结果，相关推荐结果包括推荐数据和推荐数据对应的标签，先利用推荐策略进行粗推荐，有利于提高推荐准确性；采用第二数据获取模块140获取预设的资产打标清单，资产打标清单包括多个待打标资产数据和各个待打标资产数据对应的打标标签，为后续准确推荐提供数据支持；再利用匹配模块150根据推荐数据、推荐数据对应的标签、待打标资产数据和打标标签，得到匹配结果；最后利用第二推荐模块160根据匹配结果和检索资产数据进行第二推荐，得到推荐资产数据，能够得到较为准确的推荐资产数据。As shown in FIG9 , an embodiment of the present application provides a business data processing device 100. The business data processing device 100 acquires retrieved asset data through a first data acquisition module 110. The retrieved asset data is the retrieval content input by the user. The retrieved asset data is acquired to provide data support for subsequently recommending accurate asset data to the user. The tagging module 120 is used to respond to the selection operation of the preset target tagging model, and the retrieved asset data is tagged using the target tagging model to obtain the tagging result. The retrieved asset data is first tagged, and the obtained tagging result indicates the category to which it belongs, which is conducive to accurate recommendation. Then, the first recommendation module 130 is used to make a first recommendation based on the tagging result using a preset recommendation strategy. Relevant recommendation results are obtained, which include recommended data and labels corresponding to the recommended data. A rough recommendation is first performed using a recommendation strategy, which is beneficial to improving the accuracy of the recommendation. A second data acquisition module 140 is used to obtain a preset asset labeling list, which includes multiple asset data to be labeled and labeling labels corresponding to each asset data to be labeled, to provide data support for subsequent accurate recommendations. The matching module 150 is then used to obtain matching results based on the recommended data, the labels corresponding to the recommended data, the asset data to be labeled, and the labeling labels. Finally, the second recommendation module 160 is used to perform a second recommendation based on the matching results and the retrieved asset data to obtain recommended asset data, so that relatively accurate recommended asset data can be obtained.

需要说明的是，第一数据获取模块110与打标模块120连接，打标模块120与第一推荐模块130连接，第一推荐模块130与第二数据获取模块140连接，第二数据获取模块140与匹配模块150连接，匹配模块150与第二推荐模块160连接。上述业务数据处理方法应用于业务数据处理装置100，业务数据处理装置100通过目标打标模型对检索资产数据进行打标，根据打标结果进行第一推荐，筛选出与检索资产数据相似的资产数据，再根据资产打标清单进行第二推荐，通过多层推荐能够为用户推荐准确的资产数据。It should be noted that the first data acquisition module 110 is connected to the marking module 120, the marking module 120 is connected to the first recommendation module 130, the first recommendation module 130 is connected to the second data acquisition module 140, and the second data acquisition module 140 The matching module 150 is connected to the second recommendation module 160 . The above business data processing method is applied to the business data processing device 100. The business data processing device 100 marks the retrieved asset data through the target marking model, makes first recommendations based on the marking results, and filters out asset data similar to the retrieved asset data. , and then make a second recommendation based on the asset marking list. Through multi-layer recommendation, accurate asset data can be recommended to users.

还需要说明的是：上述实施例提供的装置在实现其功能时，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将设备的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。另外，上述实施例提供的装置和方法实施例属于同一构思，其具体实现过程详见方法实施例，这里不再赘述。It should also be noted that when the device provided in the above embodiment implements its functions, only the division of the above functional modules is used as an example. In actual application, the above functions can be allocated to different functional modules according to needs, that is, The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the device and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be described again here.

本申请还公开一种电子设备。参照图10，图10是本申请实施例的公开的一种电子设备的结构示意图。该电子设备500可以包括：至少一个处理器501，至少一个网络接口504，用户接口503，存储器505，至少一个通信总线502。The present application also discloses an electronic device. Referring to FIG10 , FIG10 is a schematic diagram of the structure of an electronic device disclosed in an embodiment of the present application. The electronic device 500 may include: at least one processor 501 , at least one network interface 504 , a user interface 503 , a memory 505 , and at least one communication bus 502 .

其中，通信总线502用于实现这些组件之间的连接通信。Among them, the communication bus 502 is used to realize connection communication between these components.

其中，用户接口503可以包括显示屏（Display）、摄像头（Camera），可选用户接口503还可以包括标准的有线接口、无线接口。Among them, the user interface 503 may include a display screen (Display) and a camera (Camera), and the optional user interface 503 may also include a standard wired interface and a wireless interface.

其中，网络接口504可选的可以包括标准的有线接口、无线接口（如WI-FI接口）。Among them, the network interface 504 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).

其中，处理器501可以包括一个或者多个处理核心。处理器501利用各种接口和线路连接整个服务器内的各个部分，通过运行或执行存储在存储器505内的指令、程序、代码集或指令集，以及调用存储在存储器505内的数据，执行服务器的各种功能和处理数据。可选的，处理器501可以采用数字信号处理（Digital Signal Processing，DSP）、现场可编程门阵列（Field-Programmable Gate Array，FPGA）、可编程逻辑阵列（Programmable LogicArray，PLA）中的至少一种硬件形式来实现。处理器501可集成中央处理器（CentralProcessing Unit，CPU）、图像处理器（Graphics Processing Unit，GPU）和调制解调器等中的一种或几种的组合。其中，CPU主要处理操作系统、用户界面和应用程序等；GPU用于负责显示屏所需要显示的内容的渲染和绘制；调制解调器用于处理无线通信。可以理解的是，上述调制解调器也可以不集成到处理器501中，单独通过一块芯片进行实现。Among them, the processor 501 may include one or more processing cores. The processor 501 uses various interfaces and lines to connect various parts of the entire server, and executes the server by running or executing instructions, programs, code sets or instruction sets stored in the memory 505, and calling data stored in the memory 505. Various functions and processing data. Optionally, the processor 501 can use at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). implemented in hardware form. The processor 501 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), a modem, etc. Among them, the CPU mainly handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content that needs to be displayed on the display; and the modem is used to handle wireless communications. It can be understood that the above-mentioned modem may not be integrated into the processor 501 and may be implemented by a separate chip.

其中，存储器505可以包括随机存储器（Random Access Memory，RAM），也可以包括只读存储器（Read-Only Memory）。可选的，该存储器505包括非瞬时性计算机可读介质（non-transitory computer-readable storage medium）。存储器505可用于存储指令、程序、代码、代码集或指令集。存储器505可包括存储程序区和存储数据区，其中，存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令（比如触控功能、声音播放功能、图像播放功能等）、用于实现上述各个方法实施例的指令等；存储数据区可存储上面各个方法实施例中涉及的数据等。存储器505可选的还可以是至少一个位于远离前述处理器501的存储装置。参照图10，作为一种计算机存储介质的存储器505中可以包括操作系统、网络通信模块、用户接口模块以及一种业务数据处理方法的应用程序。Among them, the memory 505 may include a random access memory (RAM) or a read-only memory (Read-Only Memory). Optionally, the memory 505 includes a non-transitory computer-readable storage medium. The memory 505 can be used to store instructions, programs, codes, code sets or instruction sets. The memory 505 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), instructions for implementing the above-mentioned various method embodiments, etc.; the data storage area may store data involved in the above-mentioned various method embodiments, etc. The memory 505 may also be optionally at least one storage device located away from the aforementioned processor 501. Referring to Figure 10, the memory 505 as a computer storage medium may include an operating system, a network communication module, a user interface module, and an application program of a business data processing method.

在图10所示的电子设备500中，用户接口503主要用于为用户提供输入的接口，获取用户输入的数据；而处理器501可以用于调用存储器505中存储一种业务数据处理方法的应用程序，当由一个或多个处理器501执行时，使得电子设备500执行如上述实施例中一个或多个的方法。需要说明的是，对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本申请并不受所描述的动作顺序的限制，因为依据本申请，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定是本申请所必需的。In the electronic device 500 shown in FIG10 , the user interface 503 is mainly used to provide an input interface for the user and obtain the data input by the user; and the processor 501 can be used to call an application program storing a business data processing method in the memory 505, and when executed by one or more processors 501, the electronic device 500 executes one or more methods in the above-mentioned embodiments. It should be noted that for the aforementioned method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should be aware that the present application is not limited by the described order of actions, because according to the present application, certain steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required for the present application.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

在本申请所提供的几种实施方式中，应该理解到，所披露的装置，可通过其他的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些服务接口，装置或单元的间接耦合或通信连接，可以是电性或其他的形式。Among the several implementations provided in this application, it should be understood that the disclosed device can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into Another system, or some features can be ignored, or not implemented. Another point is that the coupling or direct coupling or communication connection between each other shown or discussed may be through some service interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。A unit described as a separate component may or may not be physically separate. A component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储器中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储器中，包括若干指令用以使得一台计算机设备（可为个人计算机、服务器或者网络设备等）执行本申请各个实施例方法的全部或部分步骤。而前述的存储器包括：U盘、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable memory. Based on this understanding, the technical solution of the present application, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a memory and includes several instructions for a computer device (which can be a personal computer, server or network device, etc.) to perform all or part of the steps of the various embodiments of the present application. The aforementioned memory includes: various media that can store program codes, such as USB flash drives, mobile hard drives, magnetic disks or optical disks.

以上者，仅为本公开的示例性实施例，不能以此限定本公开的范围。即但凡依本公开教导所作的等效变化与修饰，皆仍属本公开涵盖的范围内。本领域技术人员在考虑说明书及实践真理的公开后，将容易想到本公开的其他实施方案。The above are only exemplary embodiments of the present disclosure and do not limit the scope of the present disclosure. That is to say, all equivalent changes and modifications made based on the teachings of this disclosure are still within the scope of this disclosure. Other embodiments of the present disclosure will readily occur to those skilled in the art, upon consideration of the specification and disclosure of practical truths.

本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未记载的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的范围和精神由权利要求限定。This application is intended to cover any variations, uses, or adaptations of the disclosure that follow the general principles of the disclosure and include common knowledge or customary technical means in the technical field that are not described in the disclosure. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being defined by the following claims.

Claims

1. A business data processing method, characterized in that the method includes:

Get retrieval asset data;

In response to the selection operation of the preset target marking model, use the target marking model to mark the retrieved asset data to obtain a marking result;

According to the labeling result, a first recommendation is performed using a preset recommendation strategy to obtain a relevant recommendation result, wherein the relevant recommendation result includes the recommended data and a label corresponding to the recommended data;

Obtaining a preset asset marking list, wherein the asset marking list includes a plurality of asset data to be marked and a marking label corresponding to each of the asset data to be marked;

Obtain a matching result based on the recommended data, the tag corresponding to the recommended data, the asset data to be marked and the marking tag;

A second recommendation is made based on the matching result and the retrieved asset data to obtain recommended asset data.

2. The method according to claim 1, characterized in that before obtaining the preset asset marking list, the method further includes:

Obtain multiple asset data to be marked, preprocess each of the asset data to be marked, and obtain multiple preprocessed asset data;

Using the target labeling model to label each of the pre-processed asset data to obtain a labeling label;

The asset marking list is obtained according to each of the asset data to be marked and the marking tags corresponding to each of the asset data to be marked.

3. The method according to claim 2, characterized in that, before using the target marking model to mark each of the pre-processed asset data to obtain a marking label, the method further includes:

Obtain an asset sample data set, where the asset sample data set includes multiple asset sample data and marking labels corresponding to each of the asset sample data;

Preprocessing each of the asset sample data to obtain a plurality of preprocessed sample data;

Performing a first sample balancing process on each of the preprocessed sample data to obtain a training data set;

Perform a second sample equalization process on each of the preprocessed sample data to obtain a verification data set;

Using the training data set to adjust the parameters of the preset first initial labeling model to obtain a second initial labeling model;

The second initial marking model is verified using the verification data set, and when the accuracy is greater than the preset accuracy threshold, the target marking model is obtained.

4. The method according to claim 3, characterized in that, performing a first sample equalization process on each of the preprocessed sample data to obtain a training data set, including:

Perform deduplication processing on the preprocessed sample data according to categories, field Chinese names and field English names to obtain the first deduplicated sample data;

Selecting N categories with the largest number from the first deduplicated sample data to obtain selected sample data, where N is an integer greater than or equal to 2;

Deduplicate the selected sample data according to categories and field Chinese names to obtain the second deduplicated sample data;

Removing the categories whose number does not exceed a preset number in the second deduplication sample data to obtain the training data set;

Performing second sample equalization processing on each of the preprocessed sample data to obtain a verification data set, including:

Categories whose number does not exceed a preset number in the preprocessed sample data are removed to obtain the verification data set.

5. The method according to claim 1, characterized in that, before using the target marking model to mark the retrieved asset data to obtain a marking result, the method further includes:

Perform vector processing on the retrieved asset data to obtain an asset data vector;

Performing data dimensionality upgrading processing on the retrieved asset data to obtain a multi-dimensional data vector;

The asset data vector and the multi-dimensional data vector are fused and then input into the target marking model.

6. The method according to claim 1, characterized in that the matching result is obtained based on the recommended data, the tag corresponding to the recommended data, the asset data to be marked and the marking tag, including :

Classify the tags corresponding to the recommended data and the marked tags into categories, calculate the similarity between the recommended data and the asset data to be marked, and filter out data whose tag categories are consistent and whose similarity is less than a preset threshold. , to obtain the matching result.

7. The method according to claim 1 is characterized in that the recommendation strategy is at least one of quantity restriction, matching degree restriction, and combination restriction.

8. A business data processing device, characterized in that the device includes:

The first data acquisition module (110) is used to acquire retrieval asset data;

A marking module (120), configured to respond to a selection operation of a preset target marking model, use the target marking model to mark the retrieved asset data, and obtain a marking result;

The first recommendation module (130) is configured to use a preset recommendation strategy to make a first recommendation based on the marking results, and obtain relevant recommendation results, where the relevant recommendation results include recommendation data and labels corresponding to the recommendation data;

A second data acquisition module (140) is used to acquire a preset asset marking list, wherein the asset marking list includes a plurality of asset data to be marked and a marking label corresponding to each of the asset data to be marked;

A matching module (150), configured to obtain a matching result based on the recommended data, the tag corresponding to the recommended data, the asset data to be marked, and the marking tag;

The second recommendation module (160) is used to make a second recommendation based on the matching result and the retrieved asset data to obtain recommended asset data.

9. An electronic device, characterized in that it includes a processor (501), a memory (505), a user interface (503), a communication bus (502) and a network interface (504). The processor (501), the The memory (505), the user interface (503), and the network interface (504) are respectively connected to the communication bus (502). The memory (505) is used to store instructions, and the user interface (503) ) and network interface (504) are used to communicate with other devices, and the processor (501) is used to execute instructions stored in the memory (505), so that the electronic device (500) executes claims 1-7 any of the methods described.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores instructions, and when the instructions are executed, the method according to any one of claims 1-7 is performed.