CN118552985B

CN118552985B - Label identification method, object processing method, computing device, storage medium, and program product

Info

Publication number: CN118552985B
Application number: CN202411016900.5A
Authority: CN
Inventors: 马俊凯
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2024-07-26
Filing date: 2024-07-26
Publication date: 2025-07-04
Anticipated expiration: 2044-07-26
Also published as: CN118552985A

Abstract

The embodiment of the application provides a tag identification method, an object processing method, computing equipment, a storage medium and a program product. The method comprises the steps of obtaining a plurality of labels, grouping the labels according to a hierarchical relationship to obtain a plurality of groups of label combinations, extracting label characteristics corresponding to the labels in the plurality of groups of label combinations, wherein the plurality of groups of label combinations are used for matching with an object to be identified according to the sequence from high to low of the hierarchy based on the label characteristics and the image characteristic object characteristics of the object to be identified to obtain at least one target label of the object to be identified. The technical scheme provided by the embodiment of the application improves the accuracy of label identification.

Description

Tag identification method, object processing method, computing device, storage medium and program Product

技术领域Technical Field

本申请实施例涉及数据处理技术领域，尤其涉及一种标签识别方法、对象处理方法、计算设备、存储介质及程序产品。The embodiments of the present application relate to the field of data processing technology, and in particular to a tag recognition method, an object processing method, a computing device, a storage medium, and a program product.

背景技术Background Art

随着人工智能技术的飞速发展，对象识别与自动标注技术日益成为信息处理与数据分析领域的关键组件，例如，近年来，基于人工智能（Artificial Intelligence，AI）的图像识别算法已趋于成熟，广泛应用于各行各业，成为连接物理世界与数字世界的桥梁。国内外众多服务提供方竞相推出对象识别服务，可以为用户提供便捷的标签生成能力，用户提供待识别对象，如图像，即可以借助服务提供方的对象识别模型生成待识别对象对应的标签。With the rapid development of artificial intelligence technology, object recognition and automatic labeling technology are increasingly becoming key components in the field of information processing and data analysis. For example, in recent years, image recognition algorithms based on artificial intelligence (AI) have become mature and widely used in various industries, becoming a bridge between the physical world and the digital world. Many service providers at home and abroad are competing to launch object recognition services, which can provide users with convenient label generation capabilities. Users provide objects to be identified, such as images, and can use the service provider's object recognition model to generate labels corresponding to the objects to be identified.

然而，尽管现有对象识别服务提供了丰富的标签库，能够满足了大多数通用需求，但其匹配待识别对象对应标签的机制仍然是从标签库中筛选出相匹配的标签即可，并不能够满足用户的精细化标签需求。因此，如何准确地识别待识别对象匹配的标签成为本领域技术人员需要解决的技术问题。However, although the existing object recognition service provides a rich tag library that can meet most general needs, its mechanism for matching tags corresponding to the object to be recognized is still to filter out matching tags from the tag library, which cannot meet the user's refined tag needs. Therefore, how to accurately identify the tags that match the object to be recognized has become a technical problem that technicians in this field need to solve.

发明内容Summary of the invention

本申请实施例提供一种标签识别方法、对象处理方法、计算设备、存储介质及程序产品，用以解决现有技术中标签识别结果不准确的技术问题。The embodiments of the present application provide a tag recognition method, an object processing method, a computing device, a storage medium, and a program product to solve the technical problem of inaccurate tag recognition results in the prior art.

第一方面，本申请实施例提供了一种标签处理方法，包括：In a first aspect, an embodiment of the present application provides a label processing method, including:

获取多个标签；Get multiple tags;

将所述多个标签按照层级关系进行分组，获得多组标签组合；Grouping the multiple tags according to a hierarchical relationship to obtain multiple groups of tag combinations;

提取所述多组标签组合中的标签分别对应的标签特征；Extracting label features corresponding to the labels in the plurality of label combinations;

其中，所述多组标签组合用以按照层级从高到低的顺序，基于所述标签特征以及待识别对象的对象特征，与所述待识别对象进行匹配以获得所述待识别对象的至少一个目标标签。The multiple groups of label combinations are used to match the object to be identified in descending order of hierarchy based on the label features and the object features of the object to be identified to obtain at least one target label of the object to be identified.

可选地，所述将所述多个标签按照层级关系进行分组，获得多组标签组合包括：Optionally, grouping the multiple tags according to a hierarchical relationship to obtain multiple groups of tag combinations includes:

识别所述多个标签的所属层级；Identifying the hierarchy to which the multiple tags belong;

将同一层级的标签进行组合，获得至少一组标签组合。Combine labels at the same level to obtain at least one set of label combinations.

可选地，所述识别所述多个标签的所属层级包括：Optionally, the identifying the levels to which the multiple tags belong includes:

在词汇数据库中查询所述多个标签的上位词直至达到根节点，并根据所述标签至所述根节点的路径长度，确定出所述标签的所属层级。The hypernyms of the multiple tags are searched in a vocabulary database until a root node is reached, and the level to which the tag belongs is determined according to the path length from the tag to the root node.

可选地，所述在词汇数据库中查询所述多个标签的上位词直至达到根节点包括：Optionally, searching for hypernyms of the plurality of tags in a vocabulary database until a root node is reached comprises:

确定所述多个标签在所述词汇数据库中的同义词集；determining a synonym set of the plurality of tags in the vocabulary database;

逐级查找所述同义词集在所述词汇数据库中的上位词直至达到根节点。The hypernyms of the synonym set in the vocabulary database are searched level by level until a root node is reached.

可选地，在所述获取多个标签之后，所述方法还包括：Optionally, after acquiring the plurality of tags, the method further includes:

从所述多个标签筛除符合筛除条件的标签，以更新所述多个标签。Tags that meet a filtering condition are filtered out from the plurality of tags to update the plurality of tags.

可选地，所述获取多个标签包括：Optionally, acquiring multiple tags includes:

获取用户提供的包括多个标签的标签列表；Get a tag list including multiple tags provided by the user;

所述提取所述多组标签组合中的标签分别对应的标签特征包括：The step of extracting label features corresponding to the labels in the plurality of label combinations comprises:

利用多模态模型提取所述多组标签组合中的标签分别对应的标签特征；所述多模态模型为预训练模型。A multimodal model is used to extract label features corresponding to the labels in the multiple groups of label combinations; the multimodal model is a pre-trained model.

第二方面，本申请实施例提供了一种对象处理方法，包括：In a second aspect, an embodiment of the present application provides an object processing method, including:

获取待识别对象；Get the object to be identified;

提取所述待识别对象的对象特征；Extracting object features of the object to be identified;

确定提取的多组标签组合中标签对应的标签特征；所述多组标签组合为获取的多个标签按照层级关系进行分组获得；Determine the label features corresponding to the labels in the extracted multiple label combinations; the multiple label combinations are obtained by grouping the acquired multiple labels according to a hierarchical relationship;

基于所述对象特征及所述标签特征，将所述多组标签组合分别与所述待识别对象进行匹配，以获得所述待识别对象对应的至少一个目标标签。Based on the object features and the label features, the multiple groups of label combinations are matched with the objects to be identified respectively to obtain at least one target label corresponding to the objects to be identified.

可选地，所述基于所述对象特征以及所述标签特征，将所述多组标签组合分别与所述待识别对象进行匹配，以获得所述对象特征对应的至少一个目标标签，包括：Optionally, matching the plurality of groups of label combinations with the objects to be identified respectively based on the object features and the label features to obtain at least one target label corresponding to the object features includes:

基于所述标签特征以及所述对象特征，将待识别对象与最高层级标签组合中的标签进行匹配，以获得匹配成功标签；Based on the label features and the object features, matching the object to be identified with the labels in the highest-level label combination to obtain a matching successful label;

从最高层级标签组合的下一层级标签组合开始，确定当前层级标签组合中，与上一层级标签组合中的匹配成功标签具有上下位关系的至少一个标签；Starting from the next-level label combination of the highest-level label combination, determining at least one label in the current-level label combination that has a superior-fewer-superior relationship with the successfully matched label in the previous-level label combination;

将所述至少一个标签与所述待识别对象进行匹配，以获得匹配成功标签；Matching the at least one tag with the object to be identified to obtain a successful matching tag;

将所述多组标签组合对应的至少一个匹配成功标签，作为所述待识别对象对应的至少一个目标标签。At least one successfully matched tag corresponding to the multiple tag combinations is used as at least one target tag corresponding to the object to be identified.

可选地，所述基于所述对象特征及所述标签特征，将所述多组标签组合分别与所述待识别对象进行匹配，以获得与所述待识别对象对应的至少一个目标标签包括：Optionally, matching the plurality of groups of label combinations with the objects to be identified respectively based on the object features and the label features to obtain at least one target label corresponding to the objects to be identified comprises:

计算所述多组标签组合中的标签对应的标签特征分别与所述待识别对象的对象特征之间特征相似度；Calculating feature similarities between label features corresponding to labels in the plurality of label combinations and object features of the object to be identified;

确定特征相似的满足相似条件的至少一个目标标签。At least one target tag with similar features that satisfies a similarity condition is determined.

可选地，所述提取所述待识别对象的对象特征包括：Optionally, extracting the object features of the object to be identified includes:

利用多模态模型提取所述待识别对象的对象特征；Extracting object features of the object to be identified using a multimodal model;

所述确定提取的多组标签组合中标签对应的标签特征包括：The step of determining the label features corresponding to the labels in the extracted multiple sets of label combinations includes:

确定利用所述多模态模型提取的多组标签组合中标签对应的标签特征；所述获取的多个标签包括用户提供的包括多个标签的标签列表。Determine the tag features corresponding to the tags in the multiple tag combinations extracted using the multimodal model; the multiple tags obtained include a tag list including multiple tags provided by a user.

可选地，所述待识别对象为待识别图像，所述方法还包括：Optionally, the object to be identified is an image to be identified, and the method further includes:

基于所述至少一个目标标签，确定所述待识别图像的图像类别；Determining an image category of the image to be identified based on the at least one target tag;

或者；or;

基于所述至少一个目标标签，搜索与所述待识别图像匹配的目标图像；Based on the at least one target tag, searching for a target image matching the image to be identified;

或者；or;

基于所述至少一个目标标签，验证所述待识别图像是否满足图像要求。Based on the at least one target tag, verify whether the image to be identified meets image requirements.

第三方面，本申请实施例提供了一种计算设备，包括处理组件以及存储组件；所述存储组件存储一个或多个计算机指令；所述一个或多个计算机指令用以被所述处理组件调用执行，以实现如上述第一方面所述的标签处理方法，或者上述第二方面所述的对象处理方法。In a third aspect, an embodiment of the present application provides a computing device, comprising a processing component and a storage component; the storage component stores one or more computer instructions; the one or more computer instructions are used to be called and executed by the processing component to implement the label processing method described in the first aspect above, or the object processing method described in the second aspect above.

第四方面，本申请实施例提供了一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理组件执行时，以实现如上述第一方面所述的标签处理方法，或者上述第二方面所述的对象处理方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium having a computer program stored thereon. When the computer program is executed by a processing component, the label processing method described in the first aspect or the object processing method described in the second aspect is implemented.

第五方面，本申请实施例提供了一种计算机程序产品，包括计算机程序/指令，所述计算机程序/指令被处理组件执行时实现如上述第一方面所述的标签处理方法，或者上述第二方面所述的对象处理方法。In a fifth aspect, an embodiment of the present application provides a computer program product, including a computer program/instruction, which, when executed by a processing component, implements the label processing method described in the first aspect above, or the object processing method described in the second aspect above.

本申请实施例，通过获取多个标签；将多个标签按照层级关系进行分组，获得多组标签组合；提取所述多组标签组合中的标签分别对应的标签特征；其中，所述多组标签组合用以按照层级从高到低的顺序，基于所述标签特征以及所述待识别对象的对象特征，与所述待识别对象进行匹配以获得所述待识别对象的至少一个目标标签。本申请实施例通过将多个标签按层级关系分组，能够更好地理解和利用标签之间的上下文和语义关联，以确保后续标签识别过程更加细致和精准。基于标签文本特征和待识别图像的视觉特征匹配待识别对象的目标标签，有助于提升识别的准确性。按照标签层级从高到低的顺序进行匹配以获得该待识别对象的至少一个目标标签，能够进一步减少误识别的概率，提高标签覆盖度，进一步保证标签识别准确性。In an embodiment of the present application, a plurality of labels are obtained; the plurality of labels are grouped according to a hierarchical relationship to obtain a plurality of label combinations; the label features corresponding to the labels in the plurality of label combinations are extracted; wherein the plurality of label combinations are used to match the object to be identified in a descending order of hierarchy based on the label features and the object features of the object to be identified to obtain at least one target label of the object to be identified. By grouping a plurality of labels according to a hierarchical relationship, the embodiment of the present application can better understand and utilize the context and semantic associations between labels to ensure that the subsequent label recognition process is more detailed and accurate. Matching the target label of the object to be identified based on the label text features and the visual features of the image to be identified helps to improve the accuracy of recognition. Matching in a descending order of the label hierarchy to obtain at least one target label of the object to be identified can further reduce the probability of misidentification, improve label coverage, and further ensure label recognition accuracy.

本申请的这些方面或其他方面在以下实施例的描述中会更加简明易懂。These and other aspects of the present application will become more clearly understood in the description of the following embodiments.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief introduction will be given below to the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1示出了本申请提供的一种标签处理方法一个实施例的流程图；FIG1 shows a flow chart of an embodiment of a label processing method provided by the present application;

图2示出了本申请提供的一种对象处理方法一个实施例的流程图；FIG2 shows a flow chart of an embodiment of an object processing method provided by the present application;

图3示出了本申请实施例技术方案在一个实际应用中的场景交互示意图；FIG3 is a schematic diagram showing a scene interaction in a practical application of the technical solution of the embodiment of the present application;

图4示出了本申请提供的一种标签处理装置一个实施例的结构示意图；FIG4 shows a schematic structural diagram of an embodiment of a label processing device provided by the present application;

图5示出了本申请提供的一种对象处理装置一个实施例的结构示意图；FIG5 is a schematic structural diagram of an embodiment of an object processing device provided by the present application;

图6示出了本申请提供的一种计算设备的结构示意图。FIG6 shows a schematic diagram of the structure of a computing device provided by the present application.

具体实施方式DETAILED DESCRIPTION

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application.

在本申请的说明书和权利要求书及上述附图中的描述的一些流程中，包含了按照特定顺序出现的多个操作，但是应该清楚了解，这些操作可以不按照其在本文中出现的顺序来执行或并行执行，操作的序号如101、102等，仅仅是用于区分开各个不同的操作，序号本身不代表任何的执行顺序。另外，这些流程可以包括更多或更少的操作，并且这些操作可以按顺序执行或并行执行。需要说明的是，本文中的“第一”、“第二”等描述，是用于区分不同的消息、设备、模块等，不代表先后顺序，也不限定“第一”和“第二”是不同的类型。In some of the processes described in the specification and claims of this application and the above-mentioned figures, multiple operations that appear in a specific order are included, but it should be clearly understood that these operations may not be executed in the order in which they appear in this article or executed in parallel. The serial numbers of the operations, such as 101, 102, etc., are only used to distinguish between different operations, and the serial numbers themselves do not represent any execution order. In addition, these processes may include more or fewer operations, and these operations may be executed in sequence or in parallel. It should be noted that the descriptions of "first", "second", etc. in this article are used to distinguish different messages, devices, modules, etc., do not represent the order of precedence, and do not limit the "first" and "second" to be different types.

本申请实施例的技术方案可以适用于对象识别场景中，例如电商平台的商品分类、医疗影像分析的疾病诊断、智能家居中的物体识别以及定制化内容推荐系统等多个应用场景。在一个实际应用中，可以适用于图像识别场景中，例如电商场景中，需要精准识别用户上传的服装图片，以匹配相应的商品类别。The technical solution of the embodiment of the present application can be applied to object recognition scenarios, such as commodity classification on e-commerce platforms, disease diagnosis of medical image analysis, object recognition in smart homes, and customized content recommendation systems. In a practical application, it can be applied to image recognition scenarios, such as e-commerce scenarios, where it is necessary to accurately identify clothing pictures uploaded by users to match the corresponding commodity categories.

正如背景技术所述，传统方式通常借助一些服务提供方所生成的人工智能（Artificial Intelligence，AI）识别模型进行对象识别，AI模型通常是基于预先标签训练获得，比如对于服装图片识别场景中，由于服装款式繁多且更新迅速，预设标签难以囊括所有细节和新兴潮流元素，因此识别结果可能并不符合用户需求。As described in the background technology, the traditional method usually relies on artificial intelligence (AI) recognition models generated by some service providers to perform object recognition. The AI model is usually obtained based on pre-label training. For example, in the clothing image recognition scenario, due to the large number of clothing styles and rapid updates, the preset labels are difficult to cover all details and emerging trend elements, so the recognition results may not meet user needs.

为了克服传统预设标签的局限性，生成更为准确的标签，发明人想到，用户可以通过自定义标签方式，重新训练模型，但是该方案需要用户自行搜集样本图片并自定义标签，之后利用服务提供方所提供模型进行定制化训练，随后部署至生产环境以实现个性化的标签识别服务。这种方式实现流程繁琐复杂，不仅涉及大量的数据准备工作、专业的算法知识，还需跨过模型训练与部署的技术门槛，且如果涉及多标签识别，复杂度将会更高。因此，发明人又经过一系列深入研究，提出了本申请实施例的技术方案。通过获取多个表情，并将多个标签中的标签按照层级关系进行分组，获得多组标签组合获取多个标签，即系统自动根据标签的层级关系进行智能分组，从而能够更好地理解和利用标签之间的上下文和语义关联，以确保后续标签识别过程更加细致和精准。基于标签文本特征和待识别图像的视觉特征匹配待识别对象的目标标签，有助于提升识别的准确性。按照标签层级从高到低的顺序进行匹配以获得该待识别对象的至少一个目标标签，不仅能够减少误识别的概率，提高标签覆盖度，还能够保证标签识别准确性。In order to overcome the limitations of traditional preset labels and generate more accurate labels, the inventors thought that users can retrain the model by customizing labels, but this solution requires users to collect sample images and customize labels by themselves, and then use the model provided by the service provider for customized training, and then deploy it to the production environment to realize personalized label recognition services. This method has a cumbersome and complicated process, which not only involves a lot of data preparation work and professional algorithm knowledge, but also needs to cross the technical threshold of model training and deployment, and if multi-label recognition is involved, the complexity will be higher. Therefore, the inventors have proposed the technical solution of the embodiment of the present application after a series of in-depth studies. By obtaining multiple expressions and grouping the labels in multiple labels according to the hierarchical relationship, multiple groups of label combinations are obtained to obtain multiple labels, that is, the system automatically performs intelligent grouping according to the hierarchical relationship of the labels, so as to better understand and utilize the context and semantic associations between the labels to ensure that the subsequent label recognition process is more detailed and accurate. Matching the target label of the object to be identified based on the text features of the label and the visual features of the image to be identified helps to improve the accuracy of recognition. Matching is performed in descending order of the label levels to obtain at least one target label of the object to be identified, which can not only reduce the probability of misidentification and improve label coverage, but also ensure label recognition accuracy.

本申请实施例的技术方案不仅简化了标签识别的处理流程，降低了技术门槛，使得没有AI专业知识的用户也能轻松应用，而且还显著提高了对象识别的灵活性和精确度，增强了用户体验，拓宽了个性化服务的可能性，为各行业领域的标签识别流程带来了革命性的改变。The technical solution of the embodiment of the present application not only simplifies the processing flow of tag recognition and lowers the technical threshold, so that users without AI expertise can easily apply it, but also significantly improves the flexibility and accuracy of object recognition, enhances user experience, and broadens the possibility of personalized services, bringing revolutionary changes to the tag recognition process in various industries.

需要说明的是，本申请实施例中可能会涉及到对用户数据的使用，在实际应用中，可以在符合所在国的适用法律法规要求的情况下（例如，用户明确同意，对用户切实通知，等），在适用法律法规允许的范围内在本文描述的方案中使用用户特定的个人数据。It should be noted that the embodiments of the present application may involve the use of user data. In actual applications, user-specific personal data can be used in the scheme described in this article within the scope permitted by applicable laws and regulations, subject to the requirements of applicable laws and regulations of the country where the user is located (for example, with the user's explicit consent, effective notification to the user, etc.).

需要说明的是，本申请所涉及的用户信息（包括但不限于用户设备信息、用户个人信息等）和数据（包括但不限于用于分析的数据、存储的数据、展示的数据等），均为经用户授权或者经过各方充分授权的信息和数据，并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准，并提供有相应的操作入口，供用户选择授权或者拒绝。It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of relevant countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.

需要说明的是，本申请实施例的技术方案适用于网络虚拟环境中，所描述的用户一般是指“虚拟用户”，真实用户可以通过注册方式在服务端中注册用户账号，以获得在网络环境中的用户身份。It should be noted that the technical solution of the embodiment of the present application is applicable to a network virtual environment. The users described generally refer to "virtual users". Real users can register a user account in the server through registration to obtain a user identity in the network environment.

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative work are within the scope of protection of this application.

图1为本申请实施例提供的一种标签处理方法一个实施例的流程图，本实施例的技术方案可以由服务端执行，一个实际应用中，本申请实施例的技术方案可以应用于由用户端以及服务端构成的系统架构中。用户端可以通过网络可以与服务端交互以接收或发送消息等。FIG1 is a flow chart of an embodiment of a tag processing method provided by an embodiment of the present application. The technical solution of this embodiment can be executed by a server. In an actual application, the technical solution of the embodiment of the present application can be applied to a system architecture composed of a user end and a server. The user end can interact with the server through a network to receive or send messages, etc.

用户端可以为浏览器、APP（Application，应用程序）、或网页应用如H5（HyperTextMarkup Language5，超文本标记语言第5版）应用、或轻应用（也被称为小程序，一种轻量级应用程序）或云应用等，用户端可以部署在电子设备中，需要依赖设备运行或者设备中的某些app而运行等。电子设备例如可以具有显示屏并支持信息浏览等，如可以是个人移动终端如手机、平板电脑、个人计算机、台式计算机、智能音箱、智能手表等等。The user end can be a browser, an APP (Application), or a web application such as an H5 (HyperTextMarkup Language5, Hypertext Markup Language Version 5) application, or a light application (also known as a mini-program, a lightweight application) or a cloud application, etc. The user end can be deployed in an electronic device and needs to rely on the device to run or some apps in the device to run, etc. The electronic device can, for example, have a display screen and support information browsing, such as a personal mobile terminal such as a mobile phone, a tablet computer, a personal computer, a desktop computer, a smart speaker, a smart watch, etc.

服务端可以包括提供各种服务的服务器，例如进行标签处理的服务器，又如进行对象识别的服务器等。需要说明的是，服务端可以实现成多个服务器组成的分布式服务器集群，也可以实现成单个服务器。服务器也可以为分布式系统的服务器，或者是结合了区块链的服务器。服务器也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network，CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器，或者是带人工智能技术的智能云计算服务器或智能云主机。The server side may include servers that provide various services, such as servers for tag processing, and servers for object recognition. It should be noted that the server side can be implemented as a distributed server cluster consisting of multiple servers, or as a single server. The server can also be a server of a distributed system, or a server combined with a blockchain. The server can also be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology.

图1所示实施例的方法可以包括以下几个步骤：The method of the embodiment shown in FIG1 may include the following steps:

101：获取多个标签。101: Get multiple tags.

其中，所获取的多个标签可以是由用户直接提供的多个标签，或者是由用户根据需求从系统提供标签进行选择或者配置的多个标签。其中，所获取的多个标签的具体形式可以包括但不限于：标签列表、标签集合。The multiple tags obtained may be tags directly provided by the user, or tags selected or configured by the user from system-provided tags according to needs. The specific forms of the multiple tags obtained may include but are not limited to: a tag list, a tag set.

“获取多个标签”是整个标签处理过程的起点，其核心在于收集用户为待识别对象配置的多个标签。“标签”可以是指关键词或短语，可以用来描述或者分类待识别对象，其中，本文所描述的待识别对象可以是图像、文本、音频或视频等，本申请对此不进行限制。用户提供的标签能够反映其对待识别对象的理解和分类意愿，增加了待识别对象的可检索性。"Getting multiple tags" is the starting point of the entire tag processing process, and its core is to collect multiple tags configured by the user for the object to be identified. "Tag" can refer to a keyword or phrase that can be used to describe or classify the object to be identified. The object to be identified described in this article can be an image, text, audio or video, etc., and this application does not limit this. The tags provided by the user can reflect their understanding and classification intention of the object to be identified, which increases the retrievability of the object to be identified.

在该步骤中，获取多个标签包括：获取用户提供的包括多个标签的标签列表。该步骤强调了用户可作为标注处理过程中的主动角色，服务端允许用户根据个人理解或特定需求自定义标签，从而提高了标签的个性化和针对性，也就是说，本申请通过获取用户提供的标签，能够解决现存技术中通过通用标签库匹配待识别对象对应的标签，造成标签识别灵活性差的问题，能够实现对自定义标签识别，满足用户个性化需求，提高用户体验。In this step, obtaining multiple tags includes: obtaining a tag list provided by the user including multiple tags. This step emphasizes that the user can play an active role in the annotation process. The server allows the user to customize the tags according to personal understanding or specific needs, thereby improving the personalization and pertinence of the tags. In other words, this application can solve the problem of poor flexibility in tag identification caused by matching the tags corresponding to the objects to be identified through the general tag library in the existing technology by obtaining the tags provided by the user. It can realize the identification of customized tags, meet the personalized needs of users, and improve the user experience.

其中，用户提供的多个标签不受预设标签库的限制，能够涵盖更广泛的主题和细节，尤其适合于具有特定行业背景或个性化需求的场景。用户提供的多个标签可以非常多样，包括但不限于关键词、短语、数字、甚至是自创标签。此外，用户可以通过用户端而提供多个标签，例如，在用户端的显示界面中显示有标签的输入栏，以供用户输入标签。以获取多个标签具体可以是用户提交的标签列表为例，其标签列表的形式可包括字符串。Among them, the multiple tags provided by the user are not limited by the preset tag library, and can cover a wider range of topics and details, which is particularly suitable for scenarios with specific industry backgrounds or personalized needs. The multiple tags provided by the user can be very diverse, including but not limited to keywords, phrases, numbers, and even self-created tags. In addition, the user can provide multiple tags through the user terminal, for example, an input bar with tags is displayed in the display interface of the user terminal for the user to enter tags. Taking the acquisition of multiple tags as an example, which can be a tag list submitted by the user, the tag list can be in the form of a string.

本申请实施例中，本申请获取用户提供的多个标签的场景可以是在社交媒体上发布图片的场景，例如，用户在社交媒体上上传照片时，想要根据需求自定义添加如“旅游”、“美食”、“宠物”等标签，以使得这些图片更容易被感兴趣的朋友发现，同时也方便自己日后回顾和查找。还可以是电商平台中商家需要对新品时装打上潮流标签的场景，例如，商家在上架新服装时，可以根据时尚趋势和产品特色手动添加如“多巴胺”、“复古风”、“可持续材质”等标签，以帮助消费者快速定位到心仪商品，增加商品曝光率。此外，还可以包括其他场景，本申请获取用户提供的多个标签目的是在于为了使得后续获取的待识别对象能够根据用户需求标注出该标签列表中的至少一个目标标签，而不是使用系统默认标签进行打标。In an embodiment of the present application, the scenario in which the present application obtains multiple tags provided by the user can be a scenario in which a picture is posted on social media. For example, when a user uploads a photo on social media, he wants to add tags such as "travel", "food", "pets" and the like according to his needs, so that these pictures are easier to be found by friends who are interested, and it is also convenient for him to review and search in the future. It can also be a scenario in which merchants on an e-commerce platform need to add trendy tags to new fashions. For example, when merchants put new clothing on the shelves, they can manually add tags such as "dopamine", "retro style", "sustainable materials" and the like according to fashion trends and product features to help consumers quickly locate their favorite products and increase product exposure. In addition, other scenarios can also be included. The purpose of the present application obtaining multiple tags provided by the user is to enable the subsequently obtained objects to be identified to be marked with at least one target tag in the tag list according to user needs, rather than using the system default tags for marking.

可选地，在获取多个标签之后，还可以包括：从多个标签中筛除符合筛除条件的标签，以更新多个标签。Optionally, after acquiring the multiple tags, the method may further include: filtering out tags that meet a filtering condition from the multiple tags to update the multiple tags.

为了进一步提高标签的准确性和实用性，在获取用户提供的多个标签之后，可以对多个标签进行校验和清洗，比如去除一些无意义标签如乱码标签等。该筛除条件可以是无意义标签的定义规则，如正则表达式等，从而可以筛选符合定义规则的标签。In order to further improve the accuracy and practicality of the tags, after obtaining multiple tags provided by the user, the multiple tags can be verified and cleaned, such as removing some meaningless tags such as garbled tags, etc. The screening condition can be a definition rule for meaningless tags, such as a regular expression, so that tags that meet the definition rules can be screened.

此外，也可以利用分类模型从多个标签中识别无意义标签，该筛除条件也即为分类模型的分类结果为无意义标签。该分类模型可以预先训练获得，例如可以根据样本标签以及样本标签是否为无意义标签的训练标签（例如无意义标签为0，其它情况为1）进行训练获得。In addition, a classification model can also be used to identify meaningless labels from multiple labels, and the screening condition is that the classification result of the classification model is a meaningless label. The classification model can be pre-trained, for example, it can be trained based on sample labels and training labels of whether the sample labels are meaningless labels (for example, meaningless labels are 0, and other cases are 1).

也可以通过NER（Name Entity Recognition，命名实体）算法识别多个标签中的命名实体，从而可以将非命名实体的标签去除。The named entities in multiple tags can also be identified through the NER (Name Entity Recognition) algorithm, so that the tags of non-named entities can be removed.

通过进行标签校验和清洗，可以通过智能化手段去除那些可能不准确、重复或与待识别对象关联性较弱的标签，确保最终用于匹配的多个标签更加精炼且相关度高。这有助于提高后续标签匹配的效率和准确性。By performing tag verification and cleaning, we can use intelligent means to remove tags that may be inaccurate, repeated, or have a weak correlation with the object to be identified, ensuring that the multiple tags used for matching are more refined and highly relevant. This helps improve the efficiency and accuracy of subsequent tag matching.

102：将多个标签按照层级关系进行分组，获得多组标签组合。102: Group multiple tags according to a hierarchical relationship to obtain multiple tag combinations.

步骤102涉及将用户提供的多个标签中的标签，根据标签之间的上下级关系或关联性进行分类和整合。标签分组的原则通常是基于标签的普遍性、特异性和语义上的包含关系，形成一个或多个层级的语义结构，每个层级代表一个更具体或更泛化的概念范畴。通过这种分组，可以构建出一个层次分明、便于理解和检索的标签系统。Step 102 involves classifying and integrating the tags among the multiple tags provided by the user according to the hierarchical relationship or association between the tags. The principle of tag grouping is usually based on the universality, specificity and semantic inclusion relationship of the tags, forming a semantic structure of one or more levels, each level representing a more specific or more generalized concept category. Through this grouping, a clearly hierarchical tag system that is easy to understand and retrieve can be constructed.

可理解为，层级关系是根据多个标签中的标签之间的上下位关系来确定的语义结构。It can be understood that the hierarchical relationship is a semantic structure determined according to the hierarchical relationship between tags in multiple tags.

其中，层级关系的构建基于标签的内在逻辑，其中，标签的内在逻辑是指根据结构和/或语义而确定的上下位关系。本申请的层级关系的构建其本质上是将标签的逻辑信息组织成一个层次分明的结构。这一过程主要包括：首先，确定最宽泛、最抽象的顶级标签，作为层级结构的根节点；其次，从顶级标签出发，逐步识别和定义更具体、更详细的下位概念，形成层级递进的关系。假设以“动物”作为顶级标签为例，其可以包含下级标签“哺乳动物”，“哺乳动物”之下又可细分出“猫科”、“犬科”等。Among them, the construction of the hierarchical relationship is based on the intrinsic logic of the label, wherein the intrinsic logic of the label refers to the hierarchical relationship determined according to the structure and/or semantics. The construction of the hierarchical relationship of the present application is essentially to organize the logical information of the label into a hierarchical structure. This process mainly includes: first, determine the broadest and most abstract top-level label as the root node of the hierarchical structure; second, starting from the top-level label, gradually identify and define more specific and detailed subordinate concepts to form a hierarchical progressive relationship. Assuming that "animal" is taken as the top-level label as an example, it can contain the subordinate label "mammal", and "mammal" can be further subdivided into "feline", "canine" and so on.

需要说明的是，标签分组不仅限于单一维度，还可以依据不同维度进行多层次划分，如按功能、按主题、按时间等，形成多维标签树状结构。It should be noted that label grouping is not limited to a single dimension, but can also be divided into multiple levels based on different dimensions, such as by function, by subject, by time, etc., to form a multi-dimensional label tree structure.

以下实施例均以获取的多个标签为用户提供的标签列表进行举例，但需要说明的是，本申请不限制于获取的多个标签均是用户所提供，也不限制于多个标签只能是标签列表的表现形式。The following embodiments are all exemplified by obtaining a plurality of tags to provide a tag list to a user. However, it should be noted that the present application is not limited to the plurality of tags obtained being provided by the user, nor is it limited to the plurality of tags being in the form of a tag list.

例如，以用户提供的标签列表包括：“动物”、“人物”、“猫”、“狗”、“女人”、“男人”为例，根据步骤102的要求，需要将用户输入的标签列表中的标签按照层级关系进行分组。对于给定的标签列表【“动物”、“人物”、“女人”、“男人”、“猫”、“狗”】，可以构建以下分组结构来反映它们之间的层级关系：For example, taking the tag list provided by the user including: "animal", "person", "cat", "dog", "woman", "man" as an example, according to the requirements of step 102, the tags in the tag list input by the user need to be grouped according to the hierarchical relationship. For a given tag list ["animal", "person", "woman", "man", "cat", "dog"], the following grouping structure can be constructed to reflect the hierarchical relationship between them:

第一层级包括：动物、人物，其中，动物和人物属于不同维度；The first level includes: animals and people, among which animals and people belong to different dimensions;

第二层级包括：“猫”、“狗”、“男人”、“女人”（其中“猫”、“狗”属于“动物”的下级标签，“男人”、“女人”属于“人物”的下级标签）。The second level includes: "cat", "dog", "man", "woman" (among which "cat" and "dog" are sub-tags of "animal", and "man" and "woman" are sub-tags of "person").

在上述分组中，“动物”和“人物”是最宽泛的分类，而“猫”、“狗”、“女人”、“男人”则是更为具体的子分类。其中，“猫”和“狗”作为“动物”的子类，而“女人”和“男人”作为“人物”的子类，体现了标签之间的包含关系。In the above grouping, "animals" and "people" are the broadest categories, while "cats", "dogs", "women" and "men" are more specific subcategories. Among them, "cats" and "dogs" are subcategories of "animals", while "women" and "men" are subcategories of "people", reflecting the inclusion relationship between labels.

通过这样的分组，可以有效地组织和管理标签，使得在后续在确定待识别对象对应的目标标签过程中，能够更精确地定位到相匹配的目标标签，从而提高标签处理的效率和标签识别的准确性。Through such grouping, tags can be effectively organized and managed, so that in the subsequent process of determining the target tag corresponding to the object to be identified, the matching target tag can be more accurately located, thereby improving the efficiency of tag processing and the accuracy of tag identification.

进一步地，在步骤102中，将多个标签中的标签按照层级关系进行分组之后，基于分组结果即可以获得多组标签组合。Furthermore, in step 102, after the tags in the plurality of tags are grouped according to the hierarchical relationship, a plurality of tag combinations can be obtained based on the grouping results.

103：提取多组标签组合中的标签分别对应的标签特征。103: Extract label features corresponding to labels in multiple groups of label combinations.

其中，上述多组标签组合用以按照层级从高到低的顺序，基于该多组标签组合中的标签对应的标签特征以及所述待识别对象的对象特征，与待识别对象进行匹配以获得所述待识别对象的至少一个目标标签。Among them, the above-mentioned multiple groups of label combinations are used to match the object to be identified in order from high to low levels based on label features corresponding to the labels in the multiple groups of label combinations and object features of the object to be identified to obtain at least one target label of the object to be identified.

在该步骤中，提取多组标签组合中的标签分别对应的标签特征的方式包括：利用多模态模型提取多组标签组合中的标签分别对应的标签特征，其中，多模态模型是一种能够处理和整合多种类型数据（如图像、文本、音频、视频、传感器数据等）的机器学习模型。其中，多模态模型为预训练模型，可以是预训练的大模型，预训练模型解决无需进行数据标定和模型训练的问题，降低识别复杂度。在此情况下，预训练的多模态模型能够更高效地提取有用的特征，无需从零开始训练，大幅缩短了训练时间并提高了识别精度。在本步骤中，利用多模态模型可以提取多组标签组合中的标签分别对应的标签特征，从而保证了后续标签识别的准确度。多模态模型具备零样本学习（zero-shot）能力，使得多模态模型能够对从未见过的新类别进行分类或识别，而无需进行重训练。本申请实施例中，多模态模型可以是实现文本和对象匹配的多模态算法模型，能够理解和处理文本与对象两种不同形式的数据。例如，图像识别场景中，多模态模型可以是CLIP (Contrastive Language-Image Pre-training，一种基于文本-图像对比学习的跨模态预训练模型)，它在大规模未标记的图像-文本对上进行预训练，能够学会图像和文本的联合嵌入表示。利用多模态模型所提取的标签特征中可以包含与对象的对齐关系，使得标签特征更为准确。In this step, the method of extracting the label features corresponding to the labels in the multiple groups of label combinations includes: using a multimodal model to extract the label features corresponding to the labels in the multiple groups of label combinations, wherein the multimodal model is a machine learning model that can process and integrate multiple types of data (such as images, text, audio, video, sensor data, etc.). Among them, the multimodal model is a pre-trained model, which can be a pre-trained large model. The pre-trained model solves the problem of not needing data calibration and model training, and reduces the recognition complexity. In this case, the pre-trained multimodal model can extract useful features more efficiently, without the need to train from scratch, greatly shortening the training time and improving the recognition accuracy. In this step, the multimodal model can be used to extract the label features corresponding to the labels in the multiple groups of label combinations, thereby ensuring the accuracy of subsequent label recognition. The multimodal model has zero-shot learning capabilities, which enables the multimodal model to classify or identify new categories that have never been seen without retraining. In an embodiment of the present application, the multimodal model can be a multimodal algorithm model that realizes text and object matching, which can understand and process data in two different forms of text and objects. For example, in the image recognition scenario, the multimodal model can be CLIP (Contrastive Language-Image Pre-training, a cross-modal pre-training model based on text-image contrast learning), which is pre-trained on large-scale unlabeled image-text pairs and can learn the joint embedding representation of images and texts. The label features extracted by the multimodal model can include the alignment relationship with the object, making the label features more accurate.

可选地，待识别对象的对象特征可以是利用该多模态模型提取获得。Optionally, the object features of the object to be identified can be extracted using the multimodal model.

多模态模型能够将提取的标签特征与待识别对象的对象特征相结合，并在不同层级上进行匹配。从最高层（较为抽象的标签）到低层级（具体详细的标签），逐步缩小匹配范围，直至找到与待识别对象最为契合的至少一个目标标签。这一过程利用了层次化的标签组合，确保了识别的准确性和效率。The multimodal model can combine the extracted label features with the object features of the object to be identified and match them at different levels. From the highest level (more abstract labels) to the lower levels (specific and detailed labels), the matching range is gradually narrowed until at least one target label that best matches the object to be identified is found. This process utilizes a hierarchical combination of labels to ensure the accuracy and efficiency of recognition.

可选地，可以按照层级关系，保存多组标签组合中的标签分别对应的标签特征，从而在存在待识别对象情况下，可以从保存数据中，获取多组标签组合中的标签分别对应的标签特征，并按照层级从高到低的顺序，依次与每组标签组合中的标签进行匹配。Optionally, label features corresponding to the labels in multiple label combinations can be saved according to a hierarchical relationship, so that when there are objects to be identified, label features corresponding to the labels in multiple label combinations can be obtained from the saved data, and matched with the labels in each label combination in order from high to low levels.

在本实施例中，通过获取多个标签；将多个标签按照层级关系进行分组，获得多组标签组合；提取所述多组标签组合中的标签分别对应的标签特征；其中，所述多组标签组合用以按照层级从高到低的顺序，基于所述标签特征以及所述待识别对象的对象特征，与所述待识别对象进行匹配以获得所述待识别对象的至少一个目标标签。本申请实施例通过获取多个标签以便满足个性化需求。通过将多个标签按层级关系分组，能够更好地理解和利用标签之间的上下文和语义关联，以确保后续标签识别过程更加细致和精准。基于标签文本特征和待识别图像的视觉特征匹配待识别对象的目标标签，有助于提升识别的准确性。按照标签层级从高到低的顺序进行匹配以获得该待识别对象的至少一个目标标签，不仅能够减少误识别的概率，提高标签覆盖度，还能够保证标签识别准确性。In this embodiment, multiple labels are obtained; multiple labels are grouped according to a hierarchical relationship to obtain multiple groups of label combinations; label features corresponding to the labels in the multiple groups of label combinations are extracted; wherein the multiple groups of label combinations are used to match the object to be identified in a descending order of hierarchy based on the label features and the object features of the object to be identified to obtain at least one target label of the object to be identified. The embodiment of the present application obtains multiple labels to meet personalized needs. By grouping multiple labels according to a hierarchical relationship, the context and semantic associations between labels can be better understood and utilized to ensure that the subsequent label recognition process is more detailed and accurate. Matching the target label of the object to be identified based on the label text features and the visual features of the image to be identified helps to improve the accuracy of recognition. Matching in a descending order of the label hierarchy to obtain at least one target label of the object to be identified can not only reduce the probability of misidentification and improve label coverage, but also ensure label recognition accuracy.

一些实施例中，将该多个标签中的标签按照层级关系进行分组，获得多组标签组合可以包括：识别该多个标签中的标签的所属层级；将同一层级的标签进行组合，获得至少一组标签组合。In some embodiments, grouping the tags among the multiple tags according to a hierarchical relationship to obtain multiple groups of tag combinations may include: identifying the hierarchies to which the tags among the multiple tags belong; and combining tags at the same level to obtain at least one group of tag combinations.

首先，明确每个标签在语义结构中的所属层级。标签的层级关系定义了它们之间的包含和被包含关系。随后，可以将属于相同层级的标签组合在一起，形成至少一组标签组合。这样的操作有助于区分不同抽象层次的标签群集，便于后续的标签处理流程。First, clarify the level to which each tag belongs in the semantic structure. The hierarchical relationship of tags defines the inclusion and inclusion relationship between them. Subsequently, tags belonging to the same level can be grouped together to form at least one group of tag combinations. Such operations help distinguish tag clusters at different abstraction levels, which is convenient for subsequent tag processing.

作为一种可能实现的方式，可将同一层级的标签进行组合，获得一组标签组合。可理解为一层标签只有一组标签组合，可根据层级数量则获取相应数量的标签组合。As a possible implementation method, the labels at the same level can be combined to obtain a set of label combinations. It can be understood that there is only one set of label combinations for one layer of labels, and the corresponding number of label combinations can be obtained according to the number of levels.

例如，以上述例子中第二层包括：“猫”、“狗”、“男人”、“女人”为例，由于“猫”、“狗”、“男人”、“女人”均属于同一层级，因此可将“猫”、“狗”、“男人”、“女人”进行组合，得到一组“猫+狗+男人+女人”的标签组合。For example, taking the second layer in the above example including: "cat", "dog", "man", "woman" as an example, since "cat", "dog", "man", and "woman" all belong to the same level, "cat", "dog", "man", and "woman" can be combined to obtain a set of label combinations of "cat+dog+man+woman".

作为另一种可能实现的方式，还可以将同一层级同一维度的标签进行组合，获取一组或多组标签组合。As another possible implementation method, labels of the same level and the same dimension may be combined to obtain one or more groups of label combinations.

具体地，将同一层级的标签进行组合时，可将同一层级同一维度的标签进行组合，以得到多组标签组合，这样可以极大地扩展标签应用的灵活性和创造性。这意味着在进行同一层级的标签组合时，还需要考虑同一层级的单个维度内部的标签组合，以此来捕捉更复杂、更细腻的关联性和用户需求。以下是对这一概念的补充说明和具体示例：Specifically, when combining labels at the same level, labels at the same level and the same dimension can be combined to obtain multiple sets of label combinations, which can greatly expand the flexibility and creativity of label application. This means that when combining labels at the same level, it is also necessary to consider the label combinations within a single dimension at the same level, so as to capture more complex and delicate associations and user needs. The following is a supplementary explanation and specific examples of this concept:

同一层级同一维度的标签之间的自由排列组合是指在同一层级内选取属于相同维度的标签进行自由搭配。例如，以上述例子中第二层包括：“猫”、“狗”、“男人”、“女人”为例，其中，“猫”和“狗”都属于“动物”这一维度，可以将“猫”和“狗”进行组合，得到一组“猫+狗”的标签组合，而“男人”和“女人”都属于“人物”这一维度，可以将“男人”和“女人”进行组合，得到一组“男人+女人”的标签组合，此时可得到两组标签组合，分别为“猫+狗”的标签组合和“男人+女人”的标签组合。The free arrangement and combination of labels of the same level and dimension refers to the free combination of labels belonging to the same dimension within the same level. For example, in the above example, the second level includes: "cat", "dog", "man", and "woman". "Cat" and "dog" both belong to the dimension of "animal". "Cat" and "dog" can be combined to obtain a set of "cat + dog" label combination, while "man" and "woman" both belong to the dimension of "character". "Man" and "woman" can be combined to obtain a set of "man + woman" label combination. At this time, two sets of label combinations can be obtained, namely the label combination of "cat + dog" and the label combination of "man + woman".

针对上述“将同一层级同一维度的标签进行组合，获取多组标签组合”的方案，提供一个详细的实施例：For the above-mentioned solution of "combining labels of the same level and the same dimension to obtain multiple sets of label combinations", a detailed implementation example is provided:

例如，以标签列表包括A、B、C、D、E、F、G、H、I、1、2、3、4、5、6、7、8、9、0为例，通过识别出该标签列表中的标签的所属层级，并按照层级关系进行分组后可得到以下三个层级：For example, taking a tag list including A, B, C, D, E, F, G, H, I, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 as an example, by identifying the levels to which the tags in the tag list belong and grouping them according to the hierarchical relationship, the following three levels can be obtained:

第一层：A、B、C；其中，A、B、C之间属于不同维度；The first layer: A, B, C; A, B, C belong to different dimensions;

第二层：D、E、F、G、H、I（其中，D、E属于A的下层标签，F、G属于B的下层标签，H、I属于C的下层标签）；DE、FG、HI之间属于不同维度；The second layer: D, E, F, G, H, I (D and E belong to the lower-level labels of A, F and G belong to the lower-level labels of B, and H and I belong to the lower-level labels of C); DE, FG, and HI belong to different dimensions;

第三层：1、2、3、4、5、6、7、8、9、0（其中，1、2属于D的下层标签，3、4属于E的下层标签，5、6属于F的下层标签，7、8属于G的下层标签，9属于H的下层标签，0属于I的下层标签）；12、34、56、78、9、0之间属于不同维度；The third layer: 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 (1 and 2 belong to the lower-level labels of D, 3 and 4 belong to the lower-level labels of E, 5 and 6 belong to the lower-level labels of F, 7 and 8 belong to the lower-level labels of G, 9 belongs to the lower-level labels of H, and 0 belongs to the lower-level labels of I); 12, 34, 56, 78, 9, and 0 belong to different dimensions;

进一步地，一旦识别了每个标签的层级，接下来的工作就是把处于相同层级的标签合并到一起，形成不同的标签组合。这样做的好处是能够快速定位和操作同一级别的标签集合，便于后续的处理和应用。Furthermore, once the level of each tag is identified, the next step is to merge tags at the same level together to form different tag combinations. The advantage of this is that the tag set at the same level can be quickly located and operated, which is convenient for subsequent processing and application.

将同一层级同一维度的标签进行组合，可得到以下多组标签组合如下：By combining the labels of the same level and dimension, we can get the following sets of label combinations:

第一层的多组标签组合：The first layer of multiple sets of label combinations:

组合11：{A}；Combination 11: {A};

组合12：{B}；Combination 12: {B};

组合13：{C}；Combination 13: {C};

第二层的多组标签组合：The second layer of multiple label combinations:

组合21：{D，E}；Combination 21: {D, E};

组合22：{F，G}；Combination 22: {F, G};

组合23：{H，I}；Combination 23: {H, I};

第三层的多组标签组合：The third layer of multiple label combinations:

组合31：{1，2}；Combination 31: {1, 2};

组合32：{3，4}Combination 32: {3, 4}

组合33：{5，6}；Combination 33: {5, 6};

组合34：{7，8}；Combination 34: {7, 8};

组合35：{9}；Combination 35: {9};

组合36：{0}；Combination 36: {0};

在上述分组中，反映了将同一层级的标签进行组合的方式可以是同一层级同一维度的标签之间的组合。通过这样的分组，可以清晰地看到标签之间的层次结构和归属关系，为后续的标签管理和内容处理提供了便利的基础。In the above grouping, it is reflected that the way to combine tags at the same level can be to combine tags at the same level and the same dimension. Through such grouping, the hierarchical structure and attribution relationship between tags can be clearly seen, which provides a convenient basis for subsequent tag management and content processing.

其中，识别上述多个标签中的标签的所属层级可以有多种实现方式。There are many implementation methods for identifying the level to which the tags among the multiple tags belong.

在一种可能实现方式中，识别多个标签中的标签的所属层级可以包括：在词汇数据库中查询多个标签中的标签的上位词直至达到根节点，并根据标签至根节点的路径长度，确定出标签的所属层级；In one possible implementation, identifying the level to which a tag among the multiple tags belongs may include: searching a vocabulary database for hypernyms of the tags among the multiple tags until a root node is reached, and determining the level to which the tag belongs based on a path length from the tag to the root node;

其中，上述层级关系是根据多个标签中的标签之间的上下位关系来确定的语义结构，上述根节点可以是词汇数据库中标签所对应的最高层级的词汇，其通常为实体(entity)或事件(event)，为较为宽泛的类别。Among them, the above hierarchical relationship is a semantic structure determined according to the hierarchical relationship between tags in multiple tags. The above root node can be the highest-level vocabulary corresponding to the tag in the vocabulary database, which is usually an entity or event, which is a relatively broad category.

在该实施例中，通过查询在多个标签中的标签的上位词直至达到根节点的目的，是为了确定用户提供的标签在语义结构中的层级位置，首先要找到这些标签的上位概念，直到达到整个语义结构的最高层级——根节点。根节点是不同维度概念的起点（即根节点可存在多个），通过这一过程可以建立起标签之间的上下位关系。In this embodiment, the purpose of searching for hypernyms of tags among multiple tags until the root node is reached is to determine the hierarchical position of the tags provided by the user in the semantic structure. First, the hypernyms of these tags must be found until the highest level of the entire semantic structure, the root node, is reached. The root node is the starting point of concepts of different dimensions (i.e., there can be multiple root nodes). Through this process, the hypernym relationship between tags can be established.

其中，词汇数据库是一种将词汇按照语义关系组织起来形成了一个语义网络的词典，该语义关系即包括上下位关系。对于多个标签中的标签，在词汇数据库中，可以根据上下位关系，逐级查找该标签的上位词，直至找到根节点。从根节点往下追溯至该标签，即可以确定出标签所在层级。Among them, the vocabulary database is a dictionary that organizes words according to semantic relationships to form a semantic network, and the semantic relationship includes the hypernym relationship. For a tag among multiple tags, in the vocabulary database, the hypernym of the tag can be searched step by step according to the hypernym relationship until the root node is found. From the root node, tracing down to the tag, the level of the tag can be determined.

例如，用户输入的标签列表中包含“A、B”这两个标签，此时并不清楚“A”和“B”的所属层级，因此需要通过词汇数据库进行查询，假设在词汇数据库中查询到“A”已是根节点，而“B”根据其上位词找到“C”，再根据“C”进一步找到根节点“D”，此时可确定出，“A”的所属层级为第一层（最高层），也可以称为一级标签。而“B”的所属层级为第三层（即D＞C＞B），也可以称为三级标签。For example, the tag list input by the user contains two tags "A, B". At this time, it is not clear to which level "A" and "B" belong, so it is necessary to query through the vocabulary database. Assuming that "A" is found in the vocabulary database as the root node, and "B" finds "C" based on its hypernym, and then further finds the root node "D" based on "C", it can be determined that "A" belongs to the first level (the highest level), which can also be called a first-level tag. "B" belongs to the third level (that is, D>C>B), which can also be called a third-level tag.

其中，在词汇数据库中，每种词性的词汇被组织成同义词集，每个同义词集包含意义相近的词汇，同义词集之间通过语义关系相互链接形成语义网络，因此，一些实施例中，在词汇数据库中查询在标签列表中的标签的上位词直至达到根节点可以包括：确定多个标签中的标签在词汇数据库中的同义词集；查找同义词集在词汇数据库中的上位词直至达到根节点。In which, in the vocabulary database, words of each part of speech are organized into synonym sets, each synonym set contains words with similar meanings, and the synonym sets are linked to each other through semantic relationships to form a semantic network. Therefore, in some embodiments, querying the vocabulary database for hypernyms of tags in the tag list until the root node is reached may include: determining the synonym sets of tags among multiple tags in the vocabulary database; searching for hypernyms of the synonym sets in the vocabulary database until the root node is reached.

也即对于多个标签中的标签，可以从词汇数据库中查找命中的同义词集，同义词集可能包括该标签或者该标签的同义词，之后再逐级查找同义词集的上位词集直至达到根节点That is, for a tag among multiple tags, the synonym set that matches can be searched from the vocabulary database. The synonym set may include the tag or its synonyms, and then the hypernym set of the synonym set is searched level by level until the root node is reached.

可选地，词汇数据库例如可以是WordNet（一种基于认知语言学的词典），其中，WordNet包含了大量的词汇条目及其间的语义关系，包括同义词集、上位词、下位词等。对于每一个用户输入的标签，可以在WordNet中查询，找到其对应的上位词，上位词再继续查找其上位词，重复这一过程直至达到对应的根节点。Optionally, the vocabulary database may be, for example, WordNet (a dictionary based on cognitive linguistics), wherein WordNet contains a large number of vocabulary entries and semantic relationships therebetween, including synonym sets, hypernyms, hyponyms, etc. For each tag input by the user, it is possible to search in WordNet to find its corresponding hypernym, and the hypernym can then continue to search for its hypernym, and this process is repeated until the corresponding root node is reached.

在该实施例中，考虑到如果用户输入的标签在词汇数据库中不存在，这将阻碍层级关系的确定。为了解决这个问题，通过确定多个标签中的标签在词汇数据库中的同义词集；逐级查找同义词集在词汇数据库中的上位词直至达到根节点。其中，同义词集包含了与输入标签意义相近或相同的词汇，这些词汇在数据库中有较大概率存在，并且能够代表输入标签的含义。通过在词汇数据库中搜索与输入标签意义相近的词汇集合，即便是输入标签本身未直接记录，也能通过其同义词在词汇数据库中的记录来间接确定其上下位关系。In this embodiment, it is considered that if the label input by the user does not exist in the vocabulary database, this will hinder the determination of the hierarchical relationship. In order to solve this problem, by determining the synonym set of the label among multiple labels in the vocabulary database; searching for the hypernym of the synonym set in the vocabulary database step by step until the root node is reached. Among them, the synonym set contains words that are similar or identical in meaning to the input label. These words have a high probability of existing in the database and can represent the meaning of the input label. By searching the vocabulary database for a set of words that are similar in meaning to the input label, even if the input label itself is not directly recorded, its hypernym relationship can be indirectly determined through the record of its synonyms in the vocabulary database.

例如，用户输入的标签列表中包含有“跑车”这一标签，当服务端为该标签分配一个合适的分类层级时，在词汇数据库（如WordNet或其他专业数据库）中，并没有直接记录“跑车”这个特定的标签。For example, the tag list input by the user includes the tag "sports car". When the server assigns an appropriate classification level to the tag, the specific tag "sports car" is not directly recorded in the vocabulary database (such as WordNet or other professional databases).

此时，服务端首先识别“跑车”可能的同义词集或近义词集。这些同义词集可能包括“赛车”、“高性能车”、“超级跑车”等，这些词汇在数据库中都有对应的条目，并且能够代表“跑车”的基本含义。At this point, the server first identifies possible synonyms or near-synonyms for "sports car". These synonyms may include "racing car", "high-performance car", "super sports car", etc. These words have corresponding entries in the database and can represent the basic meaning of "sports car".

接下来，服务端在词汇数据库中查找这些同义词集的上位词。比如，“赛车”可能的上位词是“汽车”，“高性能车”和“超级跑车”也可能指向“汽车”。这个过程会一直进行，直到找到根节点，比如“交通工具”。Next, the server searches for hypernyms of these synonym sets in the vocabulary database. For example, a possible hypernym for "racing car" is "car", and "high-performance car" and "supercar" may also point to "car". This process continues until a root node is found, such as "transportation tool".

通过上述步骤，服务端可以构建出“跑车”的层级关系，尽管“跑车”本身未直接出现在数据库中，但通过其同义词集，系统能够间接确定“跑车”的层级，如最终确定的语义结构为：“交通工具”＞“汽车”＞“跑车”，进而可间接确定“跑车”的层级为第三层，也可以称为三级标签。Through the above steps, the server can construct a hierarchical relationship of "sports car". Although "sports car" itself does not appear directly in the database, through its synonym set, the system can indirectly determine the hierarchy of "sports car". For example, the final semantic structure is: "transportation" > "car" > "sports car", which can indirectly determine that the hierarchy of "sports car" is the third level, which can also be called a third-level label.

进一步地，找到标签或其同义词集的上位词直至根节点，就可以通过标签至根节点的路径长度来确定标签的所属层级。距离根节点越远的标签，其在语义结构中的层级就越低，即更具体；反之，则更抽象或泛化。Furthermore, by finding the hypernyms of the tag or its synonym set up to the root node, the level of the tag can be determined by the path length from the tag to the root node. The farther the tag is from the root node, the lower its level in the semantic structure, that is, the more specific it is; otherwise, it is more abstract or generalized.

从根节点开始，服务端沿着上位词关系向下追溯，记录每个标签到根节点的路径长度。根据这个长度，服务端可以将标签分配到相应的层级，形成一个层次分明的标签结构。具体地，服务端需确定一个或多个最宽泛的类别作为根节点，这些类别覆盖了所有可能的下位概念。例如，在一个产品分类系统中，“商品”可以作为一个根节点。进一步地，服务端利用已有的词汇数据库（如WordNet等）查询每个标签的上位词，直至达到根节点。例如，对于标签“笔记本电脑”，其上位词可能为“电脑”、“电子设备”直至“商品”，在查询过程中，服务端记录每个标签到根节点的路径长度，即上位词的数量，根据每个标签到根节点的上位词数量确定其对应的层级。例如，标签“笔记本电脑”，其上位词包括“电脑”、“电子设备”直至根节点“商品”，将“商品”作为第一层时，通过“笔记本电脑”的上位词数据可确定“笔记本电脑”位于第四层。Starting from the root node, the server traces down along the hypernym relationship and records the path length from each label to the root node. Based on this length, the server can assign labels to corresponding levels to form a hierarchical label structure. Specifically, the server needs to determine one or more broad categories as root nodes, which cover all possible subordinate concepts. For example, in a product classification system, "commodity" can be used as a root node. Further, the server uses existing vocabulary databases (such as WordNet, etc.) to query the hypernyms of each label until the root node is reached. For example, for the label "laptop", its hypernyms may be "computer", "electronic equipment" and even "commodity". During the query process, the server records the path length from each label to the root node, that is, the number of hypernyms, and determines the corresponding level according to the number of hypernyms from each label to the root node. For example, the label "laptop", its hypernyms include "computer", "electronic equipment" and even the root node "commodity". When "commodity" is used as the first layer, the hypernym data of "laptop" can determine that "laptop" is located in the fourth layer.

通过以上步骤，即使用户输入的标签未直接记录在词汇数据库中，系统也能通过查找同义词集和追溯至根节点的方式来确定这些标签的语义层级，构建出一个多级、有序的标签体系。这不仅提高了标签处理的灵活性和适应性，还增强了基于语义理解的标签应用，如内容分类、信息检索和个性化推荐等场景的准确性和效率。Through the above steps, even if the tags input by the user are not directly recorded in the vocabulary database, the system can determine the semantic level of these tags by searching the synonym set and tracing back to the root node, and build a multi-level, orderly tag system. This not only improves the flexibility and adaptability of tag processing, but also enhances the accuracy and efficiency of tag applications based on semantic understanding, such as content classification, information retrieval, and personalized recommendation.

可选地，每层对应的路径长度可根据需求设定。例如将每层的路径长度设为1的情况下，对于“短袖T恤”，在词汇数据库中查询，发现其上位词是“T恤”，进一步向上追溯到“上衣”，最终达到根节点“服装”。路径为：“服装”＞“上衣”＞“T恤”＞“短袖T恤”，则确定出“短袖T恤”的路径长度为4（从根节点“服装”算起，经过“上衣”、“T恤”到“短袖T恤”），则可以确定短袖T恤的层级为第四层，以此类推，“T恤”的路径长度为3，则可以确定“T恤”的层级为第三层，其中，路径长度反映了标签的抽象程度，可理解为路径长度的数值越大，标签越具体。Optionally, the path length corresponding to each layer can be set according to demand. For example, when the path length of each layer is set to 1, for "short-sleeved T-shirt", the hypernym is found to be "T-shirt" in the vocabulary database, and further traced back to "top", and finally reached the root node "clothing". The path is: "clothing" > "top" > "T-shirt" > "short-sleeved T-shirt", then the path length of "short-sleeved T-shirt" is determined to be 4 (starting from the root node "clothing", through "top", "T-shirt" to "short-sleeved T-shirt"), then the level of short-sleeved T-shirt can be determined to be the fourth level, and so on, the path length of "T-shirt" is 3, then the level of "T-shirt" can be determined to be the third level, where the path length reflects the abstractness of the label, which can be understood as the larger the value of the path length, the more specific the label.

在又一种可能实现方式中，识别多个标签中的标签的所属层级可以包括：利用大模型识别多个标签中不同标签的所属层级。In yet another possible implementation, identifying the level to which a tag among the multiple tags belongs may include: using a large model to identify the level to which different tags among the multiple tags belong.

也即可以通过大模型将多个标签中的标签按照上下位关系进行分层，以确定每个标签所属层级以及属于同一层级的标签等。That is, the labels in multiple labels can be layered according to the hierarchical relationship through the large model to determine the level to which each label belongs and the labels belonging to the same level.

可选地，大模型可以是大语言模型（Large Language Model，简称LLM），大语言模型是指具有极高参数数量和计算能力的自然语言处理模型，可以用于各种自然语言处理任务如文本生成、机器翻译、问答系统等，大语言模型采用深度学习技术的 AI算法驱动，并使用庞大的数据集来评估、规范和生成相关内容，以及进行准确的预测。大语言模型例如可以是GPT-3（Generative Pre-Trained Transformer -3，第三代生成式预训练模型）、GPT-4（Generative Pre-Trained Transforme-4，第四代生成式预训练模型）、BERT（Bidirectional Encoder Representation from Transformers，一种基于Transformers的双向编码器模型）、Turing NLG（Turing Natural language Generation，图灵自然语言生成）等等，本申请对此不进行限定。Optionally, the large model can be a large language model (LLM), which refers to a natural language processing model with a very high number of parameters and computing power. It can be used for various natural language processing tasks such as text generation, machine translation, question-answering systems, etc. The large language model is driven by AI algorithms based on deep learning technology and uses a large data set to evaluate, standardize and generate relevant content, as well as make accurate predictions. For example, the large language model can be GPT-3 (Generative Pre-Trained Transformer -3, the third generation of generative pre-trained model), GPT-4 (Generative Pre-Trained Transforme-4, the fourth generation of generative pre-trained model), BERT (Bidirectional Encoder Representation from Transformers, a bidirectional encoder model based on Transformers), Turing NLG (Turing Natural language Generation), etc., which are not limited in this application.

一些实施例中，该方法还可以包括：获取待识别对象；提取待识别对象的图像特征对象特征；基于图像特征对象特征及标签特征，将多组标签组合分别与待识别对象进行匹配，以获得待识别对象对应的至少一个目标标签。In some embodiments, the method may further include: obtaining an object to be identified; extracting image feature object features of the object to be identified; and matching multiple groups of label combinations with the object to be identified based on the image feature object features and label features to obtain at least one target label corresponding to the object to be identified.

也即本申请实施例也可以实时对获取的多个标签以及待识别对象，从多个标签获得待识别对象匹配成功的至少一个目标标签。That is, the embodiment of the present application can also obtain multiple tags and objects to be identified in real time, and obtain at least one target tag that successfully matches the object to be identified from the multiple tags.

一些实施例中，可利用多模态模型提取多组标签组合中标签对应的标签特征以及待识别对象的图像特征对象特征，其中，多模态模型通常为大模型，为了方便进行模型识别，利用多模态模型提取多组标签组合中的标签分别对应的标签特征的过程可以包括：In some embodiments, a multimodal model may be used to extract label features corresponding to labels in multiple label combinations and image features and object features of the object to be identified. The multimodal model is usually a large model. To facilitate model identification, the process of extracting label features corresponding to labels in multiple label combinations using the multimodal model may include:

针对任一组标签组合中的标签以及辅助文字信息，生成提示信息；该辅助文字信息用于辅助多模态模型理解标签，以便上述多模态模型提取标签对应的标签特征；Generate prompt information for any set of labels and auxiliary text information in the label combination; the auxiliary text information is used to assist the multimodal model in understanding the label so that the multimodal model can extract label features corresponding to the label;

利用多模态模型基于提示信息，提取标签对应的标签特征。The multimodal model is used to extract label features corresponding to labels based on prompt information.

在该实施例中，除了标签之外，还会为添加辅助文字信息。这些辅助文字信息旨在为多模态模型提供上下文，帮助模型更好地把握标签的语境和含义。比如，对于“山峰”这个标签，辅助文字信息可能是“高耸入云的自然地形特征，顶部尖锐”。In this embodiment, in addition to the label, auxiliary text information is also added. This auxiliary text information is intended to provide context for the multimodal model to help the model better grasp the context and meaning of the label. For example, for the label "mountain", the auxiliary text information may be "a natural terrain feature towering into the clouds with a sharp top."

进一步地，基于标签和辅助文字信息，生成一种“提示”或者“查询”形式的信息。这一步骤就像是在为多模态模型准备一个精确的问题：例如“请根据这张图片的内容，结合‘高耸入云的自然地形特征，顶部尖锐’的描述，提取出与‘山峰’相关的特征。”这样的提示信息既包含了视觉元素（通过标签），也融入了语言的深度描述（辅助文字信息），为多模态模型的处理提供了丰富的输入。Furthermore, based on the labels and auxiliary text information, a "prompt" or "query" form of information is generated. This step is like preparing a precise question for the multimodal model: for example, "Please extract the features related to 'mountain' based on the content of this picture and the description of 'natural terrain features towering into the clouds with sharp tops'." Such prompt information contains both visual elements (through labels) and in-depth descriptions of language (auxiliary text information), providing rich input for the processing of the multimodal model.

而多模态模型是一种能够同时处理和整合多种类型数据（如图像、文本等）的AI模型。在接收到含有图像信息和提示信息的输入后，它会尝试融合这些不同模态的数据，从而更深层次地理解和解析图像内容。模型会分析图像，并依据提示信息中的标签和描述，专门识别和抽取与之相符的特征。比如，对于“山峰”的提示，模型可能会寻找图像中的边缘轮廓、光影变化等特征，以确认并精确定位到山峰的位置和形态。这样的特征提取不仅限于直观的视觉特征，还可能包括抽象的语义理解，确保提取的特征与标签的深层含义相吻合。A multimodal model is an AI model that can process and integrate multiple types of data (such as images, text, etc.) at the same time. After receiving input containing image information and prompt information, it will try to fuse these different modal data to understand and parse the image content at a deeper level. The model will analyze the image and specifically identify and extract features that match the labels and descriptions in the prompt information. For example, for the prompt of "mountain", the model may look for features such as edge contours and light and shadow changes in the image to confirm and accurately locate the position and shape of the mountain. Such feature extraction is not limited to intuitive visual features, but may also include abstract semantic understanding to ensure that the extracted features are consistent with the deep meaning of the label.

综上所述，这种通过生成提示信息并利用多模态模型提取标签特征的方法，提高了图像分析的精度和深度，使得模型能更好地把握复杂场景中的细微差别和语境信息，从而在诸如图像分类、内容生成、检索等领域发挥重要作用。In summary, this method of generating prompt information and extracting label features using a multimodal model improves the accuracy and depth of image analysis, enabling the model to better grasp the subtle differences and contextual information in complex scenes, thereby playing an important role in fields such as image classification, content generation, and retrieval.

当然，由前文描述可知，利用多模态模型所提取的标签特征也可以预先分组保存，从而可以针对待识别的任意对象，基于所保存的标签特征即可以进行对象与标签的匹配，而无需再重新分组和计算。如图2所示的对象处理方法中，图2所示实施例的技术方案可以由服务端执行，该方法可以包括以下几个步骤：Of course, as can be seen from the foregoing description, the label features extracted by the multimodal model can also be pre-grouped and saved, so that for any object to be identified, the object and the label can be matched based on the saved label features without re-grouping and calculation. In the object processing method shown in FIG2, the technical solution of the embodiment shown in FIG2 can be executed by the server, and the method can include the following steps:

201：获取待识别对象；201: Get the object to be identified;

首先，获取到需要识别标签的对象。待识别对象可以是一张图片、一段视频、一段音频或者一段文本等多媒体内容。例如，在电商平台上，待识别对象可能是一件商品的图片。First, obtain the object whose label needs to be identified. The object to be identified can be a picture, a video, an audio, or a text or other multimedia content. For example, on an e-commerce platform, the object to be identified may be a picture of a product.

202：提取所述待识别对象的对象特征；202: Extracting object features of the object to be identified;

使用多模态模型来分析并提取待识别对象的关键特征。多模态模型能同时处理不同模式的信息（如视觉、听觉、文本等），针对不同模式的信息能够提取出对应的关键特征。例如针对视觉内容（如图片）进行提取为例，多模态模型提取出颜色、形状、纹理、物体类别等视觉特征。具体地，对于一张衣服图片，模型可能会提取出“蓝色”、“长袖”、“圆领”等特征。Use multimodal models to analyze and extract key features of the object to be identified. Multimodal models can process information in different modes (such as vision, hearing, text, etc.) at the same time, and can extract corresponding key features for information in different modes. For example, for visual content (such as pictures), multimodal models extract visual features such as color, shape, texture, and object category. Specifically, for a picture of clothing, the model may extract features such as "blue", "long sleeves", and "round neck".

203：确定提取的多组标签组合中标签对应的标签特征；所述多组标签组合为获取的多个标签中的标签按照层级关系进行分组获得；203: Determine tag features corresponding to tags in the extracted multiple tag combinations; the multiple tag combinations are obtained by grouping the tags in the acquired multiple tags according to a hierarchical relationship;

按照图1实施例的描述，获取的多个标签被组织成具有层级关系的多组标签组合。这些标签根据其上下位关系被分组，以生成多组标签组合。例如，“服装”之下有“上衣”，“上衣”之下有“T恤”和“衬衫”。进一步地，模型会为多组标签组合中标签生成相应的标签特征，便于与对象特征进行匹配。According to the description of the embodiment of FIG. 1 , the obtained multiple tags are organized into multiple groups of tag combinations with a hierarchical relationship. These tags are grouped according to their hierarchical relationship to generate multiple groups of tag combinations. For example, under "clothing" there is "top", and under "top" there are "T-shirt" and "shirt". Furthermore, the model will generate corresponding tag features for the tags in the multiple groups of tag combinations to facilitate matching with object features.

204：基于对象特征及标签特征，将多组标签组合分别与待识别对象进行匹配，以获得待识别对象对应的至少一个目标标签。在此步骤中，基于提取出的对象特征和标签特征，服务端对每个多组标签组合进行匹配，以找到与待识别对象最匹配标签组合以及该标签组合中的标签。匹配算法会考虑成功匹配标签的标签特征和对象特征之间的相似度，选择最能描述对象内容的目标标签。204: Based on the object features and label features, multiple label combinations are matched with the object to be identified respectively to obtain at least one target label corresponding to the object to be identified. In this step, based on the extracted object features and label features, the server matches each of the multiple label combinations to find the label combination that best matches the object to be identified and the labels in the label combination. The matching algorithm considers the similarity between the label features of the successfully matched labels and the object features, and selects the target label that best describes the object content.

一些实施例中，基于标签特征以及待识别对象的对象特征，与待识别对象进行匹配以获得待识别对象的至少一个目标标签的过程可以包括：In some embodiments, based on the tag feature and the object feature of the object to be identified, the process of matching with the object to be identified to obtain at least one target tag of the object to be identified may include:

基于所述标签特征以及所述对象特征，将待识别对象与最高层级标签组合中的标签进行匹配，以获得匹配成功标签；从最高层级标签组合的下一层级标签组合开始，确定当前层级标签组合中，与上一层级标签组合中的匹配成功标签具有上下位关系的至少一个标签；将所述至少一个标签与所述待识别对象进行匹配，以获得匹配成功标签；将所述多组标签组合对应的至少一个匹配成功标签，作为所述待识别对象对应的至少一个目标标签。Based on the label features and the object features, the object to be identified is matched with the labels in the highest-level label combination to obtain a successfully matched label; starting from the label combination of the next level of the highest-level label combination, at least one label in the current level label combination that has a hierarchical relationship with the successfully matched label in the previous level label combination is determined; the at least one label is matched with the object to be identified to obtain a successfully matched label; and at least one successfully matched label corresponding to the multiple groups of label combinations is used as at least one target label corresponding to the object to be identified.

该实施例描述了一种层次化标签匹配方法，用于从一系列预先定义的标签集合中，为待识别对象自动分配合适的标签。这种方法通过逐步细化，从最广泛的类别到更具体的细节，来提高标签的匹配精度。This embodiment describes a hierarchical label matching method for automatically assigning appropriate labels to objects to be identified from a series of predefined label sets. This method improves the matching accuracy of labels by gradually refining from the broadest categories to more specific details.

下面以待识别对象为待识别图像为例，对该过程进行详细解释：The following takes the object to be identified as an image to be identified as an example to explain the process in detail:

最高层级匹配：首先，服务端利用多模态模型从待识别对象中提取对象特征。其次，将这些对象特征与最高层级标签组合中的标签特征进行比较。最高层级的标签通常是最为广泛和抽象的分类，如“动物”、“植物”、“交通工具”等。匹配算法会识别出与对象特征最为吻合的最高层级标签，以作为匹配成功标签。Top-level matching: First, the server uses a multimodal model to extract object features from the object to be identified. Second, these object features are compared with the label features in the top-level label combination. The top-level labels are usually the most extensive and abstract categories, such as "animals", "plants", "transportation tools", etc. The matching algorithm will identify the top-level label that best matches the object features as the successful match label.

逐级细化匹配：一旦在最高层级的标签组合中找到了匹配成功的标签，服务端会向下一层级深入，寻找与上一层级匹配成功的标签具有上下位关系的标签。例如，如果在最高层级匹配到了“动物”，下一步可能考虑“哺乳动物”、“鸟类”等更具体的分类。这一步骤是基于上下位关系的逻辑，确保了标签的层次性和准确性。Gradually refined matching: Once a successfully matched tag is found in the highest-level tag combination, the server will go down to the next level to find tags that have a hierarchical relationship with the successfully matched tags in the previous level. For example, if "animal" is matched at the highest level, the next step may be to consider more specific categories such as "mammals" and "birds". This step is based on the logic of hierarchical relationships, ensuring the hierarchy and accuracy of the tags.

继续匹配过程：对于每个下一层级，服务端会重复匹配过程，寻找与当前层级中上一层已匹配成功标签具有上下位关系的标签。通过这种方式，系统逐步从宏观分类细化到更具体的细节，每一步都确保所选标签与对象特征的匹配度。Continue the matching process: For each next level, the server will repeat the matching process, looking for tags that have a hierarchical relationship with the successfully matched tags in the previous level in the current level. In this way, the system gradually refines from macro classification to more specific details, and each step ensures the matching degree of the selected tags with the object characteristics.

确定目标标签：最终，所有在这一逐级匹配过程中成功匹配的标签，无论它们来自哪一层级，都会被汇总起来作为待识别图像的“至少一个目标标签”。这些标签综合反映了图像的主要内容和细节，可以用于图像分类、搜索、推荐等多种用途。Determine the target label: Ultimately, all labels that are successfully matched in this step-by-step matching process, no matter which level they come from, will be summarized as "at least one target label" for the image to be identified. These labels comprehensively reflect the main content and details of the image and can be used for a variety of purposes such as image classification, search, and recommendation.

本申请实施例中，以待识别对象包括待识别图像为例，设想一个简单的图像识别场景，目的是为一张包含“山峰和日落”的风景图片分配合适的标签。In the embodiment of the present application, taking the case where the object to be identified includes an image to be identified as an example, a simple image recognition scenario is envisioned, the purpose of which is to assign a suitable label to a landscape picture containing "mountains and sunsets".

首先需要对输入的标签列表（如标签列表包括：A、B、C、D、E、F、G、H、I、J、K、L、M、1、2、3、4、5、6、7、8、9等）按照层级关系进行分组，得到以下分组：First, you need to group the input label list (such as the label list includes: A, B, C, D, E, F, G, H, I, J, K, L, M, 1, 2, 3, 4, 5, 6, 7, 8, 9, etc.) according to the hierarchical relationship to obtain the following groups:

第一层级：A、B、C、D，其中C代表“山脉”，D代表“天空”；The first level: A, B, C, D, where C stands for "mountains" and D stands for "sky";

第二层级：E、F、G、H、I、J、K、L、M，其中，F、G代表C山脉的下一层级标签，F代表“山峰”，H、I代表D天空的下一层级标签，I代表“日落”；The second level: E, F, G, H, I, J, K, L, M. F and G represent the next level labels of the mountain range C, F represents "peak", H and I represent the next level labels of the sky D, and I represents "sunset";

第三层级：1、2、3、4、5、6、7、8、9，其中8、9代表I日落的下一层标签，9代表“晚霞”。The third level: 1, 2, 3, 4, 5, 6, 7, 8, 9, among which 8 and 9 represent the next layer of labels of I sunset, and 9 represents "sunset glow".

匹配过程如下：The matching process is as follows:

第一层级匹配：分析待识别图像，发现待识别图像包含有图像特征“山脉”和“天空”，而该图像特征与第一层级中的标签C山脉和D天空匹配成功。因此，得到了匹配成功标签：C、D。需要说明的是，在匹配成功的情况下需要继续进行下一层级的匹配，直至找到最为准确的目标标签。First-level matching: Analyze the image to be identified and find that the image to be identified contains the image features "mountains" and "sky", and the image features successfully match the labels C mountains and D sky in the first level. Therefore, the matching successful labels are obtained: C, D. It should be noted that if the match is successful, it is necessary to continue the next level of matching until the most accurate target label is found.

第二层级匹配：根据C山脉的匹配，从下一层级中找到与C山脉具有上下位关系的标签，即F、G中寻找与C山脉相关的标签，发现图像中的山峰特征与F山峰匹配。同样，根据D天空的匹配，从下一层级中找到与D天空具有上下位关系的标签H、I中，识别到图像中的日落景象与I日落匹配。因此，第二层级的匹配成功标签为F、I。Second-level matching: Based on the matching of mountain C, we find labels with a hierarchical relationship with mountain C from the next level, that is, we find labels related to mountain C in F and G, and find that the peak features in the image match the peak F. Similarly, based on the matching of sky D, we find labels H and I with a hierarchical relationship with sky D from the next level, and recognize that the sunset in the image matches sunset I. Therefore, the successful matching labels of the second level are F and I.

第三层级匹配：根据F山峰的匹配，从下一层级中找到与F山峰具有上下位关系的标签，未找到成功匹配的标签；而根据I日落的匹配，从下一层级中找到与I日落具有上下位关系的标签为8、9，发现图像中的日落时分山峰上空的晚霞特征与9（晚霞）匹配。因此，第三层的匹配成功标签为9。Third-level matching: Based on the matching of peak F, we find labels with a superior-subordinate relationship with peak F from the next level, but no label is successfully matched; based on the matching of sunset I, we find labels 8 and 9 with a superior-subordinate relationship with sunset I from the next level, and find that the sunset glow feature above the peak at sunset in the image matches 9 (sunset glow). Therefore, the label with a successful match in the third level is 9.

通过上述过程，最终为待识别图像分配的至少一个目标标签序列是CD-＞FI-＞9，即“山脉”、“山峰和日落”以及“晚霞”，这反映了图像内容的多层次特征，从宏观的自然景观到具体的视觉元素，通过逐级细化标签，不仅提高了所识别标签的覆盖度，还提高了描述的准确性和丰富度。Through the above process, at least one target label sequence finally assigned to the image to be identified is CD->FI->9, namely "mountain range", "peak and sunset" and "sunset glow", which reflects the multi-level characteristics of the image content, from the macro natural landscape to the specific visual elements. By refining the labels step by step, not only the coverage of the identified labels is improved, but also the accuracy and richness of the description are improved.

在上述实施例中，第一层级（A、B、C、D）所生成的标签组合包括：{A、B、C、D}，通过多模态模型根据标签特征以及待识别对象的对象特征，能够确定相匹配的标签为C、D，使得在待识别对象包括多个相匹配的标签时，通过特征匹配的方式能够覆盖到更多的匹配标签，从而提高了所识别标签的覆盖度。此外，在进行下一层匹配标签时，本申请只需考虑与标签C、D具有上下文关系的标签即可，无需再遍历其他无关标签的标签特征与该图像的图像特征是否匹配，从而减少了标签识别过程的计算量，并提高了标签识别的效率。In the above embodiment, the label combination generated by the first level (A, B, C, D) includes: {A, B, C, D}. Through the multimodal model, according to the label features and the object features of the object to be identified, it can be determined that the matching labels are C and D, so that when the object to be identified includes multiple matching labels, more matching labels can be covered by feature matching, thereby improving the coverage of the identified labels. In addition, when performing the next layer of matching labels, the present application only needs to consider the labels that have a contextual relationship with labels C and D, and there is no need to traverse whether the label features of other irrelevant labels match the image features of the image, thereby reducing the amount of calculation in the label recognition process and improving the efficiency of label recognition.

一些实施例中，将所述多组标签组合中的标签对应的标签特征分别与所述待识别对象的对象特征进行匹配，以获得待识别对象对应的至少一个目标标签包括：计算所述多组标签组合中的标签对应的标签特征分别与所述待识别对象的对象特征之间特征相似度；确定特征相似的满足相似条件的至少一个目标标签。In some embodiments, matching label features corresponding to labels in the multiple groups of label combinations with object features of the object to be identified to obtain at least one target label corresponding to the object to be identified includes: calculating feature similarities between label features corresponding to labels in the multiple groups of label combinations and object features of the object to be identified; and determining at least one target label with similar features that satisfies a similarity condition.

从待识别对象中提取的对象特征以及从标签组合中的标签提取的标签特征，都需要转换成计算机可以理解的数值形式，通常是向量形式。特征提取的方式可以通过诸如词嵌入、深度学习特征提取等方法实现。进一步地，通过计算标签特征以及所述对象特征的特征相似度，来确定匹配成功标签，该相似条件例如可以是特征相似度最高或者特征相似度大于相似阈值，则可以是将特征相似度最高或者特征相似度大于相似阈值的标签确定为匹配成功标签。其中，特征相似度例如可以通过余弦相似度、欧氏距离等计算获得，本申请对此不进行具体限定。相似度值越高，表示标签特征与对象特征之间的匹配度越好。最后，根据预设相似阈值或排名规则，选择相似度高于相似阈值的标签作为匹配成功的标签。这可能包括直接选取特征相似度最高的标签，或是在满足一定相似度标准的前提下，选择所有或部分标签。The object features extracted from the object to be identified and the label features extracted from the labels in the label combination need to be converted into numerical forms that can be understood by the computer, usually in vector form. The feature extraction method can be implemented by methods such as word embedding, deep learning feature extraction, etc. Further, by calculating the feature similarity of the label features and the object features, the matching successful label is determined. The similarity condition can be, for example, the highest feature similarity or the feature similarity is greater than the similarity threshold, and the label with the highest feature similarity or the feature similarity greater than the similarity threshold can be determined as the matching successful label. Among them, the feature similarity can be obtained by calculating, for example, cosine similarity, Euclidean distance, etc., and this application does not specifically limit this. The higher the similarity value, the better the match between the label feature and the object feature. Finally, according to the preset similarity threshold or ranking rule, the label with a similarity higher than the similarity threshold is selected as the matching successful label. This may include directly selecting the label with the highest feature similarity, or selecting all or part of the labels on the premise of meeting certain similarity standards.

本申请实施例中，假设有一个图像识别任务，目标是为一张包含“山地自行车”的图像打标签，用户提供的标签列表包括：“户外运动”“攀岩”、“交通工具”、“山地自行车”，通过服务端预处理确定出该标签列表对应的标签组合为：第一层：户外运动、交通工具；第二层：山地自行车、攀岩，其中，山地自行车属于交通工具的下一层标签，攀岩属于户外运动的下一层标签。In an embodiment of the present application, assume that there is an image recognition task, and the goal is to label an image containing "mountain bike". The label list provided by the user includes: "outdoor sports", "rock climbing", "transportation", and "mountain bike". Through server-side preprocessing, it is determined that the label combination corresponding to the label list is: first layer: outdoor sports, transportation; second layer: mountain bike, rock climbing, among which mountain bike belongs to the next layer of labels under transportation, and rock climbing belongs to the next layer of labels under outdoor sports.

在获取到图像特征以及标签特征后，依次对第一层中的“户外运动”和“交通工具”计算特征相似度，假设对于“户外运动”标签特征，计算其与图片特征向量的相似度，得到0.2。对于“交通工具”标签特征，相似度为0.7，则确定匹配成功标签“交通工具”，因此需要进一步计算“交通工具”的下一层标签“山地自行车”与图像特征之间的相似度，例如，对于“山地自行车”标签，由于该标签更具体且与图片内容高度相关，计算得到的相似度为0.9，因此将“交通工具-山地自行车”作为该图像的目标标签。After obtaining the image features and label features, the feature similarity of "outdoor sports" and "transportation" in the first layer is calculated in turn. Assume that for the "outdoor sports" label feature, the similarity between it and the image feature vector is calculated to be 0.2. For the "transportation" label feature, the similarity is 0.7, so it is determined that the label "transportation" is successfully matched, so it is necessary to further calculate the similarity between the next layer label "mountain bike" of "transportation" and the image features. For example, for the "mountain bike" label, since this label is more specific and highly related to the image content, the calculated similarity is 0.9, so "transportation-mountain bike" is used as the target label of the image.

通过这一过程，系统不仅识别出了“山地自行车”这一高度相关的标签，确保了标签的准确性和针对性，而且只筛选相似度满足要求的“交通工具”的下一层标签“山地自行车”进行后续匹配，无需计算“户外运动”的下一层标签“攀岩”与图像特征之间的匹配，从而减少了标签识别过程的计算量，提高了标签识别的效率。Through this process, the system not only identifies the highly relevant label "mountain bike", ensuring the accuracy and pertinence of the label, but also only selects the next-layer label "mountain bike" of "transportation" that meets the similarity requirements for subsequent matching. There is no need to calculate the match between the next-layer label "rock climbing" of "outdoor sports" and the image features, thereby reducing the amount of calculation in the label recognition process and improving the efficiency of label recognition.

在一个实际应用中，待识别对象可以为待识别图像，对于所获得的待识别图像的至少一个目标标签，可以有多种应用场景。In a practical application, the object to be identified may be an image to be identified, and there may be multiple application scenarios for at least one target label obtained for the image to be identified.

作为一种可选方式，该方法还包括：基于所述待识别图像至少一个目标标签，确定所述待识别图像的图像类别。As an optional manner, the method further includes: determining the image category of the image to be identified based on at least one target label of the image to be identified.

确定图像类别的目的在于：根据待识别图像上的至少一个目标标签，将其归类到一个或多个预定义的图像类别中。例如，如果目标标签有“狗”和“户外”，系统可能会将这张图片归类为“宠物户外活动”类别。实现这一功能通常需要一个训练好的图像分类模型，该模型能够基于图像的内容和关联的标签信息，输出一个或多个最有可能的类别。The purpose of determining the image category is to classify it into one or more predefined image categories based on at least one target label on the image to be identified. For example, if the target labels are "dog" and "outdoor", the system may classify this picture into the "Pet Outdoor Activities" category. Achieving this function usually requires a trained image classification model that can output one or more most likely categories based on the content of the image and the associated label information.

作为另一种可选方式，该方法还包括：基于所述至少一个目标标签，搜索与所述待识别图像匹配的目标图像。As another optional manner, the method further includes: searching for a target image matching the image to be identified based on the at least one target tag.

通过搜索匹配的目标图像的目的在于：利用至少一个目标标签作为查询条件，从一个图像数据库中寻找与待识别图像内容相似或相关的其他图像。这在图像检索、内容推荐、版权检测等场景非常有用。例如，如果目标标签是“巴黎铁塔”，系统会搜索包含巴黎铁塔的其他图像。实现这一功能通常涉及到图像特征提取和相似度计算，以及高效的图像索引和检索算法。The purpose of searching for matching target images is to use at least one target tag as a query condition to find other images from an image database that are similar or related to the content of the image to be identified. This is very useful in scenarios such as image retrieval, content recommendation, and copyright detection. For example, if the target tag is "Eiffel Tower", the system will search for other images containing the Eiffel Tower. Implementing this function usually involves image feature extraction and similarity calculation, as well as efficient image indexing and retrieval algorithms.

作为又一种可选方式，该方法还包括：基于所述至少一个目标标签，验证所述待识别图像是否满足图像要求。As another optional manner, the method further includes: verifying whether the image to be identified meets image requirements based on the at least one target tag.

验证图像是否满足图像要求的目的在于：检查待识别图像是否符合特定的标准或要求，这些标准可能是质量上的（如分辨率、清晰度）、内容上的（如必须包含特定元素）、或者是合规性上的（如不含敏感内容）。依据目标标签，系统可能会定义一套验证规则或使用机器学习模型来评估图像是否达标。例如，如果目标标签是“产品展示”，系统会验证图像是否清晰展示了产品的全貌，没有遮挡或模糊。The purpose of verifying whether an image meets the image requirements is to check whether the image to be identified meets specific standards or requirements. These standards may be quality (such as resolution, clarity), content (such as must contain specific elements), or compliance (such as no sensitive content). Depending on the target label, the system may define a set of verification rules or use a machine learning model to evaluate whether the image meets the standards. For example, if the target label is "product display", the system will verify whether the image clearly shows the full picture of the product without occlusion or blur.

综上所述，通过基于目标标签的这三个操作（分类、搜索、验证），该方法能够有效地管理和处理图像数据，不仅提高了图像处理的自动化程度，也增强了应用的智能化水平，如在内容管理、搜索引擎、智能监控等多个领域发挥重要作用。In summary, through these three operations (classification, search, and verification) based on target labels, this method can effectively manage and process image data, which not only improves the degree of automation of image processing, but also enhances the intelligence level of applications, such as playing an important role in content management, search engines, intelligent monitoring and other fields.

下面对待识别对象为待识别图像为例，结合图3所示的场景交互示意图，对本申请实施例的技术方案进行描述。The following describes the technical solution of the embodiment of the present application by taking the image to be identified as an example of an object to be identified and combining it with the scene interaction schematic diagram shown in FIG3 .

在这个场景中，用户端301和服务端302通过一系列交互来实现对“待识别图像”的准确标签匹配。以下是具体交互流程的详细阐述：In this scenario, the user end 301 and the server end 302 achieve accurate label matching of the "image to be identified" through a series of interactions. The following is a detailed description of the specific interaction process:

步骤30：用户通过在用户端301中输入多个标签。Step 30: The user inputs multiple tags in the user terminal 301.

步骤31：用户端301向服务端302提供该多个标签。其中，用户端301（可以是移动应用、网页界面或其他客户端应用）的用户，根据其需求或观察到的物体特征，手动输入或选择一系列标签，比如“动物”、“犬科”、“金毛寻回犬”。这些标签代表了用户对待识别图像的基本认识或预期分类。在技术实现中，可通过设置用户界面，允许用户在用户界面中输入自由文本。Step 31: The client 301 provides the plurality of labels to the server 302. The user of the client 301 (which may be a mobile application, a web interface or other client application) manually inputs or selects a series of labels, such as "animal", "canine", "golden retriever", according to his needs or observed object features. These labels represent the user's basic understanding or expected classification of the image to be identified. In technical implementation, the user interface can be set to allow the user to enter free text in the user interface.

步骤32：服务端302可以将获取的多个标签中的标签按照层级关系进行分组，获得多组标签组合。其中，服务端302接收到用户端301提交的多个标签后，按照层级关系进行分组。比如，“动物”作为第一层，“犬科”作为第二层，“金毛寻回犬”作为第三层。Step 32: The server 302 may group the tags among the obtained multiple tags according to the hierarchical relationship to obtain multiple tag combinations. After receiving the multiple tags submitted by the user 301, the server 302 groups them according to the hierarchical relationship. For example, "animal" is the first layer, "canine" is the second layer, and "golden retriever" is the third layer.

步骤33：服务端302提取所述多组标签组合中的标签分别对应的标签特征。其中，服务端302对分组后的标签组合进行处理，提取出每个标签的特征向量。这一步骤通过分析标签的文字描述、相关图像或其它模态数据，形成对标签的深度理解。Step 33: The server 302 extracts the label features corresponding to the labels in the plurality of label combinations. The server 302 processes the grouped label combinations and extracts the feature vector of each label. This step forms a deep understanding of the label by analyzing the text description, related images or other modal data of the label.

步骤34：服务端302还用于获取待识别图像，并提取所述待识别图像的图像特征。服务端302同样从这些数据中提取出待识别图像的特征向量，这个特征向量反映了图像的视觉或其他模态信息。Step 34: The server 302 is also used to obtain the image to be identified and extract the image features of the image to be identified. The server 302 also extracts the feature vector of the image to be identified from these data, and this feature vector reflects the visual or other modal information of the image.

其中，待识别图像的来源可至少包括：用户通过用户端301提供、服务端302自动抓取、第三方API集成或者历史数据库调用等。The source of the image to be identified may at least include: provided by the user through the user terminal 301, automatically captured by the server 302, integrated by a third-party API, or called by a historical database, etc.

具体地，由用户通过用户端301提供可以是用户通过用户端应用程序（如手机应用、网页表单等）上传图像。服务端302自动抓取可以是服务端302自动抓取网络上的内容，如社交媒体平台上的图片、视频分享，新闻网站的媒体内容等，进行自动识别和处理，以获取待识别图像。第三方API集成可以是服务端302可以通过集成第三方服务的API来获取待识别图像，例如从云存储服务下载用户授权分享的文件，或从内容提供商那里接收数据流。历史数据库调用可以是服务端302从自身的历史数据库中选取数据以作为待识别图像，这些数据可能之前由用户上传，或通过其他途径收集，本申请对此不做限定。Specifically, the image provided by the user through the user terminal 301 may be an image uploaded by the user through a user terminal application (such as a mobile phone application, a web page form, etc.). The automatic capture by the server 302 may be that the server 302 automatically captures content on the Internet, such as pictures and video sharing on social media platforms, media content on news websites, etc., and automatically identifies and processes them to obtain the image to be identified. The third-party API integration may be that the server 302 can obtain the image to be identified by integrating the API of a third-party service, such as downloading a file authorized to be shared by the user from a cloud storage service, or receiving a data stream from a content provider. The historical database call may be that the server 302 selects data from its own historical database as the image to be identified. These data may have been uploaded by the user before, or collected through other channels, and this application does not limit this.

步骤35：服务端302基于所述图像特征及所述标签特征，将所述多组标签组合分别与所述待识别图像进行匹配，以获得所述待识别图像对应的至少一个目标标签。在这个过程中，服务端302利用之前提取的标签特征和待识别图像的特征，进行匹配操作。这一过程可能包括计算特征向量之间的相似度，从多组标签组合中找出与图像特征最为匹配的标签集。于匹配结果，服务端302确定出与待识别图像最为相关的至少一个目标标签，如“金毛寻回犬”，并将此信息反馈给用户端301（如在用户端301的显示界面显示“金毛寻回犬”）。这样，用户就能获得关于待识别图像的精确分类信息，满足其识别、搜索或分类需求。Step 35: Based on the image features and the label features, the server 302 matches the multiple sets of label combinations with the image to be identified respectively to obtain at least one target label corresponding to the image to be identified. In this process, the server 302 uses the previously extracted label features and the features of the image to be identified to perform a matching operation. This process may include calculating the similarity between feature vectors and finding the label set that best matches the image features from the multiple sets of label combinations. Based on the matching results, the server 302 determines at least one target label that is most relevant to the image to be identified, such as "Golden Retriever", and feeds this information back to the user end 301 (such as displaying "Golden Retriever" on the display interface of the user end 301). In this way, the user can obtain accurate classification information about the image to be identified to meet his identification, search or classification needs.

步骤36：在待识别图像由用户通过用户端301提供的情况下，服务端302还可以将所述待识别图像对应的至少一个目标标签反馈给用户端301。Step 36: In the case where the image to be identified is provided by the user through the user terminal 301 , the server terminal 302 may also feed back at least one target tag corresponding to the image to be identified to the user terminal 301 .

需要说明的是，提供待识别图像的用户与提供多个标签的用户可以为同一个用户，共用同一个用户账号，当然也可以是不同用户等。It should be noted that the user who provides the image to be identified and the user who provides the multiple tags may be the same user and share the same user account, or they may be different users.

当然，服务端302也可以结合至少一个目标标签而对待识别图像进行其它处理，例如确定待识别图像的图像类别，搜索待识别图像匹配的目标图像或者验证待识别图像是否满足图像要求等，并可以将处理结果再反馈给相关人员。Of course, the server 302 may also perform other processing on the image to be identified in combination with at least one target tag, such as determining the image category of the image to be identified, searching for a target image that matches the image to be identified, or verifying whether the image to be identified meets image requirements, etc., and may then feed back the processing results to relevant personnel.

图4为本申请提供的一种标签处理装置一个实施例的结构示意图，如图4所示，该装置包括：FIG4 is a schematic diagram of a structure of an embodiment of a label processing device provided by the present application. As shown in FIG4 , the device includes:

第一获取模块41，用于获取多个标签；将多个标签中的标签按照层级关系进行分组，获得多组标签组合；The first acquisition module 41 is used to acquire multiple tags; group the tags in the multiple tags according to the hierarchical relationship to obtain multiple groups of tag combinations;

第一提取模块42，用于提取所述多组标签组合中的标签分别对应的标签特征；A first extraction module 42 is used to extract label features corresponding to the labels in the plurality of label combinations;

其中，所述多组标签组合用以按照层级从高到低的顺序，基于所述标签特征以及所述待识别对象的对象特征，与所述待识别对象进行匹配以获得所述待识别对象的至少一个目标标签。The multiple groups of label combinations are used to match the object to be identified in descending order of hierarchy based on the label features and the object features of the object to be identified to obtain at least one target label of the object to be identified.

可选地，本申请实施例中，所述第一获取模块41具体用于识别所述多个标签中的标签的所属层级；将同一层级的标签进行组合，获得至少一组标签组合。Optionally, in an embodiment of the present application, the first acquisition module 41 is specifically used to identify the level to which the tags among the multiple tags belong; and combine tags at the same level to obtain at least one group of tag combinations.

可选地，本申请实施例中，所述第一获取模块41还用于在词汇数据库中查询所述标多个标签中的标签的上位词直至达到根节点，并根据所述标签至所述根节点的路径长度，确定出所述标签的所属层级。Optionally, in an embodiment of the present application, the first acquisition module 41 is also used to query the hypernyms of the label in the multiple labels in the vocabulary database until the root node is reached, and determine the level to which the label belongs based on the path length from the label to the root node.

可选地，本申请实施例中，所述第一获取模块41还用于确定所述多个标签中的标签在所述词汇数据库中的同义词集；逐级查找所述同义词集在所述词汇数据库中的上位词直至达到根节点。Optionally, in the embodiment of the present application, the first acquisition module 41 is further used to determine a synonym set of a tag among the multiple tags in the vocabulary database; and search for hypernyms of the synonym set in the vocabulary database step by step until a root node is reached.

可选地，本申请实施例中，该装置还包括：第一处理模块43；Optionally, in the embodiment of the present application, the device further includes: a first processing module 43;

所述第一处理模块43用于从所述多个标签中筛除符合筛除条件的标签，以更新所述多个标签。The first processing module 43 is used to filter out tags that meet a filter condition from the multiple tags to update the multiple tags.

可选地，本申请实施例中，所述第一获取模块41具体用于获取用户提供的包括多个标签的标签列表；Optionally, in the embodiment of the present application, the first acquisition module 41 is specifically used to acquire a tag list including multiple tags provided by a user;

所述第一提取模块42具体用于利用多模态模型提取所述多组标签组合中的标签分别对应的标签特征；所述多模态模型为预训练模型。The first extraction module 42 is specifically used to extract label features corresponding to the labels in the multiple groups of label combinations by using a multimodal model; the multimodal model is a pre-trained model.

图4所述的标签处理装置可以执行图1所示实施例所述的标签处理方法，其实现原理和技术效果不再赘述。对于上述实施例中的标签处理装置其中各个模块、单元执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。The label processing device shown in FIG4 can execute the label processing method described in the embodiment shown in FIG1, and its implementation principle and technical effect are not described in detail. The specific manner in which each module and unit performs operations in the label processing device in the above embodiment has been described in detail in the embodiment of the method, and will not be described in detail here.

图5为本申请提供的一种对象处理装置一个实施例的结构示意图，如图5所示，该装置包括：FIG5 is a schematic diagram of the structure of an embodiment of an object processing device provided by the present application. As shown in FIG5 , the device includes:

第二获取模块51，用于获取待识别对象；A second acquisition module 51 is used to acquire an object to be identified;

第二提取模块52，用于提取所述待识别对象的对象特征；A second extraction module 52, used to extract object features of the object to be identified;

第二确定模块53，用于确定提取的多组标签组合中标签对应的标签特征；所述多组标签组合为获取的多个标签中的标签按照层级关系进行分组获得；A second determination module 53 is used to determine the label features corresponding to the labels in the extracted multiple label combinations; the multiple label combinations are obtained by grouping the labels in the acquired multiple labels according to a hierarchical relationship;

第二匹配模块54，用于基于所述对象特征及所述标签特征，将所述多组标签组合分别与所述待识别对象进行匹配，以获得所述待识别对象对应的至少一个目标标签。The second matching module 54 is used to match the multiple groups of label combinations with the objects to be identified respectively based on the object features and the label features, so as to obtain at least one target label corresponding to the objects to be identified.

可选地，本申请实施例中，所述第二匹配模块54具体用于基于所述标签特征以及所述对象特征，将待识别对象与最高层级标签组合中的标签进行匹配，以获得匹配成功标签；从最高层级标签组合的下一层级标签组合开始，确定当前层级标签组合中，与上一层级标签组合中的匹配成功标签具有上下位关系的至少一个标签；将所述至少一个标签与所述待识别对象进行匹配，以获得匹配成功标签；将所述多组标签组合对应的至少一个匹配成功标签，作为所述待识别对象对应的至少一个目标标签。Optionally, in an embodiment of the present application, the second matching module 54 is specifically used to match the object to be identified with the label in the highest-level label combination based on the label features and the object features to obtain a successfully matched label; starting from the next-level label combination of the highest-level label combination, determine at least one label in the current-level label combination that has a hierarchical relationship with the successfully matched label in the previous-level label combination; match the at least one label with the object to be identified to obtain a successfully matched label; and use at least one successfully matched label corresponding to the multiple groups of label combinations as at least one target label corresponding to the object to be identified.

可选地，本申请实施例中，所述第二匹配模块54具体用于计算所述多组标签组合中的标签对应的标签特征分别与所述待识别对象的对象特征之间特征相似度；确定特征相似的满足相似条件的至少一个目标标签。Optionally, in an embodiment of the present application, the second matching module 54 is specifically used to calculate the feature similarity between the label features corresponding to the labels in the multiple groups of label combinations and the object features of the object to be identified; and determine at least one target label with similar features that meets the similarity condition.

可选地，本申请实施例中，所述第二提取模块52具体用于利用多模态模型提取所述待识别对象的对象特征；Optionally, in the embodiment of the present application, the second extraction module 52 is specifically used to extract the object features of the object to be identified by using a multimodal model;

所述第二确定模块53具体用于确定利用所述多模态模型提取的多组标签组合中标签对应的标签特征；所述获取的多个标签包括用户提供的包括多个标签的标签列表。The second determination module 53 is specifically used to determine the label features corresponding to the labels in the multiple groups of label combinations extracted using the multimodal model; the multiple labels obtained include a label list including multiple labels provided by the user.

可选地，本申请实施例中，所述待识别对象为待识别图像，该装置还包括：第二搜索模块55、第二验证模块56；Optionally, in the embodiment of the present application, the object to be identified is an image to be identified, and the device further includes: a second search module 55 and a second verification module 56;

所述第二确定模块53还用于基于所述至少一个目标标签，确定所述待识别图像的图像类别；或者，所述第二搜索模块55用于基于所述至少一个目标标签，搜索与所述待识别图像匹配的目标图像；或者；所述第二验证模块56用于基于所述至少一个目标标签，验证所述待识别图像是否满足图像要求。The second determination module 53 is also used to determine the image category of the image to be identified based on the at least one target tag; or, the second search module 55 is used to search for a target image matching the image to be identified based on the at least one target tag; or; the second verification module 56 is used to verify whether the image to be identified meets the image requirements based on the at least one target tag.

图5所述的对象处理装置可以执行图2所示实施例所述的对象处理方法，其实现原理和技术效果不再赘述。对于上述实施例中的对象处理装置其中各个模块、单元执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。The object processing device shown in FIG5 can execute the object processing method shown in the embodiment shown in FIG2, and its implementation principle and technical effect are not described in detail. The specific manner in which each module and unit performs operations in the object processing device in the above embodiment has been described in detail in the embodiment of the method, and will not be described in detail here.

本申请实施例还提供了一种计算设备，如图6所示，该计算设备可以包括存储组件61以及处理组件62；The embodiment of the present application further provides a computing device, as shown in FIG6 , the computing device may include a storage component 61 and a processing component 62;

所述存储组件61一条或多条计算机指令，其中，所述一条或多条计算机指令供所述处理组件调用执行，以实现获取多个标签；将所述多个标签中的标签按照层级关系进行分组，获得多组标签组合；提取所述多组标签组合中的标签分别对应的标签特征；其中，所述多组标签组合用以按照层级从高到低的顺序，基于所述标签特征以及所述待识别对象的对象特征，与待识别对象进行匹配以获得所述待识别对象的至少一个目标标签。The storage component 61 stores one or more computer instructions, wherein the one or more computer instructions are called and executed by the processing component to obtain multiple labels; group the labels in the multiple labels according to a hierarchical relationship to obtain multiple label combinations; extract label features corresponding to the labels in the multiple label combinations; wherein the multiple label combinations are used to match the object to be identified in a descending order of hierarchy based on the label features and the object features of the object to be identified to obtain at least one target label of the object to be identified.

当然，计算设备必然还可以包括其他部件，例如输入/输出接口、显示组件、通信组件等。Of course, the computing device may also include other components, such as input/output interfaces, display components, communication components, etc.

输入/输出接口为处理组件62和外围接口模块之间提供接口，上述外围接口模块可以是输出设备、输入设备等。通信组件被配置为便于计算设备和其他设备之间有线或无线方式的通信等。The input/output interface provides an interface between the processing component 62 and a peripheral interface module, which may be an output device, an input device, etc. The communication component is configured to facilitate wired or wireless communication between the computing device and other devices.

其中，处理组件62可以包括一个或多个处理器来执行计算机指令，以完成上述的方法中的全部或部分步骤。当然处理组件也可以为一个或多个应用专用集成电路（ASIC）、数字信号处理器（DSP）、数字信号处理设备（DSPD）、可编程逻辑器件（PLD）、现场可编程门阵列（FPGA）、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述方法。The processing component 62 may include one or more processors to execute computer instructions to complete all or part of the steps in the above method. Of course, the processing component may also be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components to perform the above method.

存储组件被配置为存储各种类型的数据以支持在终端的操作。存储组件可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器（SRAM），电可擦除可编程只读存储器（EEPROM），可擦除可编程只读存储器（EPROM），可编程只读存储器（PROM），只读存储器（ROM），磁存储器，快闪存储器，磁盘或光盘。The storage component is configured to store various types of data to support operations at the terminal. The storage component can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.

显示组件可以为电致发光(EL)元件、液晶显示器或具有类似结构的微型显示器、或者视网膜可直接显示或类似的激光扫描式显示器。The display component may be an electroluminescent (EL) element, a liquid crystal display or a micro display having a similar structure, or a retinal direct display or a similar laser scanning display.

需要说明的是，上述计算设备实现图1所示标签处理方法或者图2所示对象处理方法的情况下，其可以为物理设备或者云计算平台提供的弹性计算主机等。其可以实现成多个服务器或终端设备组成的分布式集群，也可以实现成单个服务器或单个终端设备。It should be noted that when the computing device implements the tag processing method shown in FIG1 or the object processing method shown in FIG2, it can be a physical device or an elastic computing host provided by a cloud computing platform, etc. It can be implemented as a distributed cluster consisting of multiple servers or terminal devices, or as a single server or a single terminal device.

本申请实施例还提供了一种计算机可读存储介质，存储有计算机程序，所述计算机程序被计算机执行时可以实现上述图1所示标签处理方法或者图2所示对象处理方法。该计算机可读介质可以是上述实施例中描述的计算设备中所包含的；也可以是单独存在，而未装配入该计算设备中。The embodiment of the present application also provides a computer-readable storage medium storing a computer program, which can implement the tag processing method shown in FIG1 or the object processing method shown in FIG2 when executed by a computer. The computer-readable medium can be included in the computing device described in the above embodiment; or it can exist independently without being assembled into the computing device.

本申请实施例还提供了一种计算机程序产品，其包括承载在计算机可读存储介质上的计算机程序，所述计算机程序被计算机执行时可以实现如上述如图1所示标签处理方法或者图2所示对象处理方法。在这样的实施例中，计算机程序可以是从网络上被下载和安装，和/或从可拆卸介质被安装。在该计算机程序被处理器执行时，执行本申请的系统中限定的各种功能。The embodiment of the present application also provides a computer program product, which includes a computer program carried on a computer-readable storage medium, and when the computer program is executed by a computer, it can implement the label processing method shown in Figure 1 or the object processing method shown in Figure 2. In such an embodiment, the computer program can be downloaded and installed from a network, and/or installed from a removable medium. When the computer program is executed by a processor, various functions defined in the system of the present application are executed.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统，装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and units described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Ordinary technicians in this field can understand and implement it without paying creative labor.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be implemented by hardware. Based on this understanding, the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A tag processing method, comprising:

Acquiring a plurality of labels provided by a user;

Grouping the labels in the labels according to a hierarchical relationship to obtain a plurality of groups of label combinations, wherein the labels in the same group of label combinations belong to the same hierarchy;

Extracting label characteristics corresponding to labels in the plurality of groups of label combinations respectively;

grouping and storing the tag characteristics;

the multi-group label combination is used for matching with the object to be identified based on the label characteristics and the object characteristics of the object to be identified according to the order of the hierarchy from high to low so as to obtain at least one target label of the object to be identified;

the plurality of groups of label combinations are matched with the object to be identified specifically in the following mode:

Based on the tag characteristics and the object characteristics, matching the object to be identified with the tag in the highest-level tag combination to obtain a successfully matched tag;

determining at least one label which has an upper-lower relationship with a successfully matched label in a previous-level label combination in the current-level label combination from the next-level label combination of the highest-level label combination;

matching the at least one tag with the object to be identified to obtain a successfully matched tag;

And using at least one successfully matched tag corresponding to the plurality of groups of tag combinations as at least one target tag corresponding to the object to be identified.

2. The method of claim 1, wherein grouping the tags of the plurality of tags according to a hierarchical relationship, obtaining a plurality of groups of tag combinations comprises:

identifying a hierarchy to which a tag of the plurality of tags belongs;

And combining the labels of the same level to obtain at least one group of label combinations.

3. The method of claim 2, wherein the identifying the belonging hierarchy of tags in the plurality of tags comprises:

And inquiring the upper words of the labels in the vocabulary database until reaching a root node, and determining the belonging level of the labels according to the path length from the labels to the root node.

4. The method of claim 3, wherein querying the vocabulary database for the hypernym of the tag in the plurality of tags until the root node is reached comprises:

Determining a synonym set of tags in the vocabulary database;

And searching the upper words of the synonym set in the vocabulary database step by step until reaching a root node.

5. The method of claim 1, wherein after the obtaining the plurality of tags, the method further comprises:

And screening out the labels meeting the screening condition from the labels so as to update the labels.

6. The method of claim 1, wherein the obtaining a plurality of tags comprises:

Acquiring a tag list comprising a plurality of tags provided by a user;

The extracting the label characteristics corresponding to the labels in the plurality of groups of label combinations respectively comprises the following steps:

And extracting label characteristics corresponding to labels in the plurality of groups of label combinations by using a multi-modal model, wherein the multi-modal model is a pre-training model.

7. An object processing method, comprising:

Acquiring an object to be identified;

Extracting object characteristics of the object to be identified;

determining the label characteristics corresponding to the labels in the extracted multiple groups of label combinations, wherein the multiple groups of label combinations are obtained by grouping the labels in the acquired multiple labels according to a hierarchical relationship, and the labels in the same group of label combinations belong to the same hierarchy;

8. The method of claim 7, wherein the matching the plurality of sets of tag combinations with the object to be identified based on the object features and the tag features, respectively, to obtain at least one target tag corresponding to the object to be identified comprises:

Calculating feature similarity between the tag features corresponding to the tags in the plurality of groups of tag combinations and the object features of the object to be identified respectively;

At least one target tag satisfying a similarity condition is determined that features are similar.

9. The method of claim 7, wherein the extracting object features of the object to be identified comprises:

extracting object characteristics of the object to be identified by utilizing a multi-modal model;

the determining the tag characteristics corresponding to the tags in the extracted plurality of groups of tag combinations comprises the following steps:

And determining the tag characteristics corresponding to the tags in the multi-group tag combination extracted by using the multi-mode model, wherein the acquired plurality of tags comprise a tag list provided by a user and comprising a plurality of tags.

10. The method of claim 7, wherein the object to be identified is an image to be identified, the method further comprising:

determining an image category of the image to be identified based on the at least one target tag;

Or alternatively;

Searching for a target image matching the image to be identified based on the at least one target tag;

Or alternatively;

and verifying whether the image to be identified meets the image requirement or not based on the at least one target label.

11. A computing device comprising a processing component and a storage component, the storage component storing one or more computer instructions for execution by the processing component to implement the tag processing method of any of claims 1-6 or the object processing method of any of claims 7-10.

12. A computer-readable storage medium, on which a computer program is stored which, when executed by a processing component, implements the tag processing method of any one of claims 1 to 6 or the object processing method of any one of claims 7 to 10.

13. A computer program product comprising computer programs/instructions which when executed by a processing component implement the method of tag processing according to any one of claims 1 to 6 or the method of object processing according to any one of claims 7 to 10.