CN117912054A

CN117912054A - Training method, device, equipment and medium for human body weight recognition model

Info

Publication number: CN117912054A
Application number: CN202311870113.2A
Authority: CN
Inventors: 何烨林; 魏新明; 肖嵘
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-19

Abstract

The application relates to the technical field of artificial intelligence, in particular to a training method, device, equipment and medium of a human body re-identification model. According to the method, the encoder in the human body re-recognition model to be trained is optimized in a self-supervision mode, data labeling is not needed during optimization, optimization efficiency is improved, loss of the encoder in different tasks including reconstruction loss, similarity loss and classification loss is calculated in the encoder optimization process, the encoder in the human body re-recognition model to be trained is optimized in combination with the loss in different tasks, optimization precision is improved, and therefore when the human body re-recognition task is carried out by using the human body re-recognition model constructed by the optimized encoder, re-recognition precision of the human body re-recognition model is improved.

Description

Training method, device, equipment and medium for human body weight recognition model

技术领域Technical Field

本发明涉及人工智能技术领域，尤其涉及一种人体重识别模型的训练方法、装置、设备及介质。The present invention relates to the field of artificial intelligence technology, and in particular to a training method, device, equipment and medium for a human body weight recognition model.

背景技术Background technique

随着人工智能技术的进步与发展，在企业管理和公共安全需求的日益增长下，人体重识别技术因其能够实现跨越时间和空间对目标人群进行跟踪、匹配与身份鉴定的能力，已经大量应用于社会生活中的方方面面，也是近年来计算机视觉领域的研究热点之一。重识别本质上是计算样本的相似度或者距离，然后根据相似度或者距离对样本进行排序，进而找到与查询样本属于同一个人的图像。With the advancement and development of artificial intelligence technology, and the growing needs of corporate management and public safety, human re-identification technology has been widely used in all aspects of social life because of its ability to track, match and identify target people across time and space. It is also one of the research hotspots in the field of computer vision in recent years. Re-identification essentially calculates the similarity or distance of samples, and then sorts the samples according to the similarity or distance, and then finds images that belong to the same person as the query sample.

基于监督学习的重识别方法原理是将人体图像作为重识别模型的输入，将人工标注的人体身份标签作为模型的期望输出，从而训练模型提取人体图像的身份特征，并对人体身份分类。由于监督学习方法需要人工标注大量成对数据标签，但是在实际应用中，为每个应用场景标注大规模的数据集成本高昂，该方法在实际应用中受到了较大限制。为了解决这个问题，近年来一些基于自监督的重识别方法主要通过在对未标记的数据上进行聚类，或从已标记的源数据域中将知识迁移到目标数据域上。然而，现有的自监督重识别方法得到的重识别模型的重识别精度并不令人满意，与有监督算法相比，其重识别精度显著下降。因此，在进行自监督的重识别学习中，如何重识别模型的重识别精度成为亟需解决的问题。The principle of the re-identification method based on supervised learning is to use human images as the input of the re-identification model and the manually annotated human identity labels as the expected output of the model, so as to train the model to extract the identity features of the human image and classify the human identity. Since the supervised learning method requires a large number of manually annotated paired data labels, but in practical applications, it is costly to annotate large-scale data sets for each application scenario, this method is greatly limited in practical applications. In order to solve this problem, some self-supervised re-identification methods in recent years mainly cluster unlabeled data or transfer knowledge from the labeled source data domain to the target data domain. However, the re-identification accuracy of the re-identification model obtained by the existing self-supervised re-identification method is not satisfactory, and its re-identification accuracy is significantly reduced compared with the supervised algorithm. Therefore, in the self-supervised re-identification learning, how to improve the re-identification accuracy of the re-identification model becomes an urgent problem to be solved.

发明内容Summary of the invention

有鉴于此，本发明实施例提供了一种人体重识别模型的训练方法、装置、设备及介质，以解决在进行自监督的重识别学习中，重识别模型的重识别精度较低的问题。In view of this, embodiments of the present invention provide a method, apparatus, device and medium for training a human body re-identification model to solve the problem of low re-identification accuracy of the re-identification model in self-supervised re-identification learning.

第一方面，本发明实施例提供一种人体重识别模型的训练方法，所述人体重识别模型包括预训练好的编码器，所述训练方法包括：In a first aspect, an embodiment of the present invention provides a method for training a human weight recognition model, wherein the human weight recognition model includes a pre-trained encoder, and the training method includes:

获取输入人体图像以及对所述输入人体图像的预设分类维度；Acquire an input human body image and a preset classification dimension for the input human body image;

基于所述预训练好的编码器对所述输入人体图像进行部分编码，得到部分编码结果，并将所述部分编码结果输入一解码器进行重构，计算重构损失；Partially encode the input human body image based on the pre-trained encoder to obtain a partial encoding result, and input the partial encoding result into a decoder for reconstruction, and calculate the reconstruction loss;

基于所述预训练好的编码器对所述输入人体图像的两种数据增强后的图像分别进行编码，得到两个增强编码结果，并将所述两个增强编码结果进行对比学习，计算对比学习损失；Based on the pre-trained encoder, the two data-enhanced images of the input human body image are respectively encoded to obtain two enhanced encoding results, and the two enhanced encoding results are compared and learned to calculate the comparative learning loss;

基于所述预训练好的编码器对所述输入人体图像进行全部编码，得到全部编码结果，对所述全部编码结果进行投影分类，得到分类结果，基于所述预设分类维度，对所述全部编码结果进行聚类，得到聚类结果，根据所述聚类结果和所述分类结果，计算分类损失；Based on the pre-trained encoder, all the input human body images are encoded to obtain all encoding results, all the encoding results are projected and classified to obtain classification results, all the encoding results are clustered based on the preset classification dimensions to obtain clustering results, and classification loss is calculated according to the clustering results and the classification results;

根据所述重构损失、所述对比学习损失与所述分类损失，对所述预训练好的编码器进行优化，得到优化的编码器；Optimizing the pre-trained encoder according to the reconstruction loss, the contrastive learning loss, and the classification loss to obtain an optimized encoder;

基于所述优化的编码器构建人体重识别模型，得到目标人体重识别模型。A human body weight recognition model is constructed based on the optimized encoder to obtain a target human body weight recognition model.

第二方面，本发明实施例提供一种人体重识别模型的训练装置，所述人体重识别模型包括预训练好的编码器，所述训练装置包括：In a second aspect, an embodiment of the present invention provides a training device for a human body weight recognition model, wherein the human body weight recognition model includes a pre-trained encoder, and the training device includes:

获取模块，用于获取输入人体图像以及对所述输入人体图像的预设分类维度；An acquisition module, used for acquiring an input human body image and a preset classification dimension for the input human body image;

重构损失计算模块，用于基于所述预训练好的编码器对所述输入人体图像进行部分编码，得到部分编码结果，并将所述部分编码结果输入一解码器进行重构，计算重构损失；A reconstruction loss calculation module, used for partially encoding the input human body image based on the pre-trained encoder to obtain a partial encoding result, and inputting the partial encoding result into a decoder for reconstruction, and calculating the reconstruction loss;

对比学习损失计算模块，用于基于所述预训练好的编码器对所述输入人体图像的两种数据增强后的图像分别进行编码，得到两个增强编码结果，并将所述两个增强编码结果进行对比学习，计算对比学习损失；A contrastive learning loss calculation module is used to encode the two data-enhanced images of the input human body image respectively based on the pre-trained encoder to obtain two enhanced encoding results, and to perform contrastive learning on the two enhanced encoding results to calculate the contrastive learning loss;

分类损失计算模块，用于基于所述预训练好的编码器对所述输入人体图像进行全部编码，得到全部编码结果，对所述全部编码结果进行投影分类，得到分类结果，基于所述预设分类维度，对所述全部编码结果进行聚类，得到聚类结果，根据所述聚类结果和所述分类结果，计算分类损失；A classification loss calculation module, used to encode all the input human body images based on the pre-trained encoder to obtain all encoding results, perform projection classification on all the encoding results to obtain classification results, cluster all the encoding results based on the preset classification dimensions to obtain clustering results, and calculate the classification loss according to the clustering results and the classification results;

优化模块，用于根据所述重构损失、所述对比学习损失与所述分类损失，对所述预训练好的编码器进行优化，得到优化的编码器；An optimization module, used for optimizing the pre-trained encoder according to the reconstruction loss, the contrastive learning loss and the classification loss to obtain an optimized encoder;

构建模块，用于基于所述优化的编码器构建人体重识别模型，得到目标人体重识别模型。A construction module is used to construct a human weight recognition model based on the optimized encoder to obtain a target human weight recognition model.

第三方面，本发明实施例提供一种计算机设备，所述计算机设备包括处理器、存储器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如第一方面所述的训练方法。In a third aspect, an embodiment of the present invention provides a computer device, comprising a processor, a memory, and a computer program stored in the memory and executable on the processor, wherein the processor implements the training method described in the first aspect when executing the computer program.

第四方面，本发明实施例提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如第一方面所述的训练方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the training method as described in the first aspect is implemented.

本发明与现有技术相比存在的有益效果是：Compared with the prior art, the present invention has the following beneficial effects:

获取输入人体图像以及对输入人体图像的预设分类维度，基于预训练好的编码器对输入人体图像进行部分编码，得到部分编码结果，并将部分编码结果输入一解码器进行重构，计算重构损失，基于预训练好的编码器对输入人体图像的两种数据增强后的图像分别进行编码，得到两个增强编码结果，并将两个增强编码结果进行对比学习，计算对比学习损失，基于预训练好的编码器对输入人体图像进行全部编码，得到全部编码结果，对全部编码结果进行投影分类，得到分类结果，基于预设分类维度，对全部编码结果进行聚类，得到聚类结果，根据聚类结果和分类结果，计算分类损失，根据重构损失、对比学习损失与分类损失，对预训练好的编码器进行优化，得到优化的编码器，基于优化的编码器构建人体重识别模型，得到目标人体重识别模型。本申请中，使用自监督的方式对待训练的人体重识别模型中的编码器进行优化，优化时，无须进行数据标注，提高了优化效率，且在编码器优化过程中，分别计算编码器在不同任务中的损失，包括重构损失、相似度损失与分类损失，联合不同任务中的损失对待训练的人体重识别模型中的编码器进行优化，提高了优化精度，从而使用优化的编码器构建的人体重识别模型进行人体重识别任务时，提高了人体重识别模型的重识别精度。An input human image and a preset classification dimension for the input human image are obtained, the input human image is partially encoded based on a pre-trained encoder to obtain a partial encoding result, and the partial encoding result is input into a decoder for reconstruction, and the reconstruction loss is calculated, two data-enhanced images of the input human image are respectively encoded based on the pre-trained encoder to obtain two enhanced encoding results, and the two enhanced encoding results are compared and learned to calculate the comparative learning loss, the input human image is fully encoded based on the pre-trained encoder to obtain a full encoding result, all the encoding results are projected and classified to obtain a classification result, all the encoding results are clustered based on the preset classification dimension to obtain a clustering result, and the classification loss is calculated based on the clustering result and the classification result, and the pre-trained encoder is optimized based on the reconstruction loss, the comparative learning loss and the classification loss to obtain an optimized encoder, and a human weight recognition model is constructed based on the optimized encoder to obtain a target human weight recognition model. In the present application, a self-supervised approach is used to optimize the encoder in the human weight recognition model to be trained. During the optimization, no data labeling is required, which improves the optimization efficiency. In the encoder optimization process, the losses of the encoder in different tasks are calculated separately, including reconstruction loss, similarity loss and classification loss. The losses in different tasks are combined to optimize the encoder in the human weight recognition model to be trained, thereby improving the optimization accuracy. Therefore, when the human weight recognition model constructed using the optimized encoder is used for the human weight recognition task, the re-recognition accuracy of the human weight recognition model is improved.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例的技术方案，下面将对本发明实施例的描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings required for use in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For ordinary technicians in this field, other accompanying drawings can be obtained based on these accompanying drawings without paying creative labor.

图1是本发明一实施例提供的一种人体重识别模型的训练方法的一应用环境示意图；FIG1 is a schematic diagram of an application environment of a training method for a human body weight recognition model provided by an embodiment of the present invention;

图2是本发明一实施例提供的一种人体重识别模型的训练方法的流程示意图；FIG2 is a flow chart of a method for training a human body weight recognition model according to an embodiment of the present invention;

图3是本发明一实施例提供的一种人体重识别模型的训练装置的结构示意图；3 is a schematic diagram of the structure of a training device for a human body weight recognition model provided by an embodiment of the present invention;

图4是本发明一实施例提供的一种计算机设备的结构示意图。FIG. 4 is a schematic diagram of the structure of a computer device provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

以下描述中，为了说明而不是为了限定，提出了诸如特定系统结构、技术之类的具体细节，以便透彻理解本发明实施例。然而，本领域的技术人员应当清楚，在没有这些具体细节的其它实施例中也可以实现本发明。在其它情况中，省略对众所周知的系统、装置、电路以及方法的详细说明，以免不必要的细节妨碍本发明的描述。In the following description, specific details such as specific system structures, technologies, etc. are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present invention. However, it should be clear to those skilled in the art that the present invention may be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to prevent unnecessary details from obstructing the description of the present invention.

应当理解，当在本发明说明书和所附权利要求书中使用时，术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在，但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in the present specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or combinations thereof.

还应当理解，在本发明说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合，并且包括这些组合。It should also be understood that the term "and/or" used in the present description and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

如在本发明说明书和所附权利要求书中所使用的那样，术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地，短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in the present specification and the appended claims, the term "if" may be interpreted as "when" or "upon" or "in response to determining" or "in response to detecting", depending on the context. Similarly, the phrase "if it is determined" or "if [described condition or event] is detected" may be interpreted as meaning "upon determination" or "in response to determining" or "upon detection of [described condition or event]" or "in response to detecting [described condition or event]", depending on the context.

另外，在本发明说明书和所附权利要求书的描述中，术语“第一”、“第二”、“第三”等仅用于区分描述，而不能理解为指示或暗示相对重要性。In addition, in the description of the present specification and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the descriptions and cannot be understood as indicating or implying relative importance.

在本发明说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本发明的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此，在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例，而是意味着“一个或多个但不是所有的实施例”，除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”，除非是以其他方式另外特别强调。References to "one embodiment" or "some embodiments" etc. described in the specification of the present invention mean that one or more embodiments of the present invention include specific features, structures or characteristics described in conjunction with the embodiment. Therefore, the statements "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. that appear in different places in this specification do not necessarily refer to the same embodiment, but mean "one or more but not all embodiments", unless otherwise specifically emphasized in other ways. The terms "including", "comprising", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized in other ways.

本发明实施例可以基于人工智能技术对相关的数据进行获取和处理。其中，人工智能(Artificial Intelligence，AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present invention can acquire and process relevant data based on artificial intelligence technology. Artificial Intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.

人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。The basic technologies of artificial intelligence generally include sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics, etc. Artificial intelligence software technologies mainly include computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

应理解，以下实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本发明实施例的实施过程构成任何限定。It should be understood that the order of execution of the steps in the following embodiments does not imply a precedence of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

为了说明本发明的技术方案，下面通过具体实施例来进行说明。In order to illustrate the technical solution of the present invention, specific embodiments are provided below for illustration.

本发明实一实施例提供的一种人体重识别模型的训练方法，可应用在如图1的应用环境中，其中，客户端与服务端进行通信。其中，客户端包括但不限于掌上电脑、桌上型计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer，UMPC)、上网本、云端计算机设备、个人数字助理(personal digital assistant，PDA)等计算机设备。服务端可以是独立的服务器，也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content DeliveryNetwork，CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。A training method for a human body weight recognition model provided by an embodiment of the present invention can be applied in an application environment as shown in FIG1 , wherein a client communicates with a server. The client includes but is not limited to a PDA, a desktop computer, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a cloud computer device, a personal digital assistant (PDA), and other computer devices. The server can be an independent server or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDNs), and big data and artificial intelligence platforms.

参见图2，是本发明一实施例提供的一种人体重识别模型的训练方法的流程示意图，上述人体重识别模型的训练方法可以应用于图1中的服务端，如图2所示，该人体重识别模型的训练方法可以包括以下步骤。Referring to FIG. 2 , which is a flow chart of a method for training a human weight recognition model provided in one embodiment of the present invention, the method for training a human weight recognition model can be applied to the server in FIG. 1 . As shown in FIG. 2 , the method for training a human weight recognition model can include the following steps.

S201：获取输入人体图像以及对输入人体图像的预设分类维度。S201: Obtain an input human body image and a preset classification dimension for the input human body image.

在步骤S201中，人体重识别模型包括预训练好的编码器，其中预训练好的编码器为基于自注意力机制结构的编码器，输入人体图像为包含人体的图像，输入人体图像的预设分类维度为对输入人体图像的分类类别个数。In step S201, the human body weight recognition model includes a pre-trained encoder, wherein the pre-trained encoder is an encoder based on a self-attention mechanism structure, the input human body image is an image containing a human body, and the preset classification dimension of the input human body image is the number of classification categories for the input human body image.

本实施例中，人体重识别模型包括预训练好的编码器，预训练好的编码器为基于自注意力机制结构的编码器，包括自注意力层和多层感知机层。自注意力层用于将输入人体图像映射为第一向量、第二向量与第三向量，第一向量与第二向量相乘得到注意力图，注意力图和第三向量相乘得到最后输出的注意力特征图。将输出的信息通过多层全连接层，得到特征向量。其中，本实施例中的预训练好的编码器可以包括12层Transformer层。In the present embodiment, the human body weight recognition model includes a pre-trained encoder, and the pre-trained encoder is an encoder based on a self-attention mechanism structure, including a self-attention layer and a multi-layer perceptron layer. The self-attention layer is used to map the input human body image into a first vector, a second vector and a third vector, and the first vector and the second vector are multiplied to obtain an attention map, and the attention map and the third vector are multiplied to obtain the attention feature map of the final output. The output information is passed through a multi-layer fully connected layer to obtain a feature vector. Among them, the pre-trained encoder in the present embodiment can include 12 layers of Transformer layers.

在对编码器进行预训练的过程中，可以掩模的方式进行预训练，将输入人体图像进行分块处理，得到多个图像块，将多个图像块分为不同的图像块组，随机遮挡图像块组中的若干个图像块，得到每个图像块组中的遮挡图像块，针对每个图像块组中的遮挡图像块，可以确定这些遮挡图像块的可学习向量，其中，可学习向量可以理解为随机初始化的向量参数。通过每个图像块组中的遮挡图像块对应的特征向量和未遮挡图像块对应的可学习向量，可以对编码器进行预训练，得到预训练好的编码器。In the process of pre-training the encoder, the pre-training can be performed in a masked manner, the input human body image is processed into blocks to obtain multiple image blocks, the multiple image blocks are divided into different image block groups, and several image blocks in the image block group are randomly occluded to obtain occluded image blocks in each image block group. For the occluded image blocks in each image block group, the learnable vectors of these occluded image blocks can be determined, wherein the learnable vectors can be understood as randomly initialized vector parameters. The encoder can be pre-trained by the feature vectors corresponding to the occluded image blocks in each image block group and the learnable vectors corresponding to the unoccluded image blocks to obtain a pre-trained encoder.

本实施例中，基于每个图像块组中的未遮挡图像块对应的特征向量和遮挡图像块对应的可学习向量，对编码器进行预训练，由于每个图像块组中均包括有未遮挡图像块，可以使得预训练完成的编码器能够提取出图像的局部特征，提高了预训练好的编码器的精度和预训练效果。另外，通过上述预训练完成的编码器，可以提取到更加具有判别力的特征，从而可以提高人体重识别的准确率。In this embodiment, the encoder is pre-trained based on the feature vector corresponding to the unobstructed image block in each image block group and the learnable vector corresponding to the obstructed image block. Since each image block group includes an unobstructed image block, the pre-trained encoder can extract the local features of the image, thereby improving the accuracy and pre-training effect of the pre-trained encoder. In addition, through the above-mentioned pre-trained encoder, more discriminative features can be extracted, thereby improving the accuracy of human weight recognition.

获取输入人体图像以及对输入人体图像的预设分类维度，其中输入人体图像可以是摄像头等图像采集装置采集到的原始图像，也可以是对原始图像进行预处理之后获得的图像。该预处理操作可以包括滤波等去噪操作。人体重识别的过程就是根据数据库中存储的多个用户的人体图像确定待识别人体图像的对应的身份信息，例如：行人A和行人B即为两个不同的人体类别。在一例中，为便于记录，可以采用身份标识号(I dent itydocument，i d)的形式记录和区分各人体类别。例如：id1对应第一个人体类别，id2对应第一个人体类别。需要说明的是，输入人体图像中的人体画面可以为完整的人体画面，也可以是非完整的人体画面，例如，输入人体图像中的人体画面可以为人体上半身的画面，或者，输入人体图像中的人体画面也可以为人体下半身的画面。输入人体图像的预设分类维度可以包括多个分类维度，例如，上半身分类，下半身分类与背景分类等。An input human image and a preset classification dimension for the input human image are obtained, wherein the input human image may be an original image collected by an image acquisition device such as a camera, or an image obtained after preprocessing the original image. The preprocessing operation may include denoising operations such as filtering. The process of human body weight recognition is to determine the corresponding identity information of the human body image to be recognized based on the human body images of multiple users stored in the database, for example: pedestrian A and pedestrian B are two different human body categories. In one example, for the convenience of recording, the human body categories can be recorded and distinguished in the form of an identity document (Identity document, id). For example: id1 corresponds to the first human body category, and id2 corresponds to the first human body category. It should be noted that the human body picture in the input human body image can be a complete human body picture or an incomplete human body picture. For example, the human body picture in the input human body image can be a picture of the upper body of the human body, or the human body picture in the input human body image can also be a picture of the lower body of the human body. The preset classification dimension of the input human body image may include multiple classification dimensions, for example, upper body classification, lower body classification and background classification.

可选地，预训练好的编码器的预训练过程包括：Optionally, the pre-training process of the pre-trained encoder includes:

获取基于自注意力机制结构的初始编码器与训练数据，训练数据为包含人体的图像；Obtain the initial encoder and training data based on the self-attention mechanism structure, where the training data is images containing human bodies;

使用训练数据对初始编码器进行预训练，得到预训练好的编码器。The initial encoder is pre-trained using the training data to obtain a pre-trained encoder.

本实施例中，获取基于自注意力机制结构的初始编码器与训练数据，训练数据为包含人体的图像，使用训练数据对初始编码器进行预训练，得到预训练好的编码器，预训练时，本实施例中可以进行监督训练，进行监督训练时，获取训练数据的标签数据，根据训练数据与训练数据的标签数据对初始编码器进行预训练，得到预训练好的编码器。训练数据可以从大型公用数据集中提取人体图像重识别数据集，采用中值滤波的方法对人体图像重识别数据集进行图像平滑的预处理操作，得到预处理后的数据集，将预处理后的数据集，采用灰度拉伸的图像增强的算法进行处理，得到增强后的数据集，根据得到的增强后的数据集训练初始编码器，得到预训练好的编码器。In this embodiment, an initial encoder and training data based on a self-attention mechanism structure are obtained. The training data is an image containing a human body. The initial encoder is pre-trained using the training data to obtain a pre-trained encoder. During pre-training, supervised training can be performed in this embodiment. During supervised training, label data of the training data is obtained. The initial encoder is pre-trained according to the training data and the label data of the training data to obtain a pre-trained encoder. The training data can be extracted from a large public data set to obtain a human image re-identification data set. The human image re-identification data set is pre-processed by smoothing the image using a median filtering method to obtain a pre-processed data set. The pre-processed data set is processed using a grayscale stretching image enhancement algorithm to obtain an enhanced data set. The initial encoder is trained according to the obtained enhanced data set to obtain a pre-trained encoder.

另一实施例中，可以对初始编码器进行自监督训练，进行自监督训练时，可以是对比学习的方法进行训练，将训练数据进行数据增强，增强后的数据与该训练数据组成正样本对，与其他训练数据组成负样本对，对比学习为增加正样本对中图像之间的相似性，增加负样本对图像之间的差异性。In another embodiment, the initial encoder can be subjected to self-supervised training. When conducting self-supervised training, the training can be performed using a contrastive learning method. The training data is subjected to data enhancement. The enhanced data and the training data form a positive sample pair, and the enhanced data and other training data form a negative sample pair. Contrastive learning is to increase the similarity between images in the positive sample pair and increase the difference between images in the negative sample pair.

对比学习时，通过构建正样本对与负样本对进行学习，训练数据中包括N个样本图像，其中，N为大于1的整数，进行图像增强后，得到2N个增强图像，同一张样本图像进行图像增强得到的两张增强图像作为正样本，将剩余的增强图像作为负样本，则负样本数量为2(N-1)个，每个增强图像都包括一个样本图像与2(N-1)个负样本图像。During contrastive learning, learning is performed by constructing positive sample pairs and negative sample pairs. The training data includes N sample images, where N is an integer greater than 1. After image enhancement, 2N enhanced images are obtained. The two enhanced images obtained by image enhancement of the same sample image are used as positive samples, and the remaining enhanced images are used as negative samples. The number of negative samples is 2(N-1), and each enhanced image includes a sample image and 2(N-1) negative sample images.

本实施例中，自监督训练时，对训练数据进行增强处理，得到多尺度、多角度对应的相似数据，丰富了训练数据的类型，增加了训练数据的数量。In this embodiment, during self-supervised training, the training data is enhanced to obtain similar data corresponding to multiple scales and multiple angles, thereby enriching the types of training data and increasing the amount of training data.

S202：基于预训练好的编码器对输入人体图像进行部分编码，得到部分编码结果，并将部分编码结果输入一解码器进行重构，计算重构损失。S202: Partially encode the input human body image based on the pre-trained encoder to obtain a partial encoding result, and input the partial encoding result into a decoder for reconstruction, and calculate the reconstruction loss.

在步骤S202中，基于预训练好的编码器对输入人体图像进行部分编码，通过部分编码结果重构剩余部分的特征，以便于根据重构的剩余部分的特征计算重构损失。In step S202, the input human body image is partially encoded based on the pre-trained encoder, and the features of the remaining part are reconstructed through the partial encoding result, so as to calculate the reconstruction loss according to the features of the reconstructed remaining part.

本实施例中，基于预训练好的编码器对输入人体图像进行部分编码，得到部分编码结果，其中部分编码结果为输入人体图像中的部分区域的编码结果，例如，可以是输入人体图像的上半部分区域或者是下半部分区域或者是左半部分区域或者是右半部分区域，并将部分编码结果输入一解码器进行重构，通过部分编码结果进行重构，得到重构后的人体图像，将重构后的人体图像与输入人体图像对应比较，计算重构损失。In this embodiment, the input human body image is partially encoded based on a pre-trained encoder to obtain a partial encoding result, wherein the partial encoding result is the encoding result of a partial area in the input human body image, for example, it can be the upper half area, the lower half area, the left half area, or the right half area of the input human body image, and the partial encoding result is input into a decoder for reconstruction, and the reconstruction is performed through the partial encoding result to obtain a reconstructed human body image, and the reconstructed human body image is compared with the input human body image to calculate the reconstruction loss.

需要说明的是，部分编码结果可以是对输入人体图像的部分区域中的编码结果，还可以是输入人体图像中部分通道的编码结果，本实施例不做限制。It should be noted that the partial encoding result may be the encoding result of a partial area of the input human body image, or may be the encoding result of a partial channel in the input human body image, which is not limited in this embodiment.

需要说明的是，将部分编码结果输入一解码器进行重构时，使用的解码器可以是预训练好的解码器，计算重构损失时，可以进行逐像素进行比较，计算对应损失。例如，可以计算输入人体图像与重构后图像中对应像素差值的均值，将对应均值作为重构损失，其中，对应像素为同一位置的像素，如，输入人体图像中第一行第一列中的像素与重构后图像中第一行第一列中的像素的像素差值。It should be noted that when a partial encoding result is input into a decoder for reconstruction, the decoder used may be a pre-trained decoder, and when calculating the reconstruction loss, a pixel-by-pixel comparison may be performed to calculate the corresponding loss. For example, the mean of the difference between the corresponding pixels in the input human image and the reconstructed image may be calculated, and the corresponding mean may be used as the reconstruction loss, where the corresponding pixels are pixels at the same position, such as the pixel difference between the pixels in the first row and the first column of the input human image and the pixels in the first row and the first column of the reconstructed image.

另一实施例，计算重构损失时，还可以将重构的剩余部分的差异值作为重构损失，即将重构的未编码部分进行重构损失计算，例如，对输入人体图像中的上半部分进行部分编码，得到部分编码结果，基于部分编码结果，使用解码器进行重构时，可以将重构后的人体图像中的下半部分与输入人体图像中的下半部分进行逐像素比较，计算对应重构损失，其中，人体图像中的下半部分未进行编码。In another embodiment, when calculating the reconstruction loss, the difference value of the remaining reconstructed part can also be used as the reconstruction loss, that is, the reconstruction loss is calculated for the reconstructed unencoded part. For example, the upper half of the input human body image is partially encoded to obtain a partial encoding result. Based on the partial encoding result, when a decoder is used for reconstruction, the lower half of the reconstructed human body image can be compared pixel by pixel with the lower half of the input human body image to calculate the corresponding reconstruction loss, wherein the lower half of the human body image is not encoded.

本实施例中，对输入人体图像进行重构，计算重构任务中的损失，若预训练好的编码器精度较高，则基于部分编码结果重构后的图像与输入人体图像越相近，若预训练好的编码器精度较低，则基于部分编码结果重构后的图像与输入人体图像差别越大，重构损失，不仅可以直观检测预训练好的编码器的精度，且可以通过自监督的方式得到，提高编码器优化的效率。In this embodiment, the input human body image is reconstructed and the loss in the reconstruction task is calculated. If the pre-trained encoder has a higher accuracy, the image reconstructed based on the partial encoding result is closer to the input human body image. If the pre-trained encoder has a lower accuracy, the image reconstructed based on the partial encoding result is more different from the input human body image. The reconstruction loss can not only intuitively detect the accuracy of the pre-trained encoder, but also be obtained through self-supervision, thereby improving the efficiency of encoder optimization.

可选地，基于预训练好的编码器对输入人体图像进行部分编码，得到部分编码结果，并将部分编码结果输入一解码器进行重构，计算重构损失，包括：Optionally, partially encoding the input human body image based on a pre-trained encoder to obtain a partial encoding result, and inputting the partial encoding result into a decoder for reconstruction, and calculating the reconstruction loss, including:

将输入人体图像进行分块，得到输入人体图像的N个图像块，N为大于1的整数；Divide the input human body image into blocks to obtain N image blocks of the input human body image, where N is an integer greater than 1;

从N个图像块中选取M个目标图像块与N-M个剩余图像块，基于预训练好的编码器分别对M个目标图像块进行特征编码，得到M个编码结果，M为大于零小于N的整数；Select M target image blocks and N-M remaining image blocks from N image blocks, perform feature encoding on the M target image blocks based on the pre-trained encoder, and obtain M encoding results, where M is an integer greater than zero and less than N;

获取N-M个剩余图像块的初始化特征，将M个编码结果与初始化特征输入一解码器进行重构，得到N个图像块的重构结果；Obtaining the initialization features of the N-M remaining image blocks, inputting the M encoding results and the initialization features into a decoder for reconstruction, and obtaining the reconstruction results of the N image blocks;

根据N个图像块的重构结果与N个图像块，计算重构损失。The reconstruction loss is calculated based on the reconstruction results of the N image blocks and the N image blocks.

本实施例中，将输入人体图像进行分块，得到输入人体图像的N个图像块，N为大于1的整数，分块时，对图像块的大小不做限定，只要保证分块之后，输入人体图像中包括多行多列的图像块即可，如包含有m行n列的图像块，m和n均为大于或等于2的正整数。例如，可以将尺寸大小为8×8的输入人体图像分为2行2列的4个图像块，每个图像的尺寸大小为4×4。In this embodiment, the input human body image is divided into blocks to obtain N image blocks of the input human body image, where N is an integer greater than 1. When dividing into blocks, the size of the image blocks is not limited, as long as it is ensured that after dividing into blocks, the input human body image includes image blocks of multiple rows and columns, such as image blocks of m rows and n columns, where m and n are both positive integers greater than or equal to 2. For example, an input human body image of size 8×8 can be divided into 4 image blocks of 2 rows and 2 columns, and the size of each image is 4×4.

从N个图像块中选取M个目标图像块与N-M个剩余图像块，基于预训练好的编码器分别对M个目标图像块进行特征编码，得到M个编码结果，其中，选取M个目标图像块时，可以将剩余的图像进行掩码遮挡，M个目标图像块不进行遮挡，选取时，可以每行选取固定的图像块作为目标图像块，也可以随机选取对应的目标图像块，本实施例不做限定。基于预训练好的编码器分别对M个目标图像块进行特征编码，得到M个编码结果，获取N-M个剩余图像块的初始化特征，将M个编码结果与初始化特征输入一解码器进行重构，得到N个图像块的重构结果，其中N个图像块的重构结果与N个图像块进行一一对应，例如，4个4×4大小的图像块中，选取2个目标图像块与2个剩余图像块，基于预训练好的编码器分别对2个目标图像块进行特征编码，得到2个编码结果，获取2个剩余图像块的初始特征，将2个编码结果与2个剩余图像块的初始特征输入一解码器进行重构，得到4个图像块的重构结果，其中，重构的图像块的大小与重构前的图像块的大小相等，2个剩余图像块为未编码的图像块。M target image blocks and N-M remaining image blocks are selected from the N image blocks, and the M target image blocks are feature encoded based on the pre-trained encoder to obtain M encoding results. When the M target image blocks are selected, the remaining image can be masked, and the M target image blocks are not masked. When selecting, a fixed image block can be selected as the target image block in each row, or the corresponding target image block can be randomly selected, which is not limited in this embodiment. Based on the pre-trained encoder, feature encoding is performed on M target image blocks respectively to obtain M encoding results, and initialization features of N-M remaining image blocks are obtained. The M encoding results and the initialization features are input into a decoder for reconstruction to obtain reconstruction results of N image blocks, wherein the reconstruction results of the N image blocks correspond one-to-one to the N image blocks. For example, among the four 4×4 image blocks, two target image blocks and two remaining image blocks are selected, and feature encoding is performed on the two target image blocks respectively based on the pre-trained encoder to obtain two encoding results, and initial features of the two remaining image blocks are obtained. The two encoding results and the initial features of the two remaining image blocks are input into a decoder for reconstruction to obtain reconstruction results of four image blocks, wherein the size of the reconstructed image block is equal to the size of the image block before reconstruction, and the two remaining image blocks are unencoded image blocks.

需要说明的是，获取获取N-M个剩余图像块的初始化特征时，获取到的是N-M个剩余图像块的随机初始化可学习特征，本实施例中，使用初始化的编码器提取N-M个剩余图像块的初始化特征，其中，初始化的编码器是对编码器进行初始化赋值后未进行训练的编码器或者未训练好的编码器，在对N-M个剩余图像块进行初始化特征提取时，可以分别对N-M个剩余图像块中的每个图像块进行特征提取，N-M个剩余图像块的初始化特征为N-M个剩余图像块中每个图像块对应图像特征的集合，包括了N-M个剩余图像块的图像特征。其中，初始化的编码器可以是任意深度学习神经网络，例如，基于自注意力机制结构的编码器等。It should be noted that when obtaining the initialization features of the N-M remaining image blocks, what is obtained is the randomly initialized learnable features of the N-M remaining image blocks. In this embodiment, an initialized encoder is used to extract the initialization features of the N-M remaining image blocks, wherein the initialized encoder is an encoder that has not been trained after the encoder is initialized and assigned a value or an untrained encoder. When the initialization feature extraction of the N-M remaining image blocks is performed, feature extraction can be performed on each image block in the N-M remaining image blocks respectively. The initialization features of the N-M remaining image blocks are a set of image features corresponding to each image block in the N-M remaining image blocks, including the image features of the N-M remaining image blocks. Among them, the initialized encoder can be any deep learning neural network, for example, an encoder based on a self-attention mechanism structure, etc.

根据N个图像块的重构结果与N个图像块，计算重构损失，其中，计算重构损失时，可以将N个图像块的重构结果与N个图像块进行逐像素相减，将逐像素相减得到的差值作为重构损失。The reconstruction loss is calculated based on the reconstruction results of the N image blocks and the N image blocks. When calculating the reconstruction loss, the reconstruction results of the N image blocks can be subtracted from the N image blocks pixel by pixel, and the difference obtained by the pixel by pixel subtraction is used as the reconstruction loss.

本实施例中，将输入人体图像进行分块，得到输入人体图像的N个图像块，N为大于1的整数，从N个图像块中选取M个目标图像块与N-M个剩余图像块，基于预训练好的编码器分别对M个目标图像块进行特征编码，得到M个编码结果，将M个编码结果与N-M个剩余图像块的初始化特征输入一解码器进行重构，得到N个图像块的重构结果。由于只有选取的目标图像块进行特征编码，其余图像块不需要输入至预训练好的编码器中进行特征编码，这样可以大幅度降低编码器计算量的消耗，提升处理效率。In this embodiment, the input human body image is divided into blocks to obtain N image blocks of the input human body image, where N is an integer greater than 1, and M target image blocks and N-M remaining image blocks are selected from the N image blocks. The M target image blocks are feature encoded based on the pre-trained encoder to obtain M encoding results, and the M encoding results and the initialization features of the N-M remaining image blocks are input into a decoder for reconstruction to obtain the reconstruction results of the N image blocks. Since only the selected target image blocks are feature encoded, the remaining image blocks do not need to be input into the pre-trained encoder for feature encoding, which can greatly reduce the consumption of the encoder's computational workload and improve processing efficiency.

S203：基于预训练好的编码器对输入人体图像的两种数据增强后的图像分别进行编码，得到两个增强编码结果，并将两个增强编码结果进行对比学习，计算对比学习损失。S203: Encode the two data-enhanced images of the input human body image respectively based on the pre-trained encoder to obtain two enhanced encoding results, and perform comparative learning on the two enhanced encoding results to calculate the comparative learning loss.

在步骤S203中，基于数据增强后的图像分别进行编码，计算对比学习损失，其中，对比学习损失为同一输入人体图像不同编码后的差异。In step S203, the images after data enhancement are respectively encoded, and the contrastive learning loss is calculated, wherein the contrastive learning loss is the difference between the same input human body image after different encodings.

本实施例中，对输入人体图像进行两种不同方式的图像增强处理，得到输入人体图像对应的增强后的两种图像，基于预训练好的编码器对增强后的两种图像进行编码，得到两个增强编码结果，并将两个增强编码结果进行对比学习，计算对比学习损失。In this embodiment, two different image enhancement methods are performed on the input human body image to obtain two enhanced images corresponding to the input human body image. The two enhanced images are encoded based on a pre-trained encoder to obtain two enhanced encoding results. The two enhanced encoding results are contrastively learned to calculate the contrastive learning loss.

需要说明的是，对输入人体图像进行两种不同方式的图像增强处理时，可以选取任意两种图像增强的方式，例如，滤波方式，对输入人体图像进行滤波处理，去除输入人体图像的噪声，或者对输入人体图像进行颜色转换处理，将输入人体图像转换为不同色彩空间下的图像，或者对输入人体图像进行分辨率增强处理，将输入人体图像转换为更清晰的图像。It should be noted that when performing two different image enhancement methods on the input human image, any two image enhancement methods can be selected, for example, a filtering method, filtering the input human image to remove noise from the input human image, or performing color conversion on the input human image to convert the input human image into an image in a different color space, or performing resolution enhancement on the input human image to convert the input human image into a clearer image.

需要说明的是，计算对比学习损失时，可以通过计算两个增强编码结果对应的余弦值，或者计算两个增强编码结果对应的差值，确定对比学习损失，本实施例中不做限定。It should be noted that when calculating the contrastive learning loss, the contrastive learning loss can be determined by calculating the cosine value corresponding to the two enhanced coding results, or calculating the difference corresponding to the two enhanced coding results, which is not limited in this embodiment.

本实施例中，计算增强后的图像的对比学习损失，使预训练好的编码器可以学习到图像之间内在的一致性信息，提高不同图像特征之间的区别性，以便于提高对不同人体重识别的精度。In this embodiment, the contrastive learning loss of the enhanced image is calculated so that the pre-trained encoder can learn the intrinsic consistency information between images and improve the distinctiveness between different image features, so as to improve the accuracy of recognizing different human weights.

可选地，基于预训练好的编码器对输入人体图像的两种数据增强后的图像分别进行编码，得到两个增强编码结果，并将两个增强编码结果进行对比学习，计算对比学习损失，包括：Optionally, two data-enhanced images of the input human body image are respectively encoded based on a pre-trained encoder to obtain two enhanced encoding results, and the two enhanced encoding results are compared and learned to calculate the contrastive learning loss, including:

使用预设的第一增强方式对输入人体图像进行数据增强处理，得到第一增强图像；Performing data enhancement processing on the input human body image using a preset first enhancement method to obtain a first enhanced image;

使用预设的第二增强方式对输入人体图像进行数据增强处理，得到第二增强图像，第一增强图像与第二增强图像为输入人体图像的两种数据增强后的图像；Performing data enhancement processing on the input human body image using a preset second enhancement method to obtain a second enhanced image, wherein the first enhanced image and the second enhanced image are two data-enhanced images of the input human body image;

分别对第一增强图像与第二增强图像进行特征编码，得到第一增强图像的第一编码结果与第二增强图像的第二编码结果；Performing feature encoding on the first enhanced image and the second enhanced image respectively to obtain a first encoding result of the first enhanced image and a second encoding result of the second enhanced image;

将第一编码结果进行投影变换，得到变换后的编码结果；Performing a projection transformation on the first coding result to obtain a transformed coding result;

根据变换后的编码结果与第二编码结果，计算对比学习损失。The contrastive learning loss is calculated based on the transformed encoding result and the second encoding result.

本实施例中，使用预设的第一增强方式对输入人体图像进行数据增强处理，得到第一增强图像，其中预设的第一增强方式可以是对输入人体图像进行色彩空间转换，输入人体图像可以为任何彩色图像，对输入人体图像进行色彩空间转换，得到第一增强图像。In this embodiment, a preset first enhancement method is used to perform data enhancement processing on the input human body image to obtain a first enhanced image, wherein the preset first enhancement method can be a color space conversion of the input human body image, the input human body image can be any color image, and the input human body image is converted into a color space to obtain a first enhanced image.

需要说明的是，在进行色彩空间转换时，首先获取输入人体图像的原始色彩空间参数，根据原始色彩空间参数遍历并获取输入人体图像中各像素点的颜色三分量，根据绝对色彩空间的绝对色彩参数对颜色三分量进行中间值转化，得到中间值三分量，将中间值三分量进行归一化处理，得到归一化三分量；根据目标色彩空间的目标色彩参数对归一化三分量进行数值校正，得到输入人体图像中各像素点校正三分量；将校正三分量输入至目标色彩空间，得到第一增强图像。It should be noted that when performing color space conversion, the original color space parameters of the input human body image are first obtained, and the color three components of each pixel in the input human body image are traversed and obtained according to the original color space parameters. The color three components are converted into intermediate values according to the absolute color parameters of the absolute color space to obtain the intermediate value three components, and the intermediate value three components are normalized to obtain the normalized three components; the normalized three components are numerically corrected according to the target color parameters of the target color space to obtain the corrected three components for each pixel in the input human body image; the corrected three components are input into the target color space to obtain the first enhanced image.

需要说明的是，输入人体图像的原始色彩空间包括但不限于RGB(Red、Green和Blue)色彩空间、CMYK(Cyan、Magenta、Yellow和black)色彩空间，原始色彩空间中显示的色彩范围会随着显示设备的变动而变动，绝对色彩参数是绝对色彩空间定义颜色范围的特定参数，绝对色彩空间是显示的色彩范围不会随着显示设备的变动而变动。It should be noted that the original color space of the input human body image includes but is not limited to RGB (Red, Green and Blue) color space and CMYK (Cyan, Magenta, Yellow and black) color space. The color range displayed in the original color space will change with the change of the display device. The absolute color parameter is a specific parameter that defines the color range of the absolute color space. The color range displayed in the absolute color space will not change with the change of the display device.

目标色彩空间包括LAB色彩空间(Lab color space)，目标色彩参数是目标色彩空间中定义颜色范围的特定参数，目标色彩空间中显示的颜色范围不会随显示设备的变动而变动，且由于目标色彩空间中显示的颜色范围适用于人类视觉，更有利于显示图像细节特征。例如，输入人体图像的原始色彩空间为RGB色彩空间，目标色彩空间为LAB色彩空间，在将输入人体图像从RGB色彩空间转换至LAB色彩空间时，需要先将输入人体图像从RGB色彩空间转换至绝对色彩空间，再通过绝对色彩空间将输入人体图像转换至LAB色彩空间。The target color space includes the LAB color space. The target color parameter is a specific parameter that defines the color range in the target color space. The color range displayed in the target color space will not change with the change of the display device. Moreover, since the color range displayed in the target color space is suitable for human vision, it is more conducive to displaying image detail features. For example, the original color space of the input human image is the RGB color space, and the target color space is the LAB color space. When converting the input human image from the RGB color space to the LAB color space, it is necessary to first convert the input human image from the RGB color space to the absolute color space, and then convert the input human image to the LAB color space through the absolute color space.

本实施例中，由于输入人体图像所在的原始色彩空间中显示的色彩范围会随着显示设备的变动而变动，但目标色彩空间中显示的颜色范围不会随显示设备的变动而变动，因此，输入人体图像无法直接从原始色彩空间转化至目标色彩空间，需要先将原始色彩空间中的输入人体图像转化至绝对色彩空间中，通过绝对色彩空间将输入人体图像转化至目标色彩空间。In this embodiment, since the color range displayed in the original color space where the input human image is located will change with the change of the display device, but the color range displayed in the target color space will not change with the change of the display device, the input human image cannot be directly converted from the original color space to the target color space. It is necessary to first convert the input human image in the original color space into the absolute color space, and then convert the input human image into the target color space through the absolute color space.

本实施例中，使用预设的第二增强方式对输入人体图像进行数据增强处理，得到第二增强图像，其中，预设的第二增强方式可以是滤波的方式，例如，使用中值滤波增强或者高斯滤波增强。In this embodiment, a preset second enhancement method is used to perform data enhancement processing on the input human body image to obtain a second enhanced image, wherein the preset second enhancement method can be a filtering method, for example, using median filtering enhancement or Gaussian filtering enhancement.

使用预训练好的编码器分别对第一增强图像与第二增强图像进行特征编码，得到第一增强图像的第一编码结果与第二增强图像的第二编码结果，将第一编码结果进行投影变换，得到变换后的编码结果，根据变换后的编码结果与第二编码结果，计算对比学习损失。Use a pre-trained encoder to perform feature encoding on the first enhanced image and the second enhanced image respectively to obtain a first encoding result of the first enhanced image and a second encoding result of the second enhanced image, perform a projection transformation on the first encoding result to obtain a transformed encoding result, and calculate the contrastive learning loss based on the transformed encoding result and the second encoding result.

S204：基于预训练好的编码器对输入人体图像进行全部编码，得到全部编码结果，对全部编码结果进行投影分类，得到分类结果，基于预设分类维度，对全部编码结果进行聚类，得到聚类结果，根据聚类结果和分类结果，计算分类损失。S204: Based on the pre-trained encoder, all input human body images are fully encoded to obtain all encoding results, all encoding results are projected and classified to obtain classification results, all encoding results are clustered based on preset classification dimensions to obtain clustering results, and classification loss is calculated based on the clustering results and classification results.

在步骤S204中，通过基于预训练好的编码器对输入人体图像进行全部编码得到的全部编码结果，对输入人体图像进行分类，计算分类任务中的损失，其中，全部编码结果为对输入人体图像进行全局特征编码得到的编码结果。In step S204, the input human body image is classified by obtaining all encoding results obtained by fully encoding the input human body image based on the pre-trained encoder, and the loss in the classification task is calculated, wherein all encoding results are encoding results obtained by performing global feature encoding on the input human body image.

本实施例中，基于预训练好的编码器对输入人体图像进行全部编码，得到全部编码结果，对全部编码结果进行投影分类，得到分类结果，其中，进行投影分类时，可以使用投影头进行分类，其中可以使用多层感知机网络作为投影头，将全部编码结果输入至投影头中，输出分类结果。其中多层感知机网络可以至少包括第一全连接层和第二全连接层，第一全连接层和第二全连接层用于对全部编码结果进行特征映射，具体的，第一全连接层和第二全连接层均采用激活函数对全部编码结果做特征映射变换。鉴于激活函数可以加速模型的收敛，提高模型训练的速度和效率，因此，在实施例中，第一全连接层和第二全连接层均采用激活函数对全部编码结果做特征映射变换。In this embodiment, the input human body image is fully encoded based on the pre-trained encoder to obtain all the encoding results, and all the encoding results are projected and classified to obtain the classification results. When performing the projection classification, a projection head can be used for classification, wherein a multi-layer perceptron network can be used as a projection head, and all the encoding results are input into the projection head to output the classification results. The multi-layer perceptron network can include at least a first fully connected layer and a second fully connected layer, and the first fully connected layer and the second fully connected layer are used to perform feature mapping on all the encoding results. Specifically, the first fully connected layer and the second fully connected layer both use activation functions to perform feature mapping transformation on all the encoding results. In view of the fact that the activation function can accelerate the convergence of the model and improve the speed and efficiency of model training, therefore, in the embodiment, the first fully connected layer and the second fully connected layer both use activation functions to perform feature mapping transformation on all the encoding results.

在经过多层感知机网络的全连接层的特征映射后，将第二全连接层的输出输入至多层感知机网络的归一化层，多层感知机网络的归一化层主要用于对第二全连接层的输出进行归一化处理，根据归一化值，确定出对应的分类结果。After the feature mapping of the fully connected layer of the multilayer perceptron network, the output of the second fully connected layer is input to the normalization layer of the multilayer perceptron network. The normalization layer of the multilayer perceptron network is mainly used to normalize the output of the second fully connected layer, and determine the corresponding classification result according to the normalized value.

本实施例中，使用多层感知机网络作为投影头进行投影分类，可以基于全部编码结果得到不同的类别结果，即可以进行多分类，提高分类效率。In this embodiment, a multi-layer perceptron network is used as a projection head for projection classification, and different category results can be obtained based on all encoding results, that is, multiple classifications can be performed to improve classification efficiency.

本实施例中，基于预设分类维度，对全部编码结果进行聚类，得到聚类结果，聚类时，可以根据预设分类维度进行特征维度的聚类，例如，若全部编码结果中每一图像块中的特征维度为512维，预设分类维度为三类，则可以将512维的特征通过聚类，得到特征维度为3维的聚类结果，其中，每一维度代表其中一个类别结果。根据聚类结果和分类结果，确定出输入人体图像中类别的差异，根据对应差异，计算分类损失。In this embodiment, all encoding results are clustered based on a preset classification dimension to obtain a clustering result. During clustering, feature dimensions can be clustered according to the preset classification dimension. For example, if the feature dimension in each image block in all encoding results is 512 dimensions and the preset classification dimension is three categories, the 512-dimensional features can be clustered to obtain a clustering result with a feature dimension of 3 dimensions, where each dimension represents one of the category results. Based on the clustering results and the classification results, the difference in categories in the input human body image is determined, and the classification loss is calculated based on the corresponding differences.

需要说明的是，其中，聚类时，可以使用k均值聚类法，将全部编码结果中的所有特征维度划分为K个组，本实施例中K为3，将每个组中的等分点作为初始聚类中心，计算其余特征维度中的特征与初始聚类中心的欧式距离，根据欧式距离形成新聚类，计算每个新聚类的聚类中心，直到每个聚类中心不再变化。还可以使用其他聚类方法，本实施例中不做限定。It should be noted that, when clustering, the k-means clustering method can be used to divide all feature dimensions in all encoding results into K groups, where K is 3 in this embodiment, and the equally divided points in each group are used as the initial cluster centers, and the Euclidean distances between the features in the remaining feature dimensions and the initial cluster centers are calculated, and new clusters are formed according to the Euclidean distances, and the cluster centers of each new cluster are calculated until each cluster center no longer changes. Other clustering methods can also be used, which are not limited in this embodiment.

本实施例中，在计算分类损失时，将分类结果与聚类结果进行比较，其中，聚类时，可以进行全部编码结果自动处理，无需人为干扰，可以提高聚类效率，从而提高编码器的优化效率。In this embodiment, when calculating the classification loss, the classification result is compared with the clustering result. During clustering, all encoding results can be automatically processed without human intervention, which can improve the clustering efficiency and thus improve the optimization efficiency of the encoder.

可选地，预设分类维度包括背景分类维度、人体上半身分类维度和人体下半身分类维度，对全部编码结果进行投影分类，得到分类结果，基于预设分类维度，对全部编码结果进行聚类，得到聚类结果，根据聚类结果和分类结果，计算分类损失，包括：Optionally, the preset classification dimensions include a background classification dimension, an upper body classification dimension, and a lower body classification dimension. All encoding results are projected and classified to obtain classification results. All encoding results are clustered based on the preset classification dimensions to obtain clustering results. According to the clustering results and the classification results, the classification loss is calculated, including:

对全部编码结果分别进行背景投影、上半身投影与下半身投影，得到输入人体图像的背景分类结果、人体上半身分类结果与人体下半身分类结果；Perform background projection, upper body projection and lower body projection on all encoding results respectively, and obtain the background classification result, upper body classification result and lower body classification result of the input human image;

基于预设分类维度，对全部编码结果进行聚类，得到输入人体图像的背景聚类结果、人体上半身聚类结果与人体下半身聚类结果；Based on the preset classification dimension, all encoding results are clustered to obtain the background clustering result of the input human image, the upper body clustering result of the human body, and the lower body clustering result of the human body;

根据背景聚类结果、人体上半身聚类结果与人体下半身聚类结果，确定输入人体图像的伪标签；Determine the pseudo label of the input human image according to the background clustering results, the upper body clustering results and the lower body clustering results;

根据输入人体图像的背景分类结果、人体上半身分类结果与人体下半身分类结果，以及伪标签，计算分类损失。The classification loss is calculated based on the background classification results, upper body classification results, lower body classification results, and pseudo labels of the input human image.

本实施例中，预设分类维度包括背景分类维度、人体上半身分类维度和人体下半身分类维度，对全部编码结果分别进行背景投影、上半身投影与下半身投影，得到输入人体图像的背景分类结果、人体上半身分类结果与人体下半身分类结果。其中，在分别进行背景投影、上半身投影与下半身投影时，可以使用训练好的投影头进行投影分类。In this embodiment, the preset classification dimensions include background classification dimensions, upper body classification dimensions, and lower body classification dimensions. Background projection, upper body projection, and lower body projection are performed on all encoding results to obtain background classification results, upper body classification results, and lower body classification results of the input human image. When performing background projection, upper body projection, and lower body projection, respectively, a trained projection head can be used for projection classification.

基于预设分类维度，对全部编码结果进行聚类，得到输入人体图像的背景聚类结果、人体上半身聚类结果与人体下半身聚类结果，根据背景聚类结果、人体上半身聚类结果与人体下半身聚类结果，确定输入人体图像的伪标签。其中，伪标签为输入人体图像中每个像素的伪类别，可以使用不同的编号表示，例如，若输入人体图像中第一行第一列中的像素的伪标签为背景聚类结果，则可以将第2行第1列中的像素标注为编号0，若输入人体图像中第5行第6列中的像素的伪标签为人体上半身聚类结果，则可以将第五行第六列中的像素标注为编号1，若输入人体图像中第24行第8列中的像素的伪标签为人体下半身聚类结果，则可以将第八行第九列中的像素标注为编号2。Based on the preset classification dimension, all encoding results are clustered to obtain the background clustering result, the upper body clustering result and the lower body clustering result of the input human image. According to the background clustering result, the upper body clustering result and the lower body clustering result, the pseudo label of the input human image is determined. The pseudo label is the pseudo category of each pixel in the input human image, which can be represented by different numbers. For example, if the pseudo label of the pixel in the first row and the first column of the input human image is the background clustering result, the pixel in the second row and the first column can be marked as number 0, if the pseudo label of the pixel in the fifth row and the sixth column of the input human image is the upper body clustering result, the pixel in the fifth row and the sixth column can be marked as number 1, if the pseudo label of the pixel in the twenty-fourth row and the eighth column of the input human image is the lower body clustering result, the pixel in the eighth row and the ninth column can be marked as number 2.

根据输入人体图像的背景分类结果、人体上半身分类结果与人体下半身分类结果，以及伪标签，计算分类损失。计算分类损失时，可以使用交叉熵损失函数计算对应的分类损失，或者其他分类损失函数计算对应分类损失。The classification loss is calculated based on the background classification results, the upper body classification results, the lower body classification results, and the pseudo labels of the input human image. When calculating the classification loss, the cross entropy loss function can be used to calculate the corresponding classification loss, or other classification loss functions can be used to calculate the corresponding classification loss.

本实施例中，通过聚类的方式，确定出输入人体图像中的伪标签，使用伪标签作为真实标签计算分类损失，无需对输入人体图像进行标注处理，节约了时间，从而提高了计算效率。In this embodiment, pseudo labels in the input human body image are determined by clustering, and the pseudo labels are used as true labels to calculate the classification loss. There is no need to label the input human body image, which saves time and improves computing efficiency.

S205：根据重构损失、对比学习损失与分类损失，对预训练好的编码器进行优化，得到优化的编码器。S205: Optimize the pre-trained encoder according to the reconstruction loss, the contrastive learning loss, and the classification loss to obtain an optimized encoder.

在步骤S205中，根据重构损失、对比学习损失与分类损失，对预训练好的编码器进行优化，得到优化的编码器。优化时，可以通过调整预训练好的编码器中的参数进行优化，调整预训练好的编码器中的参数时，可以通过梯度反向传播的方式进行调整，直至重构损失、对比学习损失与分类损失满足预设条件时，得到优化的编码器。In step S205, the pre-trained encoder is optimized according to the reconstruction loss, contrastive learning loss and classification loss to obtain an optimized encoder. During the optimization, the optimization can be performed by adjusting the parameters in the pre-trained encoder. When adjusting the parameters in the pre-trained encoder, the adjustment can be performed by gradient back propagation until the reconstruction loss, contrastive learning loss and classification loss meet the preset conditions to obtain the optimized encoder.

本实施例中，根据重构损失、对比学习损失与分类损失，对预训练好的编码器进行优化，优化时，不断对预训练好的编码器的参数进行调整，直至预训练好的编码器的参数达到最优的状态，结束调整，其中，预训练好的编码器的参数达到最优的状态为基于优化的编码器得到的重构损失、对比学习损失与分类损失达到收敛的状态，或者对预训练好的编码器的参数的调整次数达到预设的次数时，结束调整。In this embodiment, the pre-trained encoder is optimized according to the reconstruction loss, contrastive learning loss and classification loss. During the optimization, the parameters of the pre-trained encoder are continuously adjusted until the parameters of the pre-trained encoder reach an optimal state, and the adjustment is terminated. The parameters of the pre-trained encoder reach an optimal state when the reconstruction loss, contrastive learning loss and classification loss obtained based on the optimized encoder reach a converged state, or when the number of adjustments to the parameters of the pre-trained encoder reaches a preset number, the adjustment is terminated.

本实施例中，将不同学习任务中的损失考虑在内，避免了使用单一自监督学习任务的损失对预训练好的编码器优化精度较低的问题，使在提高优化效率的情况下，提高优化精度。In this embodiment, the losses in different learning tasks are taken into account, which avoids the problem of low optimization accuracy of the pre-trained encoder using the loss of a single self-supervised learning task, thereby improving the optimization accuracy while improving the optimization efficiency.

可选地，根据重构损失、对比学习损失与分类损失，对预训练好的编码器进行优化，得到优化的编码器，包括：Optionally, the pre-trained encoder is optimized according to the reconstruction loss, the contrastive learning loss, and the classification loss to obtain an optimized encoder, including:

将重构损失、对比学习损失与分类损失进行加权相加，得到总的损失；The reconstruction loss, contrastive learning loss and classification loss are weighted together to get the total loss;

根据总的损失，对预训练好的编码器进行优化，得到优化的编码器。According to the total loss, the pre-trained encoder is optimized to obtain the optimized encoder.

本实施例中，将重构损失、对比学习损失与分类损失进行加权相加，得到总的损失，其中，加权相加可以对不同的损失设置不同的权重，例如，分别对重构损失、对比学习损失与分类损失设置权重a,b,c，其中，a,b,c的和为1。还可以对不同的损失设置动态的权重，例如，可以将重构损失、对比学习损失与分类损失进行相加，得到重构损失、对比学习损失与分类损失的损失和，将重构损失、对比学习损失与分类损失在损失和中的所占的比例作为对应的权重进行加权相加，得到总的损失，根据总的损失，对预训练好的编码器进行优化，得到优化的编码器，优化时，不断调整预训练好的编码器的参数，预训练好的编码器的参数没调整后，基于调整后的参数重新计算总的损失，直至总的损失收敛时，结束调整。In this embodiment, the reconstruction loss, contrastive learning loss and classification loss are weightedly added to obtain the total loss, wherein the weighted addition can set different weights for different losses, for example, weights a, b, c are set for the reconstruction loss, contrastive learning loss and classification loss respectively, wherein the sum of a, b, c is 1. Dynamic weights can also be set for different losses, for example, the reconstruction loss, contrastive learning loss and classification loss can be added to obtain the loss sum of the reconstruction loss, contrastive learning loss and classification loss, and the proportions of the reconstruction loss, contrastive learning loss and classification loss in the loss sum are used as corresponding weights for weighted addition to obtain the total loss, and the pre-trained encoder is optimized according to the total loss to obtain an optimized encoder, and during the optimization, the parameters of the pre-trained encoder are continuously adjusted, and after the parameters of the pre-trained encoder are adjusted, the total loss is recalculated based on the adjusted parameters until the total loss converges, and the adjustment ends.

S206：基于优化的编码器构建人体重识别模型，得到目标人体重识别模型。S206: constructing a human weight recognition model based on the optimized encoder to obtain a target human weight recognition model.

在步骤S206中，将优化的编码器的参数替换人体重识别模型中的预训练好的编码器的参数，构建人体重识别模型，将构建的人体重识别模型作为目标人体重识别模型。In step S206, the parameters of the pre-trained encoder in the human weight recognition model are replaced by the parameters of the optimized encoder to construct a human weight recognition model, and the constructed human weight recognition model is used as the target human weight recognition model.

本实施例中，根据优化的编码器构建人体重识别模型，得到目标人体重识别模型，使目标人体重识别模型可以通过优化的编码器提取高精度的人体图像特征，从而提高人体重识别的精度。In this embodiment, a human weight recognition model is constructed according to the optimized encoder to obtain a target human weight recognition model, so that the target human weight recognition model can extract high-precision human image features through the optimized encoder, thereby improving the accuracy of human weight recognition.

另一实施例中，为了使优化的编码器可以更好适应人体重识别模型的重识别任务，得到目标人体重识别模型后，还可以对目标人体重识别模型进行监督训练。监督训练时，获取监督数据，监督数据包括人体图像和对应的人体重识别结果标签，使用监督数据对目标人体重识别模型进行训练，得到训练好的人体重识别模型，训练好的人体重识别模型用于对人体图像进行人体重识别。In another embodiment, in order to make the optimized encoder better adapt to the re-identification task of the human weight recognition model, after obtaining the target human weight recognition model, the target human weight recognition model can also be supervised and trained. During supervised training, supervision data is obtained, and the supervision data includes human body images and corresponding human weight recognition result labels. The target human weight recognition model is trained using the supervision data to obtain a trained human weight recognition model, and the trained human weight recognition model is used to perform human weight recognition on human images.

另一实施例中，在得到训练好的人体重识别模型后，使用训练好的人体重识别模型对待识别人体图像进行重识别，进行重识别时，获取待重识别的人体图像，对待重识别的人体图像进行人体定位，得到待重识别的人体图像中对应人体的人体检测框，对人体检测框进行裁剪，得到裁剪后的人体图像，对裁剪后的人体图像进行特征提取，得到人体图像特征，根据人体图像特征在预设的人体数据库中进行搜索，搜索出与人体图像特征匹配的目标人体图像，根据目标人体图像确定出待重识别的人体图像的重识别结果。其中，根据人体图像特征在预设的人体数据库中进行搜索时，根据通过计算人体图像特征与预设的人体数据库中的人体特征的相似度值进行搜索，将相似度值最大的人体数据库中的人体特征对应的人体图像作为目标人体图像。In another embodiment, after obtaining a trained human body weight recognition model, the trained human body weight recognition model is used to re-recognize the human body image to be re-recognized. When re-recognizing, the human body image to be re-recognized is obtained, the human body is positioned in the human body image to be re-recognized, a human body detection frame corresponding to the human body in the human body image to be re-recognized is obtained, the human body detection frame is cropped to obtain a cropped human body image, and features are extracted from the cropped human body image to obtain human body image features. According to the human body image features, a search is performed in a preset human body database to search for a target human body image that matches the human body image features, and a re-recognition result of the human body image to be re-recognized is determined according to the target human body image. Among them, when searching in a preset human body database according to the human body image features, the search is performed by calculating the similarity value between the human body image features and the human body features in the preset human body database, and the human body image corresponding to the human body features in the human body database with the largest similarity value is used as the target human body image.

需要说明的是，对裁剪后的人体图像进行特征提取，得到人体图像特征时，可以包括，对裁剪后的人体图像进行分块处理，得到多个图像块，将多个图像块按顺序排列，得到图像块序列，将每个图像块通过投影转换为固定尺寸的词向量，对每个图像块进行位置编码，将每个图像块的词向量与位置编码进行相加，得到每个图像块的输入词向量，计算所有的图像块的输入词向量的平均输入词向量，将平均输入词向量输入至优化的编码器中，输出对应人体图像特征。It should be noted that, when performing feature extraction on the cropped human body image to obtain human body image features, it can include dividing the cropped human body image into blocks to obtain multiple image blocks, arranging the multiple image blocks in sequence to obtain an image block sequence, converting each image block into a word vector of a fixed size through projection, position encoding each image block, adding the word vector of each image block to the position code to obtain an input word vector of each image block, calculating the average input word vector of the input word vectors of all image blocks, inputting the average input word vector into an optimized encoder, and outputting the corresponding human body image features.

获取输入人体图像以及对输入人体图像的预设分类维度，基于预训练好的编码器对输入人体图像进行部分编码，得到部分编码结果，并将部分编码结果输入一解码器进行重构，计算重构损失，基于预训练好的编码器对输入人体图像的两种数据增强后的图像分别进行编码，得到两个增强编码结果，并将两个增强编码结果进行对比学习，计算对比学习损失，基于预训练好的编码器对输入人体图像进行全部编码，得到全部编码结果，对全部编码结果进行投影分类，得到分类结果，基于预设分类维度，对全部编码结果进行聚类，得到聚类结果，根据聚类结果和分类结果，计算分类损失，根据重构损失、对比学习损失与分类损失，对预训练好的编码器进行优化，得到优化的编码器，基于优化的编码器构建人体重识别模型，得到目标人体重识别模型。本申请中，使用自监督的方式对待训练的人体重识别模型中的编码器进行优化，优化时，无须进行数据标注，提高了优化效率，且在编码器优化过程中，分别计算编码器在不同任务中的损失，包括重构损失、相似度损失与分类损失，联合不同任务中的损失对待训练的人体重识别模型中的编码器进行优化，提高了优化精度，从而使用优化的编码器构建的人体识别模型进行人体重识别任务时，提高了人体重识别模型的重识别精度。An input human image and a preset classification dimension for the input human image are obtained, the input human image is partially encoded based on a pre-trained encoder to obtain a partial encoding result, and the partial encoding result is input into a decoder for reconstruction, and the reconstruction loss is calculated, two data-enhanced images of the input human image are respectively encoded based on the pre-trained encoder to obtain two enhanced encoding results, and the two enhanced encoding results are compared and learned to calculate the comparative learning loss, the input human image is fully encoded based on the pre-trained encoder to obtain a full encoding result, all the encoding results are projected and classified to obtain a classification result, all the encoding results are clustered based on the preset classification dimension to obtain a clustering result, and the classification loss is calculated based on the clustering result and the classification result, and the pre-trained encoder is optimized based on the reconstruction loss, the comparative learning loss and the classification loss to obtain an optimized encoder, and a human weight recognition model is constructed based on the optimized encoder to obtain a target human weight recognition model. In the present application, a self-supervised approach is used to optimize the encoder in the human weight recognition model to be trained. During the optimization, no data labeling is required, which improves the optimization efficiency. In the encoder optimization process, the losses of the encoder in different tasks are calculated separately, including reconstruction loss, similarity loss and classification loss. The losses in different tasks are combined to optimize the encoder in the human weight recognition model to be trained, thereby improving the optimization accuracy. Therefore, when the human body recognition model constructed using the optimized encoder is used for the human weight recognition task, the re-recognition accuracy of the human weight recognition model is improved.

请参阅图3，图3是本发明实施例提供的一种人体重识别模型的训练装置的结构示意图。本实施例中该终端包括的各单元用于执行图2对应的实施例中的各步骤。具体请参阅图2所对应的实施例中的相关描述。为了便于说明，仅示出了与本实施例相关的部分。参见图3，训练装置30包括：获取模块31，重构损失计算模块32，对比学习损失计算模块33，分类损失计算模块34，优化模块35，构建模块36。Please refer to Figure 3, which is a structural diagram of a training device for a human body weight recognition model provided by an embodiment of the present invention. In this embodiment, the various units included in the terminal are used to execute the various steps in the embodiment corresponding to Figure 2. Please refer to the relevant description in the embodiment corresponding to Figure 2 for details. For ease of explanation, only the parts related to this embodiment are shown. Referring to Figure 3, the training device 30 includes: an acquisition module 31, a reconstruction loss calculation module 32, a comparative learning loss calculation module 33, a classification loss calculation module 34, an optimization module 35, and a construction module 36.

获取模块31，用于获取输入人体图像以及对输入人体图像的预设分类维度。The acquisition module 31 is used to acquire an input human body image and a preset classification dimension for the input human body image.

重构损失计算模块32，用于基于预训练好的编码器对输入人体图像进行部分编码，得到部分编码结果，并将部分编码结果输入一解码器进行重构，计算重构损失。The reconstruction loss calculation module 32 is used to partially encode the input human body image based on the pre-trained encoder to obtain a partial encoding result, and input the partial encoding result into a decoder for reconstruction and calculate the reconstruction loss.

对比学习损失计算模块33，用于基于预训练好的编码器对输入人体图像的两种数据增强后的图像分别进行编码，得到两个增强编码结果，并将两个增强编码结果进行对比学习，计算对比学习损失。The contrastive learning loss calculation module 33 is used to encode the two data-enhanced images of the input human body image respectively based on the pre-trained encoder to obtain two enhanced encoding results, and to perform contrastive learning on the two enhanced encoding results to calculate the contrastive learning loss.

分类损失计算模块34，用于基于预训练好的编码器对输入人体图像进行全部编码，得到全部编码结果，对全部编码结果进行投影分类，得到分类结果，基于预设分类维度，对全部编码结果进行聚类，得到聚类结果，根据聚类结果和分类结果，计算分类损失。The classification loss calculation module 34 is used to fully encode the input human body image based on the pre-trained encoder to obtain all encoding results, perform projection classification on all encoding results to obtain classification results, cluster all encoding results based on preset classification dimensions to obtain clustering results, and calculate the classification loss based on the clustering results and classification results.

优化模块35，用于根据重构损失、对比学习损失与分类损失，对预训练好的编码器进行优化，得到优化的编码器。The optimization module 35 is used to optimize the pre-trained encoder according to the reconstruction loss, the contrastive learning loss and the classification loss to obtain an optimized encoder.

构建模块36，用于基于优化的编码器构建人体重识别模型，得到目标人体重识别模型。The construction module 36 is used to construct a human weight recognition model based on the optimized encoder to obtain a target human weight recognition model.

可选地，上述训练装置30还包括：Optionally, the training device 30 further includes:

初始编码器与训练数据获取模块，用于获取基于自注意力机制结构的初始编码器与训练数据，训练数据为包含人体的图像。The initial encoder and training data acquisition module is used to obtain the initial encoder and training data based on the self-attention mechanism structure, and the training data is an image containing a human body.

预训练模块，用于使用训练数据对初始编码器进行预训练，得到预训练好的编码器。The pre-training module is used to pre-train the initial encoder using the training data to obtain a pre-trained encoder.

可选地，上述重构损失计算模块32包括：Optionally, the reconstruction loss calculation module 32 includes:

分块单元，用于将输入人体图像进行分块，得到输入人体图像的N个图像块，N为大于1的整数。The blocking unit is used to block the input human body image to obtain N image blocks of the input human body image, where N is an integer greater than 1.

选取单元，用于从N个图像块中选取M个目标图像块与N-M个剩余图像块，基于预训练好的编码器分别对M个目标图像块进行特征编码，得到M个编码结果。The selection unit is used to select M target image blocks and N-M remaining image blocks from the N image blocks, and perform feature encoding on the M target image blocks based on the pre-trained encoder to obtain M encoding results.

重构单元，用于获取N-M个剩余图像块的初始化特征，将M个编码结果与初始化特征输入一解码器进行重构，得到N个图像块的重构结果。The reconstruction unit is used to obtain the initialization features of the N-M remaining image blocks, input the M encoding results and the initialization features into a decoder for reconstruction, and obtain the reconstruction results of the N image blocks.

第一计算单元，用于根据N个图像块的重构结果与N个图像块，计算重构损。The first calculation unit is used to calculate the reconstruction loss according to the reconstruction results of the N image blocks and the N image blocks.

可选地，上述对比学习损失计算模块33包括：Optionally, the contrastive learning loss calculation module 33 includes:

第一增强单元，用于使用预设的第一增强方式对输入人体图像进行数据增强处理，得到第一增强图像。The first enhancement unit is used to perform data enhancement processing on the input human body image using a preset first enhancement method to obtain a first enhanced image.

第二增强单元，用于使用预设的第二增强方式对输入人体图像进行数据增强处理，得到第二增强图像，第一增强图像与第二增强图像为输入人体图像的两种数据增强后的图像。The second enhancement unit is used to perform data enhancement processing on the input human body image using a preset second enhancement method to obtain a second enhanced image, wherein the first enhanced image and the second enhanced image are two data-enhanced images of the input human body image.

编码单元，用于分别对第一增强图像与第二增强图像进行特征编码，得到第一增强图像的第一编码结果与第二增强图像的第二编码结果。The encoding unit is used to perform feature encoding on the first enhanced image and the second enhanced image respectively to obtain a first encoding result of the first enhanced image and a second encoding result of the second enhanced image.

投影单元，用于将第一编码结果进行投影变换，得到变换后的编码结果。The projection unit is used to perform a projection transformation on the first coding result to obtain a transformed coding result.

第二计算单元，用于根据变换后的编码结果与第二编码结果，计算对比学习损失。The second calculation unit is used to calculate the contrastive learning loss according to the transformed encoding result and the second encoding result.

可选地，上述分类损失计算模块34包括：Optionally, the classification loss calculation module 34 includes:

分类单元，用于对全部编码结果分别进行背景投影、上半身投影与下半身投影，得到输入人体图像的背景分类结果、人体上半身分类结果与人体下半身分类结果。The classification unit is used to perform background projection, upper body projection and lower body projection on all encoding results respectively, and obtain the background classification result, upper body classification result and lower body classification result of the input human body image.

聚类单元，用于基于预设分类维度，对全部编码结果进行聚类，得到输入人体图像的背景聚类结果、人体上半身聚类结果与人体下半身聚类结果。The clustering unit is used to cluster all the encoding results based on the preset classification dimension to obtain the background clustering result of the input human body image, the upper body clustering result and the lower body clustering result.

确定单元，用于根据背景聚类结果、人体上半身聚类结果与人体下半身聚类结果，确定输入人体图像的伪标签。The determination unit is used to determine the pseudo label of the input human body image according to the background clustering result, the upper body clustering result and the lower body clustering result.

第三计算单元，用于根据输入人体图像的背景分类结果、人体上半身分类结果与人体下半身分类结果，以及伪标签，计算分类损失。The third calculation unit is used to calculate the classification loss according to the background classification result, the upper body classification result and the lower body classification result of the input human body image, and the pseudo label.

可选地，上述优化模块35包括：Optionally, the optimization module 35 includes:

相加单元，用于将重构损失、对比学习损失与分类损失进行加权相加，得到总的损失。The adding unit is used to perform weighted addition of the reconstruction loss, contrastive learning loss and classification loss to obtain the total loss.

优化单元，用于根据总的损失，对预训练好的编码器进行优化，得到优化的编码器。The optimization unit is used to optimize the pre-trained encoder according to the total loss to obtain an optimized encoder.

需要说明的是，上述模块、单元、子单元之间的信息交互、执行过程等内容，由于与本发明方法实施例基于同一构思，其具体功能及带来的技术效果，具体可参见方法实施例部分，此处不再赘述。It should be noted that the information interaction, execution process, etc. between the above-mentioned modules, units, and sub-units are based on the same concept as the embodiment of the method of the present invention. Their specific functions and technical effects can be found in the method embodiment part and will not be repeated here.

图4为本发明实施例提供的一种计算机设备的结构示意图。如图4所示，该实施例的计算机设备包括：至少一个处理器(图4中仅示出一个)、存储器以及存储在存储器中并可在至少一个处理器上运行的计算机程序，处理器执行计算机程序时实现上述任意各个人体重识别模型的训练方法实施例中的步骤。Fig. 4 is a schematic diagram of the structure of a computer device provided in an embodiment of the present invention. As shown in Fig. 4, the computer device of this embodiment includes: at least one processor (only one is shown in Fig. 4), a memory, and a computer program stored in the memory and executable on at least one processor, and when the processor executes the computer program, the steps in the training method embodiment of any of the above-mentioned human body weight recognition models are implemented.

该计算机设备可包括，但不仅限于，处理器、存储器。本领域技术人员可以理解，图4仅仅是计算机设备的举例，并不构成对计算机设备的限定，计算机设备可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件。The computer device may include, but is not limited to, a processor and a memory. Those skilled in the art will appreciate that FIG4 is merely an example of a computer device and does not constitute a limitation on the computer device. The computer device may include more or fewer components than shown in the figure, or may combine certain components, or different components.

所称处理器可以是CPU，该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific IntegratedCircuit，ASIC)、现成可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor may be a CPU, or other general-purpose processors, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor, etc.

存储器包括可读存储介质、内存储器等，其中，内存储器可以是计算机设备的内存，内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。可读存储介质可以是计算机设备的硬盘，在另一些实施例中也可以是计算机设备的外部存储设备，例如，计算机设备上配备的插接式硬盘、智能存储卡(Smart Media Card，SMC)、安全数字(Secure Digital，SD)卡、闪存卡(Flash Card)等。进一步地，存储器还可以既包括计算机设备的内部存储单元也包括外部存储设备。存储器用于存储操作系统、应用程序、引导装载程序(BootLoader)、数据以及其他程序等，该其他程序如计算机程序的程序代码等。存储器还可以用于暂时地存储已经输出或者将要输出的数据。The memory includes a readable storage medium, an internal memory, etc., wherein the internal memory may be the memory of a computer device, and the internal memory provides an environment for the operation of an operating system and computer-readable instructions in the readable storage medium. The readable storage medium may be a hard disk of a computer device, and in other embodiments may also be an external storage device of a computer device, for example, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), etc. equipped on a computer device. Further, the memory may also include both an internal storage unit of a computer device and an external storage device. The memory is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as program codes of computer programs, etc. The memory may also be used to temporarily store data that has been output or is to be output.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，仅以上述各功能单元、模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能单元、模块完成，即将装置的内部结构划分成不同的功能单元或模块，以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中，上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。另外，各功能单元、模块的具体名称也只是为了便于相互区分，并不用于限制本发明的保护范围。上述装置中单元、模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明实现上述实施例方法中的全部或部分流程，可以通过计算机程序来指令相关的硬件来完成，计算机程序可存储于一计算机可读存储介质中，该计算机程序在被处理器执行时，可实现上述方法实施例的步骤。其中，计算机程序包括计算机程序代码，计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机可读介质至少可以包括：能够携带计算机程序代码的任何实体或装置、记录介质、计算机存储器、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区，根据立法和专利实践，计算机可读介质不可以是电载波信号和电信信号。It can be clearly understood by those skilled in the art that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiment can be integrated into a processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of the present invention. The specific working process of the units and modules in the above-mentioned device can refer to the corresponding process in the above-mentioned method embodiment, which will not be repeated here. If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on such an understanding, the present invention implements all or part of the process in the above-mentioned embodiment method, which can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the steps of the above-mentioned method embodiment can be implemented. Among them, the computer program includes computer program code, which can be in source code form, object code form, executable file or some intermediate form. Computer readable media can at least include: any entity or device capable of carrying computer program code, recording medium, computer memory, read-only memory (ROM), random access memory (RAM), electric carrier signal, telecommunication signal and software distribution medium. For example, USB flash drive, mobile hard disk, magnetic disk or optical disk. In some jurisdictions, according to legislation and patent practice, computer readable media cannot be electric carrier signal and telecommunication signal.

本发明实现上述实施例方法中的全部或部分流程，也可以通过一种计算机程序产品来完成，当计算机程序产品在计算机设备上运行时，使得计算机设备执行时实现可实现上述方法实施例中的步骤。The present invention implements all or part of the processes in the above-mentioned embodiment method, and can also be completed by a computer program product. When the computer program product runs on a computer device, the computer device can implement the steps in the above-mentioned method embodiment when executing.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述或记载的部分，可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described or recorded in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.

在本发明所提供的实施例中，应该理解到，所揭露的装置/计算机设备和方法，可以通过其它的方式实现。例如，以上所描述的装置/计算机设备实施例仅仅是示意性的，例如，模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口，装置或单元的间接耦合或通讯连接，可以是电性，机械或其它的形式。In the embodiments provided by the present invention, it should be understood that the disclosed devices/computer equipment and methods can be implemented in other ways. For example, the device/computer equipment embodiments described above are only schematic, for example, the division of modules or units is only a logical function division, and there may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围，均应包含在本发明的保护范围之内。The above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit the same. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that the technical solutions described in the aforementioned embodiments may still be modified, or some of the technical features may be replaced by equivalents. Such modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in the protection scope of the present invention.

Claims

1. A method of training a body weight recognition model, the body weight recognition model comprising a pre-trained encoder, the training method comprising:

acquiring an input human body image and a preset classification dimension of the input human body image;

Based on the pre-trained encoder, carrying out partial encoding on the input human body image to obtain a partial encoding result, inputting the partial encoding result into a decoder for reconstruction, and calculating reconstruction loss;

respectively encoding the two data-enhanced images of the input human body image based on the pre-trained encoder to obtain two enhanced encoding results, performing contrast learning on the two enhanced encoding results, and calculating contrast learning loss;

Performing all encoding on the input human body image based on the pre-trained encoder to obtain all encoding results, performing projection classification on all encoding results to obtain classification results, clustering all encoding results based on the preset classification dimension to obtain clustering results, and calculating classification loss according to the clustering results and the classification results;

Optimizing the pre-trained encoder according to the reconstruction loss, the contrast learning loss and the classification loss to obtain an optimized encoder;

and constructing a human body re-identification model based on the optimized encoder to obtain a target human body re-identification model.

2. The training method of claim 1, wherein the partially encoding the input human image based on the pre-trained encoder to obtain a partial encoding result, and inputting the partial encoding result to a decoder for reconstruction, and calculating the reconstruction loss comprises:

partitioning the input human body image to obtain N image blocks of the input human body image, wherein N is an integer greater than 1;

Selecting M target image blocks and N-M residual image blocks from the N image blocks, and respectively carrying out feature coding on the M target image blocks based on the pre-trained encoder to obtain M coding results, wherein M is an integer greater than zero and less than N;

Acquiring initialization features of the N-M residual image blocks, inputting the M coding results and the initialization features into a decoder for reconstruction, and obtaining reconstruction results of the N image blocks;

and calculating reconstruction loss according to the reconstruction results of the N image blocks and the N image blocks.

3. The training method of claim 1, wherein the encoding the two data-enhanced images of the input human image based on the pre-trained encoder respectively to obtain two enhanced encoding results, and performing contrast learning on the two enhanced encoding results, and calculating a contrast learning loss comprises:

Performing data enhancement processing on the input human body image by using a preset first enhancement mode to obtain a first enhancement image;

Performing data enhancement processing on the input human body image by using a preset second enhancement mode to obtain a second enhancement image, wherein the first enhancement image and the second enhancement image are images obtained by enhancing two types of data of the input human body image;

Respectively carrying out feature coding on the first enhanced image and the second enhanced image to obtain a first coding result of the first enhanced image and a second coding result of the second enhanced image;

performing projective transformation on the first coding result to obtain a transformed coding result;

and calculating a contrast learning loss according to the transformed coding result and the second coding result.

4. The training method of claim 1, wherein the predetermined classification dimensions include a background classification dimension, a human upper body classification dimension, and a human lower body classification dimension, the performing projection classification on the all encoded results to obtain classification results, clustering the all encoded results based on the predetermined classification dimensions to obtain clustering results, and calculating classification loss according to the clustering results and the classification results, comprising:

respectively carrying out background projection, upper body projection and lower body projection on all the coding results to obtain a background classification result, an upper body classification result and a lower body classification result of the input human body image;

Based on the preset classification dimension, clustering all the coding results to obtain a background clustering result, a human upper body clustering result and a human lower body clustering result of the input human image;

determining a pseudo tag of the input human body image according to the background clustering result, the human body upper body clustering result and the human body lower body clustering result;

and calculating the classification loss according to the background classification result, the upper human body classification result and the lower human body classification result of the input human body image and the pseudo tag.

5. The training method of claim 1, wherein said optimizing said pre-trained encoder based on said reconstruction loss, said contrast learning loss, and said classification loss, results in an optimized encoder, comprising:

weighting and adding the reconstruction loss, the contrast learning loss and the classification loss to obtain total loss;

and optimizing the pre-trained encoder according to the total loss to obtain an optimized encoder.

6. The training method of claim 1, wherein the pre-training process of the pre-trained encoder comprises:

Acquiring an initial encoder and training data based on a self-attention mechanism structure, wherein the training data is an image containing a human body;

And pre-training the initial encoder by using the training data to obtain a pre-trained encoder.

7. A training device for a body weight recognition model, wherein the body weight recognition model includes a pre-trained encoder, the training device comprising:

the acquisition module is used for acquiring an input human body image and a preset classification dimension of the input human body image;

the reconstruction loss calculation module is used for carrying out partial coding on the input human body image based on the pre-trained encoder to obtain a partial coding result, inputting the partial coding result into a decoder for reconstruction, and calculating reconstruction loss;

The contrast learning loss calculation module is used for respectively encoding the two data-enhanced images of the input human body image based on the pre-trained encoder to obtain two enhanced encoding results, and carrying out contrast learning on the two enhanced encoding results to calculate contrast learning loss;

The classification loss calculation module is used for carrying out all encoding on the input human body image based on the pre-trained encoder to obtain all encoding results, carrying out projection classification on all encoding results to obtain classification results, carrying out clustering on all encoding results based on the preset classification dimension to obtain a clustering result, and calculating classification loss according to the clustering result and the classification result;

the optimization module is used for optimizing the pre-trained encoder according to the reconstruction loss, the comparison learning loss and the classification loss to obtain an optimized encoder;

The construction module is used for constructing a human body re-identification model based on the optimized encoder to obtain a target human body re-identification model.

8. The training apparatus of claim 7 wherein said reconstruction loss calculation module comprises:

The block dividing unit is used for dividing the input human body image into blocks to obtain N image blocks of the input human body image, wherein N is an integer greater than 1;

A selecting unit, configured to select M target image blocks and N-M remaining image blocks from the N image blocks, and perform feature encoding on the M target image blocks based on the pre-trained encoder, to obtain M encoding results, where M is an integer greater than zero and less than N;

A reconstruction unit, configured to obtain initialization features of the N-M remaining image blocks, input the M encoding results and the initialization features into a decoder for reconstruction, and obtain reconstruction results of the N image blocks;

and the first calculation unit is used for calculating the reconstruction loss according to the reconstruction results of the N image blocks and the N image blocks.

9. A computer device, characterized in that it comprises a processor, a memory and a computer program stored in the memory and executable on the processor, which processor implements the training method according to any of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the training method according to any of claims 1 to 7.