CN117373076A

CN117373076A - Attribute identification method, attribute identification system and related device

Info

Publication number: CN117373076A
Application number: CN202311149747.9A
Authority: CN
Inventors: 赵薇; 林垠
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2023-09-06
Filing date: 2023-09-06
Publication date: 2024-01-09

Abstract

The application discloses an attribute identification method, an attribute identification system and a related device, wherein the method comprises the following steps: acquiring an image to be identified; wherein the image to be identified belongs to a preset object; inputting the image to be identified into a trained student model for attribute identification to obtain an attribute identification result of the preset object in the image to be identified; the student models and the teacher models corresponding to the student models comprise attribute prediction networks, the student models further comprise space transformation networks, the teacher models are obtained through training based on transformation training images, the transformation training images are obtained through affine transformation based on original training images, and the teacher models after training are used for guiding the student models to conduct attribute recognition on the original training images, so that the student models after training are obtained. Through the mode, the attribute identification efficiency and accuracy can be improved.

Description

Attribute identification method, attribute identification system and related device

Technical Field

The present disclosure relates to the field of image recognition technologies, and in particular, to a method and a system for identifying an attribute, and a related device.

Background

With the continuous development of intelligence, attribute identification of images is widely applied in more and more scenes. At present, when attribute identification is performed on a face area image, key point information of the face area is often detected in advance through a front model, and then relevant attribute identification is performed according to the image containing the key point information. In the method, more calculation resources are required to be occupied when key point detection is carried out, so that the attribute identification efficiency is low; meanwhile, the accuracy of attribute identification is easily affected by the detection effect of the key points, so that the accuracy of attribute identification is low. In view of this, how to propose an attribute identification method with higher efficiency and accuracy is a problem to be solved.

Disclosure of Invention

The technical problem that this application mainly solves is to provide a attribute identification method, system and relevant device, can improve attribute identification efficiency and accuracy.

In order to solve the technical problems, one technical scheme adopted by the application is as follows: provided is an attribute identification method, comprising: acquiring an image to be identified; wherein the image to be identified belongs to a preset object; inputting the image to be identified into a trained student model for attribute identification to obtain an attribute identification result of the preset object in the image to be identified; the student models and the teacher models corresponding to the student models comprise attribute prediction networks, the student models further comprise space transformation networks, the teacher models are obtained through training based on transformation training images, the transformation training images are obtained through affine transformation based on original training images, and the teacher models after training are used for guiding the student models to conduct attribute recognition on the original training images, so that the student models after training are obtained.

In order to solve the technical problems, another technical scheme adopted by the application is as follows: there is provided an attribute identification system comprising: the acquisition module is used for acquiring the image to be identified; wherein the image to be identified belongs to a preset object; the recognition module is used for inputting the image to be recognized into the trained student model for attribute recognition to obtain an attribute recognition result of the preset object in the image to be recognized; the student models and the teacher models corresponding to the student models comprise attribute prediction networks, the student models further comprise space transformation networks, the teacher models are obtained through training based on transformation training images, the transformation training images are obtained through affine transformation based on original training images, and the teacher models after training are used for guiding the student models to conduct attribute recognition on the original training images, so that the student models after training are obtained.

In order to solve the technical problems, another technical scheme adopted by the application is as follows: there is provided an electronic device including: a memory and a processor coupled to each other, the memory having stored therein program instructions for executing the program instructions to implement the attribute identification method as mentioned in the above-mentioned technical solution.

In order to solve the technical problems, another technical scheme adopted by the application is as follows: there is provided a computer readable storage medium having stored thereon program instructions, characterized in that the program instructions, when executed by a processor, implement the attribute identification method as mentioned in the above-mentioned technical solution.

The beneficial effects of this application are: different from the condition of the prior art, the attribute identification method provided by the application carries out attribute identification on the image to be identified by utilizing the trained student model comprising the space transformation network so as to obtain an attribute identification result of a preset object in the image to be identified. The student model is obtained by guiding training by using a corresponding teacher model, and the teacher model is obtained by training by using a transformation training image obtained after affine transformation, so that the trained student model has better space transformation capability and attribute recognition capability, and the trained student model is used for carrying out attribute recognition on an image to be recognized so as to improve recognition accuracy and recognition efficiency.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a flow chart of an embodiment of a method for attribute identification of the present application;

FIG. 2 is a schematic diagram of an embodiment of a teacher model of the present application;

FIG. 3 is a schematic diagram of an embodiment of a student model of the present application;

FIG. 4 is a flow chart of one embodiment of the teacher model and student model training process of the present application;

fig. 5 is a flow chart corresponding to the step S203;

FIG. 6 is a schematic diagram of an embodiment of an adjustment module of the present application;

FIG. 7 is a schematic diagram of an embodiment of an attribute identification system of the present application;

FIG. 8 is a schematic diagram of an embodiment of a mobile terminal according to the present application;

fig. 9 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Referring to fig. 1, fig. 2 and fig. 3, fig. 1 is a flow chart of an embodiment of a method for identifying attributes of the present application, fig. 2 is a schematic diagram of an embodiment of a teacher model of the present application, and fig. 3 is a schematic diagram of an embodiment of a student model of the present application. The attribute identification method specifically comprises the following steps:

s101: acquiring an image to be identified; wherein the image to be identified belongs to a preset object.

In an embodiment, an image to be identified, which is required to be identified by an attribute, is acquired, and the image to be identified may be an image acquired by an associated acquisition device, and may be an image in RGB format or an infrared image.

The image to be identified belongs to a preset object, and in the corresponding embodiment of the application, the preset object is a face, that is, the image to be identified is an image of a face area, and the attribute to be identified related to the image to be identified is a face related attribute, for example: at least one of expression, family name, gender, age, hairstyle, or whether to wear a facial accessory, etc. Of course, in other embodiments, the image to be identified may be other images of the region and the face region, and when the image to be identified is other images, the corresponding attribute to be identified is the attribute related thereto.

In another embodiment, the image to be identified may be obtained through other ways, for example, an image received or downloaded through a network, etc.

In still another embodiment, after the original image is acquired, the original image is preprocessed, and the preprocessed original image is used as the image to be recognized. Among them, the preprocessing of the original image includes, but is not limited to, format conversion and/or size conversion and/or noise reduction processing, etc. In addition, the specific manner of obtaining the original image can be referred to the above embodiment.

S102: and inputting the image to be identified into the trained student model for attribute identification to obtain an attribute identification result of the preset object in the image to be identified. The student models and the corresponding teacher models comprise attribute prediction networks, the student models further comprise space transformation networks, the teacher models are obtained by training based on transformation training images, the transformation training images are obtained by affine transformation based on original training images, and the trained teacher models are used for guiding the student models to conduct attribute recognition on the original training images, so that the trained student models are obtained.

In an embodiment, in response to the student model 200 including the attribute prediction network 10 and the spatial transformation network, the image to be identified is input into the trained student model 200, so that the student model 200 preferentially uses the spatial transformation network to detect key points of the input image to be identified, and then inputs the image to be identified including the key point information into the attribute prediction network 10, so as to obtain an attribute identification result of a preset object in the image to be identified.

Specifically, the teacher model 100 corresponding to the student model 200 is trained using a plurality of transformation training images including corresponding key point information. The trained student model 200 is obtained by training on the basis of the trained teacher model 100, and the student model 200 and the teacher model 100 both include the attribute prediction network 10, but the student model 200 further includes a spatial transformation network, unlike the teacher model 100. The training teacher model 100 is utilized to conduct guiding training on the student model 200 comprising the spatial transformation network, so that the trained student model 200 can directly conduct key point recognition and attribute recognition on an input image, and therefore attribute recognition results of preset objects in the image to be recognized are obtained.

In one implementation scenario, the transformed training image used to train the teacher model 100 is obtained by performing affine transformation on an original training image to detect a plurality of keypoints, and the original training image including the detected keypoint information is used as the transformed training image.

In addition, it should be noted that, when the attribute identification is performed on the image to be identified, only one attribute to be identified in the image to be identified may be identified, or multiple attributes to be identified may be simultaneously identified on the image to be identified. When various attributes to be identified are identified, the obtained attribute identification result comprises identification sub-results corresponding to each attribute to be identified

According to the attribute identification method, the training student model 200 comprising the spatial transformation network is utilized to carry out attribute identification on the image to be identified, so that an attribute identification result of a preset object in the image to be identified is obtained. In response to the student model 200 being obtained by performing instruction training by using the corresponding teacher model 100, and the teacher model 100 being obtained by performing training by using the transformation training image obtained after affine transformation, the trained student model 200 has better spatial transformation capability and attribute recognition capability, so that the trained student model 200 is used for performing attribute recognition on the image to be recognized to improve recognition accuracy and recognition efficiency.

Referring to fig. 4 in conjunction with fig. 2 and 3, fig. 4 is a schematic flow chart of an embodiment of the training process of the teacher model and the student model of the present application. The attribute recognition method provided in response to the application is obtained by performing attribute recognition by using the trained student model 200, and the following describes a training process of the student model 200 and the corresponding teacher model 100, which specifically includes:

s201: and inputting the transformation training image into a teacher model for attribute identification to obtain a first prediction result, and obtaining a first prediction loss based on the first prediction result and the attribute label.

Specifically, before step S201, it includes: an original training image is acquired. The original training image is preprocessed to correspond to the corresponding training size. And carrying out affine transformation on the original training image relative to the standard image, and adjusting the affine transformation to a training size to obtain a transformed training image corresponding to the original training image.

In one embodiment, the preprocessing of the original training image includes: firstly, converting an original training image into a preset training size, and performing standardization processing to help improve the subsequent recognition rate of the original training image. Then, in response to the possible problems of shielding, angle difference and the like of the original training image, an affine matrix between the original training image and the standard image is determined in advance through a key point detection algorithm. Through the affine matrix, each pixel point in the face area in the original training image can be mapped into the standard image. In response to the standard image including a plurality of predetermined standard keypoints, training keypoints corresponding to the respective standard keypoints in the original training image can be determined through the affine matrix. In order to prevent the size of the original training image from changing in the affine transformation process, after the key point information in the original training image is determined, the original training image containing the key point information is converted into the training size so as to improve the accuracy of subsequent training.

In another embodiment, after preprocessing the original training image, a plurality of pieces of key point information in the original training image can be determined by a manual labeling mode, and the original training image containing the manually labeled key point information is used as a transformation training image.

Further, after obtaining the transformed training image, step S201 includes: in response to both the student model 200 and the teacher model 100 including the attribute prediction network 10, the attribute prediction network 10 includes a plurality of task branches, each task branch matching at least one attribute to be identified. After the transformed training image is input to the teacher model 100, a corresponding first prediction result is obtained, where the first prediction result includes an identification sub-result of the attribute to be identified output by each task branch. It should be noted that, in fig. 2 and fig. 3, only two task branches, i.e., a first task branch and a second task branch, are schematically drawn, however, in other embodiments, the number of task branches may be 3, 4, 5, or the like, and may specifically be set according to practical situations.

In a specific application scenario, in response to a task branch in the teacher model 100 being used to identify whether a face region in an image wears related wear-decorations, where the task branch includes multiple attributes to be identified such as "whether ear nails are worn", "whether mask is worn", and "whether hat is worn", then an identification sub-result corresponding to the task branch output by the teacher model 100 includes a result corresponding to each attribute to be identified.

In addition, in response to the student model 200 and the teacher model 100 including the attribute prediction network 10 having the same structure, the teacher model 100 performs attribute recognition on the transformed training image, and the specific process of obtaining the first prediction result may refer to the following S203, which is not described in detail herein.

Further, in response to the transformed training image including the corresponding attribute tags, a first predictive loss is calculated from the identification sub-results of each attribute to be identified and the attribute tags corresponding to the transformed training image.

In one embodiment, the cross entropy loss function and the mean absolute error (Mean Absolute Error, MAE) are used as a first target loss function to calculate the first predicted loss.

S202: based on the first prediction loss, parameters of the teacher model are adjusted until a first convergence condition is met, and the trained teacher model is obtained.

In one embodiment, in response to obtaining the first predicted loss, parameters in the teacher model 100 are adjusted according to the first predicted loss until a first convergence condition is met, and a trained teacher model 100 is obtained.

In an implementation scenario, in response to reaching a preset training round or convergence of the first prediction loss, it is determined that the first convergence condition is satisfied and training of the teacher model 100 is stopped, and the trained teacher model 100 is obtained.

S203: and inputting the original training image into the student model for attribute identification to obtain a second prediction result, and obtaining a second prediction loss based on the second prediction result and the attribute label.

In one embodiment, as shown in fig. 2 and 3, the attribute prediction network 10 in the student model 200 and the teacher model 100 further includes a feature extraction branch, where the feature extraction branch includes a plurality of feature layers that are cascaded in turn, and the task branch includes an adjustment module matched with each feature layer, and an output layer matched with an attribute to be identified. It should be noted that, in fig. 2 and 3, only three feature layers, namely, feature layer a, feature layer b, and feature layer c, are schematically illustrated, however, in practical application, the number of feature layers may be other, and may be specifically set according to practical situations.

Referring to fig. 5 in conjunction with fig. 3, fig. 5 is a flow chart corresponding to an embodiment of step S203. The specific implementation process of step S203 includes:

s301: and inputting the original image into a spatial transformation network of the student model to obtain a spatial transformation image.

In one embodiment, the original image is input to a spatial transformation network in the student model 200 such that the spatial transformation network performs keypoint detection on the original image, thereby obtaining a spatial transformation image containing keypoint information. In response to the inclusion of the spatial transformation network in the student model 200, the capability of performing the keypoint detection on the spatial transformation network is simultaneously trained in the training process of the student model 200, so that the trained student model 200 has better keypoint detection capability and attribute recognition capability.

S302: and inputting the space transformation image into a feature layer of a feature extraction branch in an attribute prediction network of the student model to obtain extracted features, and inputting the extracted features into an adjustment module matched with the current feature layer to obtain adjusted features.

In one embodiment, as shown in fig. 3, in response to acquiring a spatially transformed image output by the spatially transforming network, the spatially transformed image is input to a feature extraction branch in the attribute prediction network 10 in the student model 200, such that a first feature layer a in the feature extraction branch performs feature extraction on the spatially transformed image to obtain extracted features in corresponding dimensions.

Further, the extracted features output by the first feature layer a are output to an adjusting module a matched with the first feature layer a in each task branch, so that the adjusting module a extracts relevant attribute features of the extracted features to obtain adjusted features. The relevant attribute features are at least relevant to attributes to be identified corresponding to task branches where the corresponding adjusting modules are located.

In another embodiment, please refer to fig. 6 in conjunction with fig. 3, fig. 6 is a schematic structural diagram of an embodiment of the adjustment module of the present application. The adjusting module corresponding to the task branch comprises an attention gate layer and a gating layer, wherein the attention layer is used for extracting the input characteristics based on a channel attention mechanism and a space attention mechanism to obtain self-attention characteristics comprising various attributes, and the gating layer is used for carrying out weighted fusion on various attribute information contained in the self-attention characteristics to obtain the adjusted characteristics. The weighting weight of the gating layer is related to the attribute to be identified matched with the task branch.

Specifically, in response to obtaining the extracted features output by the matched feature layers, the attention gate layer in the adjustment module sequentially uses the channel attention mechanism and the space attention mechanism to further extract the extracted features, so that the attention gate focuses more on the features related to the to-be-identified attributes corresponding to each task branch in the extracted features, and extracts the features related to the to-be-identified attributes corresponding to all task branches as self-attention features. And inputting the self-attention characteristics into a gating layer in the current adjusting module, carrying out weighted fusion processing on the characteristics related to each attribute to be identified in the self-attention characteristics by utilizing the gating layer, and focusing on the characteristics related to the attribute to be identified corresponding to the task branch where the current adjusting module is positioned, namely, increasing the weight of the characteristics related to the attribute to be identified corresponding to the current adjusting module in the self-attention characteristics, thereby obtaining the adjusted characteristics. The adjusted features obtained in this way include significant features related to the corresponding attribute to be identified, which is helpful for optimizing the training effect and improving the accuracy of identifying the corresponding attribute to be identified by the student model 200 obtained after training.

In a specific application scenario, in response to the student model 200 including a first task branch and a second task branch, the first task branch is used for identifying an "expression" attribute and includes a plurality of first adjustment modules; the second task branch is used for identifying the face attribute and comprises a plurality of second adjustment modules. For any adjusting module in the first task branch, after the extracted features output by the corresponding feature layers are obtained, the attention gate layer in the adjusting module firstly performs feature extraction on the extracted features through a channel attention mechanism and then through a space attention mechanism so as to extract the features related to the 'expression' and the 'face' attributes as self-attention features. Then, the gating layer in the adjusting module in the first task branch performs weighting processing on the self-attention features to improve the weight of the features related to the expression in the self-attention features, so as to obtain adjusted features.

S303: the extracted features output by the current feature layer are input to the next feature layer of the feature extraction branch to obtain updated extracted features, the updated extracted features and the current adjusted features are input to an adjustment module matched with the next feature layer to obtain updated adjusted features, and the next feature layer is updated to be the current feature layer.

In one embodiment, in response to obtaining the extracted feature of the current feature layer output, the current extracted feature is input to a next feature layer of the feature extraction branch, such that the next feature layer performs feature extraction on the input current extracted feature to obtain an updated extracted feature, the updated extracted feature belonging to a different dimension than the pre-update extracted feature of the current feature layer output. And responding to the current adjusted feature output by the adjusting module matched with the current feature layer, and inputting the updated extracted feature and the current adjusted feature into the adjusting module matched with the next feature layer to obtain the updated adjusted feature output by the adjusting module matched with the next feature layer. After the updated adjusted feature is obtained, the next feature layer is updated to the current feature layer. The specific process of extracting the input updated extracted feature and the current adjusted feature by the adjustment module to obtain an updated adjustment feature may refer to the corresponding embodiment, and will not be described herein.

In a specific application scenario, as shown in fig. 3, in response to obtaining an extracted feature output by the feature layer a, the extracted feature is input to the next feature layer b, so that the feature layer b performs feature extraction on the input extracted feature to obtain an updated extracted feature. In response to obtaining the current adjusted feature through the adjustment module a matched with the current feature layer a, inputting the current adjusted feature and the updated extracted feature into the adjustment module b matched with the feature layer b, so that the adjustment module b performs feature extraction on the input feature to obtain the updated adjusted feature. And after obtaining the updated adjusted feature output by the adjusting module b, taking the feature layer b as the current feature layer.

Further, in response to the feature extraction branch including a plurality of feature layers, after the updated extracted features are obtained in step S303, the method further includes: and returning to the step of inputting the extracted features output by the current feature layer to the next feature layer of the feature extraction branch to obtain updated extracted features until the last adjustment module of each task branch outputs the final adjusted features, and inputting the final adjusted features to the output layer to obtain the recognition sub-result of the attribute to be recognized. And taking the identification sub-results of the attribute to be identified, which are output by all the output layers, as second prediction results.

According to the scheme, the plurality of feature layers and the adjustment model matched with the plurality of feature layers are arranged, so that the adjusted feature output by the last adjustment module of each task branch comprises the feature under a plurality of dimensions, and the accuracy of attribute identification of the image to be identified by using the student model 200 obtained through training is improved.

S304: and obtaining a second prediction loss based on the second prediction result and the attribute tag.

In an embodiment, the second prediction loss is calculated according to the second prediction result and the attribute tag corresponding to the original image.

In an implementation scenario, the cross entropy loss function and the average absolute error may be used as a second objective function corresponding to the second prediction result, so as to calculate the second prediction loss, and the specific process may refer to the corresponding embodiment.

S204: and distilling the characteristics extracted by the student model aiming at the original training image by utilizing the characteristics extracted by the trained teacher model aiming at the transformed training image corresponding to the original training image to obtain distillation loss.

In one embodiment, the extracted features output by the student model 200 at each feature layer in the feature extraction branch are distilled using the extracted features output by the trained teacher model 100 at each feature layer in the feature extraction branch; and distilling the adjusted features output by the training teacher model 100 at the adjustment module of each task branch to obtain distillation loss.

Specifically, the trained teacher model 100 is utilized to perform attribute recognition on the transformed training image corresponding to the original training image again, so as to obtain extracted features output by each feature layer in the attribute recognition process; and, the student model 200 is utilized to perform attribute recognition on the corresponding original training image, and extracted features output by each feature layer in the attribute recognition process are obtained. And distilling the extracted features output by the corresponding feature layers in the student model 200 by using the extracted features output by the feature layers in the teacher model 100 to obtain a first distiller loss. Acquiring the adjusted characteristics output by each adjustment module in the teacher model 100 when the attribute of the transformation training image is identified; and acquiring the adjusted characteristics output by each adjustment module in the student model 200 when the original training image corresponding to the transformed training image is subjected to attribute recognition. And distilling the adjusted characteristics output by the corresponding adjusting modules in the student model 200 by utilizing the adjusted characteristics output by the adjusting modules in the teacher model 100 to obtain a second distillation sub-loss. The sum of the first distillate loss and the second distillate loss is referred to as the distillation loss.

In another embodiment, the extracted features output by the last feature layer of the feature extraction branch in the teacher model 100 are obtained when the attribute of the transformation training image is identified; and acquiring the extracted features output by the last feature layer of the feature extraction branch in the student model 200 when the original training image corresponding to the transformed training image is subjected to attribute recognition. The extracted features of the last feature layer output in the student model 200 are distilled using the extracted features of the last feature layer output in the teacher model 100 to obtain a first distillate loss. And acquiring the adjusted characteristics output by the last adjustment module of each task branch in the teacher model 100 when the attribute of the transformation training image is identified; and acquiring the adjusted characteristics output by the last adjustment module of each task branch in the student model 200 when the original training image corresponding to the transformed training image is subjected to attribute recognition. The adjusted features output by the corresponding adjustment modules in the student model 200 are distilled using the adjusted features output by the last adjustment module in each task branch in the teacher model 100 to obtain a second distillation sub-loss. And, the sum of the first distiller loss and the second distiller loss is taken as the distillation loss.

S205: and adjusting parameters of the student model based on the second predicted loss and the distillation loss until a second convergence condition is met, so as to obtain the trained student model.

In one embodiment, in response to the second predicted loss and the distillation loss corresponding to the student model 200 being obtained through the above steps, a sum of the second predicted loss and the distillation loss is obtained as the total loss of the student model 200. And adjusting parameters of the student model 200 by using the total loss until the second convergence condition is met, so as to obtain the trained student model 200.

In another embodiment, a corresponding prediction weight may be set for the second prediction loss and a corresponding distillation weight may be set for the distillation loss, and the sum of the prediction weight and the distillation weight is 1. A first product of the second predictive loss and the corresponding predictive weight is obtained, and a second product of the distillation loss and the corresponding distillation weight is obtained. Taking the sum of the first product and the second product as the total loss of the student model 200, and adjusting the parameters of the student model 200 by utilizing the total loss until the second convergence condition is met, so as to obtain the trained student model 200. The specific calculation formula is as follows:

L＝L ₁ ×a+L ₂ ×b

Where L represents the total loss of the student model 200, L ₁ Representing a second predictive loss, a representing a predictive weight, L ₂ The distillation loss is represented, and b represents the distillation weight.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of an attribute identification system according to the present application. The attribute identification system includes an acquisition module 30 and an identification module 40 coupled to each other.

Specifically, the acquiring module 30 is configured to acquire an image to be identified. Wherein the image to be identified belongs to a preset object.

The recognition module 40 is configured to input an image to be recognized into the trained student model for performing attribute recognition, so as to obtain an attribute recognition result of the preset object in the image to be recognized. The student models and the corresponding teacher models comprise attribute prediction networks, the student models further comprise space transformation networks, the teacher models are obtained by training based on transformation training images, the transformation training images are obtained by affine transformation based on original training images, and the trained teacher models are used for guiding the student models to conduct attribute recognition on the original training images, so that the trained student models are obtained.

In one embodiment, referring to fig. 7, the attribute identification system further includes a training module 50 coupled to the identification module 40. The training module 50 is used for training a teacher model and a student model, and the specific training process includes: inputting the transformation training image into a teacher model for attribute identification to obtain a first prediction result, and obtaining a first prediction loss based on the first prediction result and an attribute label; based on the first prediction loss, adjusting parameters of the teacher model until a first convergence condition is met, and obtaining a trained teacher model; inputting the original training image into a student model for attribute identification to obtain a second prediction result, and obtaining a second prediction loss based on the second prediction result and an attribute label; distilling the characteristics extracted by the student model aiming at the original training image by utilizing the characteristics extracted by the trained teacher model aiming at the transformed training image corresponding to the original training image to obtain distillation loss; and adjusting parameters of the student model based on the second predicted loss and the distillation loss until a second convergence condition is met, so as to obtain the trained student model.

In one implementation scenario, the attribute prediction network includes a plurality of task branches, each task branch matching at least one attribute to be identified; the attribute identification result, the first prediction result and the second prediction result comprise identification sub-results of the attribute to be identified, which are output by each task branch.

In an implementation scenario, the attribute prediction network further includes a feature extraction branch, where the feature extraction branch includes a plurality of feature layers that are cascaded in turn, and the task awareness includes an adjustment module that matches each feature layer, and an output layer that matches an attribute to be identified. The training module 50 inputs the original training image to the student model for attribute recognition to obtain a second prediction result, including: inputting the original training image into a spatial transformation network of the student model to obtain a spatial transformation image; inputting the space transformation image into a feature layer of a feature extraction branch in an attribute prediction network of a student model to obtain extracted features, and inputting the extracted features into an adjustment module matched with the current feature layer to obtain adjusted features; the extracted features output by the current feature layer are input to the next feature layer of the feature extraction branch to obtain updated extracted features, the updated extracted features and the current adjusted features are input to an adjustment module matched with the next feature layer to obtain updated adjusted features, and the next feature layer is updated to be the current feature layer; and returning to the step of inputting the extracted features output by the current feature layer to the next feature layer of the feature extraction branch to obtain updated extracted features until the last adjustment module of each task branch outputs the final adjusted features, and inputting the final adjusted features to the output layer to obtain the recognition sub-result of the attribute to be recognized.

The adjusting module corresponding to the task branch comprises an attention layer and a gating layer, wherein the attention layer is used for extracting the input characteristics based on a channel attention mechanism and a space attention mechanism to obtain self-attention characteristics comprising various attribute information, and the gating layer is used for carrying out weighted fusion on various attribute information contained in the self-attention characteristics to obtain the adjusted characteristics; the weighting weight of the gating layer is related to the attribute to be identified matched with the task branch.

In an implementation scenario, the training module 50 distills features extracted from the student model with respect to the original training image by using features extracted from the trained teacher model with respect to the transformed training image corresponding to the original training image, to obtain a distillation loss, including: distilling the extracted features output by the student model at each feature layer in the feature extraction branch by using the extracted features output by the trained teacher model at each feature layer in the feature extraction branch, and distilling the adjusted features output by the training teacher model at the adjustment module of each task branch by using the adjusted features output by the training teacher model at the adjustment module of each task branch to obtain distillation loss.

In another embodiment, referring to fig. 7, the attribute identification system further includes a transformation module 60 coupled to the training module 50. Inputting the transformed training image into the teacher model for attribute recognition to obtain a first prediction result, wherein the transformation module 60 is used for obtaining an original training image before obtaining a first prediction loss based on the first prediction result and the attribute label; the original training images are preprocessed to correspond to the same training size; affine transformation is carried out on the original training image relative to the standard image, and the affine transformation is adjusted to the training size, so that a transformation training image corresponding to the original training image is obtained.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of a mobile terminal according to the present application. The mobile terminal includes: a memory 70 and a processor 80 coupled to each other. The memory 70 has stored therein program instructions for execution by the processor 80 to implement the attribute identification method as set forth in any of the above embodiments. Specifically, the electronic device includes, but is not limited to: desktop computers, notebook computers, tablet computers, servers, etc., are not limited herein. Further, the processor 80 may also be referred to as a CPU (Center Processing Unit, central processing unit). The processor 80 may be an integrated circuit chip having signal processing capabilities. Processor 80 may also be a general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 80 may be commonly implemented by an integrated circuit chip.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of a computer readable storage medium 90 of the present application, where program instructions 95 capable of being executed by a processor are stored in the computer readable storage medium 90, and when the program instructions 95 are executed by the processor, the attribute identifying method mentioned in any of the foregoing embodiments is implemented.

In the several embodiments provided in the present application, it should be understood that the disclosed methods and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all or part of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the patent application, and all equivalent structures or equivalent processes using the descriptions and the contents of the present application or other related technical fields are included in the scope of the patent application.

Claims

1. A method for identifying an attribute, comprising:

acquiring an image to be identified; wherein the image to be identified belongs to a preset object;

inputting the image to be identified into a trained student model for attribute identification to obtain an attribute identification result of the preset object in the image to be identified; the student models and the teacher models corresponding to the student models comprise attribute prediction networks, the student models further comprise space transformation networks, the teacher models are obtained through training based on transformation training images, the transformation training images are obtained through affine transformation based on original training images, and the teacher models after training are used for guiding the student models to conduct attribute recognition on the original training images, so that the student models after training are obtained.

2. The method of claim 1, wherein the original training image corresponds to an attribute tag, and the training process of the teacher model and the student model comprises:

Inputting the transformation training image into the teacher model for attribute identification to obtain a first prediction result, and obtaining a first prediction loss based on the first prediction result and the attribute label;

based on the first prediction loss, adjusting parameters of the teacher model until a first convergence condition is met, so as to obtain the trained teacher model;

inputting the original training image into the student model for attribute identification to obtain a second prediction result, and obtaining a second prediction loss based on the second prediction result and the attribute label;

distilling the characteristics extracted by the student model aiming at the original training image by utilizing the characteristics extracted by the trained teacher model aiming at the transformation training image corresponding to the original training image to obtain distillation loss;

and adjusting parameters of the student model based on the second predicted loss and the distillation loss until a second convergence condition is met, so as to obtain the trained student model.

3. The method of claim 2, wherein the attribute prediction network comprises a plurality of task branches, each task branch matching at least one attribute to be identified; the attribute identification result, the first prediction result and the second prediction result all comprise identification sub-results of the attribute to be identified, which are output by each task branch.

4. A method according to claim 3, wherein the attribute prediction network further comprises a feature extraction branch comprising a plurality of feature layers cascaded in sequence, the task branch comprising an adjustment module matching each of the feature layers, and an output layer matching the attribute to be identified;

inputting the original training image to the student model for attribute identification to obtain a second prediction result, wherein the method comprises the following steps:

inputting the original training image to a spatial transformation network of the student model to obtain a spatial transformation image;

inputting the space transformation image to a feature layer of a feature extraction branch in an attribute prediction network of the student model to obtain extracted features, and inputting the extracted features to an adjustment module matched with the current feature layer to obtain adjusted features;

the extracted features output by the current feature layer are input to the next feature layer of the feature extraction branch to obtain updated extracted features, the updated extracted features and the current adjusted features are input to an adjustment module matched with the next feature layer to obtain updated adjusted features, and the next feature layer is updated to be the current feature layer;

And returning to the step of inputting the extracted features output by the current feature layer to the next feature layer of the feature extraction branch to obtain updated extracted features until the last adjustment module of each task branch outputs the final adjusted features, and inputting the final adjusted features to the output layer to obtain the identification sub-result of the attribute to be identified.

5. The method according to claim 4, wherein the adjustment module corresponding to the task branch includes an attention layer and a gating layer, the attention layer is configured to perform feature extraction on an input feature based on a channel attention mechanism and a spatial attention mechanism to obtain a self-attention feature including multiple attribute information, and the gating layer is configured to perform weighted fusion on multiple attribute information included in the self-attention feature to obtain an adjusted feature; the weighted weight of the gating layer is related to the attribute to be identified matched with the task branch.

6. The method of claim 4, wherein distilling the features extracted by the student model for the original training image using the features extracted by the trained teacher model for the transformed training image corresponding to the original training image to obtain a distillation loss, comprises:

Distilling the extracted features output by the student model at each feature layer in the feature extraction branches by using the extracted features output by the trained teacher model at each feature layer in the feature extraction branches, and distilling the adjusted features output by the student model at the adjustment modules of each task branch by using the adjusted features output by the trained teacher model at the adjustment modules of each task branch to obtain the distillation loss.

7. The method of claim 2, wherein the inputting the transformed training image into the teacher model for attribute recognition to obtain a first prediction result, and before obtaining a first prediction loss based on the first prediction result and the attribute tag, further comprises:

acquiring an original training image; the original training images are preprocessed to correspond to the same training size;

affine transformation is carried out on the original training image relative to the standard image, and the affine transformation is adjusted to the training size, so that a transformation training image corresponding to the original training image is obtained.

8. An attribute identification system, comprising:

The acquisition module is used for acquiring the image to be identified; wherein the image to be identified belongs to a preset object;

the recognition module is used for inputting the image to be recognized into the trained student model for attribute recognition to obtain an attribute recognition result of the preset object in the image to be recognized; the student models and the teacher models corresponding to the student models comprise attribute prediction networks, the student models further comprise space transformation networks, the teacher models are obtained through training based on transformation training images, the transformation training images are obtained through affine transformation based on original training images, and the teacher models after training are used for guiding the student models to conduct attribute recognition on the original training images, so that the student models after training are obtained.

9. An electronic device, comprising: a memory and a processor coupled to each other, the memory having stored therein program instructions for executing the program instructions to implement the attribute identification method of any of claims 1-7.

10. A computer readable storage medium having stored thereon program instructions, which when executed by a processor implement the attribute identification method according to any of claims 1-7.