CN115880483A

CN115880483A - Neural network model distillation method, device and storage medium

Info

Publication number: CN115880483A
Application number: CN202111132667.3A
Authority: CN
Inventors: 康子健; 张培圳
Original assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2023-03-31

Abstract

The present disclosure relates to a neural network model distillation method, device and storage medium. The neural network model distillation method includes: determining the target image, and obtaining the object condition feature of the target image, the object condition feature is the feature information of the target object that is manually input; performing feature detection on the target image based on the teacher model, and obtaining the teacher model feature map, And based on the initial student model, perform feature detection on the target image to obtain the student model feature map; determine the correlation between the object features at each pixel position in the teacher model feature map and the object condition features, and based on the correlation, determine the attention of the teacher model Force distribution; based on the attention distribution of the teacher model and the feature difference between the feature map of the teacher model and the feature map of the student model, the initial student model is trained to obtain the target student model. A student model with higher object detection accuracy can be obtained through the present disclosure.

Description

A neural network model distillation method, device and storage medium

技术领域technical field

本公开涉及图像识别技术领域，尤其涉及一种神经网络模型蒸馏方法、装置及存储介质。The present disclosure relates to the technical field of image recognition, in particular to a neural network model distillation method, device and storage medium.

背景技术Background technique

知识蒸馏是一种提升网络性能的方法，其利用一个预训练的大模型(教师模型)来教学待训练模型(学生模型)，以此提高模型性能。相关技术中，知识蒸馏也广泛应用于图像识别领域，用以提高针对图像的取样及检测效果。Knowledge distillation is a method to improve network performance. It uses a pre-trained large model (teacher model) to teach the model to be trained (student model) to improve model performance. In related technologies, knowledge distillation is also widely used in the field of image recognition to improve the sampling and detection effects of images.

相关技术中，模型网络学到的易辨识区域和人类的观察认知不一定相符，这会限制模型的检测性能。In related technologies, the easily identifiable regions learned by the model network do not necessarily match human observation and cognition, which limits the detection performance of the model.

发明内容Contents of the invention

为克服相关技术中存在的问题，本公开提供一种神经网络模型蒸馏方法、装置及存储介质。In order to overcome the problems existing in related technologies, the present disclosure provides a neural network model distillation method, device and storage medium.

根据本公开实施例的第一方面，提供一种神经网络模型蒸馏方法，包括：According to a first aspect of an embodiment of the present disclosure, a neural network model distillation method is provided, including:

确定目标图像，并获取所述目标图像的物体条件特征，所述物体条件特征为人工输入并表征目标物体的特征信息；基于教师模型对所述目标图像进行特征检测，得到教师模型特征图，并基于初始学生模型对所述目标图像进行特征检测，得到学生模型特征图；确定所述教师模型特征图中各像素位置的物体特征与所述物体条件特征之间的相关性，并基于所述相关性，确定所述教师模型的注意力分布；基于所述教师模型的注意力分布，以及所述教师模型特征图与所述学生模型特征图之间的特征差异度，对所述初始学生模型进行训练，得到目标学生模型。Determining the target image, and obtaining the object condition feature of the target image, the object condition feature is manually input and characterizes feature information of the target object; performing feature detection on the target image based on the teacher model to obtain a teacher model feature map, and Perform feature detection on the target image based on the initial student model to obtain a student model feature map; determine the correlation between the object feature at each pixel position in the teacher model feature map and the object condition feature, and based on the correlation Determine the attention distribution of the teacher model; based on the attention distribution of the teacher model, and the feature difference between the teacher model feature map and the student model feature map, the initial student model is Train to get the target student model.

一实施方式中，确定所述教师模型特征图中各像素位置的物体特征与所述物体条件特征之间的相关性，包括：将所述教师模型特征图中各像素位置的物体特征，分解为多个第一子特征空间；针对所述教师模型特征图中各像素位置，分别基于不同的权重对所述多个第一子特征空间进行特征加权，得到特征加权后的物体特征；确定所述特征加权后的物体特征与所述物体条件特征之间的相关性，并根据所述特征加权后的物体特征与所述物体条件特征之间的相关性，得到针对每一像素位置的多个第一子特征空间的第一子特征空间相关性；针对所述教师模型特征图中各像素位置，将所述多个第一子特征空间的第一子特征空间相关性，进行归一化处理，得到对应每一像素位置的物体特征与所述物体条件特征之间的相关性。In one embodiment, determining the correlation between the object feature at each pixel position in the teacher model feature map and the object condition feature includes: decomposing the object feature at each pixel position in the teacher model feature map into A plurality of first sub-feature spaces; for each pixel position in the feature map of the teacher model, perform feature weighting on the plurality of first sub-feature spaces based on different weights, to obtain object features after feature weighting; determine the The correlation between the feature-weighted object feature and the object condition feature, and according to the correlation between the feature-weighted object feature and the object condition feature, a plurality of first pixel positions for each pixel position is obtained The first sub-feature space correlation of a sub-feature space; for each pixel position in the feature map of the teacher model, normalize the first sub-feature space correlations of the plurality of first sub-feature spaces, The correlation between the object feature corresponding to each pixel position and the object condition feature is obtained.

一实施方式中，基于所述教师模型的注意力分布，以及所述教师模型特征图与所述学生模型特征图之间的特征差异度，对所述初始学生模型进行训练，得到目标学生模型，包括：对所述教师模型的注意力分布进行优化；基于优化后的注意力分布，以及所述教师模型特征图与所述学生模型特征图之间的特征差异度，对所述初始学生模型进行训练，得到目标学生模型。In one embodiment, based on the attention distribution of the teacher model and the degree of feature difference between the feature map of the teacher model and the feature map of the student model, the initial student model is trained to obtain a target student model, Including: optimizing the attention distribution of the teacher model; based on the optimized attention distribution and the feature difference between the teacher model feature map and the student model feature map, the initial student model is Train to get the target student model.

一实施方式中，对所述教师模型的注意力分布进行优化，包括：将所述教师模型特征图中各像素位置的物体特征，分解为多个第二子特征空间；基于所述教师模型的注意力分布，对所述多个第二子特征空间进行特征加权；控制所述教师模型基于加权后的特征进行目标检测，并基于目标检测结果以及物体条件特征，调整所述教师模型的注意力分布，以使所述教师模型得到与所述物体条件特征一致的目标检测结果。In one embodiment, optimizing the attention distribution of the teacher model includes: decomposing the object features of each pixel position in the feature map of the teacher model into a plurality of second sub-feature spaces; Attention distribution, performing feature weighting on the plurality of second sub-feature spaces; controlling the teacher model to perform target detection based on the weighted features, and adjusting the attention of the teacher model based on the target detection result and object condition characteristics distribution, so that the teacher model can obtain target detection results consistent with the object condition characteristics.

一实施方式中，第一子特征空间与所述第二子特征空间所使用的分解方式不同；其中，所述分解方式包括键值分解方式和内容分解方式。In one embodiment, the decomposition methods used in the first sub-feature space and the second sub-feature space are different; wherein, the decomposition method includes a key-value decomposition method and a content decomposition method.

一实施方式中，得到与所述物体条件特征一致的目标检测结果，包括：得到与所述物体条件特征所对应的类别特征、位置特征以及尺度特征之一或组合相一致的目标检测结果。In an embodiment, obtaining a target detection result consistent with the object condition feature includes: obtaining a target detection result consistent with one or a combination of the category feature, position feature, and scale feature corresponding to the object condition feature.

一实施方式中，基于所述相关性，确定所述教师模型的注意力分布，包括：基于所述教师模型特征图中各像素位置的物体特征与所述物体条件特征之间的相关性大小关系，为所述教师模型特征图中各像素位置进行权重分配，并根据所分配的权重，得到所述教师模型的注意力分布；其中，针对所述教师模型特征图中各像素位置，所述相关性越大的像素位置，对应的权重越大。In one embodiment, based on the correlation, determining the attention distribution of the teacher model includes: based on the correlation size relationship between the object feature at each pixel position in the feature map of the teacher model and the object condition feature , assigning weights to each pixel position in the feature map of the teacher model, and obtaining the attention distribution of the teacher model according to the assigned weights; wherein, for each pixel position in the feature map of the teacher model, the correlation The higher the pixel position, the greater the corresponding weight.

一实施方式中，基于教师模型对所述目标图像进行特征检测，得到教师模型特征图，并基于初始学生模型对所述目标图像进行特征检测，得到学生模型特征图，包括：通过所述教师模型对所述目标图像进行多尺度特征检测，得到包括多尺度特征的教师模型特征图；并通过所述学生模型对所述目标图像进行多尺度特征检测，得到包括多尺度特征的学生模型特征图。In one embodiment, performing feature detection on the target image based on the teacher model to obtain a feature map of the teacher model, and performing feature detection on the target image based on the initial student model to obtain the feature map of the student model, including: using the teacher model performing multi-scale feature detection on the target image to obtain a teacher model feature map including multi-scale features; and performing multi-scale feature detection on the target image through the student model to obtain a student model feature map including multi-scale features.

根据本公开实施例的第二方面，提供一种神经网络模型蒸馏装置，包括：According to a second aspect of an embodiment of the present disclosure, a neural network model distillation device is provided, including:

确定单元，用于确定目标图像，以及用于确定教师模型特征图中各像素位置的物体特征与所述物体条件特征之间的相关性，并基于所述相关性，确定所述教师模型的注意力分布；获取单元，用于获取所述目标图像的物体条件特征，所述物体条件特征为人工输入并表征目标物体的特征信息；检测单元，用于基于所述教师模型对所述目标图像进行特征检测，得到教师模型特征图，并基于初始学生模型对所述目标图像进行特征检测，得到学生模型特征图；处理单元，用于基于所述教师模型的注意力分布，以及所述教师模型特征图与所述学生模型特征图之间的特征差异度，对所述初始学生模型进行训练，得到目标学生模型。A determination unit, configured to determine the target image, and to determine the correlation between the object feature at each pixel position in the teacher model feature map and the object condition feature, and based on the correlation, determine the attention of the teacher model Force distribution; acquisition unit, used to acquire the object condition feature of the target image, the object condition feature is manually input and characterizes the feature information of the target object; detection unit, used to perform the target image based on the teacher model feature detection, to obtain a teacher model feature map, and perform feature detection on the target image based on the initial student model, to obtain a student model feature map; a processing unit for attention distribution based on the teacher model, and the teacher model features The degree of feature difference between the graph and the feature map of the student model, the initial student model is trained to obtain the target student model.

一种实施方式中，所述确定单元采用如下方式确定所述教师模型特征图中各像素位置的物体特征与所述物体条件特征之间的相关性：将所述教师模型特征图中各像素位置的物体特征，分解为多个第一子特征空间；针对所述教师模型特征图中各像素位置，分别基于不同的权重对所述多个第一子特征空间进行特征加权，得到特征加权后的物体特征；确定所述特征加权后的物体特征与所述物体条件特征之间的相关性，并根据所述特征加权后的物体特征与所述物体条件特征之间的相关性，得到针对每一像素位置的多个第一子特征空间的第一子特征空间相关性；针对所述教师模型特征图中各像素位置，将所述多个第一子特征空间的第一子特征空间相关性，进行归一化处理，得到对应每一像素位置的物体特征与所述物体条件特征之间的相关性。In one embodiment, the determination unit determines the correlation between the object feature at each pixel position in the teacher model feature map and the object condition feature in the following manner: the pixel position in the teacher model feature map The object features of the object feature are decomposed into multiple first sub-feature spaces; for each pixel position in the feature map of the teacher model, feature weighting is performed on the multiple first sub-feature spaces based on different weights, and the feature weighted Object features; determining the correlation between the feature-weighted object features and the object condition features, and according to the correlation between the feature-weighted object features and the object condition features, for each The first sub-feature space correlation of multiple first sub-feature spaces of the pixel position; for each pixel position in the feature map of the teacher model, the first sub-feature space correlation of the multiple first sub-feature spaces, Perform normalization processing to obtain the correlation between the object feature corresponding to each pixel position and the object condition feature.

一种实施方式中，所述处理单元采用如下方式基于所述教师模型的注意力分布，以及所述教师模型特征图与所述学生模型特征图之间的特征差异度，对所述初始学生模型进行训练，得到目标学生模型：对所述教师模型的注意力分布进行优化；基于优化后的注意力分布，以及所述教师模型特征图与所述学生模型特征图之间的特征差异度，对所述初始学生模型进行训练，得到目标学生模型。In one embodiment, the processing unit uses the following method to process the initial student model based on the attention distribution of the teacher model and the feature difference between the teacher model feature map and the student model feature map. Perform training to obtain the target student model: optimize the attention distribution of the teacher model; based on the optimized attention distribution, and the feature difference between the teacher model feature map and the student model feature map, the The initial student model is trained to obtain a target student model.

一种实施方式中，所述处理单元采用如下方式对所述教师模型的注意力分布进行优化：将所述教师模型特征图中各像素位置的物体特征，分解为多个第二子特征空间；基于所述教师模型的注意力分布，对所述多个第二子特征空间进行特征加权；控制所述教师模型基于加权后的特征进行目标检测，并基于目标检测结果以及物体条件特征，调整所述教师模型的注意力分布，以使所述教师模型得到与所述物体条件特征一致的目标检测结果。In one embodiment, the processing unit optimizes the attention distribution of the teacher model in the following manner: decompose the object features at each pixel position in the feature map of the teacher model into multiple second sub-feature spaces; Based on the attention distribution of the teacher model, perform feature weighting on the plurality of second sub-feature spaces; control the teacher model to perform target detection based on the weighted features, and adjust the target detection results and object condition characteristics based on the target detection results. The attention distribution of the teacher model is adjusted so that the teacher model can obtain a target detection result that is consistent with the conditional characteristics of the object.

一种实施方式中，第一子特征空间与所述第二子特征空间所使用的分解方式不同；其中，所述分解方式包括键值分解方式和内容分解方式。In an implementation manner, the decomposition methods used in the first sub-feature space are different from those used in the second sub-feature space; wherein, the decomposition methods include a key-value decomposition method and a content decomposition method.

一种实施方式中，所述处理单元采用如下方式得到与所述物体条件特征一致的目标检测结果：得到与所述物体条件特征所对应的类别特征、位置特征以及尺度特征之一或组合相一致的目标检测结果。In one embodiment, the processing unit obtains a target detection result that is consistent with the conditional feature of the object in the following manner: Obtaining a result that is consistent with one or a combination of the category feature, position feature, and scale feature corresponding to the object conditional feature target detection results.

一种实施方式中，所述确定单元采用如下方式基于所述相关性，确定所述教师模型的注意力分布：基于所述教师模型特征图中各像素位置的物体特征与所述物体条件特征之间的相关性大小关系，为所述教师模型特征图中各像素位置进行权重分配，并根据所分配的权重，得到所述教师模型的注意力分布；其中，第一相关性对应的第一权重大于第二相关性对应的第二权重，所述第一相关性大于所述第二相关性。In one embodiment, the determination unit determines the attention distribution of the teacher model based on the correlation in the following manner: based on the relationship between the object feature at each pixel position in the teacher model feature map and the object condition feature Correlation size relationship between, carry out weight distribution for each pixel position in the feature map of the teacher model, and according to the assigned weight, obtain the attention distribution of the teacher model; wherein, the first weight corresponding to the first correlation greater than the second weight corresponding to the second correlation, the first correlation is greater than the second correlation.

一种实施方式中，所述检测单元采用如下方式基于教师模型对所述目标图像进行特征检测，得到教师模型特征图，并基于初始学生模型对所述目标图像进行特征检测，得到学生模型特征图：控制所述教师模型对所述目标图像进行多尺度特征检测，得到包括多尺度特征的教师模型特征图；并控制所述学生模型对所述目标图像进行多尺度特征检测，得到包括多尺度特征的学生模型特征图。In one embodiment, the detection unit performs feature detection on the target image based on the teacher model in the following manner to obtain a feature map of the teacher model, and performs feature detection on the target image based on the initial student model to obtain the feature map of the student model : controlling the teacher model to perform multi-scale feature detection on the target image to obtain a teacher model feature map including multi-scale features; and controlling the student model to perform multi-scale feature detection on the target image to obtain multi-scale features The feature map of the student model of .

根据本公开实施例第三方面，提供一种神经网络模型蒸馏装置，包括：According to a third aspect of an embodiment of the present disclosure, a neural network model distillation device is provided, including:

处理器；用于存储处理器可执行指令的存储器；processor; memory for storing instructions executable by the processor;

其中，所述处理器被配置为：执行第一方面或者第一方面任意一种实施方式中所述的神经网络模型蒸馏方法。Wherein, the processor is configured to: execute the neural network model distillation method described in the first aspect or any one implementation manner of the first aspect.

根据本公开实施例第四方面，提供一种存储介质，所述存储介质中存储有指令，当所述存储介质中的指令由处理器执行时，使得处理器能够执行第一方面或者第一方面任意一种实施方式中所述的神经网络模型蒸馏方法。According to the fourth aspect of the embodiments of the present disclosure, there is provided a storage medium, the storage medium stores instructions, and when the instructions in the storage medium are executed by the processor, the processor can execute the first aspect or the first aspect The neural network model distillation method described in any one of the implementations.

本公开的实施例提供的技术方案可以包括以下有益效果：确定目标图像，并获取目标图像的物体条件特征。该物体条件特征为人工输入并表征目标物体的特征信息，故，基于教师模型特征图中各像素位置的物体特征与物体条件特征之间的相关性，确定得到的教师模型的注意力分布是通过人工输入的物体条件特征得到的。因此，以该注意力分布进行目标检测，可以得到更加贴合人类观测视角的目标检测结果，一定程度上提高了学生模型的目标检测精度。The technical solution provided by the embodiments of the present disclosure may include the following beneficial effects: determine the target image, and acquire object condition characteristics of the target image. The object condition feature is manually input and characterizes the feature information of the target object. Therefore, based on the correlation between the object feature at each pixel position in the teacher model feature map and the object condition feature, the attention distribution of the obtained teacher model is determined by It is obtained by manually inputting object condition characteristics. Therefore, using this attention distribution for target detection can obtain target detection results that are more in line with human observation perspectives, and improve the target detection accuracy of the student model to a certain extent.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

图1是根据一示例性实施例示出的一种神经网络模型蒸馏方法的流程图。Fig. 1 is a flow chart of a neural network model distillation method according to an exemplary embodiment.

图2是根据一示例性实施例示出的另一种神经网络模型蒸馏方法的流程图。Fig. 2 is a flow chart showing another neural network model distillation method according to an exemplary embodiment.

图3是根据一示例性实施例示出的一种确定教师模型特征图中各像素位置的物体特征与物体条件特征之间的相关性的方法流程图。Fig. 3 is a flow chart of a method for determining the correlation between object features and object condition features at each pixel position in the teacher model feature map according to an exemplary embodiment.

图4是根据一示例性实施例示出的又一种神经网络模型蒸馏方法的流程图。Fig. 4 is a flow chart showing another neural network model distillation method according to an exemplary embodiment.

图5是根据一示例性实施例示出的一种训练初始学生模型的方法流程图。Fig. 5 is a flowchart showing a method for training an initial student model according to an exemplary embodiment.

图6是根据一示例性实施例示出的一种对教师模型的注意力分布进行优化的方法流程图。Fig. 6 is a flowchart of a method for optimizing the attention distribution of a teacher model according to an exemplary embodiment.

图7是根据一示例性实施例示出的另一种对教师模型的注意力分布进行优化的方法流程图。Fig. 7 is a flowchart of another method for optimizing the attention distribution of a teacher model according to an exemplary embodiment.

图8是根据一示例性实施例示出的一种训练初始学生模型的流程示意图。Fig. 8 is a schematic flowchart of training an initial student model according to an exemplary embodiment.

图9是根据一示例性实施例示出的一种神经网络模型蒸馏装置框图。Fig. 9 is a block diagram of a neural network model distillation device according to an exemplary embodiment.

图10是根据一示例性实施例示出的一种电子设备示意图。Fig. 10 is a schematic diagram of an electronic device according to an exemplary embodiment.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

在附图中，自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。所描述的实施例是本公开一部分实施例，而不是全部的实施例。下面通过参考附图描述的实施例是示例性的，旨在用于解释本公开，而不能理解为对本公开的限制。基于本公开中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本公开保护的范围。下面结合附图对本公开的实施例进行详细说明。In the drawings, the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The described embodiments are some, but not all, embodiments of the disclosure. The embodiments described below by referring to the figures are exemplary and are intended to explain the present disclosure and should not be construed as limiting the present disclosure. Based on the embodiments in the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present disclosure. Embodiments of the present disclosure will be described in detail below in conjunction with the accompanying drawings.

近年来，基于人工智能的计算机视觉、深度学习、机器学习、图像处理、图像识别等技术研究取得了重要进展。人工智能(Artificial Intelligence，AI)是研究、开发用于模拟、延伸人的智能的理论、方法、技术及应用系统的新兴科学技术。人工智能学科是一门综合性学科，涉及芯片、大数据、云计算、物联网、分布式存储、深度学习、机器学习、神经网络等诸多技术种类。计算机视觉作为人工智能的一个重要分支，具体是让机器识别世界，计算机视觉技术通常包括人脸识别、活体检测、指纹识别与防伪验证、生物特征识别、人脸检测、行人检测、目标检测、行人识别、图像处理、图像识别、图像语义理解、图像检索、文字识别、视频处理、视频内容识别、行为识别、三位重建、虚拟现实、增强现实、同步定位与地图构建(SLAM)、计算摄影、机器人导航与定位等技术。随着人工智能技术的研究和进步，该项技术在众多领域展开了应用，例如安防、城市管理、交通管理、楼宇管理、园区管理、人脸通行、人脸考勤、物流管理、仓储管理、机器人、智能营销、计算摄影、手机影像、云服务、智能家居、穿戴设备、无人驾驶、自动驾驶、智能医疗、人脸支付、人脸解锁、指纹解锁、人证核验、智慧屏、智能电视、摄像机、移动互联网、网络直播、美颜、美妆、医疗美容、智能测温等领域。In recent years, artificial intelligence-based computer vision, deep learning, machine learning, image processing, image recognition and other technologies have made important progress. Artificial Intelligence (AI) is an emerging science and technology that researches and develops theories, methods, technologies and application systems for simulating and extending human intelligence. The subject of artificial intelligence is a comprehensive subject that involves many technologies such as chips, big data, cloud computing, Internet of Things, distributed storage, deep learning, machine learning, and neural networks. As an important branch of artificial intelligence, computer vision is specifically to allow machines to recognize the world. Computer vision technology usually includes face recognition, liveness detection, fingerprint recognition and anti-counterfeiting verification, biometric recognition, face detection, pedestrian detection, target detection, pedestrian detection, etc. Recognition, image processing, image recognition, image semantic understanding, image retrieval, text recognition, video processing, video content recognition, behavior recognition, three-dimensional reconstruction, virtual reality, augmented reality, simultaneous localization and map construction (SLAM), computational photography, Robot navigation and positioning technology. With the research and progress of artificial intelligence technology, this technology has been applied in many fields, such as security, urban management, traffic management, building management, park management, face access, face attendance, logistics management, warehouse management, robots , smart marketing, computational photography, mobile imaging, cloud services, smart home, wearable devices, unmanned driving, automatic driving, smart medical care, face payment, face unlock, fingerprint unlock, witness verification, smart screen, smart TV, Cameras, mobile Internet, webcasting, beauty, cosmetics, medical beauty, intelligent temperature measurement and other fields.

随着机器学习技术的发展，采用模型进行图像处理或识别已经越来越普遍。通常来说，更大更深更复杂的模型有着更好的拟合效果与更好的预测能力，但同时其计算效率低、耗时大、参数量大，从而不利于移动端、芯片端等应用层的部署。而简单模型虽然拟合能力弱，但其计算效率更高、参数量更少，从而更利于部署。知识蒸馏(knowledgedistillation)作为一种重要的模型压缩手段，可以将复杂模型(teacher，也称教师模型)中的知识迁移到简单模型(student，也称学生模型)中，来使得学生模型的拟合能力能够逼近甚至超过教师模型，从而用更少的时间和空间复杂度来得到类似的预测效果。With the development of machine learning technology, it has become more and more common to use models for image processing or recognition. Generally speaking, larger, deeper and more complex models have better fitting effects and better predictive capabilities, but at the same time, their computational efficiency is low, time-consuming, and the number of parameters is large, which is not conducive to applications such as mobile terminals and chip terminals. layer deployment. Although the simple model has weak fitting ability, it has higher computational efficiency and fewer parameters, which makes it easier to deploy. As an important model compression method, knowledge distillation can transfer the knowledge in the complex model (teacher, also known as the teacher model) to the simple model (student, also known as the student model) to make the fitting of the student model The ability can approach or even exceed the teacher model, so that similar prediction effects can be obtained with less time and space complexity.

相关技术中，知识蒸馏体系大多是针对分类问题设计的，其并不能很好的运用在目标检测和实例分割之上。且相关技术中的知识蒸馏体系，需要依赖网络自身的预测结果，这使得网络学到的易辨识区域和人类的观察认知不一定相符，进而在一定程度上影响模型的目标检测结果。In related technologies, knowledge distillation systems are mostly designed for classification problems, which cannot be well applied to target detection and instance segmentation. Moreover, the knowledge distillation system in related technologies needs to rely on the prediction results of the network itself, which makes the recognizable areas learned by the network not necessarily consistent with human observation and cognition, which in turn affects the target detection results of the model to a certain extent.

本公开实施例提供了一种神经网络模型蒸馏方法，通过人工输入用于表征目标物体特征信息的物体条件特征，使教师模型得到的注意力分布更加贴合人的观测方式。通过该注意力分布，以及教师模型特征图与学生模型特征图之间的特征差异度，对初始学生模型进行训练，可以提升学生模型的训练效果，进而使学生模型的目标检测能力得以提升。An embodiment of the present disclosure provides a neural network model distillation method, which makes the attention distribution obtained by the teacher model more suitable for human observation by manually inputting the object condition characteristics used to represent the characteristic information of the target object. Through the attention distribution and the feature difference between the feature map of the teacher model and the feature map of the student model, training the initial student model can improve the training effect of the student model, thereby improving the target detection ability of the student model.

图1是根据一示例性实施例示出的一种神经网络模型蒸馏方法的流程图，如图1所示，包括以下步骤。Fig. 1 is a flow chart of a neural network model distillation method according to an exemplary embodiment, as shown in Fig. 1 , including the following steps.

在步骤S11中，确定目标图像，并获取目标图像的物体条件特征。In step S11, the target image is determined, and the object condition characteristics of the target image are acquired.

其中，物体条件特征可以理解为人工输入并表征目标物体的特征信息。并且，可以理解的是，目标物体为存在于目标图像中的物体。Among them, the object condition feature can be understood as the feature information that is manually input and characterizes the target object. And, it can be understood that the target object is an object existing in the target image.

在步骤S12中，基于教师模型对目标图像进行特征检测，得到教师模型特征图，并基于初始学生模型对目标图像进行特征检测，得到学生模型特征图。In step S12, perform feature detection on the target image based on the teacher model to obtain a feature map of the teacher model, and perform feature detection on the target image based on the initial student model to obtain a feature map of the student model.

本公开实施例中，初始学生模型可以理解为待训练的学生模型。一示例中，教师模型特征图中包括教师模型针对目标图像的每一像素位置进行特征检测所得到的物体特征，学生模型特征图中包括初始学生模型针对目标图像的每一像素位置进行特征检测所得到的物体特征。In the embodiment of the present disclosure, the initial student model can be understood as a student model to be trained. In one example, the teacher model feature map includes the object features obtained by the teacher model for each pixel position of the target image, and the student model feature map includes the initial student model for each pixel position of the target image. The obtained object characteristics.

在步骤S13中，确定教师模型特征图中各像素位置的物体特征与物体条件特征之间的相关性，并基于相关性，确定教师模型的注意力分布。In step S13, the correlation between the object feature and the object condition feature at each pixel position in the teacher model feature map is determined, and based on the correlation, the attention distribution of the teacher model is determined.

示例的，物体特征与物体条件特征之间的相关性，可以理解为物体特征所对应的特征序列与物体条件特征所对应的特征序列之间的相似程度。教师模型的注意力分布，可以理解为教师模型在进行目标检测的过程中，针对每一像素位置所分配的检测权重。For example, the correlation between object features and object conditional features can be understood as the degree of similarity between the feature sequence corresponding to the object feature and the feature sequence corresponding to the object conditional feature. The attention distribution of the teacher model can be understood as the detection weight assigned to each pixel position by the teacher model in the process of object detection.

在步骤S14中，基于教师模型的注意力分布，以及教师模型特征图与学生模型特征图之间的特征差异度，对初始学生模型进行训练，得到目标学生模型。In step S14, based on the attention distribution of the teacher model and the feature difference between the feature map of the teacher model and the feature map of the student model, the initial student model is trained to obtain the target student model.

其中，目标学生模型可以理解为训练完成的学生模型。Wherein, the target student model can be understood as a trained student model.

本公开实施例中，通过人工输入先验特征(即物体条件特征)的方式，为教师模型提供了用于确定注意力分布的索引方向。教师模型在该方式下确定出的注意力分布，更加贴合人类的观测视角。进一步的，以该注意力分布对初始学生模型进行训练，可以提升学生模型的辨识精确度。In the embodiment of the present disclosure, the index direction for determining the attention distribution is provided for the teacher model by manually inputting prior features (ie, object condition features). The attention distribution determined by the teacher model in this way is more in line with the human observation perspective. Further, training the initial student model with the attention distribution can improve the recognition accuracy of the student model.

一示例中，对初始学生模型进行训练，得到目标学生模型，可以是确定教师模型特征图与学生模型特征图之间，针对每一像素位置的特征差异度。进一步的，将教师模型的注意力分布，作为计算教师模型与初始学生模型之间的蒸馏损失时所分配的权重，计算得到教师模型与初始学生模型之间的蒸馏损失。示例的，计算教师模型与初始学生模型之间的蒸馏损失，可以为计算教师模型特征图与学生模型特征图之间的均方差误差。将教师模型与初始学生模型之间的蒸馏损失引入初始学生模型，作为初始学生模型进行目标检测的辅助参数，可以使初始学生模型的目标检测结果与教师模型的目标检测结果相近或相同。即，得到训练完成的目标学生模型。In an example, training the initial student model to obtain the target student model may be to determine the degree of feature difference for each pixel position between the feature map of the teacher model and the feature map of the student model. Further, the attention distribution of the teacher model is used as the weight assigned when calculating the distillation loss between the teacher model and the initial student model, and the distillation loss between the teacher model and the initial student model is calculated. For example, calculating the distillation loss between the teacher model and the initial student model may be calculating the mean square error between the feature map of the teacher model and the feature map of the student model. Introducing the distillation loss between the teacher model and the initial student model into the initial student model, as an auxiliary parameter for the initial student model for object detection, can make the object detection results of the initial student model close or the same as those of the teacher model. That is, the target student model that has been trained is obtained.

一示例中，可以将特征金字塔网络(Feature Pyramid Network，简称FPN)引入教师模型及学生模型。在此基础上，可以通过教师模型对目标图像进行多尺度特征检测，得到包括多尺度特征的教师模型特征图，以及通过学生模型对目标图像进行多尺度特征检测，得到包括多尺度特征的学生模型特征图。In one example, a Feature Pyramid Network (FPN for short) may be introduced into the teacher model and the student model. On this basis, the multi-scale feature detection of the target image can be carried out through the teacher model, and the teacher model feature map including multi-scale features can be obtained, and the multi-scale feature detection of the target image can be obtained through the student model, and the student model including multi-scale features can be obtained feature map.

图2是根据一示例性实施例示出的另一种神经网络模型蒸馏方法的流程图，如图2所示，本公开实施例中的步骤S21、步骤S23以及步骤S24的实施过程与图1中所示的步骤S11、步骤S13以及步骤S14的执行方法相似，在此不再赘述。Fig. 2 is a flow chart of another neural network model distillation method shown according to an exemplary embodiment. As shown in Fig. 2 , the implementation process of step S21, step S23 and step S24 in the embodiment of the present disclosure is the same as that in Fig. 1 The execution methods of step S11 , step S13 and step S14 shown are similar and will not be repeated here.

在步骤S22中，通过教师模型对目标图像进行多尺度特征检测，得到包括多尺度特征的教师模型特征图，并通过学生模型对目标图像进行多尺度特征检测，得到包括多尺度特征的学生模型特征图。In step S22, the multi-scale feature detection is performed on the target image through the teacher model to obtain the teacher model feature map including multi-scale features, and the multi-scale feature detection is performed on the target image through the student model to obtain the student model features including multi-scale features picture.

本公开实施例提供的神经网络模型蒸馏方法，可以通过特征金字塔网络，以多个不同尺度(分辨率)对目标图像进行特征检测。其中，针对大尺度(分辨率较低)图像，可以较为精确地辨识出目标图像中小物体的相关特征，针对小尺度(分辨率较高)图像，可以较为精确地辨识出目标图像中大物体的相关特征。该方法可以满足针对目标图像中不同尺度物体的特征检测需求，以此提高了教师模型和/或学生模型的目标检测精度。The neural network model distillation method provided by the embodiments of the present disclosure can perform feature detection on a target image at multiple different scales (resolutions) through a feature pyramid network. Among them, for large-scale (lower resolution) images, the relevant features of small objects in the target image can be more accurately identified, and for small-scale (higher resolution) images, the features of large objects in the target image can be more accurately identified relevant features. This method can meet the feature detection requirements for objects of different scales in the target image, thereby improving the target detection accuracy of the teacher model and/or student model.

通常的，针对同一图像，存在例如前景区域、目标物体边缘区域、重叠目标以及目标间相互联系等多种对目标检测有贡献的特征区域。Generally, for the same image, there are various feature regions that contribute to target detection, such as foreground regions, target object edge regions, overlapping targets, and interrelationships between targets.

本公开实施例提供的目标检测蒸馏方法，可以将教师模型特征图分解，进而通过多头注意力网络，将所分解的教师模型特征图映射至多个子特征空间。进一步的，针对每一子特征空间，分配不同权重的方式，使教师模型能够辨识出不同区域中对目标检测有帮助的特征。The object detection distillation method provided by the embodiments of the present disclosure can decompose the feature map of the teacher model, and then map the decomposed feature map of the teacher model to multiple sub-feature spaces through a multi-head attention network. Furthermore, for each sub-feature space, assigning different weights enables the teacher model to identify features that are helpful for target detection in different regions.

本公开为便于描述，将在计算目标图像各像素位置的物体特征与物体条件特征之间的相关性的过程中，分解教师模型特征图得到的子特征空间，称为第一子特征空间。In the present disclosure, for the convenience of description, the sub-feature space obtained by decomposing the feature map of the teacher model in the process of calculating the correlation between the object feature and the object condition feature at each pixel position of the target image is called the first sub-feature space.

图3是根据一示例性实施例示出的一种确定教师模型特征图中各像素位置的物体特征与物体条件特征之间的相关性的方法流程图，如图3所示，包括以下步骤。Fig. 3 is a flow chart of a method for determining the correlation between object features and object condition features at each pixel position in the teacher model feature map according to an exemplary embodiment, as shown in Fig. 3 , including the following steps.

在步骤S31中，将教师模型特征图中各像素位置的物体特征，分解为多个第一子特征空间。In step S31, the object features at each pixel position in the teacher model feature map are decomposed into a plurality of first sub-feature spaces.

在步骤S32中，针对教师模型特征图中各像素位置，分别基于不同的权重对多个第一子特征空间进行特征加权，得到特征加权后的物体特征。In step S32 , for each pixel position in the feature map of the teacher model, feature weighting is performed on a plurality of first sub-feature spaces based on different weights respectively, to obtain object features after feature weighting.

在步骤S33中，确定特征加权后的物体特征与物体条件特征之间的相关性，并根据特征加权后的物体特征与物体条件特征之间的相关性，得到针对每一像素位置的多个第一子特征空间的第一子特征空间相关性。In step S33, the correlation between the feature-weighted object feature and the object condition feature is determined, and according to the correlation between the feature-weighted object feature and the object condition feature, a plurality of first A first sub-feature space correlation of a sub-feature space.

在步骤S34中，针对教师模型特征图中各像素位置，将多个第一子特征空间的第一子特征空间相关性，进行归一化处理，得到对应每一像素位置的物体特征与物体条件特征之间的相关性。In step S34, for each pixel position in the feature map of the teacher model, the first sub-feature space correlation of multiple first sub-feature spaces is normalized to obtain the object feature and object condition corresponding to each pixel position Correlations between features.

示例的，假设教师模型特征图中的像素行数量为W，像素列数量为H，且每一像素位置所对应的特征维度为C。在将教师模型特征图分解至多个第一子特征空间，并以不同权重对多个子特征空间进行特征加权时，可以将教师模型特征图分解为像素行数量为W，像素列数量为H，且每一像素位置所对应的特征维度为M的N个第一子特征空间(M*N＝C)。其中，子特征空间的数量N，可以为任意数值，即，可以将教师模型特征图分解至任意数量的第一子特征空间。进一步的，对N个第一子特征空间中的每一第一子特征空间，采用不同权重进行加权。例如，针对第一子特征空间A，可以为图像前景区域分配高于除前景区域外的其他区域的权重。针对第一子特征空间B(不同于第一子特征空间A)，可以为图像除前景区域外的其他区域分配高于图像前景区域的权重，以此实现对不同第一子特征空间设置不同权重。当然，可以采用多种不同方式为不同第一子特征空间进行特征加权，本公开对此不做具体限定。As an example, assume that the number of pixel rows in the feature map of the teacher model is W, the number of pixel columns is H, and the feature dimension corresponding to each pixel position is C. When decomposing the teacher model feature map into multiple first sub-feature spaces, and performing feature weighting on the multiple sub-feature spaces with different weights, the teacher model feature map can be decomposed into the number of pixel rows W, the number of pixel columns H, and The feature dimension corresponding to each pixel position is N first sub-feature spaces (M*N=C). Wherein, the number N of sub-feature spaces can be any value, that is, the feature map of the teacher model can be decomposed into any number of first sub-feature spaces. Further, for each first sub-feature space in the N first sub-feature spaces, different weights are used for weighting. For example, for the first sub-feature space A, the foreground region of the image may be assigned a higher weight than other regions except the foreground region. For the first sub-feature space B (different from the first sub-feature space A), other areas of the image except the foreground area can be assigned higher weights than the foreground area of the image, so as to set different weights for different first sub-feature spaces . Of course, multiple different ways may be used to perform feature weighting for different first sub-feature spaces, which is not specifically limited in the present disclosure.

本公开实施例提供的神经网络模型蒸馏方法，通过对不同第一子特征空间分配不同权重的方式，使教师模型分别将目标图像中的不同区域作为主要辨识区域。进一步的，通过将针对不同子特征空间得到的注意力分布进行归一化处理的方式，使教师模型可以辨识出目标图像各区域中对目标检测有贡献的物体特征，以此提高教师模型的辨识精确度。In the neural network model distillation method provided by the embodiments of the present disclosure, by assigning different weights to different first sub-feature spaces, the teacher model respectively uses different regions in the target image as the main identification regions. Further, by normalizing the attention distribution obtained for different sub-feature spaces, the teacher model can identify the object features that contribute to the target detection in each region of the target image, thereby improving the teacher model's identification Accuracy.

一示例中，可以通过教师模型特征图中各像素位置之间的相关性大小关系，为教师模型特征图中各像素位置的相关性进行权重分配，并以此确定出教师模型的注意力分布。In one example, the correlation of each pixel position in the teacher model feature map can be used to assign weights to the correlation of each pixel position in the teacher model feature map, and then determine the attention distribution of the teacher model.

图4是根据一示例性实施例示出的又一种神经网络模型蒸馏方法的流程图，如图4所示，本公开实施例中的步骤S41、步骤S42以及步骤S44的实施过程与图1中所示的步骤S11、步骤S12以及步骤S14的执行方法相似，在此不再赘述。Fig. 4 is a flowchart of another neural network model distillation method according to an exemplary embodiment. As shown in Fig. 4, the implementation process of step S41, step S42 and step S44 in the embodiment of the present disclosure is the same as that in Fig. 1 The execution methods of step S11 , step S12 and step S14 shown are similar and will not be repeated here.

在步骤S43中，确定教师模型特征图中各像素位置的物体特征与物体条件特征之间的相关性，并基于教师模型特征图中各像素位置的物体特征与物体条件特征之间的相关性大小关系，为教师模型特征图中各像素位置进行权重分配，并根据所分配的权重，确定教师模型的注意力分布。In step S43, determine the correlation between the object feature and the object condition feature of each pixel position in the teacher model feature map, and based on the correlation size between the object feature and the object condition feature of each pixel position in the teacher model feature map relationship, assign weights to each pixel position in the feature map of the teacher model, and determine the attention distribution of the teacher model according to the assigned weights.

其中，针对教师模型特征图中各像素位置，相关性越大的像素位置，对应的权重越大。示例的，基于教师模型特征图中各像素位置之间的相关性大小关系，为教师模型特征图中各像素位置的相关性进行权重分配时，可以是针对相关性较大的为相关性数值较大的像素位置，分配较高的权重，以及为相关性数值较小的像素位置，分配交低的权重。Among them, for each pixel position in the feature map of the teacher model, the pixel position with greater correlation has a greater corresponding weight. For example, based on the correlation size relationship between each pixel position in the teacher model feature map, when performing weight distribution for the correlation of each pixel position in the teacher model feature map, it can be for the larger correlation value. Larger pixel locations are assigned higher weights, and pixel locations with smaller correlation values are assigned lower weights.

一示例中，若将各像素位置之间的两个不同相关性称为第一相关性和第二相关性，其中，第一相关性大于第二相关性，则可以为第一相关性分配对应的第一权重，并为第二相关性分配对应的第二权重，且第一相关性对应的第一权重大于第二相关性对应的第二权重。In one example, if the two different correlations between pixel positions are referred to as the first correlation and the second correlation, where the first correlation is greater than the second correlation, then the first correlation can be assigned a corresponding , assigning a corresponding second weight to the second correlation, and the first weight corresponding to the first correlation is greater than the second weight corresponding to the second correlation.

一实施方式中，通过教师模型特征图中各像素位置之间的相关性大小关系，为教师模型特征图中各像素位置的相关性进行权重分配时，针对教师模型特征图中各像素位置所分配的权重和为1。In one embodiment, when assigning weights for the correlation of each pixel position in the teacher model feature map through the correlation size relationship between the pixel positions in the teacher model feature map, the distribution of each pixel position in the teacher model feature map The sum of the weights is 1.

一示例中，可以对教师模型的注意力分布进行优化，基于优化后的注意力分布，对初始学生模型进行训练，得到目标学生模型，以进一步提高目标学生模型的辨识精确度。In one example, the attention distribution of the teacher model can be optimized, and based on the optimized attention distribution, the initial student model can be trained to obtain the target student model, so as to further improve the identification accuracy of the target student model.

图5是根据一示例性实施例示出的一种训练初始学生模型的方法流程图，如图5所示，包括以下步骤。Fig. 5 is a flow chart of a method for training an initial student model according to an exemplary embodiment, as shown in Fig. 5 , including the following steps.

在步骤S51中，对教师模型的注意力分布进行优化。In step S51, the attention distribution of the teacher model is optimized.

在步骤S52中，基于优化后的注意力分布，以及教师模型特征图与学生模型特征图之间的特征差异度，对初始学生模型进行训练，得到目标学生模型。In step S52, based on the optimized attention distribution and the degree of feature difference between the feature map of the teacher model and the feature map of the student model, the initial student model is trained to obtain the target student model.

本公开实施例提供的神经网络模型蒸馏方法，可以对教师模型的注意力分布进行优化。使用优化后的注意力分布训练初始学生模型，得到目标学生模型，可以进一步提高目标学生模型的辨识精确度。The neural network model distillation method provided by the embodiments of the present disclosure can optimize the attention distribution of the teacher model. Using the optimized attention distribution to train the initial student model to obtain the target student model can further improve the identification accuracy of the target student model.

一示例中，可以控制教师模型对目标图像进行目标检测，进而通过教师模型的目标检测结果，以及物体条件特征，调整教师模型的注意力分布，并以此实现对注意力分布的优化。In one example, the teacher model can be controlled to perform object detection on the target image, and then the attention distribution of the teacher model can be adjusted through the object detection results of the teacher model and the object condition characteristics, so as to realize the optimization of the attention distribution.

本公开为便于描述，将在对教师模型的注意力分布进行优化的过程中，分解教师模型特征图得到的子特征空间，称为第二子特征空间。In the present disclosure, for the convenience of description, the sub-feature space obtained by decomposing the feature map of the teacher model in the process of optimizing the attention distribution of the teacher model is called the second sub-feature space.

图6是根据一示例性实施例示出的一种对教师模型的注意力分布进行优化的方法流程图，如图6所示，包括以下步骤。Fig. 6 is a flow chart of a method for optimizing the attention distribution of a teacher model according to an exemplary embodiment, as shown in Fig. 6 , including the following steps.

在步骤S61中，将教师模型特征图中各像素位置的物体特征，分解为多个第二子特征空间。In step S61, the object features at each pixel position in the teacher model feature map are decomposed into a plurality of second sub-feature spaces.

在步骤S62中，基于教师模型的注意力分布，对多个第二子特征空间进行特征加权。In step S62, based on the attention distribution of the teacher model, perform feature weighting on multiple second sub-feature spaces.

在步骤S63中，控制教师模型基于加权后的特征进行目标检测，并基于目标检测结果以及物体条件特征，调整教师模型的注意力分布，以使教师模型得到与物体条件特征一致的目标检测结果。In step S63, the teacher model is controlled to perform target detection based on the weighted features, and based on the target detection result and object condition characteristics, the attention distribution of the teacher model is adjusted so that the teacher model obtains a target detection result consistent with the object condition characteristics.

一示例中，基于加权后的特征进行目标检测，可以是将多个第二子特征空间聚合为用于表征目标图像物体特征的特征向量，教师模型可以根据聚合得到的特征向量，对目标图像进行目标检测，得到目标检测结果。In one example, to perform target detection based on weighted features, multiple second sub-feature spaces may be aggregated into feature vectors used to characterize object features of the target image, and the teacher model may perform Target detection, get the target detection result.

一实施方式中，可以通过多层感知机(Muti Layer Percepron，MLP)，对加权并聚合后得到的特征向量进行目标检测，得到目标检测结果。In one embodiment, a multi-layer perceptron (Multi Layer Percepron, MLP) may be used to perform target detection on the feature vectors obtained after weighting and aggregation, to obtain a target detection result.

本公开实施例中，第一子特征空间和第二子特征空间，可以是通过不同分解方式分解教师模型特征图得到的。其中，示例的，所使用的分解方式可以包括键值分解方式和内容分解方式。In the embodiment of the present disclosure, the first sub-feature space and the second sub-feature space may be obtained by decomposing the teacher model feature map through different decomposition methods. Wherein, for example, the decomposition method used may include a key-value decomposition method and a content decomposition method.

一实施方式中，可以采用键值分解方式分解教师模型特征图，得到多个第一子特征空间，以及采用内容分解方式分解教师模型特征图，得到多个第二子特征空间。由于针对同一教师模型特征图，不同分解方式可以分解得到用于目标检测的不同特征。因此，该方法可以丰富特征样本，并以此提高教师模型的辨识精确度。当然，本公开还可以通过其他分解方式对教师模型特征图进行特征分解，本公开对所使用的分解方式不做具体限定。In one embodiment, the feature map of the teacher model may be decomposed by key-value decomposition to obtain multiple first sub-feature spaces, and the feature map of the teacher model may be decomposed by content decomposition to obtain multiple second feature spaces. Due to the feature map of the same teacher model, different decomposition methods can be decomposed to obtain different features for object detection. Therefore, this method can enrich feature samples and improve the identification accuracy of the teacher model. Of course, the present disclosure can also perform feature decomposition on the teacher model feature map through other decomposition methods, and the present disclosure does not specifically limit the decomposition method used.

本公开实施例中，人工输入教师模型的物体条件特征可以包括目标物体的类别特征、位置特征以及尺度特征之一或组合。一示例中，得到与物体条件特征一致的目标检测结果，可以是得到与物体条件特征所对应的类别特征、位置特征以及尺度特征之一或组合相一致的目标检测结果。In the embodiment of the present disclosure, the object condition features manually input into the teacher model may include one or a combination of category features, location features, and scale features of the target object. In an example, obtaining a target detection result consistent with object conditional features may be obtaining a target detection result consistent with one or a combination of category features, position features, and scale features corresponding to the object conditional features.

图7是根据一示例性实施例示出的另一种对教师模型的注意力分布进行优化的方法流程图，本公开实施例中的步骤S71和步骤S72的实施过程与图6中所示的步骤S61和步骤S62的执行方法相似，在此不再赘述。Fig. 7 is a flow chart of another method for optimizing the attention distribution of the teacher model according to an exemplary embodiment. The implementation process of step S71 and step S72 in the embodiment of the present disclosure is the same as that shown in Fig. 6 The execution methods of step S61 and step S62 are similar and will not be repeated here.

在步骤S73中，控制教师模型基于加权后的特征进行目标检测，并基于目标检测结果以及物体条件特征，调整教师模型的注意力分布，以使教师模型得到与物体条件特征所对应的类别特征、位置特征以及尺度特征之一或组合相一致的目标检测结果。In step S73, control the teacher model to perform target detection based on the weighted features, and adjust the attention distribution of the teacher model based on the target detection results and object condition features, so that the teacher model can obtain the category features corresponding to the object condition features, Target detection results consistent with one or a combination of location features and scale features.

本公开实施例中，目标物体的位置特征可以是通过位置嵌入(PositionEmbedding)的方式编码并输入教师模型的。目标物体的类别特征和/或目标物体的尺度特征，可以是通过独热(One-hot)向量编码并输入教师模型的。In the embodiment of the present disclosure, the position feature of the target object may be coded by way of position embedding (PositionEmbedding) and input into the teacher model. The category features of the target object and/or the scale features of the target object can be encoded by one-hot vectors and input to the teacher model.

一实施方式中，输入目标物体的位置特征，可以是输入目标物体在目标图像中的中心点坐标。当然，还可以是输入目标物体所处区域内的任意点坐标。In one embodiment, the position feature of the input target object may be the coordinates of the center point of the input target object in the target image. Of course, it is also possible to input the coordinates of any point within the area where the target object is located.

另一实施方式中，输入目标物体的类别特征，可以是基于人为观测，确定目标物体的物体类别。进一步的，可以在教师模型预设的类别数据集中，查找与目标物体的物体类别相匹配的特征序列，并将该特征序列作为目标物体的类别特征，输入教师模型。In another embodiment, inputting the category features of the target object may be based on human observation to determine the object category of the target object. Further, a feature sequence that matches the object category of the target object can be found in the category dataset preset by the teacher model, and the feature sequence can be used as the category feature of the target object and input into the teacher model.

又一实施方式中，输入目标物体的尺度特征，可以是针对教师模型的预定义物体尺度范围。示例的，可以将物体尺度小于64位像素的目标物体定义为小物体，将物体尺度处于64位像素至256位像素的范围内的物体定义为中物体，以及将物体尺度大于256位像素的物体定义为大物体。进一步的，根据人为观测结果确定目标物体的尺度，并将用于表征该尺度所处尺度范围的特征序列，输入教师模型。In yet another embodiment, the scale feature of the input target object may be a predefined object scale range for the teacher model. For example, a target object with an object scale smaller than 64-bit pixels can be defined as a small object, an object with an object scale within the range of 64-bit pixels to 256-bit pixels can be defined as a medium object, and an object with an object scale larger than 256-bit pixels Defined as large objects. Further, the scale of the target object is determined according to the results of human observation, and the feature sequence used to characterize the scale range of the scale is input into the teacher model.

本公开实施例中，可以通过调节教师模型相关参数的方式，使教师模型可以输出与物体条件特征相一致的目标检测结果。一实施方式中，可以为教师模型设置相应的损失函数，教师模型可以通过该损失函数，对注意力分布进行自适应调整。In the embodiment of the present disclosure, the teacher model can output a target detection result that is consistent with the condition characteristics of the object by adjusting related parameters of the teacher model. In one embodiment, a corresponding loss function can be set for the teacher model, and the teacher model can adaptively adjust the attention distribution through the loss function.

本公开实施例提供的神经网络模型蒸馏方法，通过损失函数调整教师模型的注意力分布的方式，可以理解为对教师模型布置了辅助任务。示例的，可以理解为对教师模型布置了如下辅助任务。In the neural network model distillation method provided by the embodiments of the present disclosure, the method of adjusting the attention distribution of the teacher model through a loss function can be understood as assigning auxiliary tasks to the teacher model. As an example, it can be understood that the following auxiliary tasks are assigned to the teacher model.

辅助任务一：判断目标图像中是否存在目标物体。Auxiliary task 1: Determine whether there is a target object in the target image.

辅助任务二：判断教师模型所标注的位置区域，与目标物体在目标图像中所处区域是否相符。Auxiliary task 2: Determine whether the location area marked by the teacher model matches the area where the target object is located in the target image.

一实施方式中，可以控制教师模型进行目标检测，并判断目标图像中是否存在与输入至教师模型的类型特征相匹配的目标物体。示例的，若存在，则教师模型的输出可以为1，若不存在，教师模型的输出可以为0。当然，教师模型的输出还可以为0到1之间的浮点数，并且在教师模型输出值为浮点数的情况下，可以将输出值趋近于1的输出结果确定为物体存在，以及将输出值趋近于0的输出结果确定为物体不存在。示例的，若输出结果与类别特征所表征的结果不符，则以教师模型的损失函数输出值减小的方向，调整教师模型的注意力分布，直至教师模型的目标检测结果，与类别特征的表征相一致。In one embodiment, the teacher model may be controlled to perform target detection, and determine whether there is a target object in the target image that matches the type feature input to the teacher model. For example, the output of the teacher model can be 1 if it exists, and the output of the teacher model can be 0 if it does not exist. Of course, the output of the teacher model can also be a floating-point number between 0 and 1, and in the case that the output value of the teacher model is a floating-point number, the output result that the output value is close to 1 can be determined as the existence of the object, and the output Outputs with values approaching 0 determine the absence of the object. For example, if the output result does not match the result represented by the category feature, adjust the attention distribution of the teacher model in the direction of decreasing the output value of the loss function of the teacher model until the target detection result of the teacher model is consistent with the representation of the category feature consistent.

另一实施方式中，可以在确定目标图像中存在目标物体的情况下，判断教师模型所标注的目标物体在目标图像中所处的位置区域，与位置特征及尺度特征在目标图像中所表征的位置区域是否相符。示例的，将目标物体的位置特征所表征的图像位置，确定为目标物体在目标图像中所处区域内的随机点坐标，并控制教师模型以此标注目标物体在目标图像中所处的位置区域。进一步的，分别确定位置特征所表征的位置坐标，与教师模型所标注的位置区域的各边缘顶点坐标之间的距离，并将得到的多个不同距离值进行归一化处理，进而转化为0到1之间的浮点数。进一步的，将输出值趋近于1的输出结果确定为二者在目标图像中所标注的位置区域相符，以及将输出值趋近于0的输出结果确定为二者在目标图像中所标注的位置区域不符。示例的，若输出结果与位置特征及尺度特征所表征的结果不符，则以教师模型的损失函数输出值减小的方向，调整教师模型的注意力分布，直至教师模型的目标检测结果，与位置特征及尺度特征的表征相一致。In another embodiment, when it is determined that there is a target object in the target image, it can be judged that the position area of the target object marked by the teacher model in the target image is different from the location feature and scale feature represented in the target image. Whether the location area matches. For example, the image position represented by the position feature of the target object is determined as the random point coordinates of the target object in the area where the target object is located, and the teacher model is controlled to mark the position area where the target object is located in the target image . Further, the distances between the position coordinates represented by the position features and the coordinates of each edge vertex of the position area marked by the teacher model are respectively determined, and the obtained multiple different distance values are normalized, and then converted into 0 A floating point number between 1 and 1. Further, the output result whose output value is close to 1 is determined as the location area marked by the two in the target image, and the output result whose output value is close to 0 is determined to be the position area marked by the two in the target image The location areas do not match. For example, if the output result does not match the result represented by the position feature and scale feature, adjust the attention distribution of the teacher model in the direction of decreasing the output value of the loss function of the teacher model until the target detection result of the teacher model is consistent with the position The representations of features and scale features are consistent.

本公开实施例中，可以控制教师模型根据先前得到的注意力分布，进行目标检测。进一步的，将输入教师模型的物体条件特征作为教师模型期望得到的目标检测结果，以使教师模型进行注意力分布自适应调整，进而使教师模型可以得到与物体条件特征一致的目标检测结果。该方法可以提高教师模型的目标检测精度，且以调整后的注意力分布训练初始学生模型，可以提升目标学生模型的目标检测精度。In the embodiment of the present disclosure, the teacher model can be controlled to perform target detection according to the previously obtained attention distribution. Further, the object condition characteristics input into the teacher model are used as the target detection results expected by the teacher model, so that the teacher model can perform adaptive adjustment of the attention distribution, so that the teacher model can obtain the target detection results consistent with the object condition characteristics. This method can improve the object detection accuracy of the teacher model, and training the initial student model with the adjusted attention distribution can improve the object detection accuracy of the target student model.

图8是根据一示例性实施例示出的一种训练初始学生模型的流程示意图，如图8所示，针对同一目标图像，可以将用于表征目标图像中目标物体的物体条件特征1(用于表征目标图像中“人”的相关特征)以及物体条件特征2(用于表征目标图像中“马”的相关特征)输入教师模型。示例的，可以对目标图像进行特征检测，得到教师模型特征图。进一步的，可以将教师模型特征图以键值分解的方式分解为多个第一子特征空间，以及将教师模型特征图以内容分解的方式分解为多个第二子特征空间。在该情况下，示例的，可以将物体条件特征1(当然，也可以选用物体条件特征2)作为索引特征，计算加权后的第一子特征空间与物体条件特征1之间的相关性，进而确定教师模型针对物体条件特征1的注意力分布。并且，可以在确定教师模型的注意力分布的情况下，以得到的注意力分布对多个第二子特征空间进行加权的方式，控制教师模型进行目标检测。进一步的，将输入教师模型的物体条件特征(物体条件特征1和/或物体条件特征2)的表征，作为教师模型的目标输出结果，对教师模型的注意力分布进行优化。进而以优化后的注意力分布训练初始学生模型，得到训练后的目标学生模型。示例的，在初始学生模型训练完成的情况下，可以仅保留目标学生模型的特征提取网络及最终检测头，以用于目标检测相关的测试及部署。此外，本公开训练得到的目标学生模型，针对目标检测蒸馏的常用数据集，具有较为明显的涨点幅度。Fig. 8 is a schematic flow diagram of training an initial student model according to an exemplary embodiment. As shown in Fig. 8, for the same target image, the object condition feature 1 used to characterize the target object in the target image (for Characterize the relevant features of "person" in the target image) and object condition features 2 (used to characterize the relevant features of "horse" in the target image) are input into the teacher model. For example, feature detection can be performed on the target image to obtain the feature map of the teacher model. Further, the feature map of the teacher model may be decomposed into multiple first sub-feature spaces by means of key-value decomposition, and the feature map of the teacher model may be decomposed into multiple second sub-feature spaces by means of content decomposition. In this case, for example, the object condition feature 1 (of course, the object condition feature 2 can also be selected) can be used as an index feature to calculate the correlation between the weighted first sub-feature space and the object condition feature 1, and then Determine the attention distribution of the teacher model for object conditional feature 1. Moreover, in the case of determining the attention distribution of the teacher model, the teacher model may be controlled to perform object detection in a manner that the obtained attention distribution weights a plurality of second sub-feature spaces. Further, the representation of the object condition features (object condition feature 1 and/or object condition feature 2) input to the teacher model is used as the target output result of the teacher model to optimize the attention distribution of the teacher model. Then, the initial student model is trained with the optimized attention distribution, and the trained target student model is obtained. For example, after the training of the initial student model is completed, only the feature extraction network and the final detection head of the target student model can be reserved for testing and deployment related to target detection. In addition, the target student model trained in this disclosure has a relatively obvious increase in target detection and distillation of commonly used data sets.

本公开提供的神经网络模型蒸馏方法，可以将教师模型检测目标图像得到的教师模型特征图，以键值分解的方式分为多个第一子特征空间，并对不同第一子特征空间，分配不同的权重，以使教师模型可以辨识到各种类型对目标检测有帮助的特征。以及可以将用于表征目标物体的物体条件特征输入作为辅助参数输入教师模型，由于该物体条件特征是通过人工标定并输入的，因此，通过该物体条件特征得到的教师模型的注意力分布，更加贴合人类的观测视角，进而提升了教师模型的目标检测精度。此外，由于教师模型特征图可以是通过特征金字塔网络得到的多尺度特征图，因此，无论是针对尺度较大的物体，还是尺度较小的物体，都可以得到较优的目标检测结果。在此基础上，本公开实施例提供的神经网络模型蒸馏方法，引入了特殊的损失函数，以此对教师模型得到的注意力分布进行调整，可以实现针对注意力分布的优化，显著提升了教师模型的目标检测精度。进一步的，以优化后的注意力分布，对初始学生模型进行训练，得到的目标学生模型，也有较高的目标检测精度。使用通过该方法训练得到的目标学生模型进行目标检测，可以得到较优的目标检测结果，满足了用户的目标检测需求。The neural network model distillation method provided in this disclosure can divide the teacher model feature map obtained by the teacher model from detecting the target image into multiple first sub-feature spaces by means of key-value decomposition, and assign different first sub-feature spaces to Different weights, so that the teacher model can identify various types of features that are helpful for object detection. And the object condition feature input used to characterize the target object can be input into the teacher model as an auxiliary parameter. Since the object condition feature is manually calibrated and input, the attention distribution of the teacher model obtained through the object condition feature is more accurate. It fits the observation perspective of human beings, thereby improving the target detection accuracy of the teacher model. In addition, since the feature map of the teacher model can be a multi-scale feature map obtained through a feature pyramid network, better object detection results can be obtained no matter for objects with larger scales or objects with smaller scales. On this basis, the neural network model distillation method provided by the embodiment of the present disclosure introduces a special loss function to adjust the attention distribution obtained by the teacher model, which can realize the optimization of the attention distribution and significantly improve the teacher's performance. The object detection accuracy of the model. Furthermore, the initial student model is trained with the optimized attention distribution, and the obtained target student model also has higher target detection accuracy. Using the target student model trained by this method for target detection can get better target detection results, which meets the user's target detection needs.

基于相同的构思，本公开实施例还提供一种神经网络模型蒸馏装置。Based on the same idea, an embodiment of the present disclosure also provides a neural network model distillation device.

可以理解的是，本公开实施例提供的神经网络模型蒸馏装置为了实现上述功能，其包含了执行各个功能相应的硬件结构和/或软件模块。结合本公开实施例中所公开的各示例的单元及算法步骤，本公开实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行，取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同的方法来实现所描述的功能，但是这种实现不应认为超出本公开实施例的技术方案的范围。It can be understood that, in order to realize the above-mentioned functions, the neural network model distillation apparatus provided by the embodiments of the present disclosure includes corresponding hardware structures and/or software modules for performing various functions. Combining the units and algorithm steps of each example disclosed in the embodiments of the present disclosure, the embodiments of the present disclosure can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the technical solutions of the embodiments of the present disclosure.

图9是根据一示例性实施例示出的一种神经网络模型蒸馏装置100框图。参照图9，该装置100包括确定单元101、获取单元102、检测单元103和处理单元104。Fig. 9 is a block diagram of a neural network model distillation apparatus 100 according to an exemplary embodiment. Referring to FIG. 9 , the device 100 includes a determination unit 101 , an acquisition unit 102 , a detection unit 103 and a processing unit 104 .

确定单元101，用于确定目标图像，以及用于确定教师模型特征图中各像素位置的物体特征与物体条件特征之间的相关性，并基于相关性，确定教师模型的注意力分布。获取单元102，用于获取目标图像的物体条件特征，物体条件特征为人工输入并表征目标物体的特征信息。检测单元103，用于基于教师模型对目标图像进行特征检测，得到教师模型特征图，并基于初始学生模型对目标图像进行特征检测，得到学生模型特征图。处理单元104，用于基于教师模型的注意力分布，以及教师模型特征图与学生模型特征图之间的特征差异度，对初始学生模型进行训练，得到目标学生模型。The determination unit 101 is used to determine the target image, and to determine the correlation between the object feature and the object condition feature at each pixel position in the teacher model feature map, and determine the attention distribution of the teacher model based on the correlation. The acquisition unit 102 is configured to acquire object condition features of the target image, where the object condition features are feature information that is manually input and characterizes the target object. The detection unit 103 is configured to perform feature detection on the target image based on the teacher model to obtain a feature map of the teacher model, and perform feature detection on the target image based on the initial student model to obtain a feature map of the student model. The processing unit 104 is configured to train the initial student model based on the attention distribution of the teacher model and the feature difference between the feature map of the teacher model and the feature map of the student model to obtain a target student model.

一种实施方式中，确定单元101采用如下方式确定教师模型特征图中各像素位置的物体特征与物体条件特征之间的相关性：将教师模型特征图中各像素位置的物体特征，分解为多个第一子特征空间。针对教师模型特征图中各像素位置，分别基于不同的权重对多个第一子特征空间进行特征加权，得到特征加权后的物体特征。确定特征加权后的物体特征与物体条件特征之间的相关性，并特征加权后的物体特征与物体条件特征之间的相关性，得到针对每一像素位置的多个第一子特征空间的第一子特征空间相关性针对教师模型特征图中各像素位置，将多个第一子特征空间的第一子特征空间相关性，进行归一化处理，得到对应每一像素位置的物体特征与物体条件特征之间的相关性。In one embodiment, the determination unit 101 determines the correlation between the object features at each pixel position in the teacher model feature map and the object condition features in the following manner: decompose the object features at each pixel position in the teacher model feature map into multiple a first sub-feature space. For each pixel position in the feature map of the teacher model, feature weighting is performed on multiple first sub-feature spaces based on different weights respectively, to obtain object features after feature weighting. Determining the correlation between the feature-weighted object features and the object conditional features, and calculating the correlation between the feature-weighted object features and the object conditional features, to obtain the first multiple first sub-feature spaces for each pixel position A sub-feature space correlation is aimed at each pixel position in the feature map of the teacher model, and the first sub-feature space correlation of multiple first sub-feature spaces is normalized to obtain the object feature and object corresponding to each pixel position. Correlations between conditional features.

一种实施方式中，处理单元104采用如下方式基于教师模型的注意力分布，以及教师模型特征图与学生模型特征图之间的特征差异度，对初始学生模型进行训练，得到目标学生模型：对教师模型的注意力分布进行优化。基于优化后的注意力分布，以及教师模型特征图与学生模型特征图之间的特征差异度，对初始学生模型进行训练，得到目标学生模型。In one embodiment, the processing unit 104 trains the initial student model based on the attention distribution of the teacher model and the feature difference between the feature map of the teacher model and the feature map of the student model in the following manner to obtain the target student model: The attention distribution of the teacher model is optimized. Based on the optimized attention distribution and the degree of feature difference between the feature map of the teacher model and the feature map of the student model, the initial student model is trained to obtain the target student model.

一种实施方式中，处理单元104采用如下方式对教师模型的注意力分布进行优化：将教师模型特征图中各像素位置的物体特征，分解为多个第二子特征空间。基于教师模型的注意力分布，对多个第二子特征空间进行特征加权。控制教师模型基于加权后的特征进行目标检测，并基于目标检测结果以及物体条件特征，调整教师模型的注意力分布，以使教师模型得到与物体条件特征一致的目标检测结果。In one embodiment, the processing unit 104 optimizes the attention distribution of the teacher model in the following manner: decompose the object features at each pixel position in the teacher model feature map into multiple second sub-feature spaces. Based on the attention distribution of the teacher model, feature weighting is performed on multiple second sub-feature spaces. Control the teacher model to perform target detection based on the weighted features, and adjust the attention distribution of the teacher model based on the target detection results and object condition characteristics, so that the teacher model can obtain the target detection results consistent with the object condition characteristics.

一种实施方式中，第一子特征空间与第二子特征空间所使用的分解方式不同。其中，分解方式包括键值分解方式和内容分解方式。In one embodiment, the decomposition methods used in the first sub-feature space and the second sub-feature space are different. Wherein, the decomposition manner includes a key-value decomposition manner and a content decomposition manner.

一种实施方式中，处理单元104采用如下方式得到与物体条件特征一致的目标检测结果：得到与物体条件特征所对应的类别特征、位置特征以及尺度特征之一或组合相一致的目标检测结果。In one embodiment, the processing unit 104 obtains the target detection result consistent with the object conditional feature in the following manner: Obtain the target detection result consistent with one or a combination of the category feature, position feature, and scale feature corresponding to the object conditional feature.

一种实施方式中，确定单元101采用如下方式基于相关性，确定教师模型的注意力分布：基于教师模型特征图中各像素位置的物体特征与物体条件特征之间的相关性大小关系，为教师模型特征图中各像素位置进行权重分配，并根据所分配的权重，得到教师模型的注意力分布。其中，第一相关性对应的第一权重大于第二相关性对应的第二权重，第一相关性大于第二相关性。In one embodiment, the determining unit 101 determines the attention distribution of the teacher model based on the correlation in the following manner: based on the correlation size relationship between the object feature and the object condition feature at each pixel position in the teacher model feature map, the teacher model Weights are assigned to each pixel position in the model feature map, and the attention distribution of the teacher model is obtained according to the assigned weights. Wherein, the first weight corresponding to the first correlation is greater than the second weight corresponding to the second correlation, and the first correlation is greater than the second correlation.

一种实施方式中，检测单元103采用如下方式基于教师模型对目标图像进行特征检测，得到教师模型特征图，并基于初始学生模型对目标图像进行特征检测，得到学生模型特征图：控制教师模型对目标图像进行多尺度特征检测，得到包括多尺度特征的教师模型特征图。并控制学生模型对目标图像进行多尺度特征检测，得到包括多尺度特征的学生模型特征图。In one embodiment, the detection unit 103 performs feature detection on the target image based on the teacher model in the following manner to obtain a feature map of the teacher model, and performs feature detection on the target image based on the initial student model to obtain a feature map of the student model: control the teacher model to The target image is subjected to multi-scale feature detection, and a teacher model feature map including multi-scale features is obtained. And control the student model to perform multi-scale feature detection on the target image, and obtain the student model feature map including multi-scale features.

如图10所示，本公开的一个实施方式提供了一种电子设备200。其中，该电子设备200包括存储器201、处理器202、输入/输出(Input/Output，I/O)接口203。其中，存储器201，用于存储指令。处理器202，用于调用存储器201存储的指令执行本公开实施例的神经网络模型蒸馏方法。其中，处理器202分别与存储器201、I/O接口203连接，例如可通过总线系统和/或其他形式的连接机构(未示出)进行连接。存储器201可用于存储程序和数据，包括本公开实施例中涉及的神经网络模型蒸馏方法的程序，处理器202通过运行存储在存储器201的程序从而执行电子设备200的各种功能应用以及数据处理。As shown in FIG. 10 , an embodiment of the present disclosure provides an electronic device 200 . Wherein, the electronic device 200 includes a memory 201 , a processor 202 , and an input/output (Input/Output, I/O) interface 203 . Wherein, the memory 201 is used for storing instructions. The processor 202 is configured to invoke the instructions stored in the memory 201 to execute the neural network model distillation method of the embodiment of the present disclosure. Wherein, the processor 202 is respectively connected to the memory 201 and the I/O interface 203, for example, may be connected through a bus system and/or other forms of connection mechanisms (not shown). The memory 201 can be used to store programs and data, including the program of the neural network model distillation method involved in the embodiments of the present disclosure, and the processor 202 executes various functional applications and data processing of the electronic device 200 by running the programs stored in the memory 201 .

本公开实施例中处理器202可以采用数字信号处理器(DigitalSignalProcessing，DSP)、现场可编程门阵列(Field-ProgrammableGateArray，FPGA)、可编程逻辑阵列(ProgrammableLogicArray，PLA)中的至少一种硬件形式来实现，所述处理器202可以是中央处理单元(CentralProcessingUnit，CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元中的一种或几种的组合。In the embodiment of the present disclosure, the processor 202 may adopt at least one hardware form in a digital signal processor (Digital Signal Processing, DSP), a field-programmable gate array (Field-Programmable Gate Array, FPGA), or a programmable logic array (Programmable Logic Array, PLA) In implementation, the processor 202 may be one or a combination of a central processing unit (Central Processing Unit, CPU) or other forms of processing units with data processing capabilities and/or instruction execution capabilities.

本公开实施例中的存储器201可以包括一个或多个计算机程序产品，所述计算机程序产品可以包括各种形式的计算机可读存储介质，例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RandomAccessMemory，RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(Read-OnlyMemory，ROM)、快闪存储器(FlashMemory)、硬盘(HardDiskDrive，HDD)或固态硬盘(Solid-StateDrive，SSD)等。The memory 201 in the embodiments of the present disclosure may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, a random access memory (Random Access Memory, RAM) and/or a cache memory (cache). The non-volatile memory may include, for example, a read-only memory (Read-OnlyMemory, ROM), a flash memory (FlashMemory), a hard disk (HardDiskDrive, HDD) or a solid-state disk (Solid-StateDrive, SSD) and the like.

本公开实施例中，I/O接口203可用于接收输入的指令(例如数字或字符信息，以及产生与电子设备200的用户设置以及功能控制有关的键信号输入等)，也可向外部输出各种信息(例如，图像或声音等)。本公开实施例中I/O接口203可包括物理键盘、功能按键(比如音量控制按键、开关按键等)、鼠标、操作杆、轨迹球、麦克风、扬声器、和触控面板等中的一个或多个。In the embodiment of the present disclosure, the I/O interface 203 can be used to receive input instructions (such as digital or character information, and generate key signal input related to user settings and function control of the electronic device 200, etc.), and can also output various commands to the outside. information (for example, images or sounds, etc.). In the embodiment of the present disclosure, the I/O interface 203 may include one or more of a physical keyboard, a function button (such as a volume control button, a switch button, etc.), a mouse, a joystick, a trackball, a microphone, a speaker, and a touch panel. indivual.

在一些实施方式中，本公开提供了一种计算机可读存储介质，该计算机可读存储介质存储有计算机可执行指令，计算机可执行指令在由处理器执行时，执行上文所述的任何方法。In some embodiments, the present disclosure provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, perform any of the methods described above .

尽管在附图中以特定的顺序描述操作，但是不应将其理解为要求按照所示的特定顺序或是串行顺序来执行这些操作，或是要求执行全部所示的操作以得到期望的结果。在特定环境中，多任务和并行处理可能是有利的。Although operations are depicted in the drawings in a particular order, this should not be understood as requiring that operations be performed in the particular order shown, or in a serial order, or that all illustrated operations be performed, to achieve desirable results . In certain circumstances, multitasking and parallel processing may be advantageous.

本公开的方法和装置能够利用标准编程技术来完成，利用基于规则的逻辑或者其他逻辑来实现各种方法步骤。还应当注意的是，此处以及权利要求书中使用的词语“装置”和“模块”意在包括使用一行或者多行软件代码的实现和/或硬件实现和/或用于接收输入的设备。The methods and apparatus of the present disclosure can be accomplished using standard programming techniques, using rule-based logic or other logic to implement the various method steps. It should also be noted that the words "means" and "module" as used herein and in the claims are intended to include an implementation using one or more lines of software code and/or a hardware implementation and/or a device for receiving input.

此处描述的任何步骤、操作或程序可以使用单独的或与其他设备组合的一个或多个硬件或软件模块来执行或实现。在一个实施方式中，软件模块使用包括包含计算机程序代码的计算机可读介质的计算机程序产品实现，其能够由计算机处理器执行用于执行任何或全部的所描述的步骤、操作或程序。Any steps, operations or procedures described herein can be performed or realized using one or more hardware or software modules alone or in combination with other devices. In one embodiment, a software module is implemented using a computer program product comprising a computer readable medium containing computer program code executable by a computer processor for performing any or all of the described steps, operations or procedures.

出于示例和描述的目的，已经给出了本公开实施的前述说明。前述说明并非是穷举性的也并非要将本公开限制到所公开的确切形式，根据上述教导还可能存在各种变形和修改，或者是可能从本公开的实践中得到各种变形和修改。选择和描述这些实施例是为了说明本公开的原理及其实际应用，以使得本领域的技术人员能够以适合于构思的特定用途来以各种实施方式和各种修改而利用本公开。The foregoing description of an implementation of the disclosure has been presented for purposes of illustration and description. The foregoing description is not intended to be exhaustive or to limit the disclosure to the precise form disclosed, and variations and modifications are possible in light of the above teachings or may be acquired from practice of the disclosure. The embodiments were chosen and described in order to illustrate the principles of the disclosure and its practical application, to enable others skilled in the art to utilize the disclosure in various embodiments and with various modifications as are suited to the particular use contemplated.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

可以理解的是，本公开中“多个”是指两个或两个以上，其它量词与之类似。“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。It can be understood that "plurality" in the present disclosure refers to two or more, and other quantifiers are similar. "And/or" describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently. The character "/" generally indicates that the contextual objects are an "or" relationship. The singular forms "a", "said" and "the" are also intended to include the plural unless the context clearly dictates otherwise.

进一步可以理解的是，术语“第一”、“第二”等用于描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开，并不表示特定的顺序或者重要程度。实际上，“第一”、“第二”等表述完全可以互换使用。例如，在不脱离本公开范围的情况下，第一信息也可以被称为第二信息，类似地，第二信息也可以被称为第一信息。It can be further understood that the terms "first", "second", etc. are used to describe various information, but the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another, and do not imply a specific order or degree of importance. In fact, expressions such as "first" and "second" can be used interchangeably. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information.

进一步可以理解的是，除非有特殊说明，“连接”包括两者之间不存在其他构件的直接连接，也包括两者之间存在其他元件的间接连接。It can be further understood that, unless otherwise specified, "connection" includes a direct connection without other components between the two, and also includes an indirect connection between the two with other elements.

进一步可以理解的是，本公开实施例中尽管在附图中以特定的顺序描述操作，但是不应将其理解为要求按照所示的特定顺序或是串行顺序来执行这些操作，或是要求执行全部所示的操作以得到期望的结果。在特定环境中，多任务和并行处理可能是有利的。It can be further understood that although operations are described in a specific order in the drawings in the embodiments of the present disclosure, it should not be understood as requiring that these operations be performed in the specific order shown or in a serial order, or that Perform all operations shown to obtain the desired result. In certain circumstances, multitasking and parallel processing may be advantageous.

本领域技术人员在考虑说明书及实践这里的公开后，将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由下面的权利范围指出。Other embodiments of the disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any modification, use or adaptation of the present disclosure, and these modifications, uses or adaptations follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure . The specification and examples are to be considered as illustrative only, with the true scope and spirit of the disclosure indicated by the following claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利范围来限制。It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the scope of the appended claims.

Claims

1. a neural network model distillation method, is characterized in that, described method comprises:

Determining the target image, and acquiring object condition features of the target image, the object condition features are feature information that is manually input and characterizes the target object;

performing feature detection on the target image based on the teacher model to obtain a feature map of the teacher model, and performing feature detection on the target image based on the initial student model to obtain a feature map of the student model;

Determining the correlation between the object feature at each pixel position in the feature map of the teacher model and the object condition feature, and based on the correlation, determine the attention distribution of the teacher model;

Based on the attention distribution of the teacher model and the feature difference between the feature map of the teacher model and the feature map of the student model, the initial student model is trained to obtain a target student model.

2. neural network model distillation method according to claim 1, is characterized in that, determines the correlation between the object feature of each pixel position in the feature map of the teacher model and the object condition feature, comprising:

Decomposing the object features of each pixel position in the teacher model feature map into a plurality of first sub-feature spaces;

For each pixel position in the feature map of the teacher model, perform feature weighting on the plurality of first sub-feature spaces based on different weights respectively, to obtain object features after feature weighting;

Determining the correlation between the feature-weighted object features and the object condition features, and according to the correlation between the feature-weighted object features and the object condition features, obtaining the first sub-feature space correlation of the plurality of first sub-feature spaces;

For each pixel position in the feature map of the teacher model, the first sub-feature space correlation of the plurality of first sub-feature spaces is normalized to obtain the object feature and the object feature corresponding to each pixel position. Correlations between conditional features.

3. The neural network model distillation method according to claim 1 or 2, characterized in that, based on the attention distribution of the teacher model, and the feature difference between the teacher model feature map and the student model feature map degree, the initial student model is trained to obtain the target student model, including:

Optimizing the attention distribution of the teacher model;

Based on the optimized attention distribution and the degree of feature difference between the feature map of the teacher model and the feature map of the student model, the initial student model is trained to obtain a target student model.

4. neural network model distillation method according to claim 3, is characterized in that, optimizing the attention distribution of described teacher model, comprises:

Decomposing the object features of each pixel position in the teacher model feature map into a plurality of second sub-feature spaces;

performing feature weighting on the plurality of second sub-feature spaces based on the attention distribution of the teacher model;

Controlling the teacher model to perform target detection based on the weighted features, and adjusting the attention distribution of the teacher model based on the target detection results and object condition characteristics, so that the teacher model obtains a target consistent with the object condition characteristics Test results.

5. The neural network model distillation method according to claim 4, characterized in that, the first sub-feature space is different from the decomposition method used in the second sub-feature space;

Wherein, the decomposition manner includes a key-value decomposition manner and a content decomposition manner.

6. neural network model distillation method according to claim 4, is characterized in that, obtains the target detection result consistent with described object condition characteristic, comprises:

A target detection result consistent with one or a combination of the category feature, position feature, and scale feature corresponding to the object condition feature is obtained.

7. according to the neural network model distillation method described in claim 1 or 5 or 6, it is characterized in that, based on described correlation, determine the attention distribution of described teacher model, comprising:

Based on the correlation size relationship between the object feature of each pixel position in the teacher model feature map and the object condition feature, carry out weight distribution for each pixel position in the teacher model feature map, and according to the assigned weight, Obtain the attention distribution of the teacher model;

Wherein, for each pixel position in the feature map of the teacher model, the pixel position with greater correlation has a greater corresponding weight.

8. The neural network model distillation method according to claim 1 or 2, wherein the target image is subjected to feature detection based on the teacher model to obtain a teacher model feature map, and the target image is performed based on the initial student model Feature detection, to obtain the feature map of the student model, including:

performing multi-scale feature detection on the target image through the teacher model to obtain a teacher model feature map including multi-scale features; and

Using the student model to perform multi-scale feature detection on the target image, a student model feature map including multi-scale features is obtained.

9. A neural network model distillation device, characterized in that the device comprises:

A determination unit, configured to determine the target image, and to determine the correlation between the object feature at each pixel position in the teacher model feature map and the object condition feature, and based on the correlation, determine the attention of the teacher model force distribution;

An acquisition unit, configured to acquire object condition features of the target image, where the object condition features are feature information that is manually input and characterizes the target object;

A detection unit, configured to perform feature detection on the target image based on the teacher model to obtain a feature map of the teacher model, and perform feature detection on the target image based on the initial student model to obtain a feature map of the student model;

A processing unit, configured to train the initial student model based on the attention distribution of the teacher model and the feature difference between the feature map of the teacher model and the feature map of the student model to obtain a target student model.

10. An electronic device, characterized in that it comprises:

processor;

memory for storing processor-executable instructions;

Wherein, the processor is configured to: execute the neural network model distillation method described in any one of claims 1-8.

11. A storage medium, characterized in that instructions are stored in the storage medium, and when the instructions in the storage medium are executed by a processor, the processor is able to execute any one of claims 1 to 8. Distillation methods for neural network models.