CN112016595A

CN112016595A - Image classification method and device, electronic equipment and readable storage medium

Info

Publication number: CN112016595A
Application number: CN202010779364.XA
Authority: CN
Inventors: 钱立辉; 李马丁; 王斌; 于冰
Original assignee: Tsinghua University; Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Tsinghua University; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2020-12-01

Abstract

The disclosure relates to an image classification method and device, an electronic device and a readable storage medium. The method comprises the following steps: acquiring an initial image to be classified; acquiring a depth thermodynamic diagram of the initial image to be classified; the depth thermodynamic diagram refers to an image which represents depth-of-field depth information of an object in an initial image to be classified through different color gamuts; combining the RGB channel image of the initial image to be classified and the depth thermodynamic diagram to obtain a target image to be classified of 4 channels; and classifying the initial image to be classified according to the target image to be classified to obtain whether the image to be classified is an image with a large aperture type. In the embodiment, the accuracy of image classification can be improved by adding the depth thermodynamic diagram containing the depth information of the depth of field as one dimension characteristic of the image to be classified.

Description

Image classification method and apparatus, electronic device, and readable storage medium

技术领域technical field

本公开涉及图像处理技术领域，尤其涉及一种图像分类方法和装置、电子设备、可读存储介质。The present disclosure relates to the technical field of image processing, and in particular, to an image classification method and apparatus, an electronic device, and a readable storage medium.

背景技术Background technique

目前，越来越多的用户喜欢使用单反等摄像机拍摄图像和视频，在拍摄过程中，拍摄者会采用景深(Depth of Field，DOF)效果，此时可以获得高清、美观，且具有一定立体视觉效果的大光圈图像或大光圈视频。At present, more and more users like to use cameras such as SLRs to shoot images and videos. During the shooting process, the photographer will use the Depth of Field (DOF) effect, at this time, high-definition, beautiful, and a certain stereo vision can be obtained. A wide-aperture image or a wide-aperture video of the effect.

现实生活中，用户还可以将上述大光圈图像或大光圈视频上传到视频平台中分享，如快手。在用户上传视频后，现有视频平台可能会利用无参考视频质量评价(NR-VQA)算法，如目前比较流行的机器学习算法，来估算视频质量评分，从而将这类大光圈图像或者视频筛选出来，以作为后续视频推荐算法、视频超分辨率等应用的处理对象。In real life, users can also upload the above-mentioned large-aperture images or large-aperture videos to a video platform for sharing, such as Kuaishou. After a user uploads a video, existing video platforms may use a no-reference video quality assessment (NR-VQA) algorithm, such as the currently popular machine learning algorithm, to estimate the video quality score, so as to screen such large-aperture images or videos. It is used as the processing object of subsequent video recommendation algorithms, video super-resolution and other applications.

大光圈图像、视频的一个重要特征是在焦平面(即景深内)，被拍摄的物体是清晰的，但是在焦平面前后，拍摄时入射光线会发生聚集和扩散，从而导致影像变得模糊，形成一个圆状的区域，常被称为“弥散圆”。然而，将上述机器学习算法应用到大光圈图像或者大光圈视频预测时，会由于“弥散圆”的存在，导致其无法准确的分辨出待预测对象是大光圈图像还是因压缩导致的失真图像，使得筛选结果有较高的错误率。An important feature of large-aperture images and videos is that in the focal plane (that is, within the depth of field), the object to be photographed is clear, but before and after the focal plane, the incident light will be concentrated and diffused during shooting, resulting in blurred images. A circular area is formed, often referred to as a "diffuse circle". However, when the above machine learning algorithm is applied to large-aperture image or large-aperture video prediction, due to the existence of "circle of confusion", it cannot accurately distinguish whether the object to be predicted is a large-aperture image or a distorted image caused by compression. Make the screening results have a higher error rate.

发明内容SUMMARY OF THE INVENTION

本公开提供一种图像分类方法和装置、电子设备、可读存储介质，以至少解决相关技术中的问题。The present disclosure provides an image classification method and apparatus, an electronic device, and a readable storage medium to at least solve the problems in the related art.

本公开的技术方案如下：The technical solutions of the present disclosure are as follows:

根据本公开实施例的第一方面，提供一种图像分类方法，包括：According to a first aspect of the embodiments of the present disclosure, an image classification method is provided, including:

获取初始待分类图像；Get the initial image to be classified;

获取所述初始待分类图像的深度热力图；所述深度热力图是指通过不同色域来表示初始待分类图像中对象的景深深度信息的图像；obtaining a depth heat map of the initial image to be classified; the depth heat map refers to an image representing depth of field depth information of an object in the initial image to be classified through different color gamuts;

组合所述初始待分类图像的RGB通道图像与所述深度热力图，得到4通道的目标待分类图像；Combining the RGB channel image of the initial to-be-classified image and the depth heatmap to obtain a 4-channel target to-be-classified image;

根据所述目标待分类图像，对所述初始待分类图像进行分类，以得到所述待分类图像是否为大光圈类型的图像。According to the target to-be-classified image, the initial to-be-classified image is classified to obtain whether the to-be-classified image is a large aperture type image.

可选地，当待分类图像为待分类视频的视频帧序列中的一帧时，所述方法还包括：Optionally, when the image to be classified is a frame in the video frame sequence of the video to be classified, the method further includes:

基于所述视频帧序列中各初始待分类图像的预测分类获取表示所述待分类视频是否为大光圈类型的预测分类。Based on the predicted classification of each initial to-be-classified image in the video frame sequence, a predicted classification indicating whether the to-be-classified video is of a large aperture type is obtained.

可选地，所述待分类视频的预测分类的取值为所述视频帧序列中各初始待分类图像的预测分类取值的平均值。Optionally, the value of the predicted classification of the video to be classified is an average value of the predicted classification values of the initial images to be classified in the video frame sequence.

可选地，所述根据所述目标待分类图像，对所述初始待分类图像进行分类，以得到所述待分类图像是否为大光圈类型的图像，包括：Optionally, classifying the initial to-be-classified image according to the target to-be-classified image to obtain whether the to-be-classified image is an image of a large aperture type, including:

将所述目标分类图像输入到图像分类预测模型中，根据所述图像分类预测模型输出的预测分类信息，确定所述待分类图像是否为大光圈类型图像，其中所述图像分类预测模型是根据样本图像的深度热力图和所述样本图像的RGB通道图像训练得到的。Input the target classification image into the image classification prediction model, and determine whether the to-be-classified image is a large aperture type image according to the prediction classification information output by the image classification prediction model, wherein the image classification prediction model is based on the sample The depth heat map of the image and the RGB channel image of the sample image are obtained by training.

可选地，所述图像分类预测模型的输出层包括Sigmoid函数；所述Sigmoid函数用于输出取值连续的表示预测分类的数值，所述数值越接近于1表示初始待分类图像是大光圈类型的概率越大。Optionally, the output layer of the image classification prediction model includes a Sigmoid function; the Sigmoid function is used to output a continuous value representing the predicted classification, and the closer the value is to 1, it indicates that the initial to-be-classified image is a large aperture type. the greater the probability.

根据本公开实施例的第二方面，提供一种图像分类装置，所述装置包括：输入模块、获取模块和分类模块；According to a second aspect of the embodiments of the present disclosure, there is provided an image classification apparatus, the apparatus includes: an input module, an acquisition module, and a classification module;

所述输入模块，被配置为执行获取初始待分类图像，并将所述初始待分类图像分别发送给所述获取模块和所述分类模块；The input module is configured to perform acquisition of an initial image to be classified, and send the initial image to be classified to the acquisition module and the classification module respectively;

所述获取模块，被配置为执行获取所述初始待分类图像的深度热力图；所述深度热力图是指通过不同色域来表示初始待分类图像中对象的景深深度信息的图像；The acquisition module is configured to perform acquisition of a depth heat map of the initial image to be classified; the depth heat map refers to an image representing depth of field depth information of an object in the initial image to be classified through different color gamuts;

所述分类模块，被配置为执行组合所述初始待分类图像的RGB通道图像与所述深度热力图，得到4通道的目标待分类图像；以及根据所述目标待分类图像，对所述初始待分类图像进行分类，以得到所述待分类图像是否为大光圈类型的图像。The classification module is configured to perform combining the RGB channel image of the initial to-be-classified image and the depth heat map to obtain a 4-channel target to-be-classified image; and, according to the target to-be-classified image, perform a The classified images are classified to obtain whether the to-be-classified image is a large aperture type image.

可选地，当待分类图像为待分类视频的视频帧序列中的一帧时，所述装置还包括输出模块；Optionally, when the image to be classified is a frame in the video frame sequence of the video to be classified, the device further includes an output module;

所述输出模块，被配置为执行基于所述视频帧序列中各初始待分类图像的预测分类获取表示所述待分类视频是否为大光圈类型的预测分类。The output module is configured to perform a predictive classification based on each initial to-be-classified image in the video frame sequence to obtain a predictive classification indicating whether the to-be-classified video is a large aperture type.

可选地，所述分类模块包括：Optionally, the classification module includes:

图像输入单元，被配置为执行将所述目标分类图像输入到图像分类预测模型中；an image input unit configured to perform inputting the target classification image into an image classification prediction model;

分类确定单元，被配置为执行根据所述图像分类预测模型输出的预测分类信息，确定所述待分类图像是否为大光圈类型图像，其中所述图像分类预测模型是根据样本图像的深度热力图和所述样本图像的RGB通道图像训练得到的。The classification determination unit is configured to execute the prediction classification information output according to the image classification prediction model, and determine whether the to-be-classified image is a large aperture type image, wherein the image classification prediction model is based on the depth heat map of the sample image and The RGB channel image of the sample image is obtained by training.

根据本公开实施例的第三方面，提供一种电子设备，包括：According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, comprising:

处理器；processor;

用于存储所述处理器可执行程序的存储器；其中，所述处理器被配置为执行所述存储器中的可执行程序，以实现如上述方法的步骤。A memory for storing an executable program of the processor; wherein the processor is configured to execute the executable program in the memory to implement the steps of the above method.

根据本公开实施例的第四方面，提供一种计算机可读存储介质，当所述存储介质中的可执行程序被执行时，能够执行如上述方法的步骤。According to a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, when an executable program in the storage medium is executed, the steps of the above method can be performed.

根据本公开实施例的第五方面，提供一种计算机应用程序，当该应用程序由处理器执行时能够执行上述所述方法的步骤。According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer application program capable of executing the steps of the above-mentioned method when the application program is executed by a processor.

本公开的实施例提供的技术方案至少带来以下有益效果：The technical solutions provided by the embodiments of the present disclosure bring at least the following beneficial effects:

本实施例中可以获取初始待分类图像；然后获取所述初始待分类图像的深度热力图；所述深度热力图是指通过不同色域来表示初始待分类图像中对象的景深深度信息的图像；之后，组合所述初始待分类图像的RGB通道图像与所述深度热力图，得到4通道的目标待分类图像；最后，根据所述目标待分类图像，对所述初始待分类图像进行分类，以得到所述待分类图像是否为大光圈类型的图像。这样，本实施例中通过增加包含景深深度信息的深度热力图作为待分类图像的一个维度特征，可以提升图像分类的准确度。In this embodiment, an initial image to be classified may be obtained; then a depth heat map of the initial image to be classified may be obtained; the depth heat map refers to an image representing depth of field depth information of objects in the initial image to be classified through different color gamuts; Then, combine the RGB channel image of the initial to-be-classified image and the depth heatmap to obtain a 4-channel target to-be-classified image; finally, according to the target to-be-classified image, classify the initial to-be-classified image to Obtain whether the image to be classified is a large aperture type image. In this way, in this embodiment, by adding a depth heat map including depth of field depth information as a dimensional feature of the image to be classified, the accuracy of image classification can be improved.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理，并不构成对本公开的不当限定。The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present disclosure, and together with the description, serve to explain the principles of the present disclosure and do not unduly limit the present disclosure.

图1是相关技术中示出的一种拍摄大光圈图像的光路原理图。FIG. 1 is a schematic diagram of an optical path for capturing a large aperture image shown in the related art.

图2是相关技术中示出的一种大光圈图像的效果图。FIG. 2 is an effect diagram of a large aperture image shown in the related art.

图3是根据一示例性实施例示出的一种图像分类方法的流程图。Fig. 3 is a flowchart of an image classification method according to an exemplary embodiment.

图4是根据一示例性实施例示出的一种深度热力图的示意图。Fig. 4 is a schematic diagram of a deep thermal map according to an exemplary embodiment.

图5是根据一示例性实施例示出的一种图像分类装置的框图。Fig. 5 is a block diagram of an image classification apparatus according to an exemplary embodiment.

图6是根据一示例性实施例示出的一种获取大光圈视频数据集的流程图。Fig. 6 is a flow chart of acquiring a large aperture video data set according to an exemplary embodiment.

图7是根据一示例性实施例示出的一种图像分类装置的框图。Fig. 7 is a block diagram of an image classification apparatus according to an exemplary embodiment.

图8是根据一示例性实施例示出的一种电子设备的框图。Fig. 8 is a block diagram of an electronic device according to an exemplary embodiment.

具体实施方式Detailed ways

为了使本领域普通人员更好地理解本公开的技术方案，下面将结合附图，对本公开实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

发明人结合图1所示原理对大光圈图像进行分析，由于景深效果，大光圈图像的各个场景会包含不同的深度，参见图2，焦平面的物体(白框内)会相对清晰，而焦平面外的物体(黑框内)在图像中会比较模糊。也就是说，大光圈图像中，包含着一定的物体深度信息，即清晰区域对应的实物与摄像机的距离是一致的，而模糊区域对应的物体距离摄像机的相对位置距离不同。The inventor analyzes the large-aperture image based on the principle shown in Figure 1. Due to the depth of field effect, each scene of the large-aperture image will contain different depths. Referring to Figure 2, the object in the focal plane (in the white frame) will be relatively clear, while the focal plane will be relatively clear. Objects that are out of plane (inside the black frame) will be blurred in the image. That is to say, the large-aperture image contains certain depth information of the object, that is, the distance between the object corresponding to the clear area and the camera is consistent, while the relative position distance of the object corresponding to the blurred area from the camera is different.

为此，本公开实施例提供了一种图像分类方法，其发明构思在于，获取待分类图像的深度信息，结合深度信息对待分类图像进行分类，从而提升图像分类的准确度。实际应用中，上述深度信息可以为待分类图像的深度数据、深度热力图、深度直方图等，后续以深度信息是深度热力图为例描述各实施例，其中深度热力图是指通过不同色域来表示初始待分类图像中对象的景深深度信息的图像。To this end, an embodiment of the present disclosure provides an image classification method, the inventive concept of which is to acquire depth information of an image to be classified, and classify the image to be classified in combination with the depth information, thereby improving the accuracy of image classification. In practical applications, the above-mentioned depth information may be depth data, depth heat map, depth histogram, etc. of the image to be classified, and each embodiment will be described in the following by taking the depth information as a depth heat map as an example, wherein the depth heat map refers to different color gamuts. to represent the depth of field depth information of the object in the initial image to be classified.

图3是根据一示例性实施例示出的一种图像分类方法的流程图，可以适应于智能手机、服务器、个人电脑等电子设备。参见图3，一种图像分类方法，包括步骤31～步骤34，其中：Fig. 3 is a flowchart of an image classification method according to an exemplary embodiment, which can be adapted to electronic devices such as smart phones, servers, and personal computers. Referring to FIG. 3, an image classification method includes steps 31 to 34, wherein:

在步骤31中，获取初始待分类图像。In step 31, an initial image to be classified is obtained.

本实施例中，电子设备可以获取待分类文件，该待分类文件可以为图像或者视频。当输入为图像时，可以直接获取一张图像作为待分类图像。当输入为视频时，电子设备可以按照预设间隔，如间隔10帧采集1帧，从视频流中采样至少一帧视频帧，将至少一帧视频帧作为待分类图像进行后续处理，其中视频的每一视频帧的分类过程与一张图像的分类过程相同，其区别在于视频的预测分类是处理至少一帧视频帧的预测分类之后所得到的，后续会详细说明，在此先不作介绍。In this embodiment, the electronic device may acquire a file to be classified, and the file to be classified may be an image or a video. When the input is an image, an image can be directly obtained as the image to be classified. When the input is video, the electronic device can collect 1 frame at a preset interval, such as every 10 frames, sample at least one video frame from the video stream, and use at least one video frame as an image to be classified for subsequent processing. The classification process of each video frame is the same as the classification process of an image, the difference is that the prediction classification of the video is obtained after processing the prediction classification of at least one video frame, which will be described in detail later, and will not be introduced here.

在步骤32中，获取所述初始待分类图像的深度热力图。In step 32, a depth heat map of the initial image to be classified is obtained.

本实施例中，电子设备可以获取初始待分类图像的深度热力图，所述深度热力图是指通过不同色域来表示初始待分类图像中对象的景深深度信息的图像，效果如图4所示，图4中左图表示初始待分类图像，图4中右图表示深度热力图。具体实现时，电子设备可以获取初始待分类图像的深度数据、深度热力图、深度直方图等深度信息，在能够获取到与深度信息的情况下，相应的算法落入本公开的保护范围。在一示例中，电子设备可以采用DeepLens算法实现，该DeepLens算法可以包括编码器和解码器两部分。编码器可以采用预训练的ResNet50架构。解码器是一系列的上采样模块，同时，解码器包含了来自编码器的跳跃连接。跳跃连接对应的两个层的分辨率是相同的，目的是用于减少在网络正向计算时下采样操作的特征损失和防止网络梯度传播时的梯度消失现象，解码器最后一层直接输出待分类图像的预测的深度热力图。In this embodiment, the electronic device can obtain the depth heat map of the initial image to be classified, and the depth heat map refers to an image representing the depth of field depth information of the object in the initial image to be classified through different color gamuts, and the effect is shown in FIG. 4 . , the left image in Figure 4 represents the initial image to be classified, and the right image in Figure 4 represents the depth heat map. During specific implementation, the electronic device can obtain depth information such as depth data, depth heat map, depth histogram, etc. of the initial image to be classified. If the depth information can be obtained, the corresponding algorithm falls within the protection scope of the present disclosure. In an example, the electronic device may be implemented by using the DeepLens algorithm, and the DeepLens algorithm may include two parts: an encoder and a decoder. The encoder can adopt the pretrained ResNet50 architecture. The decoder is a series of up-sampling modules, while the decoder includes skip connections from the encoder. The resolution of the two layers corresponding to the skip connection is the same. The purpose is to reduce the feature loss of the downsampling operation during the forward calculation of the network and prevent the gradient disappearance during the network gradient propagation. The last layer of the decoder directly outputs to be classified. Predicted depth heatmap of the image.

在步骤33中，组合所述初始待分类图像的RGB通道图像与所述深度热力图，得到4通道的目标待分类图像。In step 33, the RGB channel image of the initial to-be-classified image and the depth heat map are combined to obtain a 4-channel target to-be-classified image.

本实施例中，电子设备可以组合初始待分类图像和深度热力图，即：由于初始待分类图像由RGB通道图像组成，电子设备可以获取到红色通道图像、绿色通道图像和蓝色通道图像，将颜色通道图像调整为同一大小224*224*1。并且，电子设备可以将深度热力图调整为224*224*1。然后，电子设备可以将深度热力图作为初始待分类图像的第4个通道加入到RGB通道图像中，从而组成一个包含红色、绿色、蓝色和深度共4个维度的目标待分类图像，即224*224*4。In this embodiment, the electronic device can combine the initial to-be-classified image and the depth heat map, that is, since the initial to-be-classified image is composed of RGB channel images, the electronic device can obtain the red channel image, the green channel image and the blue channel image, and use The color channel image is resized to the same size 224*224*1. Moreover, the electronic device can adjust the depth heatmap to 224*224*1. Then, the electronic device can add the depth heat map as the fourth channel of the initial image to be classified into the RGB channel image, thereby forming a target image to be classified including four dimensions of red, green, blue and depth, namely 224 *224*4.

可理解的是，当初始待分类图像除了采用RGB色系外，还可以采用HSI色系、YUV色系、YCbCr色系实现，此时将RGB通道依次替换为相应通道的图像即可，替换通道获取目标待分类图像的方案同样落入本公开的保护范围。It is understandable that when the initial image to be classified uses the RGB color system, it can also use the HSI color system, YUV color system, and YCbCr color system. The solution of acquiring the target image to be classified also falls within the protection scope of the present disclosure.

具体实现时，电子设备还可以对目标待分类图像中各通道图像进行归一化，以深度热力图为例，归一化方式如下所示：During specific implementation, the electronic device can also normalize each channel image in the target image to be classified. Taking the depth heat map as an example, the normalization method is as follows:

式(1)中，(x,y)代表深度热力图中各像素的坐标，max()函数用于计算原始的深度热力图d_i中的最大值；通过式(1)可以逐个像素计算后得到归一化后的深度热力图d_i’。本示例中，通过归一化可以使各像素的深度热力值位于[0,1]范围内，从而方便计算。In formula (1), (x, y) represents the coordinates of each pixel in the depth heatmap, and the max() function is used to calculate the maximum value in the original depth heatmap d _i ; The normalized depth heatmap d _i ' is obtained. In this example, the depth thermal value of each pixel can be in the range of [0, 1] through normalization, which is convenient for calculation.

需要说明的是，其他各通道图像的归一化方式可以参考深度热力图的归一化方式，在此不再赘述。最终，电子设备可以获取各通道均归一化后的目标待分类图像。It should be noted that the normalization method of the images of other channels may refer to the normalization method of the depth heat map, which will not be repeated here. Finally, the electronic device can obtain the target image to be classified after each channel is normalized.

在步骤34中，根据所述目标待分类图像，对所述初始待分类图像进行分类，以得到所述待分类图像是否为大光圈类型的图像。In step 34, the initial to-be-classified image is classified according to the target to-be-classified image to obtain whether the to-be-classified image is a large aperture type image.

本实施例中，电子设备可以获取目标待分类图像的预测分类，由于目标待分类图像实质上体现了初始待分类图像的特征信息，因此，上述预测分类也可以用于作为初始待分类图像的预测分类。In this embodiment, the electronic device can obtain the predicted classification of the target image to be classified. Since the target image to be classified substantially reflects the feature information of the initial image to be classified, the above predicted classification can also be used as the prediction of the initial image to be classified. Classification.

上述预测分类的取值可以是一个连续值，取值范围可以为(0,1)，并且该预测分类的取值越大表示目标待分类图像是大光圈类型的概率越高。实际应用时，可以预先设置一个分类阈值(如0.4，可调整)，在预测分类大于或者等于分类阈值时，预测分类为大光圈类型，否则不是大光圈类型，此时预测分类可以包括：大光圈类型，不是大光圈类型。技术人员可以根据具体场景选择合适的方案，相应方案落入本公开的保护范围。The value of the above prediction classification may be a continuous value, and the value range may be (0, 1), and the larger the value of the prediction classification, the higher the probability that the target image to be classified is of the large aperture type. In practical applications, a classification threshold (such as 0.4, which can be adjusted) can be preset. When the predicted classification is greater than or equal to the classification threshold, the predicted classification is a large aperture type, otherwise it is not a large aperture type. At this time, the predicted classification can include: large aperture type, not the wide aperture type. A technical person can select an appropriate solution according to a specific scenario, and the corresponding solution falls within the protection scope of the present disclosure.

本示例中，电子设备可以采用ResNet18算法实现对目标待分类图像的分类，此时需要对ResNet18算法的输入层和输出层做适当调整，如表1所示。In this example, the electronic device can use the ResNet18 algorithm to classify the target image to be classified. At this time, it is necessary to make appropriate adjustments to the input layer and output layer of the ResNet18 algorithm, as shown in Table 1.

表1中，Cov2d,7×7是一个卷积核尺寸为7的卷积操作，Max pooling,3×3是一个核尺寸为3的最大值池化操作，Ave pooling,3×3是一个核尺寸为3的均值池化操作，Bottleneck代表了一个残差模块。In Table 1, Cov2d, 7×7 is a convolution operation with a kernel size of 7, Max pooling, 3×3 is a max pooling operation with a kernel size of 3, and Ave pooling, 3×3 is a kernel size A mean pooling operation of size 3, Bottleneck represents a residual module.

参见表1，本示例中ResNet18算法的输入层包括4个输入通道，即将原始的ResNet18算法的输入维度224×224×3修改为224×224×4，以对应于目标待分类图像的RGB通道和深度通道。Referring to Table 1, the input layer of the ResNet18 algorithm in this example includes 4 input channels, that is, the input dimension of the original ResNet18 algorithm is modified from 224×224×3 to 224×224×4 to correspond to the RGB channels and RGB channels of the target image to be classified. depth channel.

表1 ResNet18算法调整前后对比Table 1 Comparison of ResNet18 algorithm before and after adjustment

继续参见表1，本示例中ResNet18算法的输出层包括1个输出通道，即将原始的ResNet18算法的全连接层的维度1000改为1。同时加入Sigmoid函数：Continuing to refer to Table 1, the output layer of the ResNet18 algorithm in this example includes 1 output channel, that is, the dimension 1000 of the fully connected layer of the original ResNet18 algorithm is changed to 1. Also add the Sigmoid function:

式(2)中，Sigmoid函数的输出范围为(0,1)。In formula (2), the output range of the sigmoid function is (0,1).

需要说明的是，ResNet18算法输出的预测分类是连续值，另一优点在于，在训练过程中，方便计算每次训练过程的损失值。It should be noted that the predicted classification output by the ResNet18 algorithm is a continuous value. Another advantage is that during the training process, it is convenient to calculate the loss value of each training process.

本示例中，电子设备还可以对上述ResNet18算法进行训练，训练时损失函数采用交叉熵损失函数实现，公式如下：In this example, the electronic device can also train the above-mentioned ResNet18 algorithm, and the loss function during training is realized by the cross-entropy loss function. The formula is as follows:

Loss＝-y log Y-(1-y)log(1-Y)； (3)Loss=-y log Y-(1-y)log(1-Y); (3)

式(3)中，y表示训练样本的标签，取值为0时表示训练样本不是大光圈类型，取值为1时表示训练样本是大光圈类型；Y表示所述分类模块的预测分类。In formula (3), y represents the label of the training sample. When the value is 0, it indicates that the training sample is not of the large aperture type, and when the value is 1, it indicates that the training sample is of the large aperture type; Y represents the predicted classification of the classification module.

本示例中，当初始待分类图像为待分类视频的视频帧序列中的一帧时，电子设备还可以基于视频帧序列中各初始待分类图像的预测分类获取表示待分类视频是否为大光圈类型的预测分类。其中，待分类视频的预测分类可以通过以下公式获取：In this example, when the initial to-be-classified image is a frame in the video frame sequence of the to-be-classified video, the electronic device may also obtain an indication of whether the to-be-classified video is a large aperture type based on the predicted classification of each initial to-be-classified image in the video frame sequence predicted classification. Among them, the predicted classification of the video to be classified can be obtained by the following formula:

式(4)中，R表示待分类视频是否是大光圈的预测评分，其输出的范围为(0,1)，越接近1表示是大光圈类型的概率越大，N表示从待分类视频中采集的视频帧的数量，M(f_i)表示第i帧视频帧的预测分类。In formula (4), R represents the prediction score of whether the video to be classified is a large aperture, and its output range is (0, 1), and the closer to 1, the greater the probability of being a large aperture type, and N represents the video to be classified. The number of captured video frames, M(f _i ) represents the predicted classification of the ith video frame.

至此，本实施例中可以获取初始待分类图像；然后获取所述初始待分类图像的深度热力图；所述深度热力图是指通过不同色域来表示初始待分类图像中对象的景深深度信息的图像；之后，组合所述初始待分类图像的RGB通道图像与所述深度热力图，得到4通道的目标待分类图像；最后，根据所述目标待分类图像，对所述初始待分类图像进行分类，以得到所述待分类图像是否为大光圈类型的图像。这样，本实施例中通过增加包含景深深度信息的深度热力图作为待分类图像的一个维度特征，可以提升图像分类的准确度。So far, in this embodiment, the initial image to be classified can be obtained; then the depth heat map of the initial image to be classified can be obtained; the depth heat map refers to the depth of field depth information of the object in the initial image to be classified through different color gamuts. image; then, combine the RGB channel image of the initial to-be-classified image and the depth heat map to obtain a 4-channel target to-be-classified image; finally, classify the initial to-be-classified image according to the target to-be-classified image , to obtain whether the image to be classified is a large aperture type image. In this way, in this embodiment, by adding a depth heat map including depth of field depth information as a dimensional feature of the image to be classified, the accuracy of image classification can be improved.

结合上述内容，本公开实施例提供了一种图像分类模型的网络架构，对待分类视频进行分类，参见图5，工作过程包括：In combination with the above content, an embodiment of the present disclosure provides a network architecture of an image classification model to classify videos to be classified. Referring to FIG. 5 , the working process includes:

首先，对待分类视频作采样帧处理，每间隔10帧采样1帧，采样N帧得到帧序列S＝{f₁,f₂,…，f_N}，这样可以降低计算量。对于帧序列中每一视频帧，切割成大小为224*224*3，其中224*224表示长*宽，3表示RGB三个颜色通道，从而满足后续的ResNet18的需求。First, perform sampling frame processing on the video to be classified, sample 1 frame every 10 frames, and sample N frames to obtain a frame sequence S={f ₁ , f ₂ , . . . , f _N }, which can reduce the amount of calculation. For each video frame in the frame sequence, it is cut into a size of 224*224*3, where 224*224 represents the length*width, and 3 represents the three color channels of RGB, so as to meet the requirements of the subsequent ResNet18.

然后，将每帧视频帧f_i依次输入到DeepLens算法，得到各视频帧的深度热力图d_i，其维度为224*224*1，根据公式(1)对其归一化：Then, input each frame of video frame f _i into the DeepLens algorithm in turn to obtain the depth heat map d _i of each video frame, whose dimension is 224*224*1, which is normalized according to formula (1):

式(1)中，(x,y)代表深度热力图中各像素的坐标，max()函数用于计算原始的深度热力图d_i中的最大值；通过式(1)可以逐个像素计算后得到归一化后的深度热力图d_i’。In formula (1), (x, y) represents the coordinates of each pixel in the depth heatmap, and the max() function is used to calculate the maximum value in the original depth heatmap d _i ; The normalized depth heatmap d _i ' is obtained.

同样地，采用深度热力图归一化的方式对各视频帧进行归一化。Similarly, each video frame is normalized by means of depth heatmap normalization.

之后，将归一化的视频帧f_i和深度热力图d_i'按照通道进行组合，得到目标待分类视频帧，其维度224*224*4，通道数4表示视频帧的RGB三通道和热力图一通道之和。After that, the normalized video frame f _i and the depth heat map d _i ' are combined according to the channel to obtain the target video frame to be classified, the dimension of which is 224*224*4, and the number of channels 4 represents the RGB three channels and thermal power of the video frame. Figure 1. Sum of channels.

之后，将目标待分类视频帧输入到如表1所示的ResNet18算法中。ResNet18算法一共具有18层卷积层，其中包含了跳跃连接和残差结构，其具有优秀的特征提取性能。这样，ResNet18算法输出一个预测分类M(f_i)，该分类是一个连续值。After that, the target video frame to be classified is input into the ResNet18 algorithm shown in Table 1. The ResNet18 algorithm has a total of 18 convolutional layers, including skip connections and residual structures, which have excellent feature extraction performance. Thus, the ResNet18 algorithm outputs a predicted classification M(f _i ), which is a continuous value.

最后，根据式(4)计算视频帧序列的预测分类的取值的平均值，得到待分类视频的预测分类R，R取值范围为(0,1)，越接近1代表约有可能是大光圈类型的视频。Finally, calculate the average value of the predicted classification values of the video frame sequence according to formula (4) to obtain the predicted classification R of the video to be classified. The value range of R is (0, 1), and the closer to 1, the more likely it is Aperture type video.

可理解的是，实际应用中，图5所示的图像分类模型利用预先设置的训练集训练后，在损失函数输出的损失值小于或等于预设损失阈值、梯度值保持不变或者训练预设数量次后停止训练，此时可以将完成训练的图像分类模型应用于视频平台的线上分类。It is understandable that, in practical applications, after the image classification model shown in FIG. 5 is trained with a preset training set, the loss value output by the loss function is less than or equal to the preset loss threshold, the gradient value remains unchanged, or the training preset Stop training after the number of times, and at this time, you can apply the trained image classification model to the online classification of the video platform.

考虑到训练集中训练样本有一定的类型分布，例如风景类型偏多，使得训练后的图像分类模型针对风景类型的大光圈图像或者视频的分类会比较准确，而针对人像类型的大光圈图像或者视频的分类相对差一些，即图像分类模型在迁移数据集时会导致其性能下降。为此，本示例中还利用上述图像分类模型来获取训练集，该训练集实质上是大光圈数据集，参见图6，包括：Considering that the training samples in the training set have a certain type distribution, for example, there are too many types of scenery, so that the image classification model after training will be more accurate for the classification of large-aperture images or videos of landscape type, while the large-aperture images or videos of portrait type will be more accurate. The classification is relatively poor, that is, the image classification model will cause its performance to degrade when migrating the dataset. To this end, in this example, the above image classification model is also used to obtain a training set, which is essentially a large aperture data set, see Figure 6, including:

(1)获取其他类型的大光圈视频构成的训练集，该训练集包括多个训练样本，各训练样本可以是其他领域(例如风景类型)的大光圈图像或者视频。(1) Obtain a training set composed of other types of large-aperture videos, the training set includes multiple training samples, and each training sample may be a large-aperture image or video in other fields (eg, landscape type).

(2)利用上述训练集来训练图像分类模型，损失函数收敛时停止训练。在训练完成后得到模型M1。(2) Use the above training set to train the image classification model, and stop the training when the loss function converges. The model M1 is obtained after the training is completed.

(3)在视频平台上随机筛选若干视频构成测试集，该测试集中人像类型的视频偏多。利用模型M₁对测试集中各视频进行分类，得到至少2个类型的视频集，即大光圈类型的视频集和非大光圈类型的视频集。由于大光圈类型的视频集的预测分类的准确率可能较低，因此可以对大光圈类型的视频集进行筛选，筛选出满足预设条件的大光圈图像。其中预设条件可以包括：预测分类的取值超过预测分类阈值(如0.8)、人工筛选。(3) A number of videos are randomly screened on the video platform to form a test set, and the test set contains more portrait-type videos. Use the model M1 to classify each video in the test set, and obtain at least two types of video sets, namely, _a large-aperture type video set and a non-large-aperture type video set. Since the accuracy of the prediction and classification of the video set of the large aperture type may be low, the video set of the large aperture type can be screened to screen out the large aperture images that meet the preset conditions. The preset conditions may include: the value of the predicted classification exceeds the predicted classification threshold (eg, 0.8), and manual screening.

(4)将筛选出的大光圈图像加入到训练集中(4) Add the selected large aperture images to the training set

(5)迭代步骤(2)～(4)，直至图像分类模型的性能稳定为止，得到的包含大光圈类型的训练样本的训练集，即得到大光圈视频数据集。其中性能稳定是指，训练集中训练样本的数量超过预先设置的样本数量阈值，或者筛选出的大光圈图像占测试集中比例超过预设比例，或者从风景类型偏多的测试集中筛选出的大光圈图像的正确率与从人像类型偏多的测试集中筛选出的大光圈图像的正确率相等。(5) Steps (2) to (4) are iterated until the performance of the image classification model is stable, and a training set containing training samples of a large aperture type is obtained, that is, a large aperture video data set is obtained. The stable performance means that the number of training samples in the training set exceeds the preset number of samples threshold, or the proportion of the selected large-aperture images in the test set exceeds the preset proportion, or the large-aperture screen is selected from the test set with many types of scenery. The correct rate of the image is equal to the correct rate of the large-aperture image selected from the test set with more portrait types.

本示例中，相较于人工筛选大光圈视频，可以大幅度降低了劳动成本；并且，还可以低成本地完成两个任务：提升图像分类模型的分类性能；获得大量大光圈数据集。In this example, compared to manually screening large-aperture videos, labor costs can be greatly reduced; moreover, two tasks can be accomplished at low cost: improving the classification performance of the image classification model; and obtaining a large number of large-aperture datasets.

图7是根据一示例性实施例示出的一种图像分类装置的流程图，可以适应于智能手机、服务器、个人电脑等电子设备。参见图7，一种图像分类装置，包括：输入模块71、获取模块72和分类模块73。其中，Fig. 7 is a flowchart of an image classification apparatus according to an exemplary embodiment, which can be adapted to electronic devices such as smart phones, servers, and personal computers. Referring to FIG. 7 , an image classification apparatus includes: an input module 71 , an acquisition module 72 and a classification module 73 . in,

输入模块71，被配置为执行获取初始待分类图像，并将所述初始待分类图像分别发送给获取模块72和分类模块73。The input module 71 is configured to execute the acquisition of the initial images to be classified, and send the initial images to be classified to the acquisition module 72 and the classification module 73 respectively.

获取模块72，被配置为执行获取所述初始待分类图像的深度热力图，并将深度热力图发送给分类模块73；所述深度热力图是指通过不同色域来表示初始待分类图像中对象的景深深度信息的图像；The acquisition module 72 is configured to execute the acquisition of the depth heat map of the initial image to be classified, and send the depth heat map to the classification module 73; the depth heat map refers to representing objects in the initial image to be classified through different color gamuts the depth of field depth information of the image;

分类模块73，被配置为执行组合所述初始待分类图像的RGB通道图像与所述深度热力图，得到4通道的目标待分类图像；以及根据所述目标待分类图像，对所述初始待分类图像进行分类，以得到所述待分类图像是否为大光圈类型的图像。The classification module 73 is configured to perform combining the RGB channel image of the initial image to be classified and the depth heat map to obtain a 4-channel target image to be classified; and, according to the target image to be classified, classify the initial image to be classified The images are classified to obtain whether the image to be classified is a large aperture type image.

当待分类图像为待分类视频的视频帧序列中的一帧时，所述装置还包括输出模块；When the image to be classified is a frame in the video frame sequence of the video to be classified, the device further includes an output module;

在一实施例中，所述待分类视频的预测分类的取值为所述视频帧序列中各初始待分类图像的预测分类取值的平均值。In one embodiment, the value of the predicted classification of the video to be classified is an average value of the predicted classification values of the initial images to be classified in the video frame sequence.

在一实施例中，所述分类模块包括：In one embodiment, the classification module includes:

在一实施例中，所述图像分类预测模型的输出层包括Sigmoid函数；所述Sigmoid函数用于输出取值连续的表示预测分类的数值，所述数值越接近于1表示初始待分类图像是大光圈类型的概率越大。In one embodiment, the output layer of the image classification prediction model includes a Sigmoid function; the Sigmoid function is used to output a continuous value representing the predicted classification, and the closer the value is to 1, the larger the initial image to be classified is. The greater the probability of the aperture type.

关于上述装置实施例，在描述图像分类方法的工作原理时已经进行了详细的描述，此处将不做详细阐述说明。Regarding the above apparatus embodiments, the working principle of the image classification method has been described in detail, and will not be described in detail here.

图8是根据一示例性实施例示出的一种电子设备的框图。例如，电子设备800可以是服务器，移动电话，计算机，数字广播终端，消息收发设备，游戏控制台，平板设备，医疗设备，健身设备，个人数字助理等。Fig. 8 is a block diagram of an electronic device according to an exemplary embodiment. For example, electronic device 800 may be a server, mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.

参照图8，电子设备800可以包括以下一个或多个组件：处理组件802，存储器804，电源组件806，多媒体组件808，音频组件810，输入/输出(I/O)的接口812，传感器组件814，以及通信组件816。8, an electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814 , and the communication component 816 .

处理组件802通常控制电子设备800的整体操作，诸如与显示，电话呼叫，数据通信，相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令，以完成上述的方法的全部或部分步骤。此外，处理组件802可以包括一个或多个模块，便于处理组件802和其他组件之间的交互。例如，处理组件802可以包括多媒体模块，以方便多媒体组件808和处理组件802之间的交互。The processing component 802 generally controls the overall operation of the electronic device 800, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 802 can include one or more processors 820 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.

存储器804被配置为存储各种类型的数据以支持在电子设备800的操作。这些数据的示例包括用于在电子设备800上操作的任何应用程序或方法的指令，联系人数据，电话簿数据，消息，图片，视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器(SRAM)，电可擦除可编程只读存储器(EEPROM)，可擦除可编程只读存储器(EPROM)，可编程只读存储器(PROM)，只读存储器(ROM)，磁存储器，快闪存储器，磁盘或光盘。Memory 804 is configured to store various types of data to support operation at electronic device 800 . Examples of such data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. Memory 804 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.

电源组件806为电子设备800的各种组件提供电力。电源组件806可以包括电源管理系统，一个或多个电源，及其他与为电子设备800生成、管理和分配电力相关联的组件。Power supply assembly 806 provides power to various components of electronic device 800 . Power supply components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800 .

多媒体组件808包括在电子设备800和用户之间的提供一个输出接口的屏幕。在一些实施例中，屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板，屏幕可以被实现为触摸屏，以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界，而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中，多媒体组件808包括一个前置摄像头和/或后置摄像头。当电子设备1000处于操作模式，如拍摄模式或视频模式时，前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。Multimedia component 808 includes a screen that provides an output interface between electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 808 includes a front-facing camera and/or a rear-facing camera. When the electronic device 1000 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

音频组件810被配置为输出和/或输入音频信号。例如，音频组件810包括一个麦克风(MIC)，当电子设备800处于操作模式，如呼叫模式、记录模式和语音识别模式时，麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中，音频组件810还包括一个扬声器，用于输出音频信号。Audio component 810 is configured to output and/or input audio signals. For example, audio component 810 includes a microphone (MIC) that is configured to receive external audio signals when electronic device 800 is in operating modes, such as calling mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 804 or transmitted via communication component 816 . In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

I/O接口812为处理组件802和外围接口模块之间提供接口，上述外围接口模块可以是键盘，点击轮，按钮等。这些按钮可包括但不限于：主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

传感器组件814包括一个或多个传感器，用于为电子设备800提供各个方面的状态评估。例如，传感器组件814可以检测到电子设备800的打开/关闭状态，组件的相对定位，例如所述组件为电子设备800的显示器和小键盘，传感器组件814还可以检测电子设备800或电子设备800一个组件的位置改变，用户与电子设备800接触的存在或不存在，电子设备800方位或加速/减速和电子设备800的温度变化。传感器组件814可以包括接近传感器，被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器，如CMOS或CCD图像传感器，用于在成像应用中使用。在一些实施例中，该传感器组件814还可以包括加速度传感器，陀螺仪传感器，磁传感器，压力传感器或温度传感器。Sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of electronic device 800 . For example, the sensor assembly 814 can detect the on/off state of the electronic device 800, the relative positioning of the components, such as the display and the keypad of the electronic device 800, the sensor assembly 814 can also detect the electronic device 800 or one of the electronic device 800 Changes in the position of components, presence or absence of user contact with the electronic device 800 , orientation or acceleration/deceleration of the electronic device 800 and changes in the temperature of the electronic device 800 . Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

通信组件816被配置为便于电子设备800和其他设备之间有线或无线方式的通信。电子设备800可以接入基于通信标准的无线网络，如WiFi，运营商网络(如2G、3G、4G或5G)，或它们的组合。在一个示例性实施例中，通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中，所述通信组件816还包括近场通信(NFC)模块，以促进短程通信。例如，在NFC模块可基于射频识别(RFID)技术，红外数据协会(IrDA)技术，超宽带(UWB)技术，蓝牙(BT)技术和其他技术来实现。Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices. Electronic device 800 may access wireless networks based on communication standards, such as WiFi, carrier networks (eg, 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

在本公开一实施例中，电子设备800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述方法。In one embodiment of the present disclosure, the electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field Programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for carrying out the above method.

在本公开一实施例中，还提供了一种包括指令的非临时性计算机可读存储介质，例如包括指令的存储器804，上述指令可由电子设备800的处理器820执行以完成上述方法的步骤。例如，所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an embodiment of the present disclosure, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 804 including instructions, and the instructions can be executed by the processor 820 of the electronic device 800 to complete the steps of the above method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

在本公开一实施例中，还提供了一种计算机应用程序，当该计算机应用程序由处理器执行时能够执行上述方法的步骤，以获取相同的技术效果。In an embodiment of the present disclosure, a computer application program is also provided, and when the computer application program is executed by a processor, the steps of the above method can be performed to obtain the same technical effect.

在本公开一实施例中，还提供了一种计算机程序产品，当该计算机程序产品由电子设备的处理器执行时，使得所述电子设备能够执行上述方法的步骤，以获取相同的技术效果。In an embodiment of the present disclosure, a computer program product is also provided, when the computer program product is executed by a processor of an electronic device, the electronic device can perform the steps of the above method to obtain the same technical effect.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

本说明书中的各个实施例均采用相关的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于装置/电子设备/存储介质实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a related manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. Especially, for the apparatus/electronic device/storage medium embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the partial description of the method embodiment.

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本公开的其它实施方案。本公开旨在涵盖上述各示例的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the above-described examples that follow the general principles of this disclosure and include common knowledge or practice in the art not disclosed by this disclosure means. The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. an image classification method, is characterized in that, comprises:

Get the initial image to be classified;

obtaining a depth heat map of the initial image to be classified; the depth heat map refers to an image representing depth of field depth information of an object in the initial image to be classified through different color gamuts;

Combining the RGB channel image of the initial to-be-classified image and the depth heatmap to obtain a 4-channel target to-be-classified image;

According to the target to-be-classified image, the initial to-be-classified image is classified to obtain whether the to-be-classified image is a large aperture type image.

2. The image classification method according to claim 1, wherein when the image to be classified is a frame in the video frame sequence of the video to be classified, the method further comprises:

Based on the predicted classification of each initial to-be-classified image in the video frame sequence, a predicted classification indicating whether the to-be-classified video is of a large aperture type is obtained.

3 . The image classification method according to claim 2 , wherein the value of the predicted classification of the video to be classified is an average value of the predicted classification values of each initial image to be classified in the video frame sequence. 4 .

4 . The image classification method according to claim 1 , wherein, according to the target image to be classified, the initial to-be-classified image is classified to obtain whether the to-be-classified image is of a large aperture type. 5 . images, including:

Input the target classification image into the image classification prediction model, and determine whether the to-be-classified image is a large aperture type image according to the prediction classification information output by the image classification prediction model, wherein the image classification prediction model is based on the sample The depth heatmap of the image and the RGB channel image of the sample image are obtained by training.

5. The image classification method according to claim 4, wherein the output layer of the image classification prediction model comprises a Sigmoid function; The closer it is to 1, the greater the probability that the initial image to be classified is of the large aperture type.

6. An image classification device, characterized in that the device comprises: an input module, an acquisition module and a classification module;

The input module is configured to perform acquisition of an initial image to be classified, and send the initial image to be classified to the acquisition module and the classification module respectively;

The acquisition module is configured to perform acquisition of a depth heat map of the initial image to be classified; the depth heat map refers to an image representing depth of field depth information of an object in the initial image to be classified through different color gamuts;

The classification module is configured to perform combining the RGB channel image of the initial to-be-classified image and the depth heat map to obtain a 4-channel target to-be-classified image; and, according to the target to-be-classified image, perform a The classified images are classified to obtain whether the to-be-classified image is a large aperture type image.

7. The image classification device according to claim 6, wherein when the image to be classified is a frame in the video frame sequence of the video to be classified, the device further comprises an output module;

The output module is configured to perform a predictive classification based on each initial to-be-classified image in the video frame sequence to obtain a predictive classification indicating whether the to-be-classified video is a large aperture type.

8. An electronic device, characterized in that, comprising:

processor;

a memory for storing an executable program of the processor; wherein the processor is configured to execute the executable program in the memory to implement the steps of the method according to any one of claims 1-5.

9 . A non-transitory computer-readable storage medium, wherein when an executable program in the storage medium is executed, the steps of the method according to any one of claims 1 to 5 can be performed. 10 .

10 . A computer application program, characterized in that, when the computer application program is executed by a processor of a server, the server can implement the steps of the method according to any one of claims 1 to 5 .