CN110728255A

CN110728255A - Image processing method, device, electronic device and storage medium

Info

Publication number: CN110728255A
Application number: CN201911007790.5A
Authority: CN
Inventors: 孙莹莹
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-10-22
Filing date: 2019-10-22
Publication date: 2020-01-24
Anticipated expiration: 2039-10-22
Also published as: WO2021078157A1; CN110728255B

Abstract

The application discloses an image processing method, an image processing device, electronic equipment and a storage medium, and relates to the technical field of image processing. The method comprises the following steps: acquiring image data to be processed; inputting image data to be processed into a plurality of pre-trained specific networks to obtain attribute labels corresponding to the image data, wherein each specific network is used for determining the attribute labels corresponding to the image data, and the attribute labels determined by each specific network are different from each other; inputting the attribute labels determined by each specific network into a pre-trained shared network to obtain an image recognition result; and outputting an image recognition result. Therefore, the plurality of specific networks jointly analyze the image data and obtain the plurality of attribute tags, the obtaining speed of the attribute tags can be improved, the shared network can obtain the image recognition result by combining the correlation of the attribute tags, and the accuracy and the overall performance of the recognition result are improved.

Description

Image processing method, device, electronic device and storage medium

技术领域technical field

本申请涉及图像处理技术领域，更具体地，涉及一种图像处理方法、装置、电子设备及存储介质。The present application relates to the technical field of image processing, and more particularly, to an image processing method, apparatus, electronic device and storage medium.

背景技术Background technique

现有的图像的属性识别技术方案主要是基于传统的机器学习的属性识别方案和基于卷积神经网络模型的属性识别方案等。但是，现有的图像的属性识别技术，最常用的是基于单一的模型并实现单一的属性判断，而在多属性识别的时候，效率并不高。The existing image attribute recognition technical solutions are mainly based on traditional machine learning attribute recognition solutions and attribute recognition solutions based on convolutional neural network models. However, the most commonly used image attribute recognition technology is based on a single model and realizes a single attribute judgment, and when multi-attribute recognition is used, the efficiency is not high.

发明内容SUMMARY OF THE INVENTION

本申请提出了一种图像处理方法、装置、电子设备及存储介质，以改善上述缺陷。The present application proposes an image processing method, apparatus, electronic device and storage medium to improve the above-mentioned defects.

第一方面，本申请实施例提供了一种图像处理方法，包括：获取待处理的图像数据；将所述待处理的图像数据输入预先训练的多个特异性网络，以获取所述图像数据对应的属性标签，其中，每个所述特异性网络用于确定所述图像数据对应的属性标签，且各个所述特异性网络确定的属性标签互不相同；将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络，以获取图像识别结果，其中，所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果；输出所述图像识别结果。In a first aspect, an embodiment of the present application provides an image processing method, including: acquiring image data to be processed; inputting the image data to be processed into a plurality of pre-trained specific networks to obtain corresponding image data corresponding to the image data. attribute labels, wherein each of the specificity networks is used to determine the attribute labels corresponding to the image data, and the attribute labels determined by each of the specificity networks are different from each other; The attribute labels are input into a pre-trained shared network to obtain the image recognition result, wherein the shared network is used to determine the image recognition result according to each attribute label and the correlation of each attribute label; and output the image recognition result.

第二方面，本申请实施例还提供了一种图像处理方法，包括：获取多个样本图像数据，每个所述样本图像数据对应多个属性标签；设置共享网络和多个特异性网络，每个所述特异性网络能够识别至少一个属性标签，且每个所述特异性网络能够识别的属性标签互不相同；将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练，以得到训练后的共享网络和多个特异性网络；获取待处理的图像数据，根据所述训练后的共享网络和多个特异性网络对所述待处理的图像数据处理，得到图像识别结果。In a second aspect, the embodiments of the present application further provide an image processing method, including: acquiring multiple sample image data, each of which corresponds to multiple attribute labels; setting a shared network and multiple specific networks, each Each of the specific networks can identify at least one attribute label, and the attribute labels that can be identified by each of the specific networks are different from each other; input the plurality of sample image data into the shared network and the plurality of specific networks for training to obtain a shared network and multiple specific networks after training; obtain image data to be processed, and process the image data to be processed according to the shared network after training and multiple specific networks to obtain image recognition result.

第三方面，本申请实施例还提供了图像处理装置，包括：数据获取单元、属性确定单元、结果获取单元和输出单元。数据获取单元，用于获取待处理的图像数据。属性确定单元，用于将所述待处理的图像数据输入预先训练的多个特异性网络，以获取所述图像数据对应的属性标签，其中，每个所述特异性网络用于确定所述图像数据对应的属性标签，且各个所述特异性网络确定的属性标签互不相同。结果获取单元，用于将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络，以获取图像识别结果，其中，所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果。输出单元，用于输出所述图像识别结果。In a third aspect, the embodiments of the present application further provide an image processing apparatus, including: a data acquisition unit, an attribute determination unit, a result acquisition unit, and an output unit. The data acquisition unit is used to acquire the image data to be processed. an attribute determination unit, configured to input the image data to be processed into a plurality of pre-trained specific networks to obtain attribute labels corresponding to the image data, wherein each of the specific networks is used to determine the image attribute labels corresponding to the data, and the attribute labels determined by each of the specific networks are different from each other. The result acquisition unit is used to input the attribute labels determined by each specific network into a pre-trained shared network to obtain image recognition results, wherein the shared network is used to Correlation determines image recognition results. An output unit, configured to output the image recognition result.

第四方面，本申请实施例还提供了图像处理装置，包括：样本获取单元、设置单元、网络训练单元和识别单元。样本获取单元，用于获取多个样本图像数据，每个所述样本图像数据对应多个属性标签。设置单元，用于设置共享网络和多个特异性网络，每个所述特异性网络能够识别至少一个属性标签，且每个所述特异性网络能够识别的属性标签互不相同。训练单元，用于将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练，以得到训练后的共享网络和多个特异性网络。识别单元，用于获取待处理的图像数据，根据所述训练后的共享网络和多个特异性网络对所述待处理的图像数据处理，得到图像识别结果。In a fourth aspect, the embodiments of the present application further provide an image processing apparatus, including: a sample acquisition unit, a setting unit, a network training unit, and an identification unit. A sample acquisition unit, configured to acquire a plurality of sample image data, each of the sample image data corresponds to a plurality of attribute tags. The setting unit is used for setting a shared network and a plurality of specific networks, each of the specific networks can identify at least one attribute label, and the attribute labels that can be identified by each of the specific networks are different from each other. A training unit, configured to input the plurality of sample image data into the shared network and the plurality of specific networks for training, so as to obtain the trained shared network and the plurality of specific networks. The recognition unit is used for acquiring the image data to be processed, and processing the image data to be processed according to the shared network after training and a plurality of specific networks to obtain an image recognition result.

第五方面，本申请实施例还提供了一种电子设备，包括：一个或多个处理器；存储器；一个或多个应用程序，其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行，所述一个或多个程序配置用于执行上述方法。In a fifth aspect, embodiments of the present application further provide an electronic device, comprising: one or more processors; a memory; and one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the above method.

第六方面，本申请实施例还提供了一种计算机可读介质，所述可读存储介质存储有处理器可执行的程序代码，所述程序代码中的多条指令被所述处理器执行时使所述处理器执行上述方法。In a sixth aspect, an embodiment of the present application further provides a computer-readable medium, where the readable storage medium stores program code executable by a processor, and when multiple instructions in the program code are executed by the processor The processor is caused to perform the above method.

本申请提供的图像处理方法、装置、电子设备及存储介质，预先训练好共享网络和多个特异性网络，每个所述特异性网络用于确定所述图像数据对应的属性标签，且各个所述特异性网络确定的属性标签互不相同，则当获取到待处理的图像数据的时候，将待处理的图像数据输入多个特异性网络，每个特异性网络能够识别到该特异性网络所能够识别到的属性，从而待处理图像数据所对应的多个属性标签能够被多个特异性网络分别识别到，提高了整体的图像数据的多个属性标签的识别，从而能够得到图像数据对应的属性标签，然后，再将该图像数据对应的属性标签输入至共享网络，共享网络根据各属性标签和各属性标签的相关性确定图像识别结果，并将图像识别结果输出。因此，多个特异性网络共同分析图像数据并得到多个属性标签，能够提高属性标签的获得速度，而共享网络能够结合各属性标签的相关性得到图像识别结果，提高了识别结果的准确度和整体性能。The image processing method, device, electronic device and storage medium provided by this application are pre-trained with a shared network and a plurality of specific networks, each specific network is used to determine the attribute label corresponding to the image data, and each specific network is used to determine the attribute label corresponding to the image data. The attribute labels determined by the specific network are different from each other, then when the image data to be processed is obtained, the image data to be processed is input into multiple specific networks, and each specific network can identify the specific network. Attributes that can be identified, so that multiple attribute tags corresponding to the image data to be processed can be identified by multiple specific networks respectively, which improves the identification of multiple attribute tags in the overall image data, so that the corresponding attribute tags of the image data can be obtained. attribute label, and then input the attribute label corresponding to the image data to the sharing network, the sharing network determines the image recognition result according to each attribute label and the correlation of each attribute label, and outputs the image recognition result. Therefore, multiple specific networks jointly analyze the image data and obtain multiple attribute labels, which can improve the speed of obtaining attribute labels, while the shared network can combine the correlation of each attribute label to obtain image recognition results, which improves the accuracy and accuracy of the recognition results. overall performance.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained from these drawings without creative effort.

图1示出了本申请一实施例提供的图像处理方法的方法流程图；FIG. 1 shows a method flowchart of an image processing method provided by an embodiment of the present application;

图2示出了本申请另一实施例提供的图像处理方法的方法流程图；FIG. 2 shows a method flowchart of an image processing method provided by another embodiment of the present application;

图3示出了本申请又一实施例提供的图像处理方法的方法流程图；FIG. 3 shows a method flowchart of an image processing method provided by another embodiment of the present application;

图4示出了本申请一实施例提供的图3所示的图像处理方法的中S310的方法流程图；FIG. 4 shows a method flowchart of S310 in the image processing method shown in FIG. 3 provided by an embodiment of the present application;

图5示出了本申请另一实施例提供的图3所示的图像处理方法的中S310的方法流程图；FIG. 5 shows a method flowchart of S310 in the image processing method shown in FIG. 3 provided by another embodiment of the present application;

图6示出了本申请实施例提供的测量区域的示意图；FIG. 6 shows a schematic diagram of a measurement area provided by an embodiment of the present application;

图7示出了本申请实施例提供的特异性网络和共享网络的连接示意图；FIG. 7 shows a schematic diagram of the connection of a specific network and a shared network provided by an embodiment of the present application;

图8示出了本申请实施例提供的子图像数据的示意图；FIG. 8 shows a schematic diagram of sub-image data provided by an embodiment of the present application;

图9示出了本申请实施例提供的人脸朝向的示意图；FIG. 9 shows a schematic diagram of a face orientation provided by an embodiment of the present application;

图10示出了本申请再一实施例提供的图像处理方法的方法流程图；FIG. 10 shows a method flowchart of an image processing method provided by still another embodiment of the present application;

图11示出了本申请一实施例提供的图像处理装置的模块框图；FIG. 11 shows a module block diagram of an image processing apparatus provided by an embodiment of the present application;

图12示出了本申请另一实施例提供的图像处理装置的模块框图；FIG. 12 shows a module block diagram of an image processing apparatus provided by another embodiment of the present application;

图13示出了本申请又一实施例提供的图像处理装置的模块框图；FIG. 13 shows a module block diagram of an image processing apparatus provided by another embodiment of the present application;

图14示出了本申请实施例提供的电子设备的模块框图；FIG. 14 shows a module block diagram of an electronic device provided by an embodiment of the present application;

图15出了本申请实施例提供的用于保存或者携带实现根据本申请实施例的图形处理方法的程序代码的存储单元。FIG. 15 shows a storage unit provided by an embodiment of the present application for storing or carrying a program code for implementing the graphics processing method according to the embodiment of the present application.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the solutions of the present application, the following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.

人脸识别是一种基于人类脸部表象特征来鉴别不同人身份的技术，应用场景广泛，相关研究和应用已有数十年之久。随着近年来大数据和深度学习等相关技术的发展，人脸识别效果有了突飞猛进的提高，在身份认证、视频监控、美颜娱乐等场景应用愈加广泛。其中，人证比对问题，即在标准证件照与生活照之间的人脸识别问题，由于识别目标人仅需要在数据库中部署其证件照，免去了目标人在系统中采集生活照进行注册的麻烦，正得到越来越多的关注。Face recognition is a technology that identifies different people based on human facial features. It has a wide range of applications and has been researched and applied for decades. With the development of related technologies such as big data and deep learning in recent years, the effect of face recognition has been improved by leaps and bounds, and it has been widely used in scenarios such as identity authentication, video surveillance, and beauty entertainment. Among them, the problem of witness comparison, that is, the problem of face recognition between standard ID photos and life photos, because the identification of the target person only needs to deploy their ID photos in the database, the target person is not required to collect life photos in the system for registration. The troubles are getting more and more attention.

现有的人脸属性识别技术方案主要是基于传统的机器学习的属性识别方案和基于卷积神经网络CNN模型的属性识别方案等等。Existing face attribute recognition technology solutions are mainly based on traditional machine learning attribute recognition solutions and attribute recognition solutions based on convolutional neural network CNN model and so on.

一些人脸识别技术中，借鉴了多任务学习的概念，在应用人脸检测算法从图像或影像中提取人脸区域后，再利用卷积神经网络学习人脸库中预设的分析任务的卷积层，得到人脸分析模型，并完成对人脸情绪的预测。In some face recognition technologies, the concept of multi-task learning is used for reference. After the face detection algorithm is applied to extract the face region from the image or image, the convolutional neural network is used to learn the volume of the preset analysis task in the face database. Layer up, get the face analysis model, and complete the prediction of face emotion.

另一些人脸识别技术中，基于多任务学习的思想，通过在训练过程中加入性别、是否微笑、是否戴眼镜、姿态等辅助信息实现了多任务级联学习，但是该技术将人脸的多种属性作为标签，通过级联实现人脸对齐。In other face recognition technologies, based on the idea of multi-task learning, multi-task cascade learning is realized by adding auxiliary information such as gender, whether to smile, whether to wear glasses, and posture in the training process. These attributes are used as labels, and face alignment is achieved by cascading.

多任务学习方法不仅仅进行单一的人脸属性识别，同时也可以实现多属性预测。例如，可以通过将多任务学习方法引入人脸图像种族和性别识别中，以不同语义作为不同任务，提出基于语义的多任务特征选择，应用于种族和性别识别中，但是在构建网络结构时，依旧将种族和性别作为两个任务单独求解，模型存在大量的冗余，且不能实时预测。The multi-task learning method not only performs single face attribute recognition, but also can achieve multi-attribute prediction. For example, multi-task learning methods can be introduced into face image race and gender recognition, with different semantics as different tasks, and semantic-based multi-task feature selection can be proposed, which can be applied to race and gender recognition, but when constructing the network structure, Race and gender are still solved separately as two tasks, the model has a lot of redundancy and cannot be predicted in real time.

因此，现有的人脸属性识别技术，最常用的是基于单一的模型并实现单一的属性判断，即在统一的模型下一次只学习一个任务，将复杂问题先分解成理论上独立的子问题，在每个子问题中，训练集中的样本只反映单个任务的信息。但是人脸图像蕴含着种族、性别、年龄等各种各样的属性信息，对应不同信息的识别任务间存在相关性，在学习过程中各个任务之间共享一定的相关信息。将多任务学习方法引入人脸图像种族和性别、年龄识别中，以不同语义作为不同任务，提出基于语义的多任务特征选择，应用于多属性识别，能显著提高学习系统的泛化能力和识别效果。Therefore, the most commonly used face attribute recognition technology is based on a single model and realizes a single attribute judgment, that is, only one task is learned at a time under a unified model, and complex problems are first decomposed into theoretically independent sub-problems , in each sub-problem, the samples in the training set only reflect the information of a single task. However, face images contain various attribute information such as race, gender, age, etc., and there are correlations between recognition tasks corresponding to different information, and certain relevant information is shared between tasks in the learning process. The multi-task learning method is introduced into the recognition of race, gender, and age in face images, and different semantics are used as different tasks, and a semantic-based multi-task feature selection is proposed. Applied to multi-attribute recognition, it can significantly improve the generalization ability and recognition of the learning system. Effect.

虽然，针对上述问题，出现了基于多任务学习的人脸图像种族和性别识别方法。它将多任务学习方法引入人脸图像种族和性别识别中，以不同语义作为不同任务，提出基于语义的多任务特征选择，尽管显著的提高了学习系统的泛化能力和识别效果，但是采用的是传统的机器学习的方式，在效率方面大打折扣。Although, in response to the above problems, multi-task learning-based face image race and gender recognition methods have emerged. It introduces the multi-task learning method into face image race and gender recognition, takes different semantics as different tasks, and proposes multi-task feature selection based on semantics. Although it significantly improves the generalization ability and recognition effect of the learning system, it adopts It is the traditional machine learning method, which greatly reduces the efficiency.

现有的专利也有提出将深度学习技术与属性识别任务结合起来，但是只采用了多任务学习的方式，提出了三阶段的训练流程，分别在卷积网络上学习到人脸部位、人脸动作单元和情绪空间值这三个特征，来完成人脸情绪分析的任务，并没有实现多属性的输出结果。Existing patents also propose to combine deep learning technology with attribute recognition tasks, but only use multi-task learning, and propose a three-stage training process to learn face positions and faces on convolutional networks. The three features of action unit and emotional space value are used to complete the task of facial emotion analysis, and there is no multi-attribute output result.

因为，为了解决上述缺陷，本申请实施例提供了一种图像处理方法，如图1所示，该方法包括：S101至S104。Because, in order to solve the above-mentioned defects, the embodiment of the present application provides an image processing method. As shown in FIG. 1 , the method includes: S101 to S104.

S101：获取待处理的图像数据。S101: Acquire image data to be processed.

其中，图像数据可以是预先已经下载在电子设备内的离线的图像文件，也可以是在线的图像文件，于本申请实施例中，该图像数据可以是在线图像数据，例如，可以是实时获取的图像。The image data may be an offline image file that has been downloaded in advance in the electronic device, or may be an online image file. In this embodiment of the present application, the image data may be online image data, for example, may be acquired in real time image.

其中，该在线图像数据对应一个视频文件内某一帧图像或者多帧图像，而该在线图像数据为该视频文件已发送至电子设备内的数据，例如，该视频文件是某某电影，电子设备所接收的是该某某电影的播放时间0至10分钟的数据，则该某某电影对应的在线图像数据为某某电影的播放时间0至10分钟的数据。则客户端在获取到每个在线图像数据之后能够将每个在线图像数据各自解码并得到各自对应的待渲染图层，然后合并显示，从而能够在屏幕上显示多个视频画面。Wherein, the online image data corresponds to a certain frame of images or multiple frames of images in a video file, and the online image data is the data that the video file has sent to the electronic device, for example, the video file is a certain movie, and the electronic device The received data is the playing time of the XX movie from 0 to 10 minutes, and the online image data corresponding to the XX movie is the data of the playing time of the XX movie from 0 to 10 minutes. Then, after acquiring each online image data, the client can decode each online image data to obtain its corresponding layer to be rendered, and then combine and display, so that multiple video images can be displayed on the screen.

作为一种实施方式，电子设备内包括多个能够播放视频文件的客户端，当电子设备的客户端播放视频的时候，电子设备能够获取欲播放的视频文件，然后再对视频文件解码，具体地，可以采用上述的软解码或者硬解码对视频文件解码，在解码之后就能够获取视频文件对应的待渲染的多帧图像数据，之后需要将多帧图像数据渲染之后才能够在显示屏上显示。As an embodiment, the electronic device includes multiple clients capable of playing video files. When the client of the electronic device plays a video, the electronic device can obtain the video file to be played, and then decode the video file. Specifically , the above-mentioned soft decoding or hard decoding can be used to decode the video file. After decoding, the multi-frame image data to be rendered corresponding to the video file can be obtained, and then the multi-frame image data needs to be rendered before being displayed on the display screen.

作为另一种实施方式，该图像数据还可以是电子设备内的指定应用程序通过电子设备的摄像头采集的图像，具体地，指定应用程序执行某个功能的时候，调用摄像头拍摄图像，并且请求电子设备通过本申请的方法确定图像识别结果，并且将该图像识别结果发送至指定应用程序，由该指定应用程序根据该图像识别结果执行对应的操作。As another implementation manner, the image data may also be an image captured by a specified application in the electronic device through a camera of the electronic device. Specifically, when the specified application performs a certain function, the camera is called to capture an image, and the electronic device is requested to The device determines the image recognition result through the method of the present application, and sends the image recognition result to a designated application program, and the designated application program performs corresponding operations according to the image recognition result.

S102：将所述待处理的图像数据输入预先训练的多个特异性网络，以获取所述图像数据对应的属性标签。S102: Input the image data to be processed into a plurality of pre-trained specific networks to obtain attribute labels corresponding to the image data.

其中，每个所述特异性网络用于确定所述图像数据对应的属性标签，且各个所述特异性网络确定的属性标签互不相同。Wherein, each of the specific networks is used to determine an attribute label corresponding to the image data, and the attribute labels determined by each of the specific networks are different from each other.

具体地，在预先对特异性网络学习的时候，输入特异性网络内的样本图像数据包括多个属性标签，例如，人脸图像中的头发的颜色为黑，车辆图像内的白色汽车等，而每个属性标签的取值为0或者1，0表示不具备这项属性，1表示不具备这项属性，而这些属性标签是为了得到图像识别结果而预先设定的图像的特征值，而特异性网络的作用确定图像内是否包括预先设定的属性标签。Specifically, when learning the specificity network in advance, the sample image data input into the specificity network includes multiple attribute labels, for example, the color of the hair in the face image is black, the color of the white car in the vehicle image, etc., and The value of each attribute tag is 0 or 1, 0 means not having this attribute, 1 means not having this attribute, and these attribute tags are preset image eigenvalues for obtaining image recognition results. The role of the sex network determines whether a predefined attribute tag is included within the image.

具体地，每个特异性网络都能够确定图像数据对应的属性标签，在一些实施例中，可以是每个特异性网络能够确定图像数据对应的至少一个属性标签，例如，多个特异性网络包括第一特异性网络和第二特异性网络，而属性标签包括标签1、标签2和标签3，则第一特异性网络用于识别图像数据对应的标签1的识别结果，即如果图像数据包括标签1，则第一特异性网络识别的图像数据对应有标签1，或者，给出一个关于标签1的识别结果，即标签1为1，而如果没有标签1，则给出的识别结果是标签1为0。而第二特异性网络用于确定标签2和标签3，因此，标签1与标签2和标签3分开由不同的特异性网络识别，能够提高识别效率，避免标签1、标签2和标签3由同一个特异性网络识别而导致计算量过大，并且，第一特异性网络仅用于识别标签1，则就不需要学习和训练针对标签2和3的识别，也减少了训练成本。Specifically, each specific network can determine an attribute label corresponding to the image data. In some embodiments, each specific network can determine at least one attribute label corresponding to the image data. For example, the multiple specific networks include The first specific network and the second specific network, and the attribute labels include label 1, label 2 and label 3, then the first specific network is used to identify the recognition result of label 1 corresponding to the image data, that is, if the image data includes labels 1, the image data recognized by the first specific network corresponds to label 1, or, a recognition result about label 1 is given, that is, label 1 is 1, and if there is no label 1, the given recognition result is label 1 is 0. The second specificity network is used to determine label 2 and label 3. Therefore, label 1, label 2 and label 3 are separately identified by different specific networks, which can improve the recognition efficiency and avoid label 1, label 2 and label 3 from being identified by the same One specific network recognizes and results in an excessive amount of computation, and the first specific network is only used to recognize label 1, so there is no need to learn and train the recognition for labels 2 and 3, and the training cost is also reduced.

另外，需要说明的是，多个特异性网络可以是同时执行的，即可以是多个线程下同时操作，而并非是一个级联的关系，即某个特异性网络的输出结果不需要其他特异性网络的输入。具体地，特异性网络的结构在后续实施例中介绍。In addition, it should be noted that multiple specific networks can be executed at the same time, that is, they can be operated simultaneously under multiple threads, rather than a cascade relationship, that is, the output result of a specific network does not require other specific input to the sexual network. Specifically, the structure of the specificity network is introduced in the following embodiments.

本申请实施例中，特异性网络，其主要作用是从图像中分割出目标物体，并对该目标物体进行识别，即特异性网络也可以是目标检测网络，显然，特异性网络是将目标物的分割和识别合二为一。常用的目标检测网络有GOTURN网络、MobileNet-SSD深度卷积神经网络、FasterRCNN神经网络、Yolo神经网络以及SPP-Net(Spatial Pyramid Pooling)神经网络等。GOTURN神经网络是一种利用卷积神经网络进行离线训练的目标检测算法，其利用现有大规模分类数据集预训练的CNN分类网络提取特征并对该特征进行识别。In the embodiment of the present application, the specific network is mainly used to segment the target object from the image and identify the target object, that is, the specific network can also be a target detection network. Obviously, the specific network is to divide the target object The segmentation and recognition are combined into one. Commonly used target detection networks include GOTURN network, MobileNet-SSD deep convolutional neural network, FasterRCNN neural network, Yolo neural network and SPP-Net (Spatial Pyramid Pooling) neural network. GOTURN neural network is a target detection algorithm that uses convolutional neural network for offline training. It uses the existing large-scale classification dataset pre-trained CNN classification network to extract features and identify the features.

S103：将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络，以获取图像识别结果。S103: Input the attribute label determined by each specific network into a pre-trained shared network to obtain an image recognition result.

其中，所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果。具体地，共享网络则重点学习所有属性标签的共享信息，例如，嘴角上扬的属性标签与翻白眼的属性标签同时出现的时候，所表达的情绪为思考，而嘴角上扬的属性标签与翻白眼的属性标签的相关性就是通过共享网络识别到并且根据该相关性得到的识别结果。也就是说，共享网络经过预先训练之后，能够识别到各个属性标签之间的相关性，并根据其得到图像识别结果。Wherein, the shared network is used to determine the image recognition result according to each attribute tag and the correlation of each attribute tag. Specifically, the shared network focuses on learning the shared information of all attribute tags. For example, when the attribute tag with the raised corner of the mouth and the attribute tag with rolling eyes appear at the same time, the emotion expressed is thinking, while the attribute tag with the raised corner of the mouth and the attribute tag with rolling eyes appear at the same time. The correlation of is the identification result identified by the shared network and obtained according to the correlation. That is to say, after the shared network is pre-trained, it can identify the correlation between each attribute label, and obtain the image recognition result based on it.

S104：输出所述图像识别结果。S104: Output the image recognition result.

其中，输出图像识别结果的方式可以是，将图像识别结果在屏幕上显示，或者发送至请求获取该图像识别结果的请求端，该请求端可以是与该电子设备通信的服务器，也可以是其他的电子设备，还可以是电子设备内安装的应用程序，则上述方法的执行主体可以是电子设备内的能够图像识别的应用程序可以是电子设备的操作系统，则得到图像识别结果之后，将图像识别结果发送至请求端，请求端根据该图像识别结果执行某个操作，例如，交易支付或者屏幕解锁等。The way of outputting the image recognition result may be to display the image recognition result on the screen, or send the image recognition result to the requesting end requesting to obtain the image recognition result, and the requesting end may be the server communicating with the electronic device, or it may be other The electronic device can also be an application program installed in the electronic device, then the execution subject of the above method can be the application program capable of image recognition in the electronic device, or the operating system of the electronic device, then after the image recognition result is obtained, the image The recognition result is sent to the requesting end, and the requesting end performs an operation according to the image recognition result, for example, transaction payment or screen unlocking.

请参阅图2，示出了本申请实施例提供的图像处理方法，该方法包括：S201至S207。Referring to FIG. 2, an image processing method provided by an embodiment of the present application is shown, and the method includes: S201 to S207.

S201：获取原始图像数据。S201: Acquire original image data.

其中，原始图像数据为可以是图像对应的灰度值，也就是说，图像内的每个像素的数值为[0，255]区间的数值，即灰度值。则作为一种实施方式，在电子设备获取到图像的时候，该图像可以是一个彩色图像，则对该彩色图像做二值化处理，得到灰度图，则该灰度图内的每个像素的灰度值构成了该原始图像的数据。The original image data may be a grayscale value corresponding to the image, that is, the value of each pixel in the image is a value in the [0, 255] interval, that is, a grayscale value. Then, as an implementation manner, when the electronic device obtains an image, the image may be a color image, and then the color image is binarized to obtain a grayscale image, and then each pixel in the grayscale image is obtained. The grayscale values of , constitute the data of the original image.

另外，需要说明的是，该原始图像数据可以是电子设备的摄像头采集的数据，例如，而图像处理方法的应用于该摄像头采集的图像数据的实时分析，并且，该分析是针对人脸识别属性的分析。In addition, it should be noted that the original image data may be data collected by a camera of an electronic device. For example, the image processing method is applied to real-time analysis of the image data collected by the camera, and the analysis is based on the attributes of face recognition. analysis.

具体地，该图像数据还可以是电子设备内的指定应用程序通过电子设备的摄像头采集的图像，具体地，指定应用程序执行某个功能的时候，调用摄像头拍摄图像，并且请求电子设备通过本申请的方法确定图像识别结果，并且将该图像识别结果发送至指定应用程序，由该指定应用程序根据该图像识别结果执行对应的操作。其中，该指定应用程序可以是电子设备内的屏幕解锁APP也可以是支付APP。例如，该屏幕解锁APP通过摄像头采集到的人脸图像进行人脸识别从而确定出身份信息，判断所述人脸图像是否与预设人脸图像匹配，如果匹配，则判定成功解锁，如果不匹配，则判定未成功解锁。Specifically, the image data can also be an image collected by a designated application in the electronic device through the camera of the electronic device. Specifically, when the designated application performs a certain function, the camera is called to capture an image, and the electronic device is requested to pass the application. The method determines an image recognition result, and sends the image recognition result to a designated application program, and the designated application program performs corresponding operations according to the image recognition result. Wherein, the designated application may be a screen unlocking APP in the electronic device or a payment APP. For example, the screen unlocking APP performs face recognition through the face image collected by the camera to determine the identity information, and judges whether the face image matches the preset face image. If it matches, it is determined that the unlocking is successful. , it is determined that the unlocking is unsuccessful.

其中，预设人脸图像可以是用户预先设定的人脸图像，可以是存储在移动终端内，也可以是存储在某个服务器或者存储器内，移动终端能够由该服务器或者存储器内获取待预设人脸图像。具体地，可以是预设人脸图像的预设特征信息，则如果人脸图像为二维图像，则预设特征信息为用户预先录入的人脸图像的五官特征点信息，如果人脸图像为三维图像，则预设特征信息为用户预先录入的人脸图像的人脸三维信息。则判断所述人脸图像是否满足预设条件的方式为，获取人脸图像的特征点信息，将所采集的人脸图像的特征信息与用户预先录入的预设特征信息比对，如果匹配，则判定人脸图像满足预设条件，则确定人脸图像有权限将移动终端的屏幕解锁，如果不匹配，则判定人脸图像不满足预设条件，没有权限将屏幕解锁。The preset face image may be a face image preset by the user, which may be stored in the mobile terminal, or may be stored in a certain server or memory. Set up a face image. Specifically, it may be preset feature information of a preset face image, then if the face image is a two-dimensional image, the preset feature information is the facial feature point information of the face image pre-entered by the user, if the face image is If a three-dimensional image is used, the preset feature information is the face three-dimensional information of the face image pre-entered by the user. Then the way of judging whether the face image satisfies the preset condition is to obtain the feature point information of the face image, and compare the collected feature information of the face image with the preset feature information pre-entered by the user, and if it matches, If it is determined that the face image satisfies the preset conditions, it is determined that the face image has the authority to unlock the screen of the mobile terminal; if it does not match, it is determined that the face image does not meet the preset conditions and has no authority to unlock the screen.

于本申请实施例中，图像数据包含人脸图像，而针对该图像数据的识别为人脸识别，则在执行获取待处理的图像数据，可以先判断该图像数据内是否包括人脸，如果包括，则执行后续操作。具体地，摄像头采集的图像为二维图像，通过查找该图像内是否有人脸五官特征点，能够确定是否采集到人脸图像，如果采集到，则将所采集的人脸图像发送给移动终端的处理器，以使处理器能够对人脸图像进行分析并执行屏幕解锁操作。作为另一种实施方式，摄像头包括结构光，则根据结构光所采集的三维信息，确定是否存在人脸三维信息，如果存在，则将所采集的图像发送给移动终端的处理器。In the embodiment of the present application, the image data includes a face image, and the recognition for the image data is face recognition, then when performing the acquisition of the image data to be processed, it can be first judged whether the image data includes a human face, if included, Then perform subsequent operations. Specifically, the image collected by the camera is a two-dimensional image. By finding out whether there are facial features in the image, it can be determined whether a face image is collected. If collected, the collected face image is sent to the mobile terminal. The processor enables the processor to analyze the face image and perform the screen unlocking operation. As another embodiment, if the camera includes structured light, it is determined whether there is three-dimensional face information according to the three-dimensional information collected by the structured light, and if so, the collected image is sent to the processor of the mobile terminal.

另外，如果所述摄像头采集的图像内不包括人脸图像，则返回继续执行判断所述摄像头采集的图像内是否包括人脸图像的操作，还可以是，发出人脸采集提醒信息，以提醒用户使用所述摄像头采集人脸图像。具体地，该人脸采集提醒信息可以是在电子设备的当前界面显示。In addition, if the image collected by the camera does not include a face image, return to and continue to perform the operation of judging whether the image collected by the camera includes a face image, or send a face collection reminder message to remind the user A face image is collected using the camera. Specifically, the face collection reminder information may be displayed on the current interface of the electronic device.

S202：对所述原始图像数据归一化处理，以得到待处理的图像数据。S202: Normalize the original image data to obtain image data to be processed.

对原始图像内的每个像素值做归一化处理，即原来的0-255的数值变为0-1区间内的数值，从而能够提高后续的特异性网络和共享网络的计算速度，提高整体图像处理的速度，具体地，可以采用均值方差归一化或灰度变换归一化的方式将原始图像数据归一化处理。Normalize each pixel value in the original image, that is, the original value of 0-255 becomes the value in the range of 0-1, which can improve the calculation speed of the subsequent specific network and shared network, and improve the overall The speed of image processing, specifically, normalization of the original image data by means of mean-variance normalization or grayscale transformation normalization.

另外，还是可以在对所述原始图像数据归一化处理之后，去除冗余信息，该冗余信息是指被压缩了分布之间的差距。In addition, after normalizing the original image data, redundant information may be removed, where the redundant information refers to the gap between the compressed distributions.

则经过归一化处理之后的原始图像数据作为待处理的图像数据。The original image data after normalization processing is regarded as the image data to be processed.

S203：确定每个所述特异性网络对应的属性标签。S203: Determine the attribute label corresponding to each specific network.

具体地，所述特异性网络能够识别的属性标签为该特异性网络对应的属性标签。所述特异性网络能够识别的属性标签是在训练该特异性网络的时候就已经设置好的，具体地，可参考后续实施例。Specifically, the attribute label that can be identified by the specific network is the attribute label corresponding to the specific network. The attribute labels that can be identified by the specific network are already set when the specific network is trained. For details, reference may be made to subsequent embodiments.

S204：根据每个所述特异性网络对应的属性标签将所述图像数据划分为多个子图像数据。S204: Divide the image data into multiple sub-image data according to the attribute label corresponding to each specific network.

由于图像数据内的每个属性标签都对应图像内的一个位置，以人脸图像为例，头发的颜色的属性标签所对应的图像内的位置为头发位置，眼睛颜色的属性标签所对应的图像数据内的位置为眼睛位置，从而每个属性标签所对应的位置可以预先确定，例如，在训练特异性网络的时候，就设定好每个属性标签所对应的区域，由于，图像数据内的每个像素值都对应一个像素坐标，则每个属性标签对应的区域在像素坐标内的位置能够确定，进而就能够确定每个特异性网络所能识别的属性标签在图像内的位置。进而，就能够将图像数据划分未多个子图像数据，而每个子图像数据对应图像内的一个区域，并且该区域内的属性标签都对应同一个特异性网络，即每个特异性网络所能够识别的属性标签位于为子图像数据对应的区域内。Since each attribute tag in the image data corresponds to a position in the image, taking a face image as an example, the position in the image corresponding to the attribute tag of hair color is the position of the hair, and the image corresponding to the attribute tag of eye color The position in the data is the eye position, so the position corresponding to each attribute label can be predetermined. For example, when training a specific network, the area corresponding to each attribute label is set. Each pixel value corresponds to a pixel coordinate, so the position of the region corresponding to each attribute label in the pixel coordinate can be determined, and then the position of the attribute label recognized by each specific network in the image can be determined. Furthermore, the image data can be divided into multiple sub-image data, and each sub-image data corresponds to an area in the image, and the attribute labels in this area all correspond to the same specific network, that is, each specific network can identify The attribute label of is located in the area corresponding to the sub-image data.

从而就能够得到每个特异性网络对应的图像子区域，即每个特异性网络对应的子图像数据。例如，图像被划分为第一区域、第二区域和第三区域，第一特异性网络所能够识别的属性标签分布在第一区域内，第二特异性网络所能够识别的属性标签分布在第二区域内，第三特异性网络所能够识别的属性标签分布在第三区域内，则图像数据被划分为三个子图像数据，分别为第一子图像数据、第二子图像数据和第三子图像数据，则第一子图像数据对应第一区域，第二子图像数据对应第二区域，第三子图像数据对应第三区域。Thus, the image sub-region corresponding to each specific network can be obtained, that is, the sub-image data corresponding to each specific network. For example, the image is divided into a first area, a second area and a third area, the attribute labels that can be recognized by the first specific network are distributed in the first area, and the attribute labels that can be recognized by the second specific network are distributed in the first area. In the second area, the attribute labels that can be recognized by the third specific network are distributed in the third area, and the image data is divided into three sub-image data, namely the first sub-image data, the second sub-image data and the third sub-image data. image data, the first sub-image data corresponds to the first area, the second sub-image data corresponds to the second area, and the third sub-image data corresponds to the third area.

另外，为了更好的根据像素坐标划分多个区域，可以将图像统一按照一个方向调整，从而是指定区域位于同一个位置内。以人脸图像为了，可以在获取到原始图像的时候，截图原始图像内的人脸区域，并且按照一定方向调整图像旋转，使得人脸固定朝向某个位置，例如，始终保持人脸的额头位于图像的上部，下巴位于图像的下部。In addition, in order to better divide multiple areas according to the pixel coordinates, the image can be uniformly adjusted in one direction, so that the designated area is located in the same position. For the purpose of face image, when acquiring the original image, you can take a screenshot of the face area in the original image, and adjust the image rotation according to a certain direction, so that the face is fixed to a certain position, for example, always keep the forehead of the face at the position of the face. The upper part of the image, the chin is in the lower part of the image.

S205：将所述子图像数据输入与该子图像数据对应的特异性网络。S205: Input the sub-image data into a specific network corresponding to the sub-image data.

作为一种实施方式，可以是在确定了每个特异性网络对应的属性标签位于图像内的区域之后，可以将图像划分为多个区域，而每个区域对应一个特异性网络，按照所确定的多个区域将图像划分为多个子图像，而每个子图像对应的像素数据作为子图像数据，即为经过S202处理之后的像素数据。As an implementation manner, after determining the region where the attribute label corresponding to each specific network is located in the image, the image can be divided into multiple regions, and each region corresponds to a specific network, according to the determined The multiple regions divide the image into multiple sub-images, and the pixel data corresponding to each sub-image is used as the sub-image data, that is, the pixel data processed in S202.

例如，上述的第一区域、第二区域和第三区域，然后，将图像分为三个子图像，分别为第一子图像、第二子图像和第三子图像，然后，将第一子图像内的各个像素值对应的像素数据输入第一特异性网络，将第二子图像内的各个像素值对应的像素数据输入第二特异性网络，将第三子图像内的各个像素值对应的像素数据输入第三特异性网络。For example, the above-mentioned first area, second area and third area, and then, the image is divided into three sub-images, namely the first sub-image, the second sub-image and the third sub-image, and then the first sub-image is divided into three sub-images. The pixel data corresponding to each pixel value in the second sub-image is input into the first specific network, the pixel data corresponding to each pixel value in the second sub-image is input into the second specific network, and the pixel corresponding to each pixel value in the third sub-image is input. Data entered into the third specificity network.

从而可以不必将整个图像数据分别输入到各个特异性网络内，而只将特异性网络所能够识别的属性标签对应的子图像数据输入至该特异性网络内，减少了特异性网络的计算量，提高了整体的识别速度。Therefore, it is not necessary to input the entire image data into each specific network, but only the sub-image data corresponding to the attribute label that can be recognized by the specific network is input into the specific network, which reduces the calculation amount of the specific network. Improved overall recognition speed.

S206：将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络，以获取图像识别结果。S206: Input the attribute label determined by each specific network into the pre-trained shared network to obtain an image recognition result.

S207：输出所述图像识别结果。S207: Output the image recognition result.

需要说明的是，上述步骤中未详细描述的部分可以参考前述实施例，在此不再赘述。It should be noted that, for parts not described in detail in the above steps, reference may be made to the foregoing embodiments, and details are not described herein again.

另外，在执行S102或S204之前，还需要对特异性网络和共享网络训练，具体地，该训练过程可以是在S101之后以及S102之前或者S201之后以及S204之前，也可以是在S101和S201之前，于本申请实施例中，可以是在执行本图像识别方法之前先对特异性网络和共享网络训练。In addition, before executing S102 or S204, it is also necessary to train the specific network and the shared network. Specifically, the training process can be after S101 and before S102 or after S201 and before S204, or before S101 and S201, In this embodiment of the present application, the specific network and the shared network may be trained before executing the image recognition method.

具体地，请参阅图3，示出了本申请实施例提供的图像处理方法中的特异性网络和共享网络训练的训练过程，具体地，图3所示，该方法包括：S310至S370。Specifically, please refer to FIG. 3 , which shows the training process of the specific network and the shared network training in the image processing method provided by the embodiment of the present application. Specifically, as shown in FIG. 3 , the method includes: S310 to S370 .

S310：获取多个样本图像数据，每个所述样本图像数据对应多个属性标签。S310: Acquire a plurality of sample image data, each of which corresponds to a plurality of attribute tags.

具体地，该样本图像数据为已经被标记的图像数据，可以是预先获取的图像之后，人工对该图像进行标记，而每个标记点对应一个属性标签。例如，该样本图像数据可以是CelebA人脸属性数据集作为实验数据集。该数据集包含约20万张人脸图像，每张图片提供了40个人脸属性标注和5个人脸关键点的位置信息。本文依据CelebA官方的标准，取其中的约10万张人脸图像用于网络模型的训练，约1万张图像用于验证，1万张图像用于测试网络模型。Specifically, the sample image data is already marked image data, which may be a pre-acquired image, and the image is manually marked, and each marked point corresponds to an attribute label. For example, the sample image data may be the CelebA face attribute data set as the experimental data set. The dataset contains about 200,000 face images, and each image provides 40 face attribute annotations and 5 face key points location information. According to the official standards of CelebA, this paper takes about 100,000 face images for the training of the network model, about 10,000 images for verification, and 10,000 images for testing the network model.

针对该公开的人脸属性数据集，可得每张人脸图片对应40个属性标签，每个标签的取值为0或者1，0表示不具备这项属性，1表示不具备这项属性。For the public face attribute data set, each face image corresponds to 40 attribute labels, and the value of each label is 0 or 1, 0 means not having this attribute, 1 means not having this attribute.

需要说明的是，样本图像数据和待处理的图像数据均包含人脸图像，也就是说本申请实施例所提供的图像处理方法应用于人脸属性识别，而该特异性网络和共享网络的训练也是针对人脸属性识别而训练的。It should be noted that the sample image data and the image data to be processed both contain face images, that is to say, the image processing method provided by the embodiment of the present application is applied to face attribute recognition, and the training of the specific network and the shared network It is also trained for face attribute recognition.

进一步地，为了提高图像识别的准确度可以将人脸对齐，具体地，请参阅图4，S310可以包括：S311至S314。Further, in order to improve the accuracy of image recognition, the faces may be aligned. Specifically, please refer to FIG. 4 , and S310 may include: S311 to S314.

S311：获取多个样本图像数据。S311: Acquire multiple sample image data.

具体地，该步骤可参考上述描述，在此不再赘述。Specifically, for this step, reference may be made to the above description, which will not be repeated here.

S312：识别每个所述样本图像数据内的人脸关键点在所述样本图像内的位置信息。S312: Identify the position information of the face key points in the sample image in each of the sample image data.

具体地，该人脸关键点可以是人脸图像内的五官，例如，该人脸关键点可以是眼睛、鼻子、嘴巴等。而具体的识别方式可以是，通过人脸图像的五官识别方式，例如，PCA(principal component analysis)分析方法确定人脸图像内的五官，从而确定人脸关键点在人脸图像内的位置信息，即像素坐标。Specifically, the face key points may be facial features in the face image, for example, the face key points may be eyes, nose, mouth, and the like. The specific recognition method can be, through the facial features recognition method of the face image, for example, the PCA (principal component analysis) analysis method to determine the facial features in the face image, so as to determine the position information of the key points of the face in the face image, i.e. pixel coordinates.

具体地，可以是在获取到样本图像之后，获取人脸区域图像，裁剪面部区域，对所述待训练样本数据进行人脸矫正，确定人脸关键点的位置信息，如眼睛、鼻子、嘴巴等。Specifically, after obtaining the sample image, obtain the face area image, crop the face area, perform face correction on the sample data to be trained, and determine the position information of the key points of the face, such as eyes, nose, mouth, etc. .

S313：根据每个所述样本图像数据的人脸关键点的位置信息，调整每个所述样本图像数据内的人脸朝向符合预设朝向。S313: Adjust the orientation of the face in each of the sample image data to conform to the preset orientation according to the position information of the key points of the face in each of the sample image data.

具体地，预设朝向可以是人脸朝向正前方，具体地，人脸朝向正前方的含义是人脸的额头部分在图像的上部，人脸的下巴部分在图像的下部。具体地，可以通过人脸关键点的位置信息确定图像中人脸朝向。具体地，为每个样本图像设定相同的像素坐标系，即都可以是以样本图像的左侧顶部的顶点为原点，建立像素坐标系，从而人脸图像内的人脸关键点的像素坐标就能够获取到，而通过人脸关键点的位置信息能够确定当然人的额头和下巴的位置关系，具体地，假如确定眼睛在图像的左侧，而嘴巴在图像的右侧，且二者的纵坐标之间的差距小于指定数值，则可以确定眼睛和嘴巴在同一个水平线上，则可以通过顺时针旋转90°的方式使得样本图像数据内的人脸朝向符合预设朝向。作为一种实施方式，预设朝向可以是人脸朝向正前方15度之内。Specifically, the preset orientation may be that the human face faces straight ahead. Specifically, the face facing straight ahead means that the forehead part of the human face is in the upper part of the image, and the chin part of the human face is in the lower part of the image. Specifically, the orientation of the face in the image can be determined by the position information of the key points of the face. Specifically, the same pixel coordinate system is set for each sample image, that is, the pixel coordinate system can be established with the vertex on the top left of the sample image as the origin, so that the pixel coordinates of the key points of the face in the face image are It can be obtained, and the positional relationship between the forehead and the chin of the person can be determined through the position information of the key points of the face. Specifically, if it is determined that the eyes are on the left side of the image, and the mouth is on the right side of the image, and the two If the difference between the ordinates is less than the specified value, it can be determined that the eyes and the mouth are on the same horizontal line, and the face orientation in the sample image data can be made to conform to the preset orientation by rotating 90° clockwise. As an implementation manner, the preset orientation may be within 15 degrees of the face facing straight ahead.

另外，在根据每个所述样本图像数据的人脸关键点的位置信息，调整每个所述样本图像数据内的人脸朝向符合预设朝向之后，为了减少计算量，还可以调整样本图像的尺寸，从而做尺寸调整，具体地，通过人脸关键点的定位，如眼睛、鼻子、嘴巴等，按预设方向标准调整所述待预测对象的方向，保证每个待预测对象的人脸朝向正前方15度之内，实现人脸对齐，并向面部区域添加预定比例的边距，同时为了减少计算量，设置图像大小为指定尺寸，例如，该指定尺寸可以是112*112。具体地，可以是将整个图像的尺寸压缩到指定尺寸，也可以是以指定尺寸大小的窗口剪裁样本图像，具体地，可以是以样本图像的中心点为该窗口的中心点，检测该窗口大小对应的图像区域内的图像，作为尺寸调整后的图像。作为一种实施方式，该窗口的大小可以是112*112。In addition, after adjusting the face orientation in each of the sample image data to conform to the preset orientation according to the position information of the face key points of each of the sample image data, in order to reduce the amount of calculation, the orientation of the sample image can also be adjusted. size, so as to adjust the size. Specifically, through the positioning of key points of the face, such as eyes, nose, mouth, etc., the direction of the object to be predicted is adjusted according to the preset direction standard to ensure that the face of each object to be predicted faces. Within 15 degrees in front, the face is aligned, and a predetermined proportion of margin is added to the face area. At the same time, in order to reduce the amount of calculation, the image size is set to a specified size, for example, the specified size can be 112*112. Specifically, the size of the entire image can be compressed to a specified size, or the sample image can be cropped with a window of the specified size. Specifically, the center point of the sample image can be used as the center point of the window, and the size of the window can be detected. The image in the corresponding image area is used as the resized image. As an implementation manner, the size of the window may be 112*112.

S314：将调整后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据。S314: Use the adjusted sample image data as the sample image data used for training the shared network and multiple initial specificity networks this time.

具体地，如果上述调整为根据每个所述样本图像数据的人脸关键点的位置信息，调整每个所述样本图像数据内的人脸朝向符合预设朝向，则将经过人脸朝向的调整之后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据，如果上述调整包括根据每个所述样本图像数据的人脸关键点的位置信息，调整每个所述样本图像数据内的人脸朝向符合预设朝向以及调整样本图像的尺寸为指定尺寸，则将经过人脸朝向的调整和尺寸调整之后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据。Specifically, if the above adjustment is to adjust the face orientation in each of the sample image data to conform to the preset orientation according to the position information of the face key points of each of the sample image data, the face orientation adjustment will be performed. The subsequent sample image data is used as the sample image data for training the shared network and multiple initial specificity networks this time. If the above adjustment includes the position information of the face key points of each of the sample image data, the adjustment of each The face orientation in each of the sample image data conforms to the preset orientation and the size of the sample image is adjusted to the specified size, then the sample image data after face orientation adjustment and size adjustment will be used for training the sharing. Sample image data for the network and multiple initial specificity networks.

进一步地，为了增加数据样本并且提高训练后的图像处理模型，即人脸识别模型，也即共享网络和多个特异性网络的泛化型，具体地，请参阅图5，S310可以包括：S311、S315、S316和S317。Further, in order to increase the data samples and improve the image processing model after training, that is, the face recognition model, that is, the generalization type of the shared network and multiple specific networks, specifically, please refer to FIG. 5, S310 may include: S311 , S315, S316 and S317.

S315：对所述多个样本图像数据进行数据增强处理，以使每个所述样本图像数据的光照强度和对比度在预设区间内随机分布。S315: Perform data enhancement processing on the plurality of sample image data, so that the illumination intensity and contrast of each of the sample image data are randomly distributed within a preset interval.

具体地，按预设光照强度区间变换所述待训练对象的光照强度，得到每个待训练对象的光照强度在预设光照强度区间随机分布的数据；按预设对比度区间变换所述待训练对象的对比度，得到每个待训练对象的对比度在预设对比度区间随机分布的数据。Specifically, the illumination intensity of the object to be trained is transformed according to a preset illumination intensity interval, and data in which the illumination intensity of each object to be trained is randomly distributed in the preset illumination intensity interval is obtained; the object to be trained is transformed according to a preset contrast interval The contrast of each object to be trained is randomly distributed in the preset contrast interval.

具体地，预设光照强度区间可以是预先设定的光照强度区域，而在获取样本图像内的每个像素点的光照强度之后，可以将该像素点的光照强度调整到预设光照强度区间内。作为一种实施方式，可以是统计样本图像内的每个像素点的光照强度的分布情况，使得光照强度较高的像素点在预设光照强度区间内也位于光照强度较高的数值，光照强度较低的像素点在预设光照强度区间内也位于光照强度较低的数值，另外，还可以增加各个像素点在预设光照强度区间内的光照强度的分布的连续性，即多个像素值的光照强度的分布子区域内，相邻的两个子区域之间的强度值相差不大于指定数值，使得光照强度在预设光照强度区间内随机分布，从而能够增加数据的多样性，具体地，每个样本图像数据内的像素点的光照强度均可以在其对应的预设光照强度区间内随机分布，并且每个样本图像数据所对应的预设光照强度区间可以不全相同，从而进一步增加数据的多样性，进而提高后期模型训练的泛化型。Specifically, the preset illumination intensity interval may be a preset illumination intensity area, and after acquiring the illumination intensity of each pixel in the sample image, the illumination intensity of the pixel may be adjusted to be within the preset illumination intensity interval . As an embodiment, the distribution of the illumination intensity of each pixel in the sample image can be counted, so that the pixel with higher illumination intensity is also located at the value of higher illumination intensity within the preset illumination intensity interval, and the illumination intensity The lower pixel points are also located at the lower value of the illumination intensity in the preset illumination intensity interval. In addition, the continuity of the distribution of the illumination intensity of each pixel in the preset illumination intensity interval can also be increased, that is, multiple pixel values. In the distribution sub-region of the illumination intensity, the difference between the intensity values of two adjacent sub-regions is not greater than the specified value, so that the illumination intensity is randomly distributed within the preset illumination intensity interval, thereby increasing the diversity of data. Specifically, The illumination intensity of the pixels in each sample image data can be randomly distributed within its corresponding preset illumination intensity interval, and the preset illumination intensity intervals corresponding to each sample image data may not be all the same, thereby further increasing the data density. Diversity, thereby improving the generalization of later model training.

同理，按预设对比度区间变换所述待训练对象的对比度，得到每个待训练对象的对比度在预设对比度区间随机分布的数据要可以参考上述光照强度的调整的过程。从而使得每个样本图像数据内的像素点的对比度均可以在其对应的预设对比度区间内随机分布，并且每个样本图像数据所对应的预设对比度区间可以不全相同，从而进一步增加数据的多样性，进而提高后期模型训练的泛化型。Similarly, to transform the contrast of the object to be trained according to the preset contrast interval, to obtain the data of random distribution of the contrast of each object to be trained in the preset contrast interval, reference may be made to the above-mentioned adjustment process of the illumination intensity. Therefore, the contrast of the pixels in each sample image data can be randomly distributed in its corresponding preset contrast interval, and the preset contrast interval corresponding to each sample image data can be different, thereby further increasing the variety of data. This improves the generalization of later model training.

另外，在执行上述的增强处理之前，可以先对处理得到的样本数据进行归一化处理，将其像素值从[0,255]归一化到[0,1]，去除样本数据中包含的冗余信息。In addition, before performing the above-mentioned enhancement processing, the sample data obtained by the processing can be normalized, and the pixel values of the sample data can be normalized from [0, 255] to [0, 1] to remove the redundancy contained in the sample data. information.

S316：对增强处理后的每个所述样本图像数据按照预设随机剪裁比例剪裁，并且每个将每个剪裁后的样本图像数据的尺寸均为预设尺寸。S316: Crop each of the sample image data after the enhancement processing according to a preset random cropping ratio, and the size of each cropped sample image data is a preset size.

按预设随机裁剪比例裁剪所述待训练对象，并将其调整为预设尺寸，其中，预设尺寸可以是112*112；按水平方向翻转所述待训练对象。Crop the object to be trained according to a preset random cropping ratio, and adjust it to a preset size, where the preset size may be 112*112; flip the object to be trained in a horizontal direction.

需要说明的是，上述按预设随机裁剪比例裁剪所述待训练对象，并将其调整为预设尺寸可以参考前述的调整为指定尺寸的剪裁方式，则预设尺寸与指定尺寸相同。It should be noted that, for the above-mentioned cropping of the object to be trained according to the preset random cropping ratio, and adjusting it to a preset size, reference may be made to the aforementioned cropping method of adjusting to a specified size, and the preset size is the same as the specified size.

S317：将剪裁后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据。S317: Use the cropped sample image data as the sample image data used for training the shared network and multiple initial specificity networks this time.

需要说明的是，上述的步骤S311至S314可以替换S310，即在S311、S312、S313和S314之后执行S320，也可以是步骤S311、S315、S316和S317替换S310，即在S311、S315、S316和S317之后执行S320，还可以是在S311至S317一起替换S310，即S311、S312、S313、S314、S315、S316和S317之后再执行S320。It should be noted that the above steps S311 to S314 can replace S310, that is, S320 is executed after S311, S312, S313 and S314, or S310 can be replaced by steps S311, S315, S316 and S317, that is, S311, S315, S316 and S320 is executed after S317, or S310 can be replaced together with S311 to S317, that is, S320 is executed after S311, S312, S313, S314, S315, S316 and S317.

S320：设置共享网络和多个特异性网络。S320: Setting a shared network and multiple specific networks.

每个所述特异性网络能够识别至少一个属性标签，且每个所述特异性网络能够识别的属性标签互不相同。如果为每个属性标签配置一个特异性网络，会存在巨大的计算开销。例如，以人脸图像识别为例，假设属性标签一共包括40个，如果将40个人脸属性视为40个独立的任务，直接将40个人脸属性视为40个独立的任务存在巨大的计算开销，而且忽视了人脸属性之间的显示的位置相关性。因此，可以为人脸划分多个区域，而每个区域对应一个特异性网络，具体地，设置共享网络和多个特异性网络的具体实施方式可以是划分多个测量区域，每个所述测量区域对应人脸的不同区域；根据所述多个测量区域设置多个特异性网络，每个特异性网络对应一个测量区域，每个所述特异性网络用于确认所对应的测量区域内的属性标签。Each of the specific networks can identify at least one attribute label, and the attribute labels that can be identified by each of the specific networks are different from each other. If a specific network is configured for each attribute label, there will be a huge computational cost. For example, taking face image recognition as an example, assuming that there are 40 attribute labels in total, if 40 face attributes are regarded as 40 independent tasks, there is a huge computational cost to directly regard 40 face attributes as 40 independent tasks. , and ignores the displayed positional correlation between face attributes. Therefore, a face can be divided into multiple areas, and each area corresponds to a specific network. Specifically, the specific implementation of setting a shared network and multiple specific networks can be divided into multiple measurement areas, each of which is measured. Corresponding to different areas of the face; multiple specific networks are set according to the multiple measurement areas, each specific network corresponds to a measurement area, and each of the specific networks is used to confirm the attribute label in the corresponding measurement area .

具体地，可以设置四个特异性网络，分别为上部特异性网络、中部特异性网络、下部特异性网络和全脸特异性网络。对应地，属性标签划分为四个组，分别为上部组、中部组、下部组和全脸组，每个组对应属性标签，且各个组的属性标签不同，也就是说，每个特异性网络能够识别所对应的组的属性标签。根据它们的相应位置，可以将每个组的属性分类视为单独的属性学习任务。将属性分成4个组之后，将每个属性组的属性分类问题视为一个子任务，具体地，属性标签如下表所示：Specifically, four specificity networks can be set, which are the upper specific network, the middle specific network, the lower specific network and the whole face specific network. Correspondingly, the attribute labels are divided into four groups, namely the upper group, the middle group, the lower group and the full face group, each group corresponds to the attribute label, and the attribute labels of each group are different, that is, each specific network The attribute label that can identify the corresponding group. According to their corresponding positions, the attribute classification of each group can be treated as a separate attribute learning task. After dividing the attributes into 4 groups, the attribute classification problem of each attribute group is regarded as a subtask, specifically, the attribute labels are shown in the following table:

如图6所示，样本图像被划分为四个区域，分别为上部区域m1、中部区域m2、下部区域m3和全脸区域m4，作为一种实施方式，上部区域m1为图像的顶部侧边至双眼中位置最靠下的眼睛的位置的横坐标之间的区域，具体地，如图6所示，在双眼中位置最靠下的眼睛的位置的横坐标设置有一条平行横坐标轴的直线，记为第一直线，则第一直线与图像的顶部侧边之间的区域作为上部区域。在鼻子与上嘴唇之间的区域选中一个位置点，可以是中间位置点，设置有一条平行横坐标轴的直线，记为第二直线，则第一直线和第二直线之前的区间作为中部区域，第二执行和图像的底部侧边至今的区域作为下部区域，将人脸的下巴末端至头发顶端之间的区域作为全脸区域，其中，该全脸区域能够圈住人脸以及头发。在其中，上述的区域为测量区域，即上部区域m1、中部区域m2、下部区域m3和全脸区域m4为四个测量区域。As shown in FIG. 6 , the sample image is divided into four areas, namely the upper area m1, the middle area m2, the lower area m3 and the full face area m4. As an embodiment, the upper area m1 is the top side of the image to the The area between the abscissas of the position of the lowermost eye in both eyes, specifically, as shown in Figure 6, the abscissa of the position of the lowermost eye in both eyes is provided with a straight line parallel to the abscissa axis , denoted as the first straight line, then the area between the first straight line and the top side of the image is taken as the upper area. Select a position point in the area between the nose and the upper lip, which can be a middle position point, and set a line parallel to the abscissa axis, denoted as the second line, and the interval before the first line and the second line is regarded as the middle The second execution and the area from the bottom side of the image to the present are taken as the lower area, and the area between the chin end of the face to the top of the hair is taken as the full face area, wherein the full face area can enclose the face and hair. Among them, the above-mentioned areas are measurement areas, that is, the upper area m1 , the middle area m2 , the lower area m3 and the full face area m4 are four measurement areas.

于本申请实施例中，一共包括四个特异性网络和一个共享网络，则每个任务拥有一个独立的特异性网络。与分支结构不同的是，为了更好地保留任务各自的特异性，任务间的特异性网络的参数是不共享的。共享网络作为一个独立的网格，并不对应某个特定的学习任务，而是为了学习任务之间的相关性，提取任务间的互补信息。通过一个简单的连接单元，将特异性网络和共享网络连接起来，从而达到最大化两者之间信息流的目的。具体地，如图7所示，特异性网络和共享网络的连接关系如图7所示，共享网络的每一层输入，除了包括其上一层的输出特征之外，还包括上一层所有的特异性网络的输出特征。这些特征串联在一起，组成了特异性网络每一层的输入。同时，特异性网络的每一层输入，除了包括其上一层的输出特征之外，还包括上一层共享网络的输出特征。两者串联在一起形成了最后的输入。另外，图7仅示出了两个特异性网络和共享网络之间的连接关系，而针对四个特异性网络的连接关系也可以参考图7所示而合理得出。In the embodiment of the present application, a total of four specific networks and one shared network are included, and each task has an independent specific network. Unlike the branching structure, in order to better preserve the specificity of each task, the parameters of the specificity network between tasks are not shared. As an independent grid, the shared network does not correspond to a specific learning task, but extracts complementary information between tasks in order to learn the correlation between tasks. Through a simple connection unit, the specific network and the shared network are connected, so as to maximize the information flow between the two. Specifically, as shown in Figure 7, the connection relationship between the specific network and the shared network is shown in Figure 7. The input of each layer of the shared network, in addition to the output features of the previous layer, also includes all the The output features of the specificity network. These features are concatenated together to form the input to each layer of the specificity network. At the same time, the input of each layer of the specific network, in addition to the output features of the previous layer, also includes the output features of the shared network in the previous layer. The two are concatenated together to form the final input. In addition, FIG. 7 only shows the connection relationship between the two specific networks and the shared network, and the connection relationship for the four specific networks can also be reasonably obtained with reference to what is shown in FIG. 7 .

本申请实施例所构建的多任务属性识别模型包括4个特异性网络和1个共享网络。特异性网络重点学习每一个特征组的特异性特征，而共享网络则重点学习所有特征组的共享信息。特异性网络和共享网络通过局部共享单元进行连接和信息交互，从而组成整个局部共享的多任务人脸多属性分类网络。每个特异性网络和共享网络都具有相同的网络结构。都含有5个卷积层和2个全连接层。同时，每个卷积层和全连接层的后面接上归一化层和ReLU(Rectified Linear Unit)激活层。特异性网络之间每一层的输出通道个数也相同，而共享网络的输出通道数个数则与特异性网络有不同。The multi-task attribute recognition model constructed in the embodiment of the present application includes 4 specific networks and 1 shared network. The specificity network focuses on learning the specific features of each feature group, while the shared network focuses on learning the shared information of all feature groups. The specific network and the shared network are connected and exchanged through the local shared unit, thus forming the whole local shared multi-task face multi-attribute classification network. Each specific network and shared network have the same network structure. Both contain 5 convolutional layers and 2 fully connected layers. At the same time, each convolutional layer and fully connected layer is followed by a normalization layer and a ReLU (Rectified Linear Unit) activation layer. The number of output channels of each layer between the specific networks is also the same, while the number of output channels of the shared network is different from that of the specific network.

S330：将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练，以得到训练后的共享网络和多个特异性网络。S330: Input the multiple sample image data into the shared network and multiple specific networks for training, so as to obtain the trained shared network and multiple specific networks.

作为一种实施方式，可以将样本图像数据整张都输入至每个特异性网络进行训练，但是，每个特异性网络仅能够识别所对应的属性标签，例如，上部特异性网络进能够识别上部组的属性标签，而其他的属性标签其无法识别。As an embodiment, the entire sample image data can be input into each specific network for training, but each specific network can only identify the corresponding attribute label, for example, the upper specific network can recognize the upper The attribute tag of the group, while other attribute tags are not recognized.

作为另一种实施方式，为了减少计算量和训练速度，可以为每个特异性网络输入不同的样本图像数据的不同的部分，具体地，根据每个所述特异性网络对应的属性标签将所述样本图像数据划分为多个子样本图像数据，将所述子样本图像数据输入与该子图像数据对应的特异性网络。As another embodiment, in order to reduce the amount of computation and training speed, different parts of different sample image data may be input for each specific network. Specifically, according to the attribute label corresponding to each specific network, the The sample image data is divided into a plurality of sub-sample image data, and the sub-sample image data is input into a specific network corresponding to the sub-image data.

如图8所示，同一张样本图像被划分为四个子样本图像，左上的图像为如6所示的样本图像中上部区域m1对应的子样本图像数据，右上的图像为如6所示的样本图像中中部区域m2对应的子样本图像数据，左下的图像为如6所示的样本图像中下部区域m3对应的子样本图像数据，右下的图像为如6所示的样本图像中全脸区域m4对应的子样本图像数据，则将上部区域m1对应的子样本图像数据输入上部特异性网络，用于训练上部特异性网络，将中部区域m2对应的子样本图像数据输入中部特异性网络，用于训练中部特异性网络，将下部区域m3对应的子样本图像数据输入下部特异性网络，用于训练下部特异性网络，将全脸区域m4对应的子样本图像数据输入全脸特异性网络，用于训练全脸特异性网络。As shown in Figure 8, the same sample image is divided into four sub-sample images, the upper left image is the sub-sample image data corresponding to the upper region m1 in the sample image shown in 6, and the upper right image is the sample shown in 6. The subsample image data corresponding to the middle area m2 in the image, the lower left image is the subsample image data corresponding to the lower area m3 in the sample image shown in 6, and the lower right image is the full face area in the sample image shown in 6 For the sub-sample image data corresponding to m4, the sub-sample image data corresponding to the upper region m1 is input into the upper specific network for training the upper specific network, and the sub-sample image data corresponding to the middle region m2 is input into the middle specific network, using For training the middle specific network, input the sub-sample image data corresponding to the lower area m3 into the lower specific network for training the lower specific network, input the sub-sample image data corresponding to the full-face area m4 into the full-face specific network, use for training full face-specific networks.

另外，需要说明的是，在S311步骤中获取到的样本图像数据一部分用于训练上述网络模型，另一部分用于测试上述网络模型，具体地，对所述待训练对象样本中的两类按比例随机划分为训练集和测试集，划分比例为8:2，其中训练集用于人脸多属性识别模型的训练，测试集用于人脸多属性识别模型的测试，保证同一个人的数据仅出现在一个集合中。In addition, it should be noted that a part of the sample image data obtained in step S311 is used for training the above-mentioned network model, and the other part is used for testing the above-mentioned network model. It is randomly divided into training set and test set with a ratio of 8:2. The training set is used for the training of the face multi-attribute recognition model, and the test set is used for the test of the face multi-attribute recognition model, to ensure that the data of the same person only appears in a set.

将测试数据集送入训练好的特异性网络和共享网络进行测试，验证网络模型的准确性，并且将上述测试数据集中判断错误的样本再次送入网络模型中进行精调，提高模型的泛化性。Send the test data set to the trained specific network and shared network for testing to verify the accuracy of the network model, and send the wrongly judged samples in the above test data set to the network model again for fine-tuning to improve the generalization of the model sex.

于本申请实施例中，采用Adam梯度下降算法，Adam是一种高效计算方法，可以提高梯度下降收敛速度。训练过程中将训练集输入卷积神经网络模型并迭代预设次数epochs，本方法设置为epochs为90次。每一次迭代计算过程中使用Adam梯度下降算法优化目标函数，本方法设置batch_size为64，即每轮训练送入64张输入图像。其中，特异性网络和共享网络均是基于卷积神经网络模型而训练完成的。In the embodiment of the present application, the Adam gradient descent algorithm is used, and Adam is an efficient calculation method, which can improve the gradient descent convergence speed. During the training process, the training set is input into the convolutional neural network model and the preset number of epochs is iterated. In this method, the epochs are set to 90 times. In each iteration calculation process, the Adam gradient descent algorithm is used to optimize the objective function. In this method, the batch_size is set to 64, that is, 64 input images are sent in each round of training. Among them, the specific network and the shared network are both trained based on the convolutional neural network model.

针对多属性问题，本方法使用交叉熵作为损失函数进行训练，该函数作为衡量目标和输出之间的交叉熵的标准，其公式如下：For multi-attribute problems, this method uses cross-entropy as a loss function for training, which is used as a standard to measure the cross-entropy between the target and the output. The formula is as follows:

上式中，m代表属性总数，n_i代表第i个属性的样本总数，

代表第i个属性第j个样本的标签值，而

指第i个属性第j个样本的预测值。In the above formula, m represents the total number of attributes, n _i represents the total number of samples of the ith attribute,

represents the label value of the jth sample of the ith attribute, and

Refers to the predicted value of the jth sample of the ith attribute.

S340：获取待处理的图像数据。S340: Acquire image data to be processed.

需要说明的是，获取待处理的图像数据除了可以参考上述步骤之后，还可以增加人脸朝向的检测，具体地，可以是在确定当前图像内包含人脸图像之后，确定当前所采集的人脸图像内的人脸朝向是否满足预设朝向。具体地，以电子设备内的摄像头采集人脸图像为例，电子设备响应人脸识别请求调用摄像头采集人脸图像，识别人脸图像内的用户的人脸关键点的位置信息，从而能够确定出人脸的朝向是否为预设朝向。It should be noted that, in addition to referring to the above steps to obtain the image data to be processed, the detection of the face orientation can also be added. Whether the face orientation in the image satisfies the preset orientation. Specifically, taking the camera in the electronic device to collect the face image as an example, the electronic device calls the camera to collect the face image in response to the face recognition request, and recognizes the position information of the key points of the user's face in the face image, so as to determine the Whether the orientation of the face is the default orientation.

如图9所示，左边的图像的人脸朝向为以被采集人脸图像的用户的右侧，并且能够确定图像上的几个关键点，分别为，左眼a1、右眼a2、鼻子a3和嘴唇a4，并且以图像的竖向对称线，可以看出右眼a2、鼻子a3和嘴唇a4均位于对称线的左侧，从而能够确定用户的人脸朝向向右偏，，具体地，可以在屏幕上显示一个采集框，用户需要通过移动的方式将自己的人脸位于该采集框内，而如果用户没有正对屏幕的话，或者人脸的朝向不符合预设朝向，则可以发出提醒信息提示用户针对摄像头调整脸部朝向。则图9所示，右侧的人脸图中，左眼b1、右眼b2、鼻子b3和嘴唇b4位于对称线附近，并且左眼b1、右眼b2分居对称线两侧，可以确定该人脸朝向正对屏幕，则符合预设朝向，则可以正常采集人脸图像，并进行后期识别。As shown in Figure 9, the face of the left image is oriented to the right of the user whose face image is being collected, and several key points on the image can be determined, namely, left eye a1, right eye a2, nose a3 and lips a4, and from the vertical symmetry line of the image, it can be seen that the right eye a2, the nose a3 and the lips a4 are all located on the left side of the symmetry line, so it can be determined that the user's face is biased to the right, specifically, it can be A collection frame is displayed on the screen, and the user needs to move his face in the collection frame. If the user is not facing the screen, or the orientation of the face does not conform to the preset orientation, a reminder message can be sent Prompt the user to adjust the face orientation for the camera. As shown in Figure 9, in the face picture on the right, the left eye b1, right eye b2, nose b3 and lips b4 are located near the line of symmetry, and the left eye b1 and the right eye b2 are separated on both sides of the line of symmetry, it can be determined that the person If the face is facing the screen, if it conforms to the preset orientation, the face image can be collected normally and recognized later.

S350：将所述待处理的图像数据输入预先训练的多个特异性网络，以获取所述图像数据对应的属性标签。S350: Input the image data to be processed into a plurality of pre-trained specific networks to acquire attribute labels corresponding to the image data.

S360：将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络，以获取图像识别结果。S360: Input the attribute label determined by each specific network into the pre-trained shared network to obtain an image recognition result.

S370：输出所述图像识别结果。S370: Output the image recognition result.

需要说明的是，人脸图像中不仅包含了面部特征、种族、性别、年龄、表情等人脸属性信息，而且还可以表达人的身份信息。因此，人脸属性识别在有关年龄的访问控制、人脸属性的人脸检索、安防和人机交互等领域中有广泛的应用前景。It should be noted that the face image not only includes face attribute information such as facial features, race, gender, age, expression, etc., but also can express the identity information of the person. Therefore, face attribute recognition has broad application prospects in the fields of age-related access control, face retrieval of face attributes, security and human-computer interaction.

请参阅图10，示出了本申请实施例提供的图像处理方法，该方法包括：S1001至S1004。Referring to FIG. 10 , an image processing method provided by an embodiment of the present application is shown, and the method includes: S1001 to S1004.

S1001：获取多个样本图像数据，每个所述样本图像数据对应多个属性标签。S1001: Acquire multiple sample image data, each of which corresponds to multiple attribute tags.

S1002：设置共享网络和多个特异性网络，每个所述特异性网络能够识别至少一个属性标签，且每个所述特异性网络能够识别的属性标签互不相同。S1002: Set a shared network and a plurality of specific networks, each of the specific networks can identify at least one attribute label, and the attribute labels that can be identified by each of the specific networks are different from each other.

S1003：将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练，以得到训练后的共享网络和多个特异性网络。S1003: Input the multiple sample image data into the shared network and multiple specific networks for training, so as to obtain a trained shared network and multiple specific networks.

其中，S1001至S1003为共享网络和多个特异性网络的训练过程，其具体的实施方式可以参考前述S310至S330，在此不再赘述。Among them, S1001 to S1003 are the training process of the shared network and multiple specific networks, and the specific implementation thereof may refer to the aforementioned S310 to S330, which will not be repeated here.

S1004：获取待处理的图像数据，根据所述训练后的共享网络和多个特异性网络对所述待处理的图像数据处理，得到图像识别结果。S1004: Acquire image data to be processed, and process the image data to be processed according to the trained shared network and multiple specific networks to obtain an image recognition result.

根据所述训练后的共享网络和多个特异性网络对所述待处理的图像数据处理，得到图像识别结果可以参考前述实施例，在此不再赘述。To obtain an image recognition result by processing the image data to be processed according to the trained shared network and multiple specific networks, reference may be made to the foregoing embodiments, and details are not repeated here.

请参阅图11，其示出了本申请实施例提供的一种图像处理装置1100的结构框图，该装置可以包括：数据获取单元1110、属性确定单元1120、结果获取单元1130和输出单元1140。Please refer to FIG. 11 , which shows a structural block diagram of an image processing apparatus 1100 provided by an embodiment of the present application. The apparatus may include: a data acquisition unit 1110 , an attribute determination unit 1120 , a result acquisition unit 1130 , and an output unit 1140 .

数据获取单元1110，用于获取待处理的图像数据。The data acquisition unit 1110 is used to acquire image data to be processed.

属性确定单元1120，用于将所述待处理的图像数据输入预先训练的多个特异性网络，以获取所述图像数据对应的属性标签，其中，每个所述特异性网络用于确定所述图像数据对应的属性标签，且各个所述特异性网络确定的属性标签互不相同。The attribute determination unit 1120 is configured to input the image data to be processed into a plurality of pre-trained specific networks to obtain attribute labels corresponding to the image data, wherein each specific network is used to determine the The attribute labels corresponding to the image data, and the attribute labels determined by each of the specific networks are different from each other.

结果获取单元1130，用于将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络，以获取图像识别结果，其中，所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果。The result obtaining unit 1130 is configured to input the attribute labels determined by each of the specific networks into a pre-trained shared network to obtain image recognition results, wherein the shared network is used to The correlation determines the image recognition results.

输出单元1140，用于输出所述图像识别结果。The output unit 1140 is configured to output the image recognition result.

请参阅图12，其示出了本申请实施例提供的一种图像处理装置1200的结构框图，该装置可以包括：训练单元1210、数据获取单元1220、属性确定单元1230、结果获取单元1240和输出单元1250。Please refer to FIG. 12, which shows a structural block diagram of an image processing apparatus 1200 provided by an embodiment of the present application. The apparatus may include: a training unit 1210, a data acquisition unit 1220, an attribute determination unit 1230, a result acquisition unit 1240, and an output unit 1250.

训练单元1210用于对所述共享网络和多个特异性网络进行训练。The training unit 1210 is used for training the shared network and the plurality of specific networks.

具体地，训练单元1210包括样本获取子单元1211、设置子单元1212和训练子单元1213。Specifically, the training unit 1210 includes a sample acquisition subunit 1211 , a setting subunit 1212 and a training subunit 1213 .

获取子单元1211，用于获取多个样本图像数据，每个所述样本图像数据对应多个属性标签。The obtaining subunit 1211 is configured to obtain a plurality of sample image data, each of which corresponds to a plurality of attribute tags.

设置子单元1212，用于设置共享网络和多个特异性网络，每个所述特异性网络能够识别至少一个属性标签，且每个所述特异性网络能够识别的属性标签互不相同。The setting subunit 1212 is configured to set a shared network and a plurality of specific networks, each of the specific networks can identify at least one attribute label, and the attribute labels that can be identified by each of the specific networks are different from each other.

训练子单元1213，用于将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练，以得到训练后的共享网络和多个特异性网络。The training subunit 1213 is configured to input the plurality of sample image data into the shared network and the plurality of specific networks for training, so as to obtain the trained shared network and the plurality of specific networks.

进一步地，所述样本图像数据和待处理的图像数据均包含人脸图像，获取子单元1211还用于获取多个样本图像数据；识别每个所述样本图像数据内的人脸关键点在所述样本图像内的位置信息；根据每个所述样本图像数据的人脸关键点的位置信息，调整每个所述样本图像数据内的人脸朝向符合预设朝向；将调整后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据。Further, the sample image data and the image data to be processed both contain face images, and the acquisition subunit 1211 is also used to acquire multiple sample image data; identify the key points of the face in each of the sample image data in the the position information in the sample image; according to the position information of the face key points of each of the sample image data, adjust the face orientation in each of the sample image data to conform to the preset orientation; the adjusted sample image data As the sample image data for training the shared network and multiple initial specificity networks this time.

进一步地，获取子单元1211还用于获取多个样本图像数据；对所述多个样本图像数据进行数据增强处理，以使每个所述样本图像数据的光照强度和对比度在预设区间内随机分布；对增强处理后的每个所述样本图像数据按照预设随机剪裁比例剪裁，并且每个将每个剪裁后的样本图像数据的尺寸均为预设尺寸；将剪裁后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据。Further, the acquisition subunit 1211 is also used to acquire multiple sample image data; perform data enhancement processing on the multiple sample image data, so that the illumination intensity and contrast of each of the sample image data are randomly within a preset interval. distribution; each of the sample image data after the enhancement processing is cropped according to a preset random cropping ratio, and the size of each cropped sample image data is a preset size; the cropped sample image data is used as This time the sample image data used to train the shared network and multiple initial specificity networks.

进一步地，所述样本图像数据和待处理的图像数据均包含人脸图像，每个所述样本图像数据对应的多个属性标签对应于人脸的不同位置；设置子单元1212还用于划分多个测量区域，每个所述测量区域对应人脸的不同区域；根据所述多个测量区域设置多个特异性网络，每个特异性网络对应一个测量区域，每个所述特异性网络用于确认所对应的测量区域内的属性标签。Further, the sample image data and the image data to be processed both contain face images, and the plurality of attribute labels corresponding to each of the sample image data correspond to different positions of the face; the setting subunit 1212 is also used to divide multiple measurement areas, each of which corresponds to a different area of the face; multiple specific networks are set up according to the multiple measurement areas, each specific network corresponds to a measurement area, and each specific network is used for Check the attribute label in the corresponding measurement area.

数据获取单元1220，用于获取待处理的图像数据。The data acquisition unit 1220 is used for acquiring image data to be processed.

进一步地，数据获取单元1220还用于获取原始图像数据；对所述原始图像数据归一化处理，以得到待处理的图像数据。Further, the data acquisition unit 1220 is further configured to acquire original image data; and normalize the original image data to obtain image data to be processed.

属性确定单元1230，用于将所述待处理的图像数据输入预先训练的多个特异性网络，以获取所述图像数据对应的属性标签，其中，每个所述特异性网络用于确定所述图像数据对应的属性标签，且各个所述特异性网络确定的属性标签互不相同。The attribute determination unit 1230 is configured to input the image data to be processed into a plurality of pre-trained specific networks to obtain attribute labels corresponding to the image data, wherein each specific network is used to determine the The attribute labels corresponding to the image data, and the attribute labels determined by each of the specific networks are different from each other.

进一步地，属性确定单元1230还用于确定每个所述特异性网络对应的属性标签，其中，所述特异性网络能够识别的属性标签为该特异性网络对应的属性标签；根据每个所述特异性网络对应的属性标签将所述图像数据划分为多个子图像数据；将所述子图像数据输入与该子图像数据对应的特异性网络。Further, the attribute determining unit 1230 is further configured to determine the attribute label corresponding to each specific network, wherein the attribute label that can be identified by the specific network is the attribute label corresponding to the specific network; The attribute label corresponding to the specific network divides the image data into a plurality of sub-image data; the sub-image data is input into the specific network corresponding to the sub-image data.

结果获取单元1240，用于将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络，以获取图像识别结果，其中，所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果。The result obtaining unit 1240 is configured to input the attribute labels determined by each specific network into a pre-trained shared network to obtain image recognition results, wherein the shared network is used to The correlation determines the image recognition results.

输出单元1250，用于输出所述图像识别结果。The output unit 1250 is configured to output the image recognition result.

请参阅图13，其示出了本申请实施例提供的一种图像处理装置1300的结构框图，该装置可以包括：样本获取单元1310、设置单元1320、网络训练单元1330和识别单元1340。Please refer to FIG. 13 , which shows a structural block diagram of an image processing apparatus 1300 provided by an embodiment of the present application. The apparatus may include: a sample acquisition unit 1310 , a setting unit 1320 , a network training unit 1330 , and an identification unit 1340 .

样本获取单元1310，用于获取多个样本图像数据，每个所述样本图像数据对应多个属性标签。The sample obtaining unit 1310 is configured to obtain a plurality of sample image data, each of which corresponds to a plurality of attribute tags.

设置单元1320，用于设置共享网络和多个特异性网络，每个所述特异性网络能够识别至少一个属性标签，且每个所述特异性网络能够识别的属性标签互不相同。The setting unit 1320 is configured to set a shared network and a plurality of specific networks, each of the specific networks can identify at least one attribute label, and the attribute labels that can be identified by each of the specific networks are different from each other.

网络训练单元1330，用于将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练，以得到训练后的共享网络和多个特异性网络。The network training unit 1330 is configured to input the plurality of sample image data into the shared network and the plurality of specific networks for training, so as to obtain the trained shared network and the plurality of specific networks.

其中，样本获取单元1310、设置单元1320、网络训练单元1330对应于上述的训练单元1210。作为一种实施方式，样本获取单元1310对应于获取子单元，样本获取单元1310的具体实施方式可以参考获取子单元，设置单元1320对应于设置子单元，设置单元1320的具体实施方式可以参考设置子单元，网络训练单元1330对应于训练子单元，网络训练单元1330的具体实施方式可以参考训练子单元。The sample acquisition unit 1310 , the setting unit 1320 , and the network training unit 1330 correspond to the above-mentioned training unit 1210 . As an embodiment, the sample acquisition unit 1310 corresponds to the acquisition subunit, the specific implementation of the sample acquisition unit 1310 may refer to the acquisition subunit, the setting unit 1320 corresponds to the setting subunit, and the specific implementation of the setting unit 1320 may refer to the setting subunit unit, the network training unit 1330 corresponds to the training subunit, and the specific implementation of the network training unit 1330 can refer to the training subunit.

识别单元1334，用于获取待处理的图像数据，根据所述训练后的共享网络和多个特异性网络对所述待处理的图像数据处理，得到图像识别结果。The identification unit 1334 is configured to acquire image data to be processed, and process the image data to be processed according to the trained shared network and multiple specific networks to obtain an image recognition result.

具体地，识别单元1334用于获取待处理的图像数据；将所述待处理的图像数据输入预先训练的多个特异性网络，以获取所述图像数据对应的属性标签，其中，每个所述特异性网络用于确定所述图像数据对应的属性标签，且各个所述特异性网络确定的属性标签互不相同；将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络，以获取图像识别结果，其中，所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果；输出所述图像识别结果。Specifically, the identification unit 1334 is configured to acquire image data to be processed; input the image data to be processed into multiple pre-trained specific networks to acquire attribute labels corresponding to the image data, wherein each of the The specific network is used to determine the attribute label corresponding to the image data, and the attribute labels determined by each specific network are different from each other; the attribute label determined by each specific network is input into the pre-trained shared network , to obtain the image recognition result, wherein the shared network is used to determine the image recognition result according to each attribute label and the correlation of each attribute label; and output the image recognition result.

作为一种实施方式，识别单元1334对应于数据获取单元、属性确定单元、结果获取单元和输出单元，具体实施方式可参考数据获取单元、属性确定单元、结果获取单元和输出单元。As an embodiment, the identification unit 1334 corresponds to a data acquisition unit, an attribute determination unit, a result acquisition unit, and an output unit, and the specific implementation can refer to the data acquisition unit, the attribute determination unit, the result acquisition unit, and the output unit.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述装置和模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the above-described devices and modules, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

在本申请所提供的几个实施例中，模块相互之间的耦合可以是电性，机械或其它形式的耦合。In several embodiments provided in this application, the coupling between the modules may be electrical, mechanical or other forms of coupling.

另外，在本申请各个实施例中的各功能模块可以集成在一个处理模块中，也可以是各个模块单独物理存在，也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.

请参考图14，其示出了本申请实施例提供的一种电子设备的结构框图。该电子设备100可以是智能手机、平板电脑、电子书等能够运行应用程序的电子设备。本申请中的电子设备100可以包括一个或多个如下部件：处理器110、存储器120、以及一个或多个应用程序，其中一个或多个应用程序可以被存储在存储器120中并被配置为由一个或多个处理器110执行，一个或多个程序配置用于执行如前述方法实施例所描述的方法。Please refer to FIG. 14 , which shows a structural block diagram of an electronic device provided by an embodiment of the present application. The electronic device 100 may be an electronic device capable of running an application program, such as a smart phone, a tablet computer, an electronic book, or the like. The electronic device 100 in the present application may include one or more of the following components: a processor 110, a memory 120, and one or more application programs, wherein the one or more application programs may be stored in the memory 120 and configured to be executed by One or more processors 110 execute, and one or more programs are configured to execute the methods described in the foregoing method embodiments.

处理器110可以包括一个或者多个处理核。处理器110利用各种接口和线路连接整个电子设备100内的各个部分，通过运行或执行存储在存储器120内的指令、程序、代码集或指令集，以及调用存储在存储器120内的数据，执行电子设备100的各种功能和处理数据。可选地，处理器110可以采用数字信号处理(Digital Signal Processing，DSP)、现场可编程门阵列(Field－Programmable Gate Array，FPGA)、可编程逻辑阵列(Programmable LogicArray，PLA)中的至少一种硬件形式来实现。处理器110可集成中央处理器(CentralProcessing Unit，CPU)、图像处理器(Graphics Processing Unit，GPU)和调制解调器等中的一种或几种的组合。其中，CPU主要处理操作系统、用户界面和应用程序等；GPU用于负责显示内容的渲染和绘制；调制解调器用于处理无线通信。可以理解的是，上述调制解调器也可以不集成到处理器110中，单独通过一块通信芯片进行实现。The processor 110 may include one or more processing cores. The processor 110 uses various interfaces and lines to connect various parts of the entire electronic device 100, and executes by running or executing the instructions, programs, code sets or instruction sets stored in the memory 120, and calling the data stored in the memory 120. Various functions of the electronic device 100 and processing data. Optionally, the processor 110 may employ at least one of a digital signal processing (Digital Signal Processing, DSP), a Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and a Programmable Logic Array (Programmable Logic Array, PLA). implemented in hardware. The processor 110 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface and application programs, etc.; the GPU is used for rendering and drawing of the display content; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may also not be integrated into the processor 110, and is implemented by a communication chip alone.

存储器120可以包括随机存储器(Random Access Memory，RAM)，也可以包括只读存储器(Read-Only Memory)。存储器120可用于存储指令、程序、代码、代码集或指令集。存储器120可包括存储程序区和存储数据区，其中，存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储电子设备100在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。The memory 120 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory). Memory 120 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like. The storage data area may also store data (such as phone book, audio and video data, chat record data) created by the electronic device 100 during use.

请参考图15，其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读介质1500中存储有程序代码，所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。Please refer to FIG. 15 , which shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application. The computer-readable medium 1500 stores program codes, and the program codes can be invoked by the processor to execute the methods described in the above method embodiments.

计算机可读存储介质1500可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地，计算机可读存储介质1500包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质1500具有执行上述方法中的任何方法步骤的程序代码1510的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码1510可以例如以适当形式进行压缩。The computer-readable storage medium 1500 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. Optionally, the computer-readable storage medium 1500 includes a non-transitory computer-readable storage medium. Computer readable storage medium 1500 has storage space for program code 1510 to perform any of the method steps in the above-described methods. These program codes can be read from or written to one or more computer program products. Program code 1510 may be compressed, for example, in a suitable form.

综上所述，本申请提供的图像处理方法、装置、电子设备及存储介质，预先训练好共享网络和多个特异性网络，每个所述特异性网络用于确定所述图像数据对应的属性标签，且各个所述特异性网络确定的属性标签互不相同，则当获取到待处理的图像数据的时候，将待处理的图像数据输入多个特异性网络，每个特异性网络能够识别到该特异性网络所能够识别到的属性，从而待处理图像数据所对应的多个属性标签能够被多个特异性网络分别识别到，提高了整体的图像数据的多个属性标签的识别，从而能够得到图像数据对应的属性标签，然后，再将该图像数据对应的属性标签输入至共享网络，共享网络根据各属性标签和各属性标签的相关性确定图像识别结果，并将图像识别结果输出。因此，多个特异性网络共同分析图像数据并得到多个属性标签，能够提高属性标签的获得速度，而共享网络能够结合各属性标签的相关性得到图像识别结果，提高了识别结果的准确度和整体性能。In summary, the image processing method, device, electronic device and storage medium provided by this application are pre-trained with a shared network and multiple specific networks, each of which is used to determine the attributes corresponding to the image data. label, and the attribute labels determined by each of the specific networks are different from each other, then when the image data to be processed is acquired, the image data to be processed is input into multiple specific networks, and each specific network can identify the The attributes that can be identified by the specific network, so that multiple attribute labels corresponding to the image data to be processed can be identified by multiple specific networks respectively, which improves the identification of multiple attribute labels of the overall image data, so that it can be The attribute label corresponding to the image data is obtained, and then the attribute label corresponding to the image data is input to the sharing network, and the sharing network determines the image recognition result according to each attribute label and the correlation of each attribute label, and outputs the image recognition result. Therefore, multiple specific networks jointly analyze the image data and obtain multiple attribute labels, which can improve the speed of obtaining attribute labels, while the shared network can combine the correlation of each attribute label to obtain image recognition results, which improves the accuracy and accuracy of the recognition results. overall performance.

本申请提出了一种基于深度学习的人脸多属性识别算法及系统，该方法将40个人脸属性根据属性对应的图片位置分为4个人脸属性组，考虑到了人脸属性之间的显示的位置相关性，将每个属性组的属性分类问题视为一个子任务，构建了包含4个特异性网络和1个共享网络的模型。This application proposes a face multi-attribute recognition algorithm and system based on deep learning. The method divides 40 face attributes into 4 face attribute groups according to the picture positions corresponding to the attributes, and takes into account the display between the face attributes. Location correlation, which treats the problem of attribute classification for each attribute group as a subtask, builds a model with 4 specific networks and 1 shared network.

特异性网络旨在学习任务之间的特异性，因此每个属性组配置一个单独的特异性网络，而共享网络旨在学习任务之间的互补信息，促进任务间的交互。特异性网络和共享网络间丰富的连接促进了相互之间的信息交流，有利于挖掘任务间的相关性，提高了整体的性能。The specificity network aims to learn the specificity between tasks, so each attribute group is configured with a separate specificity network, while the shared network aims to learn the complementary information between tasks and facilitate the interaction between tasks. The rich connections between the specific network and the shared network promote the exchange of information between each other, which is conducive to mining the correlation between tasks and improves the overall performance.

最后应说明的是：以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or some technical features thereof are equivalently replaced; and these modifications or replacements do not drive the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. An image processing method, comprising:

acquiring image data to be processed;

inputting the image data to be processed into a plurality of pre-trained specific networks to obtain attribute labels corresponding to the image data, wherein each specific network is used for determining the attribute labels corresponding to the image data, and the attribute labels determined by each specific network are different from each other;

inputting the attribute labels determined by each specific network into a pre-trained shared network to obtain an image recognition result, wherein the shared network is used for determining the image recognition result according to the attribute labels and the correlation of the attribute labels;

and outputting the image recognition result.

2. The method according to claim 1, wherein the inputting the image data to be processed into a plurality of pre-trained specific networks to obtain attribute labels corresponding to the image data comprises:

determining an attribute label corresponding to each specific network, wherein the attribute label which can be identified by the specific network is the attribute label corresponding to the specific network;

dividing the image data into a plurality of sub-image data according to the attribute label corresponding to each specific network;

and inputting the sub-image data into a specific network corresponding to the sub-image data.

3. The method of claim 1, wherein the acquiring image data to be processed comprises:

acquiring original image data;

and normalizing the original image data to obtain image data to be processed.

4. The method of claim 1, wherein before inputting the image data to be processed into a plurality of pre-trained specific networks to obtain the attribute labels corresponding to the image data, the method further comprises:

obtaining a plurality of sample image data, wherein each sample image data corresponds to a plurality of attribute labels;

setting a sharing network and a plurality of specific networks, wherein each specific network can identify at least one attribute label, and the attribute labels which can be identified by each specific network are different from each other;

and inputting the plurality of sample image data into the shared network and the plurality of specific networks for training so as to obtain the trained shared network and the plurality of specific networks.

5. The method according to claim 4, wherein the sample image data and the image data to be processed each contain a face image; the acquiring a plurality of sample image data includes:

acquiring a plurality of sample image data;

identifying position information of a face key point in each sample image data in the sample image;

adjusting the face orientation in each sample image data to accord with a preset orientation according to the position information of the face key point of each sample image data;

and taking the adjusted sample image data as the sample image data used for training the shared network and the plurality of initial specific networks.

6. The method of claim 4, wherein said acquiring a plurality of sample image data comprises:

acquiring a plurality of sample image data;

performing data enhancement processing on the plurality of sample image data to ensure that the illumination intensity and the contrast of each sample image data are randomly distributed in a preset interval;

clipping each sample image data after enhancement processing according to a preset random clipping proportion, wherein the size of each clipped sample image data is a preset size;

and taking the clipped sample image data as the sample image data used for training the shared network and the plurality of initial specific networks.

7. The method according to claim 4, wherein the sample image data and the image data to be processed each contain a face image, and the plurality of attribute labels corresponding to each sample image data correspond to different positions of a face; the setting shared network and a plurality of specific networks comprise:

dividing a plurality of measurement areas, wherein each measurement area corresponds to a different area of the human face;

and setting a plurality of specific networks according to the plurality of measurement areas, wherein each specific network corresponds to one measurement area and is used for confirming the attribute labels in the corresponding measurement area.

8. An image processing method, comprising:

inputting the sample image data into the shared network and the specific networks for training to obtain a trained shared network and specific networks;

and acquiring image data to be processed, and processing the image data to be processed according to the trained shared network and the plurality of specific networks to obtain an image recognition result.

9. An image processing apparatus, characterized in that the apparatus comprises:

the data acquisition unit is used for acquiring image data to be processed;

the attribute determining unit is used for inputting the image data to be processed into a plurality of pre-trained specific networks to obtain attribute labels corresponding to the image data, wherein each specific network is used for determining the attribute labels corresponding to the image data, and the attribute labels determined by the specific networks are different from each other;

the result acquisition unit is used for inputting the attribute labels determined by each specific network into a pre-trained shared network to acquire an image recognition result, wherein the shared network is used for determining the image recognition result according to the attribute labels and the correlation of the attribute labels;

and the output unit is used for outputting the image recognition result.

10. An image processing apparatus, characterized in that the apparatus comprises:

the system comprises a sample acquisition unit, a data processing unit and a data processing unit, wherein the sample acquisition unit is used for acquiring a plurality of sample image data, and each sample image data corresponds to a plurality of attribute labels;

a setting unit, configured to set a shared network and a plurality of specific networks, where each specific network is capable of identifying at least one attribute tag, and the attribute tags that each specific network is capable of identifying are different from each other;

the network training unit is used for inputting the sample image data into the shared network and the specific networks for training so as to obtain the trained shared network and the specific networks;

and the identification unit is used for acquiring image data to be processed, and processing the image data to be processed according to the trained shared network and the plurality of specific networks to obtain an image identification result.

11. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-7.

12. A computer-readable storage medium storing program code executable by a processor, wherein a plurality of instructions in the program code, when executed by the processor, cause the processor to perform the method of any one of claims 1-7.