WO2020034542A1

WO2020034542A1 - Neural network model training method and apparatus, face recognition method and apparatus, device, and medium

Info

Publication number: WO2020034542A1
Application number: PCT/CN2018/123884
Authority: WO
Inventors: 周衍鑫
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-08-17
Filing date: 2018-12-26
Publication date: 2020-02-20
Also published as: CN110197109A; CN110197109B

Abstract

A neural network model training method and apparatus, a face recognition method and apparatus, a device, and a medium, capable of effectively recognizing a face to be recognized. The neural network model training method comprises: obtaining point cloud data corresponding a face and depth image data corresponding to the face (S10); obtaining first projection data of the point cloud data in a first preset direction, and obtaining second projection data of the point cloud data in a second preset direction, the first preset direction and the second preset direction being different projection directions (S20); using the depth image data, the first projection data, and the second projection data as training data of a VGG neural network model (S30); and training the VGG neural network model by means of a training set constituted by the training data corresponding to N faces, wherein N is greater than or equal to 2 (S40), to obtain a converged VGG neural network model.

Description

Neural network model training, face recognition method, device, equipment and medium

This application is based on a Chinese invention patent application filed on August 17, 2018 with the application number 201810939556.5, entitled "Neural Network Model Training, Face Recognition Method, Device, Equipment, and Medium", and claims priority.

Technical field

The present application relates to the field of computers, and in particular, to a method, a device, a device, and a medium for neural network model training and face recognition.

Background technique

Convolutional neural network (CNN) is an efficient identification method that has been developed in recent years and has attracted widespread attention. Now, CNN has become one of the research hotspots in many scientific fields, especially in the fields of face recognition, image classification and recognition. VGG (visual geometric group) neural network is one of the convolutional neural networks. It was proposed by the Visual Geometry Group of Oxford University. VGG neural network has a good generalization ability for other data sets.

However, the VGG neural network model can be used for two-dimensional face recognition due to its inherent convolutional neural network architecture. The traditionally trained VGG neural network model is usually a two-dimensional face to be recognized. The R \ G \ B data of the image is used as input data, but it is not suitable for face recognition in which the face to be recognized is three-dimensional. A three-dimensional face is a kind of three-dimensional data information. When converted into depth image data, it is Single-channel image data, therefore, the traditional VGG convolutional neural network model is not very suitable for 3D face recognition and cannot effectively extract 3D face recognition.

Summary of the Invention

Based on this, it is necessary to provide a neural network model training, a face recognition method, a device, a device, and a medium that can effectively recognize a three-dimensional face in response to the technical problem.

A neural network model training method includes:

Obtaining point cloud data corresponding to the face, and depth image data corresponding to the face;

Acquire first projection data of the point cloud data in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction. The first preset direction and the second preset direction are different projections. direction;

Using the depth image data, the first projection data, and the second projection data as training data of the VGG neural network model;

The VGG neural network model is trained to obtain a convergence of the VGG neural network model through a training set composed of training data corresponding to N human faces, where N is greater than or equal to 2.

A face recognition method includes:

Obtain point cloud data and depth image data of the face to be identified;

Using the depth image data, the first projection data, and the second projection data as input data of a VGG neural network recognition model;

The input data is input into a convergent VGG neural network recognition model obtained by a neural network model training method to recognize a face to be recognized.

A neural network model training device includes:

A first acquisition module, configured to acquire point cloud data corresponding to a human face and depth image data corresponding to a human face;

A second acquisition module, configured to acquire first projection data of the point cloud data acquired by the first acquisition module in a first preset direction, and acquire a second projection module's second point cloud data in a second preset direction Projection data, the first preset direction and the second preset direction are different projection directions;

A determining module, configured to use the depth image data obtained by the first acquisition module, the first projection data obtained by the second acquisition module, and the data of the second projection data as the training data of the VGG neural network model;

A training module is configured to train a VGG neural network model through a training set composed of training data determined by a determination module corresponding to N human faces, where N is greater than or equal to 2.

A face recognition device includes:

A first acquisition module, configured to acquire point cloud data and depth image data of a face to be identified;

A second acquisition module, configured to acquire first projection data of the point cloud data acquired by the first acquisition module in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction, the first The preset direction and the second preset direction are different projection directions;

A determining module, configured to use the data of the three channels of the depth image data obtained by the first acquisition module, the first projection data obtained by the second acquisition module, and the second projection data as input data of the VGG neural network recognition model;

The recognition module is configured to input the input data determined by the determination module into a VGG neural network recognition model to recognize a face to be recognized.

A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:

Obtaining point cloud data corresponding to a human face, and depth image data corresponding to the human face;

Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;

Using the depth image data, the first projection data, and the second projection data as training data of a VGG neural network model;

The VGG neural network model is trained to obtain a convergent VGG neural network model by using a training set composed of the training data corresponding to N faces, where N is greater than or equal to 2.

Obtain point cloud data and depth image data of the face to be identified;

Using the depth image data, the first projection data, and the second projection data as input data;

The input data is input into the convergent VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.

One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:

Obtain point cloud data and depth image data of the face to be identified;

Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below, and other features and advantages of the present application will become apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solution of the embodiments of the present application more clearly, the drawings used in the description of the embodiments of the application will be briefly introduced below. Obviously, the drawings in the following description are just some embodiments of the application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.

1 is a schematic diagram of an application framework of a neural network model training method in the present application;

2 is a schematic flowchart of an embodiment of a neural network model training method in the present application;

3 is a schematic flowchart of an embodiment of a face recognition method in the present application;

4 is a schematic structural diagram of an embodiment of a neural network model training device in the present application;

5 is a schematic structural diagram of an embodiment of a face recognition device in the present application;

FIG. 6 is a schematic structural diagram of an embodiment of a computer device in the present application.

detailed description

In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

The neural network model training method provided in this application can be applied in the application environment as shown in FIG. 1, where a computer device acquires point cloud data corresponding to a human face and depth image data corresponding to a human face; First projection data in a preset direction, and obtain second projection data of point cloud data in a second preset direction, where the first preset direction and the second preset direction are different projection directions; the depth image data, The data of the three channels of the first projection data and the second projection data are used as the training data of the VGG neural network model. The training set composed of the training data corresponding to N different faces is used to train the VGG neural network model to obtain training. Convergent VGG neural network model, where N is greater than or equal to two. It can be seen that, in this application, the depth image data of different faces and the projection data of the point cloud data of the face in two different projection directions, the training data composed of data of a total of three channels is applied to the VGG nerve. The network model is trained and the trained VGG neural network model is obtained. It is suitable for the recognition of three-dimensional faces, and because the projection direction corresponding to the point cloud data of the face retains the three-dimensional characteristics of the three-dimensional face, it can effectively be three-dimensional For identification. Among them, the computer device is a device having a computing processing capability, and may be, but is not limited to, various personal computers, notebook computers, servers, and the like.

In an embodiment, as shown in FIG. 2, FIG. 2 is a schematic flowchart of an embodiment of a neural network model training method of the present application, including the following steps:

S10. Acquire point cloud data corresponding to a face and depth image data corresponding to the face.

In this solution, the point cloud data corresponding to the face and the depth image data corresponding to the face can be obtained, where the point cloud data refers to the information of the discrete points of the surface of the face recorded in a point manner, including the surface discrete of the face The spatial position information and color information (for example, RGB) of the points, specifically, the above-mentioned spatial position information is the spatial coordinates of discrete points on the surface of the human face. For example, the point cloud data can be specifically expressed as: U = {P _i = (x _i , y _i , z _i , r _i , g _i , b _i ) | 1≤i≤M}, where M is a positive integer , M is the number of points in the point cloud data U, the initial value of i is 1, the i-th point in U is denoted by P _i , x _i , y _i and z _i are the spatial coordinates of point P _i , r _i , g _i and b _i are the color information in the point P _i , that is, the red, green, and blue ternary color information.

In addition, in this application, the corresponding point cloud data of a human face can be obtained directly through a depth camera. The depth camera refers to an image sensor that can observe the position of a human face in space. Specifically, the depth camera may be an active, passive, contact or non-contact depth camera, wherein the active camera emits an energy beam (such as a laser, an electromagnetic wave, or an ultrasonic wave) toward a human face to obtain point cloud data of the human face. The passive depth camera mainly uses the conditions of the surrounding environment of the object to obtain the point cloud data of the human face. The contact depth camera refers to the need to contact or be closer to the human face, and the non-contact type means no contact with the human face. For example, the depth camera may specifically refer to a TOF (time-of-flight) depth camera. In addition, it may also be a kinect depth camera, an XTion depth camera, or a RealSense depth camera, which is not specifically limited.

In addition, it should be understood that depth image data is also called range image data, which refers to image data that uses the distance (depth) of the image collector to each point in the real scene as the pixel value. Reflects the geometry of the visible surface of the face. Depth image data can be converted into corresponding point cloud data through coordinate transformation. Conversely, point cloud data can also be calculated as depth image data. Therefore, in this application, after obtaining the point cloud data of a human face, The obtained point cloud data of the face is converted into depth image data corresponding to the face. Of course, in some application scenarios, the depth image data and point cloud data of a person's face can be directly obtained through the depth camera, or the depth image data or point cloud data of the face can be obtained first, and then converted into point cloud data or depth image data. , Not specifically limited.

S20. Acquire first projection data of the point cloud data in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction. The first preset direction and the second preset direction are different. Projection direction

In this application, after obtaining the point cloud data of the face, the first projection data of the point cloud data of the face in the first preset direction and the point cloud data of the face in the second preset direction may be further obtained. And the first preset direction and the second preset direction are different projection directions. That is, the projection data of the point cloud data of the face on different planes can be obtained according to different projection directions.

S30. Use the depth image data, the first projection data, and the second projection data as training data of the VGG neural network model.

In other words, after steps S10-S30, three types of data of the corresponding depth image data, first projection data, and second projection data of the face can be obtained. In this step, the depth image data, the first projection data, and The second projection data is used as training data of the three channels of the VGG neural network model to form training data of the VGG neural network model, that is, a training sample corresponding to the face is obtained.

S40. The VGG neural network model is trained to obtain a convergence of the VGG neural network model through a training set composed of training data corresponding to N human faces, where N is greater than or equal to 2.

That is, suppose there are {1,2, ..., N} personal faces, and N is greater than or equal to 2, each face corresponds to three types of data: image data, first projection data, and second projection data. The above-mentioned three kinds of data corresponding to a human face constitute a training sample, and the training samples corresponding to N human faces constitute a training sample set, and the VGG neural network model is trained through the training set until the VGG neural network model converges.

It can be seen that in this application, the training data composed of three channels of data is used to train the depth image data of different faces and the projection data of the point cloud data in different projection directions. The VGG neural network model obtained by training is suitable for the recognition of three-dimensional faces, and since the projection direction corresponding to the point cloud data retains the three-dimensional characteristics of the three-dimensional faces, it can effectively extract the three-dimensional faces to be recognized for recognition, and can effectively The three-dimensional face is subjected to feature extraction.

In an embodiment, obtaining the first projection data of the point cloud data in the first preset direction in step S20 includes:

S21. Use the azimuth direction of the point cloud data in the target coordinate system as the first preset direction;

S22. The point cloud data is projected in a first preset direction to generate first projection data.

That is, in this application, the point cloud data may be projected in the azimuth direction of the target coordinate system to obtain the first projection data, thereby obtaining the projection data of the point cloud data of the human face on one of the two-dimensional planes. Among them, the target coordinate system is a world coordinate system, which is a three-dimensional coordinate system. One coordinate point P of the point cloud data is known as the first point P of the target coordinate system. Look at the origin (point O). This point P is rotated counterclockwise from the positive direction of the x-axis to the vertical projection line of the point P, and the angle formed between the x-axis and the vertical projection line of the point P is the directional angle. The projecting the point cloud data in the azimuth direction to generate the first projection data specifically includes: obtaining the coordinate value of each point of the point cloud data in the target coordinate system, and each point in the point cloud data. The coordinate value of is projected on the tilt angle corresponding to the target coordinate system, thereby generating a projection of the coordinate value of each point in the point cloud data on the tilt angle corresponding to the target coordinate system. The point cloud data is on the target The projection of the coordinate value of each point of the coordinate system in the oblique angle direction constitutes the first projection data.

In an embodiment, in step S20, obtaining second projection data of the point cloud data in a second preset direction includes:

S23. Use the tilt angle direction of the point cloud data in the target coordinate system as the second preset direction.

S24. Project the point cloud data in a second preset direction to generate second projection data.

That is, in this application, the point cloud data can be projected in the direction of the tilt angle of the target coordinate system to obtain the first projection data, thereby obtaining the projection data of the point cloud data of the human face on one of the two-dimensional planes. The target coordinate system is a world coordinate system, which is a three-dimensional coordinate system. A coordinate point P of the point cloud data is known as a point P of the first limit of the target coordinate system. Stand at the origin (point O) and look at this point. P, from the positive direction of the x-axis, rotates counterclockwise to the vertical projection line of point P. The angle between the x-axis and the vertical projection line of point P is the directional angle. Obtain the tilt angle, that is, the angle formed by the vertical projection line of point P and the straight line between the origin and point P is the tilt angle. The projecting the point cloud data in the direction of the inclination angle to generate the first projection data specifically includes: obtaining the coordinate value of each point of the point cloud data in the target coordinate system, and each point in the point cloud data. The coordinate value of is projected on the tilt angle corresponding to the target coordinate system, thereby generating a projection of the coordinate value of each point in the point cloud data on the tilt angle corresponding to the target coordinate system. The point cloud data is on the target The projection of the coordinate value of each point of the coordinate system in the oblique angle direction constitutes the first projection data.

It can be obtained that this application proposes a specific projection direction to obtain the first projection data and the second projection data corresponding to the point cloud data of the face, which improves the implementability of the solution.

In one embodiment, in step S40, the size of the convolution kernel of the VGG neural network model used is 7x7.

The VGG neural network model in this application includes an input layer, a convolution layer, an activation function, a pooling layer, a fully connected layer, and a normalization layer (softmax). The convolution kernel size of the layer is 7x7. In this application, the depth image data of the human face, the first projection data, and the second projection data are used as training data of the VGG neural network model to be substituted into the VGG neural network model for training. The input layer of the VGG neural network model is used to input three channels of data: the depth image data of the face, the first projection data corresponding to the point cloud data of the face, and the second projection data. Before the training data is inputted into the VGG neural network model for training, the method further includes: preprocessing the training data of the three channels, where the preprocessing includes: de-averaging processing for Dimensions are all centered to 0; normalization processing: Normalize the amplitude of the data of the three channels in the input data to the same range, thereby reducing the interference caused by the difference in the value range of each channel. For example, We have two dimensions of data A and B. The range of A is 0 to 10 and the range of B is 0 to 10000. If you use these two features directly, there will be problems. The good practice is to normalize them, that is, A and B. The data are all in the range of 0 to 1. The convolution layer is used to perform a convolution operation on the above input data to obtain a feature map and use an activation function (such as the ReLU function) to perform a non-linear transformation. It should be understood that the feature map obtained by the convolution layer convolution is a linear mapping The expressive ability of the linear mapping is not enough, so some non-linear activation functions are added. The non-linear part is introduced in the entire network to enhance the expression ability of the feature map. In addition, the activation function can also be a sigmoid or tanh activation function. No restrictions. The pooling layer is used to compress the above feature maps. On the one hand, the feature maps are reduced to simplify the computational complexity of the VGG neural network; on the other hand, feature compression is performed to extract the main features of the input data. Among them, the commonly used pooling layer It can be max Pooling or Overlapping Pooling, or other pooling layers, such as Spatial Pyramid Pooling, etc., which are not specifically limited. The fully connected layer is used to connect all the features obtained by the pooling layer, and finally output to the normalization layer. A large number of training sets are constructed using the depth image data of the face, the first projection data, and the second projection data as training data. After training, the final VGG neural network model can be obtained. The specific training process is not described in detail.

In the present application, a convolution layer template with a structure of 7 × 7 is used in the convolution layer of the VGG neural network model. It should be understood that because the depth image data is smoother than the two-dimensional image, the number of 3x3 convolution kernels is no longer applicable. If a 3x3 convolution kernel is also used, because the 3x3 convolution kernel has a larger range than the house, the depth The image data is relatively smooth, and it is easy to lose face depth image data. Therefore, in this application, the size of the convolution kernel can be enlarged. Specifically, in this application, a 7 × 7 convolution kernel is used to effectively reduce the number of people. The loss of face depth image data makes the trained VGG neural network model more accurate in identifying faces to be recognized.

In an embodiment, obtaining point cloud data of a human face includes:

Obtain point cloud data of each frame in different poses of the face;

The point cloud data of different frames are fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the face.

It should be noted that due to the limitation of the size of the human face, the environment, and the scanning equipment, the scanning device may only measure one side of the human face during each scan. Therefore, in specific implementation, in order to obtain the complete point cloud data of the face, the face is scanned multiple times with different postures by the scanning device. Among them, one frame of point cloud data can be obtained per scan, and the point cloud data of different frames are fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the face. Specifically, in some solutions, the point cloud data of different frames can be fused and matched in the following ways, for example: Iterative Closest Point (ICP), Normal Distribution Transformation (NDT) Wait, not specifically limited.

It should be noted that the VGG neural network model is trained by using a training set composed of training data corresponding to N faces to obtain a convergence of the VGG neural network model, and its convergence conditions can be configured, for example, through BP Propagation) algorithm performs iterative training on the above training set until the VGG neural network model converges.

In an embodiment, as shown in FIG. 3, FIG. 3 is a schematic flowchart of an embodiment of the applicant's face recognition method, including the following steps:

S10`: Acquire point cloud data and depth image data of the face to be identified;

In this application, point cloud data and depth image data of a face to be recognized can also be obtained directly through a depth camera, where the depth camera refers to an image sensor that can observe objects or people in space. The position, specifically, the depth camera may be an active, passive, contact or non-contact depth camera, where the active is directed to the face to be identified and emits an energy beam (such as laser, electromagnetic wave or ultrasound) to obtain the to be identified The point cloud data of the face. The passive depth camera mainly uses the conditions of the surroundings of the face to be identified to obtain the point cloud data of the face to be identified. The contact depth camera refers to the need to contact or be close to the face to be identified. The contact type means that there is no need to contact the face to be identified. For example, the depth camera may specifically refer to a TOF (time-of-flight) depth camera. In addition, it may also be a kinect depth camera, an XTion depth camera, or a RealSense depth camera, which is not specifically limited.

In this application, after the point cloud data of the face to be identified is obtained, the obtained point cloud data of the face to be identified may be converted into depth image data corresponding to the face to be identified. To put it simply, in some application scenarios, the depth image data and point cloud data of the face to be recognized can be directly obtained through the depth camera, or the depth image data or point cloud data of the face to be recognized can be obtained first, and then converted into points Cloud data or depth image data are not specifically limited.

S20`. Acquire first projection data of the point cloud data in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction. The first preset direction and the second preset direction are Different projection directions

In this application, after the point cloud data of the face to be identified is obtained, the first projection data of the point cloud data of the face to be identified in a first preset direction and the point cloud data of the face to be identified may be further obtained. The projection data in the second preset direction, and the first preset direction and the second preset direction are different projection directions. In some application scenarios, the first preset direction is the azimuth direction of the point cloud data of the face to be recognized in the target coordinate system, and the second preset direction is the tilt angle of the point cloud data of the face to be recognized in the target coordinate system. direction. The target coordinate system is a world coordinate system, which is a three-dimensional coordinate system. One coordinate point P of the point cloud data of the face to be identified is known as the point P of the first hanging limit of the target coordinate system, standing at the origin (O Point) Look at this point P. From the positive direction of the x axis, rotate counterclockwise to the vertical projection line of point P. The angle between the x axis formed by the rotation and the vertical projection line of point P is the direction angle, and then go higher. Find the P point and get the tilt angle, that is, the angle formed by the vertical projection line of the P point and the straight line between the origin and the P point is the tilt angle. The projecting the point cloud data in the direction of the oblique angle to generate the first projection data specifically includes: obtaining the coordinate value of each point of the point cloud data of the face to be identified in the target coordinate system, and the face to be identified The coordinate values of each point in the point cloud data are projected on the tilt angle corresponding to the target coordinate system, so that the coordinate values of each point in the point cloud data of the face to be identified are generated corresponding to the target coordinate system. The projection on the tilt angle, and the projection of the coordinate value of each point in the target coordinate system of the point cloud data of the face to be recognized in the tilt angle direction constitutes the first projection data.

S30 ', using the depth image data, the first projection data and the second projection data as input data of a VGG neural network recognition model;

S40`: Input the input data into a VGG neural network recognition model to recognize the face to be recognized.

It should be understood that the VGG neural network recognition model is a deep convolutional neural network architecture. The VGG neural network recognition model of the present application refers to the VGG neural network model obtained in the foregoing model training method. The depth image data corresponding to the face to be recognized, the first projection data, and the second projection data are input into the VGG neural network model, thereby completing the recognition of the face to be recognized.

It can be seen that in this face recognition method, the depth image data of the face to be identified and the projection data of the point cloud data in different projection directions are input into the trained VGG neural network model with a total of three channels of data Because the VGG neural network model is suitable for the recognition of three-dimensional faces, and because the projection direction corresponding to the point cloud data retains the three-dimensional characteristics of the three-dimensional faces, the three-dimensional faces to be recognized can be effectively extracted for recognition, and the 3D human face for feature extraction.

It should be understood that the size of the sequence numbers of the steps in the embodiment does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application. .

In one embodiment, a neural network model training device is provided, and the neural network model training device corresponds to the model training method in the embodiment one by one. As shown in FIG. 4, the neural network model training device 40 includes a first acquisition module 401, a second acquisition module 402, a determination module 403, and a training module 404. The detailed description of each function module is as follows:

A first acquisition module 401, configured to acquire point cloud data corresponding to a human face and depth image data corresponding to a human face;

The second acquisition module 402 is configured to acquire the first projection data of the point cloud data acquired by the first acquisition module 401 in the first preset direction, and acquire the point cloud data of the first acquisition module 401 in the second preset direction. The second projection data, the first preset direction and the second preset direction are different projection directions;

A determining module 403, configured to use the depth image data obtained by the first obtaining module 402, the first projection data and the second projection data obtained by the second obtaining module as training data of a VGG neural network model;

A training module 404 is configured to train a VGG neural network model to obtain a convergence of the VGG neural network model by using a training set composed of training data determined by the N-face corresponding determination module 403, where N is greater than or equal to 2.

In some embodiments, the second obtaining module 402 is specifically configured to:

Use the azimuth direction of the point cloud data in the target coordinate system as the first preset direction;

Project the point cloud data in a first preset direction to generate first projection data.

Use the tilt angle direction of the point cloud data in the target coordinate system as the second preset direction;

The point cloud data is projected in a second preset direction to generate second projection data.

In one embodiment, the convolution kernel size of the VGG neural network model is 7 × 7.

In an embodiment, the first obtaining module 401 is specifically configured to:

Get point cloud data of each frame of the face in different poses;

The point cloud data of each frame is fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the face.

For the specific limitation of the neural network model training device, refer to the limitation on the model training method described above, which is not repeated here. Each module in the model training device may be implemented in whole or in part by software, hardware, and a combination thereof. Each module may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In one embodiment, a face recognition device is provided, and the face recognition device corresponds to the face recognition method in the embodiment one by one. As shown in FIG. 5, the face recognition device 50 includes a first acquisition module 501, a second acquisition module 502, a determination module 503, and a recognition module 504. The detailed description of each function module is as follows:

A first acquisition module 501, configured to acquire point cloud data and depth image data of a face to be identified;

The second acquisition module 502 is configured to acquire first projection data of the point cloud data acquired by the first acquisition module 501 in a first preset direction, and acquire the point cloud data acquired by the first acquisition module 501 in a second preset direction. The second projection data on the first preset direction and the second preset direction are different projection directions;

A determining module 503, configured to use the depth image data obtained by the first obtaining module 501, the first projection data and the second projection data obtained by the second obtaining module 502 as input data of a VGG neural network recognition model;

The recognition module 504 is configured to input the input data determined by the determination module 503 into a VGG neural network recognition model to recognize a face to be recognized.

For specific limitations on the face recognition device, reference may be made to the limitations on the face recognition method described above, and details are not described herein again. Each module in the face recognition device may be implemented in whole or in part by software, hardware, and a combination thereof. Each module may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In one embodiment, a computer device is provided, and its internal structure diagram can be as shown in FIG. 6. The computer device includes a processor, a memory, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for running an operating system and computer-readable instructions in a non-volatile storage medium. The database of the computer equipment is used to store the acquired image data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by a processor to implement a model training method or a face recognition method.

In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:

Obtain point cloud data and depth image data of the face to be identified;

The input data is input into a VGG neural network recognition model to recognize a face to be recognized.

In one embodiment, one or more non-volatile readable storage media storing computer readable instructions are provided, and the non readable storage medium stores computer readable instructions, the computer readable instructions When executed by one or more processors, causes the one or more processors to perform the following steps:

The VGG neural network model is trained through a training set composed of training data corresponding to N human faces, where N is greater than or equal to 2.

Obtain point cloud data and depth image data of the face to be identified;

Those of ordinary skill in the art can understand that all or part of the processes in the method of the embodiment can be implemented by computer-readable instructions to instruct related hardware. The computer-readable instructions can be stored in a non-volatile In a computer-readable storage medium, the computer-readable instructions, when executed, may include the processes of the embodiments of the methods as described. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the division of the functional units and modules is described as an example. In actual applications, the functions can be assigned to different functions as required Units and modules are completed, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.

The above-mentioned embodiments are only used to describe the technical solution of the present application, but are not limited thereto. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of this application.

Claims

A neural network model training method is characterized in that it includes:

Obtaining point cloud data corresponding to a human face, and depth image data corresponding to the human face;

Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;

Using the depth image data, the first projection data, and the second projection data as training data of a VGG neural network model;

The VGG neural network model is trained to obtain a convergent VGG neural network model by using a training set composed of the training data corresponding to N faces, where N is greater than or equal to 2.
The method for training a neural network model according to claim 1, wherein the acquiring the first projection data of the point cloud data in a first preset direction comprises:

Use the azimuth direction of the point cloud data in the target coordinate system as the first preset direction;

Projecting the point cloud data in the first preset direction to generate the first projection data.
The method for training a neural network model according to claim 2, wherein the acquiring the second projection data of the point cloud data in a second preset direction comprises:

Using the tilt angle direction of the point cloud data in the target coordinate system as the second preset direction;

Projecting the point cloud data in the second preset direction to generate second projection data.
The neural network model training method according to claim 3, wherein the size of the convolution kernel of the VGG neural network model is 7x7.
The neural network model training method according to claim 4, wherein the acquiring point cloud data of a human face comprises:

Acquiring point cloud data of each face of the face in different poses;

The point cloud data of each frame is fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the human face.
A face recognition method, comprising:

Obtain point cloud data and depth image data of the face to be identified;

Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;

Using the depth image data, the first projection data, and the second projection data as input data;

The input data is input into the convergent VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.
A neural network model training device includes:

A first acquisition module, configured to acquire point cloud data corresponding to a human face and depth image data corresponding to the human face;

A second obtaining module, configured to obtain first projection data of the point cloud data obtained by the first obtaining module in a first preset direction, and obtain the point cloud data obtained by the first obtaining module in the first preset direction; Second projection data in two preset directions, the first preset direction and the second preset direction being different projection directions;

A determining module, configured to use the depth image data obtained by the first obtaining module, the first projection data and the second projection data obtained by the second obtaining module as training data of a VGG neural network model;

A training module, configured to train the VGG neural network model by using a training set composed of the training data determined by the determining module corresponding to the N faces to obtain the convergent VGG neural network model, The N is greater than or equal to two.
The neural network model training device according to claim 7, wherein the second acquisition module is specifically configured to:

Use the azimuth direction of the point cloud data in the target coordinate system as the first preset direction;

Project the point cloud data in a first preset direction to generate first projection data.
The apparatus for training a neural network model according to claim 8, wherein the second acquisition module is further configured to:

Use the tilt angle direction of the point cloud data in the target coordinate system as the second preset direction;

The point cloud data is projected in a second preset direction to generate second projection data.
The neural network model training device according to claim 9, wherein the size of the convolution kernel of the VGG neural network model is 7x7.
The neural network model training device according to claim 10, wherein the first acquisition module is specifically configured to:

Get point cloud data of each frame of the face in different poses;

The point cloud data of each frame is fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the face.
A face recognition device, comprising:

A first acquisition module, configured to acquire point cloud data and depth image data of a face to be identified;

A second acquisition module is configured to acquire first projection data of the point cloud data in a first preset direction acquired by the first acquisition module, and acquire a first projection data of the point cloud data in a second preset direction. Two projection data, the first preset direction and the second preset direction are different projection directions;

A determination module, configured to use the depth image data obtained by the first acquisition module, the first projection data and the second projection data obtained by the second acquisition module as inputs of a VGG neural network recognition model data;

A recognition module is configured to input the input data determined by the determination module into the convergence VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.
A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the computer-readable instructions as follows: step:

Obtaining point cloud data corresponding to a human face, and depth image data corresponding to the human face;

Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;

Using the depth image data, the first projection data, and the second projection data as training data of a VGG neural network model;

The VGG neural network model is trained to obtain a convergent VGG neural network model by using a training set composed of the training data corresponding to N faces, where N is greater than or equal to 2.
The computer device according to claim 13, wherein the acquiring the first projection data of the point cloud data in a first preset direction comprises:

Use the azimuth direction of the point cloud data in the target coordinate system as the first preset direction;

Projecting the point cloud data in the first preset direction to generate the first projection data.
The computer device according to claim 14, wherein the acquiring the second projection data of the point cloud data in a second preset direction comprises:

Using the tilt angle direction of the point cloud data in the target coordinate system as the second preset direction;

Projecting the point cloud data in the second preset direction to generate second projection data.
A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the computer-readable instructions as follows: step:

Obtain point cloud data and depth image data of the face to be identified;

Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;

Using the depth image data, the first projection data, and the second projection data as input data;

The input data is input into the convergent VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.
One or more non-volatile readable storage media storing computer-readable instructions, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute The following steps:

Obtaining point cloud data corresponding to a human face, and depth image data corresponding to the human face;

Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;

Using the depth image data, the first projection data, and the second projection data as training data of a VGG neural network model;

The VGG neural network model is trained to obtain a convergent VGG neural network model by using a training set composed of the training data corresponding to N faces, where N is greater than or equal to 2.
The non-volatile readable storage medium according to claim 17, wherein the acquiring the first projection data of the point cloud data in a first preset direction comprises:

Use the azimuth direction of the point cloud data in the target coordinate system as the first preset direction;

Projecting the point cloud data in the first preset direction to generate the first projection data.
The non-volatile readable storage medium according to claim 18, wherein the acquiring the second projection data of the point cloud data in a second preset direction comprises:

Using the tilt angle direction of the point cloud data in the target coordinate system as the second preset direction;

Projecting the point cloud data in the second preset direction to generate second projection data.
One or more non-volatile readable storage media storing computer-readable instructions, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute The following steps:

Obtain point cloud data and depth image data of the face to be identified;

Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;

Using the depth image data, the first projection data, and the second projection data as input data;

The input data is input into the convergent VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.