WO2020034542A1 - Neural network model training method and apparatus, face recognition method and apparatus, device, and medium - Google Patents
Neural network model training method and apparatus, face recognition method and apparatus, device, and medium Download PDFInfo
- Publication number
- WO2020034542A1 WO2020034542A1 PCT/CN2018/123884 CN2018123884W WO2020034542A1 WO 2020034542 A1 WO2020034542 A1 WO 2020034542A1 CN 2018123884 W CN2018123884 W CN 2018123884W WO 2020034542 A1 WO2020034542 A1 WO 2020034542A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- point cloud
- preset direction
- cloud data
- projection
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 104
- 238000003062 neural network model Methods 0.000 title claims abstract description 95
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000013528 artificial neural network Methods 0.000 claims description 24
- 230000006870 function Effects 0.000 description 12
- 238000011176 pooling Methods 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 7
- 230000004913 activation Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000010606 normalization Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Definitions
- the present application relates to the field of computers, and in particular, to a method, a device, a device, and a medium for neural network model training and face recognition.
- Convolutional neural network is an efficient identification method that has been developed in recent years and has attracted widespread attention. Now, CNN has become one of the research hotspots in many scientific fields, especially in the fields of face recognition, image classification and recognition.
- VGG visual geometric group
- VGG neural network has a good generalization ability for other data sets.
- the VGG neural network model can be used for two-dimensional face recognition due to its inherent convolutional neural network architecture.
- the traditionally trained VGG neural network model is usually a two-dimensional face to be recognized.
- the R ⁇ G ⁇ B data of the image is used as input data, but it is not suitable for face recognition in which the face to be recognized is three-dimensional.
- a three-dimensional face is a kind of three-dimensional data information.
- the traditional VGG convolutional neural network model is not very suitable for 3D face recognition and cannot effectively extract 3D face recognition.
- a neural network model training method includes:
- first projection data of the point cloud data in a first preset direction and acquire second projection data of the point cloud data in a second preset direction.
- the first preset direction and the second preset direction are different projections. direction;
- the VGG neural network model is trained to obtain a convergence of the VGG neural network model through a training set composed of training data corresponding to N human faces, where N is greater than or equal to 2.
- a face recognition method includes:
- first projection data of the point cloud data in a first preset direction and acquire second projection data of the point cloud data in a second preset direction.
- the first preset direction and the second preset direction are different projections. direction;
- the input data is input into a convergent VGG neural network recognition model obtained by a neural network model training method to recognize a face to be recognized.
- a neural network model training device includes:
- a first acquisition module configured to acquire point cloud data corresponding to a human face and depth image data corresponding to a human face
- a second acquisition module configured to acquire first projection data of the point cloud data acquired by the first acquisition module in a first preset direction, and acquire a second projection module's second point cloud data in a second preset direction Projection data, the first preset direction and the second preset direction are different projection directions;
- a determining module configured to use the depth image data obtained by the first acquisition module, the first projection data obtained by the second acquisition module, and the data of the second projection data as the training data of the VGG neural network model;
- a training module is configured to train a VGG neural network model through a training set composed of training data determined by a determination module corresponding to N human faces, where N is greater than or equal to 2.
- a face recognition device includes:
- a first acquisition module configured to acquire point cloud data and depth image data of a face to be identified
- a second acquisition module configured to acquire first projection data of the point cloud data acquired by the first acquisition module in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction, the first The preset direction and the second preset direction are different projection directions;
- a determining module configured to use the data of the three channels of the depth image data obtained by the first acquisition module, the first projection data obtained by the second acquisition module, and the second projection data as input data of the VGG neural network recognition model;
- the recognition module is configured to input the input data determined by the determination module into a VGG neural network recognition model to recognize a face to be recognized.
- a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
- the processor executes the computer-readable instructions, the following steps are implemented:
- the VGG neural network model is trained to obtain a convergent VGG neural network model by using a training set composed of the training data corresponding to N faces, where N is greater than or equal to 2.
- a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
- the processor executes the computer-readable instructions, the following steps are implemented:
- the input data is input into the convergent VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.
- One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
- the VGG neural network model is trained to obtain a convergent VGG neural network model by using a training set composed of the training data corresponding to N faces, where N is greater than or equal to 2.
- One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
- the input data is input into the convergent VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.
- FIG. 1 is a schematic diagram of an application framework of a neural network model training method in the present application
- FIG. 2 is a schematic flowchart of an embodiment of a neural network model training method in the present application
- FIG. 3 is a schematic flowchart of an embodiment of a face recognition method in the present application.
- FIG. 4 is a schematic structural diagram of an embodiment of a neural network model training device in the present application.
- FIG. 5 is a schematic structural diagram of an embodiment of a face recognition device in the present application.
- FIG. 6 is a schematic structural diagram of an embodiment of a computer device in the present application.
- the neural network model training method provided in this application can be applied in the application environment as shown in FIG. 1, where a computer device acquires point cloud data corresponding to a human face and depth image data corresponding to a human face; First projection data in a preset direction, and obtain second projection data of point cloud data in a second preset direction, where the first preset direction and the second preset direction are different projection directions; the depth image data, The data of the three channels of the first projection data and the second projection data are used as the training data of the VGG neural network model.
- the training set composed of the training data corresponding to N different faces is used to train the VGG neural network model to obtain training. Convergent VGG neural network model, where N is greater than or equal to two.
- the computer device is a device having a computing processing capability, and may be, but is not limited to, various personal computers, notebook computers, servers, and the like.
- FIG. 2 is a schematic flowchart of an embodiment of a neural network model training method of the present application, including the following steps:
- the point cloud data corresponding to the face and the depth image data corresponding to the face can be obtained, where the point cloud data refers to the information of the discrete points of the surface of the face recorded in a point manner, including the surface discrete of the face
- the spatial position information and color information (for example, RGB) of the points is the spatial coordinates of discrete points on the surface of the human face.
- the corresponding point cloud data of a human face can be obtained directly through a depth camera.
- the depth camera refers to an image sensor that can observe the position of a human face in space.
- the depth camera may be an active, passive, contact or non-contact depth camera, wherein the active camera emits an energy beam (such as a laser, an electromagnetic wave, or an ultrasonic wave) toward a human face to obtain point cloud data of the human face.
- the passive depth camera mainly uses the conditions of the surrounding environment of the object to obtain the point cloud data of the human face.
- the contact depth camera refers to the need to contact or be closer to the human face, and the non-contact type means no contact with the human face.
- the depth camera may specifically refer to a TOF (time-of-flight) depth camera.
- it may also be a kinect depth camera, an XTion depth camera, or a RealSense depth camera, which is not specifically limited.
- depth image data is also called range image data, which refers to image data that uses the distance (depth) of the image collector to each point in the real scene as the pixel value. Reflects the geometry of the visible surface of the face. Depth image data can be converted into corresponding point cloud data through coordinate transformation. Conversely, point cloud data can also be calculated as depth image data. Therefore, in this application, after obtaining the point cloud data of a human face, The obtained point cloud data of the face is converted into depth image data corresponding to the face.
- the depth image data and point cloud data of a person's face can be directly obtained through the depth camera, or the depth image data or point cloud data of the face can be obtained first, and then converted into point cloud data or depth image data. , Not specifically limited.
- the first projection data of the point cloud data of the face in the first preset direction and the point cloud data of the face in the second preset direction may be further obtained.
- the first preset direction and the second preset direction are different projection directions. That is, the projection data of the point cloud data of the face on different planes can be obtained according to different projection directions.
- steps S10-S30 three types of data of the corresponding depth image data, first projection data, and second projection data of the face can be obtained.
- the depth image data, the first projection data, and The second projection data is used as training data of the three channels of the VGG neural network model to form training data of the VGG neural network model, that is, a training sample corresponding to the face is obtained.
- the VGG neural network model is trained to obtain a convergence of the VGG neural network model through a training set composed of training data corresponding to N human faces, where N is greater than or equal to 2.
- each face corresponds to three types of data: image data, first projection data, and second projection data.
- the above-mentioned three kinds of data corresponding to a human face constitute a training sample
- the training samples corresponding to N human faces constitute a training sample set
- the VGG neural network model is trained through the training set until the VGG neural network model converges.
- the training data composed of three channels of data is used to train the depth image data of different faces and the projection data of the point cloud data in different projection directions.
- the VGG neural network model obtained by training is suitable for the recognition of three-dimensional faces, and since the projection direction corresponding to the point cloud data retains the three-dimensional characteristics of the three-dimensional faces, it can effectively extract the three-dimensional faces to be recognized for recognition, and can effectively The three-dimensional face is subjected to feature extraction.
- obtaining the first projection data of the point cloud data in the first preset direction in step S20 includes:
- the point cloud data is projected in a first preset direction to generate first projection data.
- the point cloud data may be projected in the azimuth direction of the target coordinate system to obtain the first projection data, thereby obtaining the projection data of the point cloud data of the human face on one of the two-dimensional planes.
- the target coordinate system is a world coordinate system, which is a three-dimensional coordinate system.
- One coordinate point P of the point cloud data is known as the first point P of the target coordinate system.
- This point P is rotated counterclockwise from the positive direction of the x-axis to the vertical projection line of the point P, and the angle formed between the x-axis and the vertical projection line of the point P is the directional angle.
- the projecting the point cloud data in the azimuth direction to generate the first projection data specifically includes: obtaining the coordinate value of each point of the point cloud data in the target coordinate system, and each point in the point cloud data.
- the coordinate value of is projected on the tilt angle corresponding to the target coordinate system, thereby generating a projection of the coordinate value of each point in the point cloud data on the tilt angle corresponding to the target coordinate system.
- the point cloud data is on the target
- the projection of the coordinate value of each point of the coordinate system in the oblique angle direction constitutes the first projection data.
- step S20 obtaining second projection data of the point cloud data in a second preset direction includes:
- the point cloud data can be projected in the direction of the tilt angle of the target coordinate system to obtain the first projection data, thereby obtaining the projection data of the point cloud data of the human face on one of the two-dimensional planes.
- the target coordinate system is a world coordinate system, which is a three-dimensional coordinate system.
- a coordinate point P of the point cloud data is known as a point P of the first limit of the target coordinate system. Stand at the origin (point O) and look at this point. P, from the positive direction of the x-axis, rotates counterclockwise to the vertical projection line of point P.
- the angle between the x-axis and the vertical projection line of point P is the directional angle.
- the projecting the point cloud data in the direction of the inclination angle to generate the first projection data specifically includes: obtaining the coordinate value of each point of the point cloud data in the target coordinate system, and each point in the point cloud data.
- the coordinate value of is projected on the tilt angle corresponding to the target coordinate system, thereby generating a projection of the coordinate value of each point in the point cloud data on the tilt angle corresponding to the target coordinate system.
- the point cloud data is on the target
- the projection of the coordinate value of each point of the coordinate system in the oblique angle direction constitutes the first projection data.
- this application proposes a specific projection direction to obtain the first projection data and the second projection data corresponding to the point cloud data of the face, which improves the implementability of the solution.
- step S40 the size of the convolution kernel of the VGG neural network model used is 7x7.
- the VGG neural network model in this application includes an input layer, a convolution layer, an activation function, a pooling layer, a fully connected layer, and a normalization layer (softmax).
- the convolution kernel size of the layer is 7x7.
- the depth image data of the human face, the first projection data, and the second projection data are used as training data of the VGG neural network model to be substituted into the VGG neural network model for training.
- the input layer of the VGG neural network model is used to input three channels of data: the depth image data of the face, the first projection data corresponding to the point cloud data of the face, and the second projection data.
- the method further includes: preprocessing the training data of the three channels, where the preprocessing includes: de-averaging processing for Dimensions are all centered to 0; normalization processing: Normalize the amplitude of the data of the three channels in the input data to the same range, thereby reducing the interference caused by the difference in the value range of each channel.
- preprocessing includes: de-averaging processing for Dimensions are all centered to 0; normalization processing: Normalize the amplitude of the data of the three channels in the input data to the same range, thereby reducing the interference caused by the difference in the value range of each channel.
- a and B The range of A is 0 to 10 and the range of B is 0 to 10000. If you use these two features directly, there will be problems. The good practice is to normalize them, that is, A and B.
- the data are all in the range of 0 to 1.
- the convolution layer is used to perform a convolution operation on the above input data to obtain a feature map and use an activation function (such as the ReLU function) to perform a non-linear transformation.
- an activation function such as the ReLU function
- the feature map obtained by the convolution layer convolution is a linear mapping
- the expressive ability of the linear mapping is not enough, so some non-linear activation functions are added.
- the non-linear part is introduced in the entire network to enhance the expression ability of the feature map.
- the activation function can also be a sigmoid or tanh activation function. No restrictions.
- the pooling layer is used to compress the above feature maps.
- the feature maps are reduced to simplify the computational complexity of the VGG neural network; on the other hand, feature compression is performed to extract the main features of the input data.
- the commonly used pooling layer It can be max Pooling or Overlapping Pooling, or other pooling layers, such as Spatial Pyramid Pooling, etc., which are not specifically limited.
- the fully connected layer is used to connect all the features obtained by the pooling layer, and finally output to the normalization layer.
- a large number of training sets are constructed using the depth image data of the face, the first projection data, and the second projection data as training data. After training, the final VGG neural network model can be obtained. The specific training process is not described in detail.
- a convolution layer template with a structure of 7 ⁇ 7 is used in the convolution layer of the VGG neural network model. It should be understood that because the depth image data is smoother than the two-dimensional image, the number of 3x3 convolution kernels is no longer applicable. If a 3x3 convolution kernel is also used, because the 3x3 convolution kernel has a larger range than the house, the depth The image data is relatively smooth, and it is easy to lose face depth image data. Therefore, in this application, the size of the convolution kernel can be enlarged. Specifically, in this application, a 7 ⁇ 7 convolution kernel is used to effectively reduce the number of people. The loss of face depth image data makes the trained VGG neural network model more accurate in identifying faces to be recognized.
- obtaining point cloud data of a human face includes:
- the point cloud data of different frames are fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the face.
- the scanning device may only measure one side of the human face during each scan. Therefore, in specific implementation, in order to obtain the complete point cloud data of the face, the face is scanned multiple times with different postures by the scanning device. Among them, one frame of point cloud data can be obtained per scan, and the point cloud data of different frames are fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the face. Specifically, in some solutions, the point cloud data of different frames can be fused and matched in the following ways, for example: Iterative Closest Point (ICP), Normal Distribution Transformation (NDT) Wait, not specifically limited.
- ICP Iterative Closest Point
- NDT Normal Distribution Transformation
- the VGG neural network model is trained by using a training set composed of training data corresponding to N faces to obtain a convergence of the VGG neural network model, and its convergence conditions can be configured, for example, through BP Propagation) algorithm performs iterative training on the above training set until the VGG neural network model converges.
- FIG. 3 is a schematic flowchart of an embodiment of the applicant's face recognition method, including the following steps:
- point cloud data and depth image data of a face to be recognized can also be obtained directly through a depth camera, where the depth camera refers to an image sensor that can observe objects or people in space.
- the position specifically, the depth camera may be an active, passive, contact or non-contact depth camera, where the active is directed to the face to be identified and emits an energy beam (such as laser, electromagnetic wave or ultrasound) to obtain the to be identified
- the point cloud data of the face The passive depth camera mainly uses the conditions of the surroundings of the face to be identified to obtain the point cloud data of the face to be identified.
- the contact depth camera refers to the need to contact or be close to the face to be identified.
- the contact type means that there is no need to contact the face to be identified.
- the depth camera may specifically refer to a TOF (time-of-flight) depth camera.
- it may also be a kinect depth camera, an XTion depth camera, or a RealSense depth camera, which is not specifically limited.
- the obtained point cloud data of the face to be identified may be converted into depth image data corresponding to the face to be identified.
- the depth image data and point cloud data of the face to be recognized can be directly obtained through the depth camera, or the depth image data or point cloud data of the face to be recognized can be obtained first, and then converted into points Cloud data or depth image data are not specifically limited.
- the first projection data of the point cloud data of the face to be identified in a first preset direction and the point cloud data of the face to be identified may be further obtained.
- the projection data in the second preset direction, and the first preset direction and the second preset direction are different projection directions.
- the first preset direction is the azimuth direction of the point cloud data of the face to be recognized in the target coordinate system
- the second preset direction is the tilt angle of the point cloud data of the face to be recognized in the target coordinate system. direction.
- the target coordinate system is a world coordinate system, which is a three-dimensional coordinate system.
- One coordinate point P of the point cloud data of the face to be identified is known as the point P of the first hanging limit of the target coordinate system, standing at the origin (O Point) Look at this point P. From the positive direction of the x axis, rotate counterclockwise to the vertical projection line of point P. The angle between the x axis formed by the rotation and the vertical projection line of point P is the direction angle, and then go higher. Find the P point and get the tilt angle, that is, the angle formed by the vertical projection line of the P point and the straight line between the origin and the P point is the tilt angle.
- the projecting the point cloud data in the direction of the oblique angle to generate the first projection data specifically includes: obtaining the coordinate value of each point of the point cloud data of the face to be identified in the target coordinate system, and the face to be identified
- the coordinate values of each point in the point cloud data are projected on the tilt angle corresponding to the target coordinate system, so that the coordinate values of each point in the point cloud data of the face to be identified are generated corresponding to the target coordinate system.
- the projection on the tilt angle, and the projection of the coordinate value of each point in the target coordinate system of the point cloud data of the face to be recognized in the tilt angle direction constitutes the first projection data.
- S40 ⁇ Input the input data into a VGG neural network recognition model to recognize the face to be recognized.
- the VGG neural network recognition model is a deep convolutional neural network architecture.
- the VGG neural network recognition model of the present application refers to the VGG neural network model obtained in the foregoing model training method.
- the depth image data corresponding to the face to be recognized, the first projection data, and the second projection data are input into the VGG neural network model, thereby completing the recognition of the face to be recognized.
- the depth image data of the face to be identified and the projection data of the point cloud data in different projection directions are input into the trained VGG neural network model with a total of three channels of data Because the VGG neural network model is suitable for the recognition of three-dimensional faces, and because the projection direction corresponding to the point cloud data retains the three-dimensional characteristics of the three-dimensional faces, the three-dimensional faces to be recognized can be effectively extracted for recognition, and the 3D human face for feature extraction.
- a neural network model training device is provided, and the neural network model training device corresponds to the model training method in the embodiment one by one.
- the neural network model training device 40 includes a first acquisition module 401, a second acquisition module 402, a determination module 403, and a training module 404.
- the detailed description of each function module is as follows:
- a first acquisition module 401 configured to acquire point cloud data corresponding to a human face and depth image data corresponding to a human face;
- the second acquisition module 402 is configured to acquire the first projection data of the point cloud data acquired by the first acquisition module 401 in the first preset direction, and acquire the point cloud data of the first acquisition module 401 in the second preset direction.
- the second projection data, the first preset direction and the second preset direction are different projection directions;
- a determining module 403, configured to use the depth image data obtained by the first obtaining module 402, the first projection data and the second projection data obtained by the second obtaining module as training data of a VGG neural network model;
- a training module 404 is configured to train a VGG neural network model to obtain a convergence of the VGG neural network model by using a training set composed of training data determined by the N-face corresponding determination module 403, where N is greater than or equal to 2.
- the second obtaining module 402 is specifically configured to:
- the second obtaining module 402 is specifically configured to:
- the point cloud data is projected in a second preset direction to generate second projection data.
- the convolution kernel size of the VGG neural network model is 7 ⁇ 7.
- the first obtaining module 401 is specifically configured to:
- the point cloud data of each frame is fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the face.
- Each module in the model training device may be implemented in whole or in part by software, hardware, and a combination thereof.
- Each module may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
- a face recognition device is provided, and the face recognition device corresponds to the face recognition method in the embodiment one by one.
- the face recognition device 50 includes a first acquisition module 501, a second acquisition module 502, a determination module 503, and a recognition module 504.
- the detailed description of each function module is as follows:
- a first acquisition module 501 configured to acquire point cloud data and depth image data of a face to be identified
- the second acquisition module 502 is configured to acquire first projection data of the point cloud data acquired by the first acquisition module 501 in a first preset direction, and acquire the point cloud data acquired by the first acquisition module 501 in a second preset direction.
- the second projection data on the first preset direction and the second preset direction are different projection directions;
- a determining module 503, configured to use the depth image data obtained by the first obtaining module 501, the first projection data and the second projection data obtained by the second obtaining module 502 as input data of a VGG neural network recognition model;
- the recognition module 504 is configured to input the input data determined by the determination module 503 into a VGG neural network recognition model to recognize a face to be recognized.
- Each module in the face recognition device may be implemented in whole or in part by software, hardware, and a combination thereof.
- Each module may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
- a computer device in one embodiment, is provided, and its internal structure diagram can be as shown in FIG. 6.
- the computer device includes a processor, a memory, and a database connected through a system bus.
- the processor of the computer device is used to provide computing and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
- the internal memory provides an environment for running an operating system and computer-readable instructions in a non-volatile storage medium.
- the database of the computer equipment is used to store the acquired image data.
- the network interface of the computer device is used to communicate with an external terminal through a network connection.
- the computer-readable instructions are executed by a processor to implement a model training method or a face recognition method.
- a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
- the processor executes the computer-readable instructions, the following steps are implemented:
- first projection data of the point cloud data in a first preset direction and acquire second projection data of the point cloud data in a second preset direction.
- the first preset direction and the second preset direction are different projections. direction;
- the VGG neural network model is trained to obtain a convergence of the VGG neural network model through a training set composed of training data corresponding to N human faces, where N is greater than or equal to 2.
- a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
- the processor executes the computer-readable instructions, the following steps are implemented:
- first projection data of the point cloud data in a first preset direction and acquire second projection data of the point cloud data in a second preset direction.
- the first preset direction and the second preset direction are different projections. direction;
- the input data is input into a VGG neural network recognition model to recognize a face to be recognized.
- one or more non-volatile readable storage media storing computer readable instructions are provided, and the non readable storage medium stores computer readable instructions, the computer readable instructions When executed by one or more processors, causes the one or more processors to perform the following steps:
- first projection data of the point cloud data in a first preset direction and acquire second projection data of the point cloud data in a second preset direction.
- the first preset direction and the second preset direction are different projections. direction;
- the VGG neural network model is trained through a training set composed of training data corresponding to N human faces, where N is greater than or equal to 2.
- one or more non-volatile readable storage media storing computer readable instructions are provided, and the non readable storage medium stores computer readable instructions, the computer readable instructions When executed by one or more processors, causes the one or more processors to perform the following steps:
- first projection data of the point cloud data in a first preset direction and acquire second projection data of the point cloud data in a second preset direction.
- the first preset direction and the second preset direction are different projections. direction;
- the input data is input into a VGG neural network recognition model to recognize a face to be recognized.
- Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory can include random access memory (RAM) or external cache memory.
- RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
- SRAM static RAM
- DRAM dynamic RAM
- SDRAM synchronous DRAM
- DDRSDRAM dual data rate SDRAM
- ESDRAM enhanced SDRAM
- SLDRAM synchronous chain Synchlink DRAM
- Rambus direct RAM
- DRAM direct memory bus dynamic RAM
- RDRAM memory bus dynamic RAM
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
A neural network model training method and apparatus, a face recognition method and apparatus, a device, and a medium, capable of effectively recognizing a face to be recognized. The neural network model training method comprises: obtaining point cloud data corresponding a face and depth image data corresponding to the face (S10); obtaining first projection data of the point cloud data in a first preset direction, and obtaining second projection data of the point cloud data in a second preset direction, the first preset direction and the second preset direction being different projection directions (S20); using the depth image data, the first projection data, and the second projection data as training data of a VGG neural network model (S30); and training the VGG neural network model by means of a training set constituted by the training data corresponding to N faces, wherein N is greater than or equal to 2 (S40), to obtain a converged VGG neural network model.
Description
本申请以2018年8月17日提交的申请号为201810939556.5,名称为“神经网络模型训练、人脸识别方法、装置、设备及介质”的中国发明专利申请为基础,并要求其优先权。This application is based on a Chinese invention patent application filed on August 17, 2018 with the application number 201810939556.5, entitled "Neural Network Model Training, Face Recognition Method, Device, Equipment, and Medium", and claims priority.
本申请涉及计算机领域,尤其涉及一种神经网络模型训练、人脸识别方法、装置、设备及介质。The present application relates to the field of computers, and in particular, to a method, a device, a device, and a medium for neural network model training and face recognition.
卷积神经网络(convolutional neural network,CNN)是近年发展起来,并引起广泛重视的一种高效识别方法。现在,CNN已经成为众多科学领域的研究热点之一,尤其是在人脸识别、图像分类识别等领域有着较大的研究前景。VGG(visual geometry group)神经网络为卷积神经网络中的一种,由牛津大学的视觉几何组提出,VGG神经网络对其他数据集具有很好的泛化能力。Convolutional neural network (CNN) is an efficient identification method that has been developed in recent years and has attracted widespread attention. Now, CNN has become one of the research hotspots in many scientific fields, especially in the fields of face recognition, image classification and recognition. VGG (visual geometric group) neural network is one of the convolutional neural networks. It was proposed by the Visual Geometry Group of Oxford University. VGG neural network has a good generalization ability for other data sets.
然而,VGG神经网络模型中,由于其固有的卷积神经网络架构,可以用来做二维的人脸识别,传统上训练得到的VGG神经网络模型通常为对待识别人脸为二维的人脸图像的R\G\B三个通道的数据作为输入数据,但是它不适合用于待识别人脸为三维的人脸识别,三维人脸是一种三维数据信息,转换成深度图像数据时为单通道图像数据,因此,传统的VGG卷积神经网络模型并不是很适用于三维人脸识别,不能有效地提取对三维人脸进行识别。However, the VGG neural network model can be used for two-dimensional face recognition due to its inherent convolutional neural network architecture. The traditionally trained VGG neural network model is usually a two-dimensional face to be recognized. The R \ G \ B data of the image is used as input data, but it is not suitable for face recognition in which the face to be recognized is three-dimensional. A three-dimensional face is a kind of three-dimensional data information. When converted into depth image data, it is Single-channel image data, therefore, the traditional VGG convolutional neural network model is not very suitable for 3D face recognition and cannot effectively extract 3D face recognition.
发明内容Summary of the Invention
基于此,有必要针对所述技术问题,提供一种可以有效地对三维人脸进行识别的神经网络模型训练、人脸识别方法、装置、设备及介质。Based on this, it is necessary to provide a neural network model training, a face recognition method, a device, a device, and a medium that can effectively recognize a three-dimensional face in response to the technical problem.
一种神经网络模型训练方法,包括:A neural network model training method includes:
获取人脸对应的点云数据,以及人脸对应的深度图像数据;Obtaining point cloud data corresponding to the face, and depth image data corresponding to the face;
获取点云数据在第一预设方向上的第一投影数据,并获取点云数据在第二预设方向上的第二投影数据,第一预设方向和第二预设方向为不同的投影方向;Acquire first projection data of the point cloud data in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction. The first preset direction and the second preset direction are different projections. direction;
将深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络模型的训练数据;Using the depth image data, the first projection data, and the second projection data as training data of the VGG neural network model;
通过由N个人脸对应的训练数据所构成的训练集对VGG神经网络模型进行训练以得到收敛的所述VGG神经网络模型,N大于或等于2。The VGG neural network model is trained to obtain a convergence of the VGG neural network model through a training set composed of training data corresponding to N human faces, where N is greater than or equal to 2.
一种人脸识别方法,包括:A face recognition method includes:
获取待识别人脸的点云数据以及深度图像数据;Obtain point cloud data and depth image data of the face to be identified;
获取点云数据在第一预设方向上的第一投影数据,并获取点云数据在第二预设方向上的第二投影数据,第一预设方向和第二预设方向为不同的投影方向;Acquire first projection data of the point cloud data in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction. The first preset direction and the second preset direction are different projections. direction;
将深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络识别模型的输入数据;Using the depth image data, the first projection data, and the second projection data as input data of a VGG neural network recognition model;
将输入数据输入神经网络模型训练方法得到的收敛的VGG神经网络识别模型以对待识别人脸进行识别。The input data is input into a convergent VGG neural network recognition model obtained by a neural network model training method to recognize a face to be recognized.
一种神经网络模型训练装置,包括:A neural network model training device includes:
第一获取模块,用于获取人脸对应的点云数据,以及人脸对应的深度图像数据;A first acquisition module, configured to acquire point cloud data corresponding to a human face and depth image data corresponding to a human face;
第二获取模块,用于获取第一获取模块获取的点云数据在第一预设方向上的第一投影数据,并获取第一获取模块在点云数据在第二预设方向上的第二投影数据,第一预设方向和第二预设方向为不同的投影方向;A second acquisition module, configured to acquire first projection data of the point cloud data acquired by the first acquisition module in a first preset direction, and acquire a second projection module's second point cloud data in a second preset direction Projection data, the first preset direction and the second preset direction are different projection directions;
确定模块,用于将第一获取模块获取的深度图像数据、第二获取模块获取的第一投影数据以及第二投影数据三个通道的数据,作为VGG神经网络模型的训练数据;A determining module, configured to use the depth image data obtained by the first acquisition module, the first projection data obtained by the second acquisition module, and the data of the second projection data as the training data of the VGG neural network model;
训练模块,用于通过由N个人脸对应的确定模块确定的训练数据所构成的训练集对VGG神经网络模型进行训练,N大于或等于2。A training module is configured to train a VGG neural network model through a training set composed of training data determined by a determination module corresponding to N human faces, where N is greater than or equal to 2.
一种人脸识别装置,包括:A face recognition device includes:
第一获取模块,用于获取待识别人脸的点云数据以及深度图像数据;A first acquisition module, configured to acquire point cloud data and depth image data of a face to be identified;
第二获取模块,用于获取第一获取模块获取的点云数据在第一预设方向上的第一投影数据,并获取点云数据在第二预设方向上的第二投影数据,第一预设方向和第二预设方向为不同的投影方向;A second acquisition module, configured to acquire first projection data of the point cloud data acquired by the first acquisition module in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction, the first The preset direction and the second preset direction are different projection directions;
确定模块,用于将第一获取模块获取的深度图像数据、第二获取模块获取的第一投影数据以及第二投影数据三个通道的数据作为VGG神经网络识别模型的输入数据;A determining module, configured to use the data of the three channels of the depth image data obtained by the first acquisition module, the first projection data obtained by the second acquisition module, and the second projection data as input data of the VGG neural network recognition model;
识别模块,用于将确定模块确定的输入数据输入VGG神经网络识别模型以对待识别人脸进行识别。The recognition module is configured to input the input data determined by the determination module into a VGG neural network recognition model to recognize a face to be recognized.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:
获取人脸对应的点云数据,以及所述人脸对应的深度图像数据;Obtaining point cloud data corresponding to a human face, and depth image data corresponding to the human face;
获取所述点云数据在第一预设方向上的第一投影数据,并获取所述点云数据在第二预设方向上的第二投影数据,所述第一预设方向和所述第二预设方向为不同的投影方向;Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;
将所述深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络模型的训练数据;Using the depth image data, the first projection data, and the second projection data as training data of a VGG neural network model;
通过由N个所述人脸对应的所述训练数据所构成的训练集对所述VGG神经网络模型进行训练以得到收敛的所述VGG神经网络模型,所述N大于或等于2。The VGG neural network model is trained to obtain a convergent VGG neural network model by using a training set composed of the training data corresponding to N faces, where N is greater than or equal to 2.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:
获取待识别人脸的点云数据以及深度图像数据;Obtain point cloud data and depth image data of the face to be identified;
获取所述点云数据在第一预设方向上的第一投影数据,并获取所述点云数据在第二预设方向上的第二投影数据,所述第一预设方向和所述第二预设方向为不同的投影方向;Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;
将所述深度图像数据、所述第一投影数据以及所述第二投影数据作为输入数据;Using the depth image data, the first projection data, and the second projection data as input data;
将所述输入数据输入如权利要求1-5所述的收敛的VGG神经网络识别模型以对所述待识别人脸进行识别。The input data is input into the convergent VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
获取人脸对应的点云数据,以及所述人脸对应的深度图像数据;Obtaining point cloud data corresponding to a human face, and depth image data corresponding to the human face;
获取所述点云数据在第一预设方向上的第一投影数据,并获取所述点云数据在第二预设方向上的第二投影数据,所述第一预设方向和所述第二预设方向为不同的投影方向;Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;
将所述深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络模型的训练数据;Using the depth image data, the first projection data, and the second projection data as training data of a VGG neural network model;
通过由N个所述人脸对应的所述训练数据所构成的训练集对所述VGG神经网络模型进行训练以得到收敛的所述VGG神经网络模型,所述N大于或等于2。The VGG neural network model is trained to obtain a convergent VGG neural network model by using a training set composed of the training data corresponding to N faces, where N is greater than or equal to 2.
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
获取待识别人脸的点云数据以及深度图像数据;Obtain point cloud data and depth image data of the face to be identified;
获取所述点云数据在第一预设方向上的第一投影数据,并获取所述点云数据在第二预 设方向上的第二投影数据,所述第一预设方向和所述第二预设方向为不同的投影方向;Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;
将所述深度图像数据、所述第一投影数据以及所述第二投影数据作为输入数据;Using the depth image data, the first projection data, and the second projection data as input data;
将所述输入数据输入如权利要求1-5所述的收敛的VGG神经网络识别模型以对所述待识别人脸进行识别。The input data is input into the convergent VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below, and other features and advantages of the present application will become apparent from the description, the drawings, and the claims.
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solution of the embodiments of the present application more clearly, the drawings used in the description of the embodiments of the application will be briefly introduced below. Obviously, the drawings in the following description are just some embodiments of the application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.
图1是本申请中神经网络模型训练方法一应用框架示意图;1 is a schematic diagram of an application framework of a neural network model training method in the present application;
图2是本申请中神经网络模型训练方法一实施例流程示意图;2 is a schematic flowchart of an embodiment of a neural network model training method in the present application;
图3是本申请中人脸识别方法一实施例流程示意图;3 is a schematic flowchart of an embodiment of a face recognition method in the present application;
图4是本申请中神经网络模型训练装置的一实施例结构示意图;4 is a schematic structural diagram of an embodiment of a neural network model training device in the present application;
图5是本申请中人脸识别装置的一实施例结构示意图;5 is a schematic structural diagram of an embodiment of a face recognition device in the present application;
图6是本申请中计算机设备一实施例结构示意图。FIG. 6 is a schematic structural diagram of an embodiment of a computer device in the present application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
本申请提供的神经网络模型训练方法,可应用在如图1的应用环境中,其中,计算机设备获取人脸对应的点云数据,以及人脸对应的深度图像数据;获取点云数据在第一预设方向上的第一投影数据,并获取点云数据在第二预设方向上的第二投影数据,第一预设方向和第二预设方向为不同的投影方向;将深度图像数据、第一投影数据以及第二投影数据三个通道的数据,作为VGG神经网络模型的训练数据;将由N张不同人脸对应的训练数据所构成的训练集对VGG神经网络模型进行训练从而得到训练后收敛的VGG神经网络模 型,其中,N大于或等于2。由此可见,在本申请中,是将不同人脸的深度图像数据,以及人脸的点云数据在两个不同投影方向的投影数据,共有三个通道的数据所构成的训练数据对VGG神经网络模型进行训练,训练得到收敛的VGG神经网络模型,适用于三维人脸的识别,且由于人脸的点云数据对应的投影方向保留有三维人脸的三维特性,因此可有效地三维人脸进行识别。其中,计算机设备为具有计算处理能力的设备,可以但不限于各种个人计算机、笔记本电脑、服务器等。The neural network model training method provided in this application can be applied in the application environment as shown in FIG. 1, where a computer device acquires point cloud data corresponding to a human face and depth image data corresponding to a human face; First projection data in a preset direction, and obtain second projection data of point cloud data in a second preset direction, where the first preset direction and the second preset direction are different projection directions; the depth image data, The data of the three channels of the first projection data and the second projection data are used as the training data of the VGG neural network model. The training set composed of the training data corresponding to N different faces is used to train the VGG neural network model to obtain training. Convergent VGG neural network model, where N is greater than or equal to two. It can be seen that, in this application, the depth image data of different faces and the projection data of the point cloud data of the face in two different projection directions, the training data composed of data of a total of three channels is applied to the VGG nerve. The network model is trained and the trained VGG neural network model is obtained. It is suitable for the recognition of three-dimensional faces, and because the projection direction corresponding to the point cloud data of the face retains the three-dimensional characteristics of the three-dimensional face, it can effectively be three-dimensional For identification. Among them, the computer device is a device having a computing processing capability, and may be, but is not limited to, various personal computers, notebook computers, servers, and the like.
在一实施例中,如图2所示,图2为本申请神经网络模型训练方法一实施例流程示意图,包括如下步骤:In an embodiment, as shown in FIG. 2, FIG. 2 is a schematic flowchart of an embodiment of a neural network model training method of the present application, including the following steps:
S10、获取人脸对应的点云数据,以及该人脸对应的深度图像数据;S10. Acquire point cloud data corresponding to a face and depth image data corresponding to the face.
在本方案中,可以获取人脸对应点云数据以及该人脸对应的深度图像数据,其中,点云数据是指以点的方式记录人脸的表面离散点的信息,包括人脸的表面离散点的空间位置信息和颜色信息(例如RGB),具体地,上述空间位置信息为人脸的表面离散点的空间坐标。例如,该点云数据具体可以表示为:U={P
i=(x
i,y
i,z
i,r
i,g
i,b
i)|1≤i≤M},其中,M为正整数,M为点云数据U中点的数量,i的初始值为1,U中的第i个点记为P
i,x
i、y
i和z
i是点P
i的空间坐标,r
i,g
i,b
i为点P
i中的颜色信息,也即红、绿、蓝三元色信息。
In this solution, the point cloud data corresponding to the face and the depth image data corresponding to the face can be obtained, where the point cloud data refers to the information of the discrete points of the surface of the face recorded in a point manner, including the surface discrete of the face The spatial position information and color information (for example, RGB) of the points, specifically, the above-mentioned spatial position information is the spatial coordinates of discrete points on the surface of the human face. For example, the point cloud data can be specifically expressed as: U = {P i = (x i , y i , z i , r i , g i , b i ) | 1≤i≤M}, where M is a positive integer , M is the number of points in the point cloud data U, the initial value of i is 1, the i-th point in U is denoted by P i , x i , y i and z i are the spatial coordinates of point P i , r i , g i and b i are the color information in the point P i , that is, the red, green, and blue ternary color information.
另外,在本申请中,可以直接通过深度相机(depth camera)获取人脸的对应的点云数据,深度相机指的是一种图像传感器,该图像传感器能够观察到人脸在空间中的位置,具体的,该深度相机可以是主动式、被动式,接触式或非接触式深度相机,其中,主动式是指向人脸发射能量束(如激光、电磁波或超声波等)以获取人脸的点云数据,被动式深度相机主要利用待拍摄对象的周围环境的条件来获取人脸的点云数据,接触式深度相机是指需与人脸接触或比较靠近,非接触式是指不需要与人脸接触。该示例性的,上述深度相机具体可以是指TOF(time-of-flight)深度相机,除此之外,还可以是kinect深度相机,XTion深度相机或RealSense深度相机,具体不做限定。In addition, in this application, the corresponding point cloud data of a human face can be obtained directly through a depth camera. The depth camera refers to an image sensor that can observe the position of a human face in space. Specifically, the depth camera may be an active, passive, contact or non-contact depth camera, wherein the active camera emits an energy beam (such as a laser, an electromagnetic wave, or an ultrasonic wave) toward a human face to obtain point cloud data of the human face. The passive depth camera mainly uses the conditions of the surrounding environment of the object to obtain the point cloud data of the human face. The contact depth camera refers to the need to contact or be closer to the human face, and the non-contact type means no contact with the human face. For example, the depth camera may specifically refer to a TOF (time-of-flight) depth camera. In addition, it may also be a kinect depth camera, an XTion depth camera, or a RealSense depth camera, which is not specifically limited.
另外,应理解,深度图像数据(depth image)也被称为距离影像(range image)数据,是指将图像采集器到现实场景中各点的距离(深度)作为像素值的图像数据,它直接反映了人脸可见表面的几何形状。深度图像数据经过坐标转换可以转换为对应的点云数据,相反的,点云数据也可以反算为深度图像数据,因此,在本申请中,在获得了人脸的点云数据后,可以将得到的人脸的点云数据转换为该人脸对应的深度图像数据。当然,在一些应用场景中,可通过深度相机直接获取人脸的深度图像数据和点云数据,也可先获取人脸的 深度图像数据或点云数据,再转换为点云数据或深度图像数据,具体不做限定。In addition, it should be understood that depth image data is also called range image data, which refers to image data that uses the distance (depth) of the image collector to each point in the real scene as the pixel value. Reflects the geometry of the visible surface of the face. Depth image data can be converted into corresponding point cloud data through coordinate transformation. Conversely, point cloud data can also be calculated as depth image data. Therefore, in this application, after obtaining the point cloud data of a human face, The obtained point cloud data of the face is converted into depth image data corresponding to the face. Of course, in some application scenarios, the depth image data and point cloud data of a person's face can be directly obtained through the depth camera, or the depth image data or point cloud data of the face can be obtained first, and then converted into point cloud data or depth image data. , Not specifically limited.
S20、获取点云数据在第一预设方向上的第一投影数据,并获取点云数据在第二预设方向上的第二投影数据,第一预设方向和第二预设方向为不同的投影方向;S20. Acquire first projection data of the point cloud data in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction. The first preset direction and the second preset direction are different. Projection direction
本申请中,在获取了人脸的点云数据后,可进一步获取人脸的点云数据在第一预设方向上的第一投影数据,以及人脸的点云数据在第二预设方向上的投影数据,并且,第一预设方向和第二预设方向为不同的投影方向。也就是,根据投影方向的不同,可以得到该人脸的点云数据在不同平面上的投影数据。In this application, after obtaining the point cloud data of the face, the first projection data of the point cloud data of the face in the first preset direction and the point cloud data of the face in the second preset direction may be further obtained. And the first preset direction and the second preset direction are different projection directions. That is, the projection data of the point cloud data of the face on different planes can be obtained according to different projection directions.
S30、将深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络模型的训练数据;S30. Use the depth image data, the first projection data, and the second projection data as training data of the VGG neural network model.
也就是说,经过步骤S10-S30,可得到该人脸的对应的深度图像数据、第一投影数据以及第二投影数据三种数据,在本步骤中,将深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络模型三个通道的训练数据从而构成VGG神经网络模型的训练数据,也就是得到该人脸对应的一个训练样本。In other words, after steps S10-S30, three types of data of the corresponding depth image data, first projection data, and second projection data of the face can be obtained. In this step, the depth image data, the first projection data, and The second projection data is used as training data of the three channels of the VGG neural network model to form training data of the VGG neural network model, that is, a training sample corresponding to the face is obtained.
S40、通过由N张人脸对应的训练数据所构成的训练集对VGG神经网络模型进行训练以得到收敛的所述VGG神经网络模型,N大于或等于2。S40. The VGG neural network model is trained to obtain a convergence of the VGG neural network model through a training set composed of training data corresponding to N human faces, where N is greater than or equal to 2.
也就是说,假设有{1,2,...,N}个人脸,N大于或等于2,每一个人脸都对应有图像数据、第一投影数据以及第二投影数据三种数据,每个人脸对应的上述三种数据构成一个训练样本,将N个人脸对应的训练样本则构成训练样本集,并通过该训练集对VGG神经网络模型进行训练直到该VGG神经网络模型收敛。That is, suppose there are {1,2, ..., N} personal faces, and N is greater than or equal to 2, each face corresponds to three types of data: image data, first projection data, and second projection data. The above-mentioned three kinds of data corresponding to a human face constitute a training sample, and the training samples corresponding to N human faces constitute a training sample set, and the VGG neural network model is trained through the training set until the VGG neural network model converges.
由此可见,在本申请中,是将不同人脸的深度图像数据,以及点云数据在不同投影方向的投影数据,共有三个通道的数据所构成的训练数据对VGG神经网络模型进行训练,训练得到VGG神经网络模型,适用于三维人脸的识别,且由于点云数据对应的投影方向保留有三维人脸的三维特性,因此可有效地提取对待识别三维人脸进行识别,能有效地对该三维人脸进行特征提取。It can be seen that in this application, the training data composed of three channels of data is used to train the depth image data of different faces and the projection data of the point cloud data in different projection directions. The VGG neural network model obtained by training is suitable for the recognition of three-dimensional faces, and since the projection direction corresponding to the point cloud data retains the three-dimensional characteristics of the three-dimensional faces, it can effectively extract the three-dimensional faces to be recognized for recognition, and can effectively The three-dimensional face is subjected to feature extraction.
在一实施例中,步骤S20中,获取点云数据在第一预设方向上的第一投影数据,包括:In an embodiment, obtaining the first projection data of the point cloud data in the first preset direction in step S20 includes:
S21、将点云数据在目标坐标系的方位角方向作为第一预设方向;S21. Use the azimuth direction of the point cloud data in the target coordinate system as the first preset direction;
S22、点云数据在第一预设方向上进行投影以生成第一投影数据。S22. The point cloud data is projected in a first preset direction to generate first projection data.
也就是说,在本申请中,可以将点云数据在目标坐标系的方位角方向进行投影以得到第一投影数据,从而得到人脸的点云数据在其中一个二维平面上的投影数据。其中,目标坐标系为世界坐标系,是一种三维坐标系,已知点云数据的一个坐标点P,设为目标坐标 系第一挂限的一个点P,站在原点(O点)看这个点P,从x轴正方向,沿逆时针旋转到点P的垂直投影线上,旋转所形成的x轴与P点的垂直投影线之间的角度就是方向角。其中,对上述点云数据在上述方位角方向上进行投影以生成第一投影数据,具体包括:获取点云数据在目标坐标系中每一个点的坐标值,对点云数据中的每一个点的坐标值在该目标坐标系对应的倾斜角上进行投影,从而生成点云数据中的每一个点的坐标值在该目标坐标系对应的倾斜角上的投影,该点云数据中在该目标坐标系的每一个点的坐标值在倾斜角方向上的投影构成第一投影数据。That is, in this application, the point cloud data may be projected in the azimuth direction of the target coordinate system to obtain the first projection data, thereby obtaining the projection data of the point cloud data of the human face on one of the two-dimensional planes. Among them, the target coordinate system is a world coordinate system, which is a three-dimensional coordinate system. One coordinate point P of the point cloud data is known as the first point P of the target coordinate system. Look at the origin (point O). This point P is rotated counterclockwise from the positive direction of the x-axis to the vertical projection line of the point P, and the angle formed between the x-axis and the vertical projection line of the point P is the directional angle. The projecting the point cloud data in the azimuth direction to generate the first projection data specifically includes: obtaining the coordinate value of each point of the point cloud data in the target coordinate system, and each point in the point cloud data. The coordinate value of is projected on the tilt angle corresponding to the target coordinate system, thereby generating a projection of the coordinate value of each point in the point cloud data on the tilt angle corresponding to the target coordinate system. The point cloud data is on the target The projection of the coordinate value of each point of the coordinate system in the oblique angle direction constitutes the first projection data.
在一实施例中,步骤,S20中,获取点云数据在第二预设方向上的第二投影数据,包括:In an embodiment, in step S20, obtaining second projection data of the point cloud data in a second preset direction includes:
S23、将点云数据在目标坐标系的倾斜角方向作为第二预设方向;S23. Use the tilt angle direction of the point cloud data in the target coordinate system as the second preset direction.
S24、对点云数据在第二预设方向上进行投影以生成第二投影数据。S24. Project the point cloud data in a second preset direction to generate second projection data.
也就是说,在本申请中,可以将点云数据在目标坐标系的倾斜角方向进行投影以得到第一投影数据,从而得到人脸的点云数据在其中一个二维平面上的投影数据。目标坐标系为世界坐标系,是一种三维坐标系,已知点云数据的一个坐标点P,设为目标坐标系第一挂限的一个点P,站在原点(O点)看这个点P,从x轴正方向,沿逆时针旋转到点P的垂直投影线上,旋转所形成的x轴与P点的垂直投影线之间的角度就是方向角,再往高出寻找P点又得到倾斜角,也即P点的垂直投影线与原点与P点之间的直线所形成的角为倾斜角。其中,对上述点云数据在上述倾斜角方向上进行投影以生成第一投影数据,具体包括:获取点云数据在目标坐标系中每一个点的坐标值,对点云数据中的每一个点的坐标值在该目标坐标系对应的倾斜角上进行投影,从而生成点云数据中的每一个点的坐标值在该目标坐标系对应的倾斜角上的投影,该点云数据中在该目标坐标系的每一个点的坐标值在倾斜角方向上的投影构成第一投影数据。That is, in this application, the point cloud data can be projected in the direction of the tilt angle of the target coordinate system to obtain the first projection data, thereby obtaining the projection data of the point cloud data of the human face on one of the two-dimensional planes. The target coordinate system is a world coordinate system, which is a three-dimensional coordinate system. A coordinate point P of the point cloud data is known as a point P of the first limit of the target coordinate system. Stand at the origin (point O) and look at this point. P, from the positive direction of the x-axis, rotates counterclockwise to the vertical projection line of point P. The angle between the x-axis and the vertical projection line of point P is the directional angle. Obtain the tilt angle, that is, the angle formed by the vertical projection line of point P and the straight line between the origin and point P is the tilt angle. The projecting the point cloud data in the direction of the inclination angle to generate the first projection data specifically includes: obtaining the coordinate value of each point of the point cloud data in the target coordinate system, and each point in the point cloud data. The coordinate value of is projected on the tilt angle corresponding to the target coordinate system, thereby generating a projection of the coordinate value of each point in the point cloud data on the tilt angle corresponding to the target coordinate system. The point cloud data is on the target The projection of the coordinate value of each point of the coordinate system in the oblique angle direction constitutes the first projection data.
由此可得,本申请提出了一种具体投影方向以获得人脸的点云数据对应的第一投影数据以及第二投影数据,提高了方案的可实施性。It can be obtained that this application proposes a specific projection direction to obtain the first projection data and the second projection data corresponding to the point cloud data of the face, which improves the implementability of the solution.
在一实施例中,步骤S40中,所采用的VGG神经网络模型的卷积核大小为7x7。In one embodiment, in step S40, the size of the convolution kernel of the VGG neural network model used is 7x7.
其中,本申请中的VGG神经网络模型包括输入层、卷积层(convolution)、激活函数、池化层(pooling)、全连接层(fully connected)、以及归一化层(softmax),卷积层的卷积核大小为7x7。本申请中,将人脸的深度图像数据、第一投影数据以及第二投影数据作为VGG神经网络模型的训练数据代入VGG神经网络模型进行训练。其中,该VGG神经网络模型的输入层用于输入人脸的深度图像数据、人脸的点云数据对应的第一投影数据以 及第二投影数据这三个通道的数据。其中,将上述训练数据输入上述VGG神经网络模型进行训练之前,该方法还包括:在对上述三个通道的训练数据做预处理,其中该预处理包括:去均值处理,用于把输入数据各个维度都中心化为0;归一化处理:分别将输入数据中三个通道的数据的幅度归一化到同样的范围,从而减少各通道数据取值范围的差异而带来的干扰,例如,我们有两个维度的数据A和B,A范围是0到10,而B范围是0到10000,如果直接使用这两个特征会有问题,好的做法就是归一化处理,即A和B的数据都变为0到1的范围。卷积层用于对上述输入数据进行卷积操作以得到特征图并利用激活函数(如ReLU函数)进行非线性转换,应理解,由于经过卷积层卷积得到的特征图是一种线性映射,线性映射的表达能力不够,因此加入一些非线性的激活函数,整个网络中就引入了非线性部分,增强特征图的表达能力,另外,该激活函数具体还可以是sigmoid或tanh激活函数,具体不做限定。池化层用于对上述特征图进行压缩,一方面使特征图变小,简化VGG神经网络计算复杂度;一方面进行特征压缩,从而提取出输入数据的主要特征;其中,常用的池化层具体可以是max Pooling或Over lapping Pooling,还可以是其他的池化层,例如Spatial Pyramid Pooling等,具体不做限定。全连接层用于连接池化层得到的所有特征,最后输出至归一化层,在将人脸的深度图像数据、第一投影数据以及第二投影数据作为训练数据所构成的训练集进行大量训练后可得到最终的VGG神经网络模型,具体的训练过程不做一一赘述。The VGG neural network model in this application includes an input layer, a convolution layer, an activation function, a pooling layer, a fully connected layer, and a normalization layer (softmax). The convolution kernel size of the layer is 7x7. In this application, the depth image data of the human face, the first projection data, and the second projection data are used as training data of the VGG neural network model to be substituted into the VGG neural network model for training. The input layer of the VGG neural network model is used to input three channels of data: the depth image data of the face, the first projection data corresponding to the point cloud data of the face, and the second projection data. Before the training data is inputted into the VGG neural network model for training, the method further includes: preprocessing the training data of the three channels, where the preprocessing includes: de-averaging processing for Dimensions are all centered to 0; normalization processing: Normalize the amplitude of the data of the three channels in the input data to the same range, thereby reducing the interference caused by the difference in the value range of each channel. For example, We have two dimensions of data A and B. The range of A is 0 to 10 and the range of B is 0 to 10000. If you use these two features directly, there will be problems. The good practice is to normalize them, that is, A and B. The data are all in the range of 0 to 1. The convolution layer is used to perform a convolution operation on the above input data to obtain a feature map and use an activation function (such as the ReLU function) to perform a non-linear transformation. It should be understood that the feature map obtained by the convolution layer convolution is a linear mapping The expressive ability of the linear mapping is not enough, so some non-linear activation functions are added. The non-linear part is introduced in the entire network to enhance the expression ability of the feature map. In addition, the activation function can also be a sigmoid or tanh activation function. No restrictions. The pooling layer is used to compress the above feature maps. On the one hand, the feature maps are reduced to simplify the computational complexity of the VGG neural network; on the other hand, feature compression is performed to extract the main features of the input data. Among them, the commonly used pooling layer It can be max Pooling or Overlapping Pooling, or other pooling layers, such as Spatial Pyramid Pooling, etc., which are not specifically limited. The fully connected layer is used to connect all the features obtained by the pooling layer, and finally output to the normalization layer. A large number of training sets are constructed using the depth image data of the face, the first projection data, and the second projection data as training data. After training, the final VGG neural network model can be obtained. The specific training process is not described in detail.
在本申请中,该VGG神经网络模型的卷积层中,使用结构为7x7的卷积层模板。应理解,因为深度图像数据相对于二维图像更加平滑,不再适用3x3的卷积核数,如果还采用3x3的卷积核,由于3x3的卷积核,由于卷积核范围较宅,深度图像数据又比较平滑,易丢人脸的深度图像数据,因此在本申请中,可以扩大卷积核的大小,具体地,在本申请中,使用结构为7x7的卷积核,可以有效地减少人脸的深度图像数据的丢失,从而使得训练出来的VGG神经网络模型在对待识别人脸进行识别时更为准确。In the present application, a convolution layer template with a structure of 7 × 7 is used in the convolution layer of the VGG neural network model. It should be understood that because the depth image data is smoother than the two-dimensional image, the number of 3x3 convolution kernels is no longer applicable. If a 3x3 convolution kernel is also used, because the 3x3 convolution kernel has a larger range than the house, the depth The image data is relatively smooth, and it is easy to lose face depth image data. Therefore, in this application, the size of the convolution kernel can be enlarged. Specifically, in this application, a 7 × 7 convolution kernel is used to effectively reduce the number of people. The loss of face depth image data makes the trained VGG neural network model more accurate in identifying faces to be recognized.
在一实施例中,获取人脸的点云数据,包括:In an embodiment, obtaining point cloud data of a human face includes:
获取人脸不同姿态下的每一帧点云数据;Obtain point cloud data of each frame in different poses of the face;
将不同帧点云数据进行融合匹配,以统一到同一坐标系中的融合点云数据作为人脸的点云数据。The point cloud data of different frames are fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the face.
需要说明的是,由于受人脸的大小、环境以及扫描设备具等因素的限制,扫描设备在每次扫描时可能只能测量到人脸的一个侧面。因此,在具体实现上,为获得人脸完整的点云数据,通过扫描设备以不同姿态对人脸行多次扫描。其中,每次扫描可以得到一帧点云 数据,将不同帧点云数据进行融合匹配,以统一到同一坐标系中的融合点云数据作为人脸的点云数据。具体的,在一些方案中,将不同帧点云数据进行融合匹配,可以采用以下方式,例如:迭代最近点法((Iterative Closest Point,ICP)、正态分布变换法(Normal Distribution Transformation,NDT)等,具体不做限定。It should be noted that due to the limitation of the size of the human face, the environment, and the scanning equipment, the scanning device may only measure one side of the human face during each scan. Therefore, in specific implementation, in order to obtain the complete point cloud data of the face, the face is scanned multiple times with different postures by the scanning device. Among them, one frame of point cloud data can be obtained per scan, and the point cloud data of different frames are fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the face. Specifically, in some solutions, the point cloud data of different frames can be fused and matched in the following ways, for example: Iterative Closest Point (ICP), Normal Distribution Transformation (NDT) Wait, not specifically limited.
需要说明的,通过由N张人脸对应的训练数据所构成的训练集对VGG神经网络模型进行训练以得到收敛的所述VGG神经网络模型,其收敛条件可进行配置,例如通过BP(Error Back Propagation)算法对上述训练集进行迭代训练直至VGG神经网络模型收敛。It should be noted that the VGG neural network model is trained by using a training set composed of training data corresponding to N faces to obtain a convergence of the VGG neural network model, and its convergence conditions can be configured, for example, through BP Propagation) algorithm performs iterative training on the above training set until the VGG neural network model converges.
在一实施例中,如图3所示,图3为本申请人脸识别方法一实施例流程示意图,包括如下步骤:In an embodiment, as shown in FIG. 3, FIG. 3 is a schematic flowchart of an embodiment of the applicant's face recognition method, including the following steps:
S10`、获取待识别人脸的点云数据以及深度图像数据;S10`: Acquire point cloud data and depth image data of the face to be identified;
在本申请中,也可以直接通过深度相机获取待识别人脸的点云数据和深度图像数据,其中,深度相机指的是一种图像传感器,该图像传感器能够观察到物体或人物在空间中的位置,具体的,该深度相机可以是主动式、被动式,接触式或非接触式深度相机,其中,主动式是指向待识别人脸发射能量束(如激光、电磁波或超声波等)以获取待识别人脸的点云数据,被动式深度相机主要利用待识别人脸的周围环境的条件来获取待识别人脸的点云数据,接触式深度相机是指需与待识别人脸接触或比较靠近,非接触式是指不需要与待识别人脸接触。该示例性的,上述深度相机具体可以是指TOF(time-of-flight)深度相机,除此之外,还可以是kinect深度相机,XTion深度相机或RealSense深度相机,具体不做限定。In this application, point cloud data and depth image data of a face to be recognized can also be obtained directly through a depth camera, where the depth camera refers to an image sensor that can observe objects or people in space. The position, specifically, the depth camera may be an active, passive, contact or non-contact depth camera, where the active is directed to the face to be identified and emits an energy beam (such as laser, electromagnetic wave or ultrasound) to obtain the to be identified The point cloud data of the face. The passive depth camera mainly uses the conditions of the surroundings of the face to be identified to obtain the point cloud data of the face to be identified. The contact depth camera refers to the need to contact or be close to the face to be identified. The contact type means that there is no need to contact the face to be identified. For example, the depth camera may specifically refer to a TOF (time-of-flight) depth camera. In addition, it may also be a kinect depth camera, an XTion depth camera, or a RealSense depth camera, which is not specifically limited.
在本申请中,在获得了待识别人脸的点云数据后,可以将得到的待识别人脸的点云数据转换为该待识别人脸对应的深度图像数据。简单点说,在一些应用场景中,可通过深度相机直接获取待识别人脸的深度图像数据和点云数据,也可先获取待识别人脸的深度图像数据或点云数据,再转换为点云数据或深度图像数据,具体不做限定。In this application, after the point cloud data of the face to be identified is obtained, the obtained point cloud data of the face to be identified may be converted into depth image data corresponding to the face to be identified. To put it simply, in some application scenarios, the depth image data and point cloud data of the face to be recognized can be directly obtained through the depth camera, or the depth image data or point cloud data of the face to be recognized can be obtained first, and then converted into points Cloud data or depth image data are not specifically limited.
S20`、获取点云数据在第一预设方向上的第一投影数据,并获取点云数据在第二预设方向上的第二投影数据,第一预设方向和第二预设方向为不同的投影方向;S20`. Acquire first projection data of the point cloud data in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction. The first preset direction and the second preset direction are Different projection directions
本申请中,在获取了待识别人脸的点云数据后,可进一步获取待识别人脸的点云数据在第一预设方向上的第一投影数据,以及待识别人脸的点云数据在第二预设方向上的投影数据,并且,第一预设方向和第二预设方向为不同的投影方向。在一些应用场景中,上述第一预设方向为待识别人脸的点云数据在目标坐标系的方位方向,第二预设方向为待识别人脸的点云数据在目标坐标系的倾斜角方向。目标坐标系为世界坐标系,是一种三维坐标 系,已知待识别人脸的点云数据的一个坐标点P,设为目标坐标系第一挂限的一个点P,站在原点(O点)看这个点P,从x轴正方向,沿逆时针旋转到点P的垂直投影线上,旋转所形成的x轴与P点的垂直投影线之间的角度就是方向角,再往高出寻找P点又得到倾斜角,也即P点的垂直投影线与原点与P点之间的直线所形成的角为倾斜角。其中,对上述点云数据在上述倾斜角方向上进行投影以生成第一投影数据,具体包括:获取待识别人脸的点云数据在目标坐标系中每一个点的坐标值,对待识别人脸的点云数据中的每一个点的坐标值在该目标坐标系对应的倾斜角上进行投影,从而生成待识别人脸的点云数据中的每一个点的坐标值在该目标坐标系对应的倾斜角上的投影,该待识别人脸的点云数据中在该目标坐标系的每一个点的坐标值在倾斜角方向上的投影构成第一投影数据。In this application, after the point cloud data of the face to be identified is obtained, the first projection data of the point cloud data of the face to be identified in a first preset direction and the point cloud data of the face to be identified may be further obtained. The projection data in the second preset direction, and the first preset direction and the second preset direction are different projection directions. In some application scenarios, the first preset direction is the azimuth direction of the point cloud data of the face to be recognized in the target coordinate system, and the second preset direction is the tilt angle of the point cloud data of the face to be recognized in the target coordinate system. direction. The target coordinate system is a world coordinate system, which is a three-dimensional coordinate system. One coordinate point P of the point cloud data of the face to be identified is known as the point P of the first hanging limit of the target coordinate system, standing at the origin (O Point) Look at this point P. From the positive direction of the x axis, rotate counterclockwise to the vertical projection line of point P. The angle between the x axis formed by the rotation and the vertical projection line of point P is the direction angle, and then go higher. Find the P point and get the tilt angle, that is, the angle formed by the vertical projection line of the P point and the straight line between the origin and the P point is the tilt angle. The projecting the point cloud data in the direction of the oblique angle to generate the first projection data specifically includes: obtaining the coordinate value of each point of the point cloud data of the face to be identified in the target coordinate system, and the face to be identified The coordinate values of each point in the point cloud data are projected on the tilt angle corresponding to the target coordinate system, so that the coordinate values of each point in the point cloud data of the face to be identified are generated corresponding to the target coordinate system. The projection on the tilt angle, and the projection of the coordinate value of each point in the target coordinate system of the point cloud data of the face to be recognized in the tilt angle direction constitutes the first projection data.
S30`、将深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络识别模型的输入数据;S30 ', using the depth image data, the first projection data and the second projection data as input data of a VGG neural network recognition model;
S40`、将输入数据输入VGG神经网络识别模型以对待识别人脸进行识别。S40`: Input the input data into a VGG neural network recognition model to recognize the face to be recognized.
应理解,VGG神经网络识别模型是一种深度卷积神经网络架构,本申请的VGG神经网络识别模型是指前述模型训练方法中所得到的VGG神经网络模型。通过将待识别人脸对应的深度图像数据、第一投影数据以及第二投影数据输入VGG神经网络模型,从而完成对待识别人脸的识别。It should be understood that the VGG neural network recognition model is a deep convolutional neural network architecture. The VGG neural network recognition model of the present application refers to the VGG neural network model obtained in the foregoing model training method. The depth image data corresponding to the face to be recognized, the first projection data, and the second projection data are input into the VGG neural network model, thereby completing the recognition of the face to be recognized.
由此可见,在该人脸识别方法中,是将待识别人脸的深度图像数据,以及点云数据在不同投影方向的投影数据,共有三个通道的数据输入训练后的到VGG神经网络模型,由于VGG神经网络模型适用于三维人脸的识别,且由于点云数据对应的投影方向保留有三维人脸的三维特性,因此可有效地提取对待识别三维人脸进行识别,能有效地对该三维人脸进行特征提取。It can be seen that in this face recognition method, the depth image data of the face to be identified and the projection data of the point cloud data in different projection directions are input into the trained VGG neural network model with a total of three channels of data Because the VGG neural network model is suitable for the recognition of three-dimensional faces, and because the projection direction corresponding to the point cloud data retains the three-dimensional characteristics of the three-dimensional faces, the three-dimensional faces to be recognized can be effectively extracted for recognition, and the 3D human face for feature extraction.
应理解,所述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the embodiment does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application. .
在一实施例中,提供一种神经网络模型训练装置,神经网络模型训练装置与实施例中模型训练方法一一对应。如图4所示,该神经网络模型训练装置40包括第一获取模块401、第二获取模块402、确定模块403和训练模块404。各功能模块详细说明如下:In one embodiment, a neural network model training device is provided, and the neural network model training device corresponds to the model training method in the embodiment one by one. As shown in FIG. 4, the neural network model training device 40 includes a first acquisition module 401, a second acquisition module 402, a determination module 403, and a training module 404. The detailed description of each function module is as follows:
第一获取模块401,用于获取人脸对应的点云数据,以及人脸对应的深度图像数据;A first acquisition module 401, configured to acquire point cloud data corresponding to a human face and depth image data corresponding to a human face;
第二获取模块402,用于获取第一获取模块401获取的点云数据在第一预设方向上的第一投影数据,并获取第一获取模块401在点云数据在第二预设方向上的第二投影数据, 第一预设方向和第二预设方向为不同的投影方向;The second acquisition module 402 is configured to acquire the first projection data of the point cloud data acquired by the first acquisition module 401 in the first preset direction, and acquire the point cloud data of the first acquisition module 401 in the second preset direction. The second projection data, the first preset direction and the second preset direction are different projection directions;
确定模块403,用于将第一获取模块402获取的深度图像数据、第二获取模块获取的第一投影数据以及第二投影数据,作为VGG神经网络模型的训练数据;A determining module 403, configured to use the depth image data obtained by the first obtaining module 402, the first projection data and the second projection data obtained by the second obtaining module as training data of a VGG neural network model;
训练模块404,用于通过由N个人脸对应的确定模块403确定的训练数据所构成的训练集对VGG神经网络模型进行训练以得到收敛的所述VGG神经网络模型,N大于或等于2。A training module 404 is configured to train a VGG neural network model to obtain a convergence of the VGG neural network model by using a training set composed of training data determined by the N-face corresponding determination module 403, where N is greater than or equal to 2.
在一些实施例中,第二获取模块402具体用于:In some embodiments, the second obtaining module 402 is specifically configured to:
将点云数据在目标坐标系的方位角方向作为第一预设方向;Use the azimuth direction of the point cloud data in the target coordinate system as the first preset direction;
对点云数据在第一预设方向上进行投影以生成第一投影数据。Project the point cloud data in a first preset direction to generate first projection data.
在一些实施例中,第二获取模块402具体用于:In some embodiments, the second obtaining module 402 is specifically configured to:
将点云数据在目标坐标系的倾斜角方向作为第二预设方向;Use the tilt angle direction of the point cloud data in the target coordinate system as the second preset direction;
对点云数据在第二预设方向上进行投影以生成第二投影数据。The point cloud data is projected in a second preset direction to generate second projection data.
在一实施例中,VGG神经网络模型的卷积核大小为7x7。In one embodiment, the convolution kernel size of the VGG neural network model is 7 × 7.
在一实施例中,第一获取模块401具体用于:In an embodiment, the first obtaining module 401 is specifically configured to:
获取人脸在不同姿态下的每一帧点云数据;Get point cloud data of each frame of the face in different poses;
将每一帧点云数据进行融合匹配,以统一到同一坐标系中的融合点云数据作为人脸的点云数据。The point cloud data of each frame is fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the face.
关于神经网络模型训练装置的具体限定可以参见上文中对于模型训练方法的限定,在此不再赘述。模型训练装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the neural network model training device, refer to the limitation on the model training method described above, which is not repeated here. Each module in the model training device may be implemented in whole or in part by software, hardware, and a combination thereof. Each module may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一实施例中,提供一种人脸识别装置,人脸识别装置与实施例中人脸识别方法一一对应。如图5所示,人脸识别装置50包括第一获取模块501、第二获取模块502、确定模块503和识别模块504。各功能模块详细说明如下:In one embodiment, a face recognition device is provided, and the face recognition device corresponds to the face recognition method in the embodiment one by one. As shown in FIG. 5, the face recognition device 50 includes a first acquisition module 501, a second acquisition module 502, a determination module 503, and a recognition module 504. The detailed description of each function module is as follows:
第一获取模块501,用于获取待识别人脸的点云数据以及深度图像数据;A first acquisition module 501, configured to acquire point cloud data and depth image data of a face to be identified;
第二获取模块502,用于获取第一获取模块501获取的点云数据在第一预设方向上的第一投影数据,并获取第一获取模块501获取的点云数据在第二预设方向上的第二投影数据,第一预设方向和第二预设方向为不同的投影方向;The second acquisition module 502 is configured to acquire first projection data of the point cloud data acquired by the first acquisition module 501 in a first preset direction, and acquire the point cloud data acquired by the first acquisition module 501 in a second preset direction. The second projection data on the first preset direction and the second preset direction are different projection directions;
确定模块503,用于将第一获取模块501获取的深度图像数据、第二获取模块502获取的第一投影数据以及第二投影数据,作为VGG神经网络识别模型的输入数据;A determining module 503, configured to use the depth image data obtained by the first obtaining module 501, the first projection data and the second projection data obtained by the second obtaining module 502 as input data of a VGG neural network recognition model;
识别模块504,用于将确定模块503确定的输入数据输入VGG神经网络识别模型以对待识别人脸进行识别。The recognition module 504 is configured to input the input data determined by the determination module 503 into a VGG neural network recognition model to recognize a face to be recognized.
关于人脸识别装置的具体限定可以参见上文中对于人脸识别方法的限定,在此不再赘述。人脸识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For specific limitations on the face recognition device, reference may be made to the limitations on the face recognition method described above, and details are not described herein again. Each module in the face recognition device may be implemented in whole or in part by software, hardware, and a combination thereof. Each module may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机设备,其内部结构图可以如图6所示。所述计算机设备包括通过系统总线连接的处理器、存储器和数据库。其中,所述计算机设备的处理器用于提供计算和控制能力。所述计算机设备的存储器包括非易失性存储介质、内存储器。所述非易失性存储介质存储有操作系统、计算机可读指令和数据库。所述内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。所述计算机设备的数据库用于存储所获取的图像数据。所述计算机设备的网络接口用于与外部的终端通过网络连接通信。所述计算机可读指令被处理器执行时以实现一种模型训练方法或人脸识别方法。In one embodiment, a computer device is provided, and its internal structure diagram can be as shown in FIG. 6. The computer device includes a processor, a memory, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for running an operating system and computer-readable instructions in a non-volatile storage medium. The database of the computer equipment is used to store the acquired image data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by a processor to implement a model training method or a face recognition method.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:
获取人脸对应的点云数据,以及人脸对应的深度图像数据;Obtaining point cloud data corresponding to the face, and depth image data corresponding to the face;
获取点云数据在第一预设方向上的第一投影数据,并获取点云数据在第二预设方向上的第二投影数据,第一预设方向和第二预设方向为不同的投影方向;Acquire first projection data of the point cloud data in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction. The first preset direction and the second preset direction are different projections. direction;
将深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络模型的训练数据;Using the depth image data, the first projection data, and the second projection data as training data of the VGG neural network model;
通过由N个人脸对应的训练数据所构成的训练集对VGG神经网络模型进行训练以得到收敛的所述VGG神经网络模型,N大于或等于2。The VGG neural network model is trained to obtain a convergence of the VGG neural network model through a training set composed of training data corresponding to N human faces, where N is greater than or equal to 2.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:
获取待识别人脸的点云数据以及深度图像数据;Obtain point cloud data and depth image data of the face to be identified;
获取点云数据在第一预设方向上的第一投影数据,并获取点云数据在第二预设方向上的第二投影数据,第一预设方向和第二预设方向为不同的投影方向;Acquire first projection data of the point cloud data in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction. The first preset direction and the second preset direction are different projections. direction;
将深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络识别模型的输入数据;Using the depth image data, the first projection data, and the second projection data as input data of a VGG neural network recognition model;
将输入数据输入VGG神经网络识别模型以对待识别人脸进行识别。The input data is input into a VGG neural network recognition model to recognize a face to be recognized.
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,该非易失性可读存储介质上存储有计算机可读指令,该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现以下步骤:In one embodiment, one or more non-volatile readable storage media storing computer readable instructions are provided, and the non readable storage medium stores computer readable instructions, the computer readable instructions When executed by one or more processors, causes the one or more processors to perform the following steps:
获取人脸对应的点云数据,以及人脸对应的深度图像数据;Obtaining point cloud data corresponding to the face, and depth image data corresponding to the face;
获取点云数据在第一预设方向上的第一投影数据,并获取点云数据在第二预设方向上的第二投影数据,第一预设方向和第二预设方向为不同的投影方向;Acquire first projection data of the point cloud data in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction. The first preset direction and the second preset direction are different projections. direction;
将深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络模型的训练数据;Using the depth image data, the first projection data, and the second projection data as training data of the VGG neural network model;
通过由N个人脸对应的训练数据所构成的训练集对VGG神经网络模型进行训练,N大于或等于2。The VGG neural network model is trained through a training set composed of training data corresponding to N human faces, where N is greater than or equal to 2.
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,该非易失性可读存储介质上存储有计算机可读指令,该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现以下步骤:In one embodiment, one or more non-volatile readable storage media storing computer readable instructions are provided, and the non readable storage medium stores computer readable instructions, the computer readable instructions When executed by one or more processors, causes the one or more processors to perform the following steps:
获取待识别人脸的点云数据以及深度图像数据;Obtain point cloud data and depth image data of the face to be identified;
获取点云数据在第一预设方向上的第一投影数据,并获取点云数据在第二预设方向上的第二投影数据,第一预设方向和第二预设方向为不同的投影方向;Acquire first projection data of the point cloud data in a first preset direction, and acquire second projection data of the point cloud data in a second preset direction. The first preset direction and the second preset direction are different projections. direction;
将深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络识别模型的输入数据;Using the depth image data, the first projection data, and the second projection data as input data of a VGG neural network recognition model;
将输入数据输入VGG神经网络识别模型以对待识别人脸进行识别。The input data is input into a VGG neural network recognition model to recognize a face to be recognized.
本领域普通技术人员可以理解实现所述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,所述计算机可读指令在执行时,可包括如所述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态 RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the method of the embodiment can be implemented by computer-readable instructions to instruct related hardware. The computer-readable instructions can be stored in a non-volatile In a computer-readable storage medium, the computer-readable instructions, when executed, may include the processes of the embodiments of the methods as described. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以所述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将所述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the division of the functional units and modules is described as an example. In actual applications, the functions can be assigned to different functions as required Units and modules are completed, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to describe the technical solution of the present application, but are not limited thereto. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of this application.
Claims (20)
- 一种神经网络模型训练方法,其特征在于,包括:A neural network model training method is characterized in that it includes:获取人脸对应的点云数据,以及所述人脸对应的深度图像数据;Obtaining point cloud data corresponding to a human face, and depth image data corresponding to the human face;获取所述点云数据在第一预设方向上的第一投影数据,并获取所述点云数据在第二预设方向上的第二投影数据,所述第一预设方向和所述第二预设方向为不同的投影方向;Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;将所述深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络模型的训练数据;Using the depth image data, the first projection data, and the second projection data as training data of a VGG neural network model;通过由N个所述人脸对应的所述训练数据所构成的训练集对所述VGG神经网络模型进行训练以得到收敛的所述VGG神经网络模型,所述N大于或等于2。The VGG neural network model is trained to obtain a convergent VGG neural network model by using a training set composed of the training data corresponding to N faces, where N is greater than or equal to 2.
- 如权利要求1所述的神经网络模型训练方法,其特征在于,所述获取所述点云数据在第一预设方向上的第一投影数据,包括:The method for training a neural network model according to claim 1, wherein the acquiring the first projection data of the point cloud data in a first preset direction comprises:将所述点云数据在目标坐标系的方位角方向作为第一预设方向;Use the azimuth direction of the point cloud data in the target coordinate system as the first preset direction;对所述点云数据在所述第一预设方向上进行投影以生成所述第一投影数据。Projecting the point cloud data in the first preset direction to generate the first projection data.
- 如权利要求2所述的神经网络模型训练方法,其特征在于,所述获取所述点云数据在第二预设方向上的第二投影数据,包括:The method for training a neural network model according to claim 2, wherein the acquiring the second projection data of the point cloud data in a second preset direction comprises:将所述点云数据在目标坐标系的倾斜角方向作为所述第二预设方向;Using the tilt angle direction of the point cloud data in the target coordinate system as the second preset direction;对所述点云数据在所述第二预设方向上进行投影以生成第二投影数据。Projecting the point cloud data in the second preset direction to generate second projection data.
- 如权利要求3所述的神经网络模型训练方法,其特征在于,所述VGG神经网络模型的卷积核大小为7x7。The neural network model training method according to claim 3, wherein the size of the convolution kernel of the VGG neural network model is 7x7.
- 如权利要求4所述的神经网络模型训练方法,其特征在于,所述获取人脸的点云数据,包括:The neural network model training method according to claim 4, wherein the acquiring point cloud data of a human face comprises:获取所述人脸在不同姿态下的每一帧点云数据;Acquiring point cloud data of each face of the face in different poses;将所述每一帧点云数据进行融合匹配,以统一到同一坐标系中的融合点云数据作为所述人脸的点云数据。The point cloud data of each frame is fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the human face.
- 一种人脸识别方法,其特征在于,包括:A face recognition method, comprising:获取待识别人脸的点云数据以及深度图像数据;Obtain point cloud data and depth image data of the face to be identified;获取所述点云数据在第一预设方向上的第一投影数据,并获取所述点云数据在第二预设方向上的第二投影数据,所述第一预设方向和所述第二预设方向为不同的投影方向;Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;将所述深度图像数据、所述第一投影数据以及所述第二投影数据作为输入数据;Using the depth image data, the first projection data, and the second projection data as input data;将所述输入数据输入如权利要求1-5所述的收敛的VGG神经网络识别模型以对所述待识别人脸进行识别。The input data is input into the convergent VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.
- 一种神经网络模型训练装置,其特征在于,包括:A neural network model training device includes:第一获取模块,用于获取人脸对应的点云数据,以及所述人脸对应的深度图像数据;A first acquisition module, configured to acquire point cloud data corresponding to a human face and depth image data corresponding to the human face;第二获取模块,用于获取所述第一获取模块获取的所述点云数据在第一预设方向上的第一投影数据,并获取所述第一获取模块在所述点云数据在第二预设方向上的第二投影数据,所述第一预设方向和所述第二预设方向为不同的投影方向;A second obtaining module, configured to obtain first projection data of the point cloud data obtained by the first obtaining module in a first preset direction, and obtain the point cloud data obtained by the first obtaining module in the first preset direction; Second projection data in two preset directions, the first preset direction and the second preset direction being different projection directions;确定模块,用于将所述第一获取模块获取的所述深度图像数据、所述第二获取模块获取的所述第一投影数据以及第二投影数据,作为VGG神经网络模型的训练数据;A determining module, configured to use the depth image data obtained by the first obtaining module, the first projection data and the second projection data obtained by the second obtaining module as training data of a VGG neural network model;训练模块,用于通过由N个所述人脸对应的所述确定模块确定的所述训练数据所构成的训练集对所述VGG神经网络模型进行训练以得到收敛的所述VGG神经网络模型,所述N大于或等于2。A training module, configured to train the VGG neural network model by using a training set composed of the training data determined by the determining module corresponding to the N faces to obtain the convergent VGG neural network model, The N is greater than or equal to two.
- 如权利要求7所述的神经网络模型训练装置,其特征在于,所述第二获取模块具体用于:The neural network model training device according to claim 7, wherein the second acquisition module is specifically configured to:将点云数据在目标坐标系的方位角方向作为第一预设方向;Use the azimuth direction of the point cloud data in the target coordinate system as the first preset direction;对点云数据在第一预设方向上进行投影以生成第一投影数据。Project the point cloud data in a first preset direction to generate first projection data.
- 如权利要求8所述的神经网络模型训练装置,其特征在于,所述第二获取模块还具体用于:The apparatus for training a neural network model according to claim 8, wherein the second acquisition module is further configured to:将点云数据在目标坐标系的倾斜角方向作为第二预设方向;Use the tilt angle direction of the point cloud data in the target coordinate system as the second preset direction;对点云数据在第二预设方向上进行投影以生成第二投影数据。The point cloud data is projected in a second preset direction to generate second projection data.
- 如权利要求9所述的神经网络模型训练装置,其特征在于,所述VGG神经网络模型的卷积核大小为7x7。The neural network model training device according to claim 9, wherein the size of the convolution kernel of the VGG neural network model is 7x7.
- 如权利要求10所述的神经网络模型训练装置,其特征在于,所述第一获取模块具体用于:The neural network model training device according to claim 10, wherein the first acquisition module is specifically configured to:获取人脸在不同姿态下的每一帧点云数据;Get point cloud data of each frame of the face in different poses;将每一帧点云数据进行融合匹配,以统一到同一坐标系中的融合点云数据作为人脸的点云数据。The point cloud data of each frame is fused and matched, and the fused point cloud data unified into the same coordinate system is used as the point cloud data of the face.
- 一种人脸识别装置,其特征在于,包括:A face recognition device, comprising:第一获取模块,用于获取待识别人脸的点云数据以及深度图像数据;A first acquisition module, configured to acquire point cloud data and depth image data of a face to be identified;第二获取模块,用于获取所述第一获取模块获取的所述点云数据在第一预设方向上的 第一投影数据,并获取所述点云数据在第二预设方向上的第二投影数据,所述第一预设方向和所述第二预设方向为不同的投影方向;A second acquisition module is configured to acquire first projection data of the point cloud data in a first preset direction acquired by the first acquisition module, and acquire a first projection data of the point cloud data in a second preset direction. Two projection data, the first preset direction and the second preset direction are different projection directions;确定模块,用于将所述第一获取模块获取的所述深度图像数据、所述第二获取模块获取的所述第一投影数据以及所述第二投影数据,作为VGG神经网络识别模型的输入数据;A determination module, configured to use the depth image data obtained by the first acquisition module, the first projection data and the second projection data obtained by the second acquisition module as inputs of a VGG neural network recognition model data;识别模块,用于将所述确定模块确定的所述输入数据输入如权利要求1-5所述的收敛的所述VGG神经网络识别模型以对所述待识别人脸进行识别。A recognition module is configured to input the input data determined by the determination module into the convergence VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.
- 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the computer-readable instructions as follows: step:获取人脸对应的点云数据,以及所述人脸对应的深度图像数据;Obtaining point cloud data corresponding to a human face, and depth image data corresponding to the human face;获取所述点云数据在第一预设方向上的第一投影数据,并获取所述点云数据在第二预设方向上的第二投影数据,所述第一预设方向和所述第二预设方向为不同的投影方向;Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;将所述深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络模型的训练数据;Using the depth image data, the first projection data, and the second projection data as training data of a VGG neural network model;通过由N个所述人脸对应的所述训练数据所构成的训练集对所述VGG神经网络模型进行训练以得到收敛的所述VGG神经网络模型,所述N大于或等于2。The VGG neural network model is trained to obtain a convergent VGG neural network model by using a training set composed of the training data corresponding to N faces, where N is greater than or equal to 2.
- 如权利要求13所述的计算机设备,其特征在于,所述获取所述点云数据在第一预设方向上的第一投影数据,包括:The computer device according to claim 13, wherein the acquiring the first projection data of the point cloud data in a first preset direction comprises:将所述点云数据在目标坐标系的方位角方向作为第一预设方向;Use the azimuth direction of the point cloud data in the target coordinate system as the first preset direction;对所述点云数据在所述第一预设方向上进行投影以生成所述第一投影数据。Projecting the point cloud data in the first preset direction to generate the first projection data.
- 如权利要求14所述的计算机设备,其特征在于,所述获取所述点云数据在第二预设方向上的第二投影数据,包括:The computer device according to claim 14, wherein the acquiring the second projection data of the point cloud data in a second preset direction comprises:将所述点云数据在目标坐标系的倾斜角方向作为所述第二预设方向;Using the tilt angle direction of the point cloud data in the target coordinate system as the second preset direction;对所述点云数据在所述第二预设方向上进行投影以生成第二投影数据。Projecting the point cloud data in the second preset direction to generate second projection data.
- 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the computer-readable instructions as follows: step:获取待识别人脸的点云数据以及深度图像数据;Obtain point cloud data and depth image data of the face to be identified;获取所述点云数据在第一预设方向上的第一投影数据,并获取所述点云数据在第二预设方向上的第二投影数据,所述第一预设方向和所述第二预设方向为不同的投影方向;Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;将所述深度图像数据、所述第一投影数据以及所述第二投影数据作为输入数据;Using the depth image data, the first projection data, and the second projection data as input data;将所述输入数据输入如权利要求1-5所述的收敛的VGG神经网络识别模型以对所述待识别人脸进行识别。The input data is input into the convergent VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.
- 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer-readable instructions, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute The following steps:获取人脸对应的点云数据,以及所述人脸对应的深度图像数据;Obtaining point cloud data corresponding to a human face, and depth image data corresponding to the human face;获取所述点云数据在第一预设方向上的第一投影数据,并获取所述点云数据在第二预设方向上的第二投影数据,所述第一预设方向和所述第二预设方向为不同的投影方向;Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;将所述深度图像数据、第一投影数据以及第二投影数据,作为VGG神经网络模型的训练数据;Using the depth image data, the first projection data, and the second projection data as training data of a VGG neural network model;通过由N个所述人脸对应的所述训练数据所构成的训练集对所述VGG神经网络模型进行训练以得到收敛的所述VGG神经网络模型,所述N大于或等于2。The VGG neural network model is trained to obtain a convergent VGG neural network model by using a training set composed of the training data corresponding to N faces, where N is greater than or equal to 2.
- 如权利要求17所述的非易失性可读存储介质,其特征在于,所述获取所述点云数据在第一预设方向上的第一投影数据,包括:The non-volatile readable storage medium according to claim 17, wherein the acquiring the first projection data of the point cloud data in a first preset direction comprises:将所述点云数据在目标坐标系的方位角方向作为第一预设方向;Use the azimuth direction of the point cloud data in the target coordinate system as the first preset direction;对所述点云数据在所述第一预设方向上进行投影以生成所述第一投影数据。Projecting the point cloud data in the first preset direction to generate the first projection data.
- 如权利要求18所述的非易失性可读存储介质,其特征在于,所述获取所述点云数据在第二预设方向上的第二投影数据,包括:The non-volatile readable storage medium according to claim 18, wherein the acquiring the second projection data of the point cloud data in a second preset direction comprises:将所述点云数据在目标坐标系的倾斜角方向作为所述第二预设方向;Using the tilt angle direction of the point cloud data in the target coordinate system as the second preset direction;对所述点云数据在所述第二预设方向上进行投影以生成第二投影数据。Projecting the point cloud data in the second preset direction to generate second projection data.
- 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer-readable instructions, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute The following steps:获取待识别人脸的点云数据以及深度图像数据;Obtain point cloud data and depth image data of the face to be identified;获取所述点云数据在第一预设方向上的第一投影数据,并获取所述点云数据在第二预设方向上的第二投影数据,所述第一预设方向和所述第二预设方向为不同的投影方向;Acquiring first projection data of the point cloud data in a first preset direction, and acquiring second projection data of the point cloud data in a second preset direction, the first preset direction and the first Two preset directions are different projection directions;将所述深度图像数据、所述第一投影数据以及所述第二投影数据作为输入数据;Using the depth image data, the first projection data, and the second projection data as input data;将所述输入数据输入如权利要求1-5所述的收敛的VGG神经网络识别模型以对所述待识别人脸进行识别。The input data is input into the convergent VGG neural network recognition model according to claims 1-5 to recognize the face to be recognized.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810939556.5 | 2018-08-17 | ||
CN201810939556.5A CN110197109B (en) | 2018-08-17 | 2018-08-17 | Neural network model training and face recognition method, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020034542A1 true WO2020034542A1 (en) | 2020-02-20 |
Family
ID=67751408
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/123884 WO2020034542A1 (en) | 2018-08-17 | 2018-12-26 | Neural network model training method and apparatus, face recognition method and apparatus, device, and medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110197109B (en) |
WO (1) | WO2020034542A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111462108A (en) * | 2020-04-13 | 2020-07-28 | 山西新华化工有限责任公司 | Machine learning-based head and face product design ergonomics assessment operation method |
CN111695497A (en) * | 2020-06-10 | 2020-09-22 | 上海有个机器人有限公司 | Pedestrian identification method, medium, terminal and device based on motion information |
CN111931694A (en) * | 2020-09-02 | 2020-11-13 | 北京嘀嘀无限科技发展有限公司 | Method and device for determining sight line orientation of person, electronic equipment and storage medium |
CN112149635A (en) * | 2020-10-23 | 2020-12-29 | 北京百度网讯科技有限公司 | Cross-modal face recognition model training method, device, device and storage medium |
CN113610172A (en) * | 2021-08-13 | 2021-11-05 | 北京地平线信息技术有限公司 | Neural network model training method and device, and sensing data fusion method and device |
CN113793295A (en) * | 2021-08-05 | 2021-12-14 | 西人马帝言(北京)科技有限公司 | Data processing method, device and equipment and readable storage medium |
WO2022266916A1 (en) * | 2021-06-24 | 2022-12-29 | 周宇 | Instantaneously adjustable electromagnetic suspension device |
WO2025000661A1 (en) * | 2023-06-30 | 2025-01-02 | 广东花至美容科技有限公司 | Beauty care device positioning method and apparatus, wearable device, and beauty care system |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111079700B (en) * | 2019-12-30 | 2023-04-07 | 陕西西图数联科技有限公司 | Three-dimensional face recognition method based on fusion of multiple data types |
CN112435331A (en) * | 2020-12-07 | 2021-03-02 | 上海眼控科技股份有限公司 | Model training method, point cloud generating method, device, equipment and storage medium |
CN112560669B (en) * | 2020-12-14 | 2024-07-26 | 杭州趣链科技有限公司 | Face pose estimation method and device and electronic equipment |
CN113902786B (en) * | 2021-09-23 | 2022-05-27 | 珠海视熙科技有限公司 | Depth image preprocessing method, system and related device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104091162A (en) * | 2014-07-17 | 2014-10-08 | 东南大学 | Three-dimensional face recognition method based on feature points |
CN107423678A (en) * | 2017-05-27 | 2017-12-01 | 电子科技大学 | A kind of training method and face identification method of the convolutional neural networks for extracting feature |
CN107844760A (en) * | 2017-10-24 | 2018-03-27 | 西安交通大学 | Three-dimensional face identification method based on curved surface normal direction component map Neural Networks Representation |
CN107944367A (en) * | 2017-11-16 | 2018-04-20 | 北京小米移动软件有限公司 | Face critical point detection method and device |
CN107944435A (en) * | 2017-12-27 | 2018-04-20 | 广州图语信息科技有限公司 | Three-dimensional face recognition method and device and processing terminal |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9846232B1 (en) * | 2012-01-05 | 2017-12-19 | Teledyne Reson A/S | Use of multi-beam sonar systems to generate point cloud data and models; data registration in underwater metrology applications |
EP3549102B1 (en) * | 2016-12-02 | 2021-05-26 | Google LLC | Determining structure and motion in images using neural networks |
CN107392944A (en) * | 2017-08-07 | 2017-11-24 | 广东电网有限责任公司机巡作业中心 | Full-view image and the method for registering and device for putting cloud |
CN108038474B (en) * | 2017-12-28 | 2020-04-14 | 深圳励飞科技有限公司 | Face detection method, convolutional neural network parameter training method, device and medium |
-
2018
- 2018-08-17 CN CN201810939556.5A patent/CN110197109B/en active Active
- 2018-12-26 WO PCT/CN2018/123884 patent/WO2020034542A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104091162A (en) * | 2014-07-17 | 2014-10-08 | 东南大学 | Three-dimensional face recognition method based on feature points |
CN107423678A (en) * | 2017-05-27 | 2017-12-01 | 电子科技大学 | A kind of training method and face identification method of the convolutional neural networks for extracting feature |
CN107844760A (en) * | 2017-10-24 | 2018-03-27 | 西安交通大学 | Three-dimensional face identification method based on curved surface normal direction component map Neural Networks Representation |
CN107944367A (en) * | 2017-11-16 | 2018-04-20 | 北京小米移动软件有限公司 | Face critical point detection method and device |
CN107944435A (en) * | 2017-12-27 | 2018-04-20 | 广州图语信息科技有限公司 | Three-dimensional face recognition method and device and processing terminal |
Non-Patent Citations (1)
Title |
---|
GE, LIUHAO ET AL.: "Robust 3D Hand Pose Estimation in Single Depth Images: From Single- View CNN to Multi-View CNNs", 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR, 30 June 2016 (2016-06-30), XP033021543, ISSN: 1063-6919 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111462108A (en) * | 2020-04-13 | 2020-07-28 | 山西新华化工有限责任公司 | Machine learning-based head and face product design ergonomics assessment operation method |
CN111462108B (en) * | 2020-04-13 | 2023-05-02 | 山西新华防化装备研究院有限公司 | Machine learning-based head-face product design ergonomics evaluation operation method |
CN111695497A (en) * | 2020-06-10 | 2020-09-22 | 上海有个机器人有限公司 | Pedestrian identification method, medium, terminal and device based on motion information |
CN111695497B (en) * | 2020-06-10 | 2024-04-09 | 上海有个机器人有限公司 | Pedestrian recognition method, medium, terminal and device based on motion information |
CN111931694A (en) * | 2020-09-02 | 2020-11-13 | 北京嘀嘀无限科技发展有限公司 | Method and device for determining sight line orientation of person, electronic equipment and storage medium |
CN112149635A (en) * | 2020-10-23 | 2020-12-29 | 北京百度网讯科技有限公司 | Cross-modal face recognition model training method, device, device and storage medium |
WO2022266916A1 (en) * | 2021-06-24 | 2022-12-29 | 周宇 | Instantaneously adjustable electromagnetic suspension device |
CN113793295A (en) * | 2021-08-05 | 2021-12-14 | 西人马帝言(北京)科技有限公司 | Data processing method, device and equipment and readable storage medium |
CN113610172A (en) * | 2021-08-13 | 2021-11-05 | 北京地平线信息技术有限公司 | Neural network model training method and device, and sensing data fusion method and device |
CN113610172B (en) * | 2021-08-13 | 2023-08-18 | 北京地平线信息技术有限公司 | Neural network model training method and device and sensing data fusion method and device |
WO2025000661A1 (en) * | 2023-06-30 | 2025-01-02 | 广东花至美容科技有限公司 | Beauty care device positioning method and apparatus, wearable device, and beauty care system |
Also Published As
Publication number | Publication date |
---|---|
CN110197109A (en) | 2019-09-03 |
CN110197109B (en) | 2023-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020034542A1 (en) | Neural network model training method and apparatus, face recognition method and apparatus, device, and medium | |
CN111091075B (en) | Face recognition method, device, electronic device and storage medium | |
Tian et al. | Robust 6d object pose estimation by learning rgb-d features | |
US20210182537A1 (en) | Method and apparatus for detecting facial key points, computer device, and storage medium | |
WO2020125623A1 (en) | Method and device for live body detection, storage medium, and electronic device | |
WO2021051543A1 (en) | Method for generating face rotation model, apparatus, computer device and storage medium | |
US20190080455A1 (en) | Method and device for three-dimensional feature-embedded image object component-level semantic segmentation | |
US20150009214A1 (en) | Real-time 3d computer vision processing engine for object recognition, reconstruction, and analysis | |
Valle et al. | Face alignment using a 3D deeply-initialized ensemble of regression trees | |
WO2022252642A1 (en) | Behavior posture detection method and apparatus based on video image, and device and medium | |
KR102161359B1 (en) | Apparatus for Extracting Face Image Based on Deep Learning | |
CN111353489A (en) | Text image processing method, device, computer equipment and storage medium | |
CN109948467A (en) | Method, device, computer equipment and storage medium for face recognition | |
CN113469092B (en) | Character recognition model generation method, device, computer equipment and storage medium | |
CN112634152B (en) | Face sample data enhancement method and system based on image depth information | |
CN116883466A (en) | Optical and SAR image registration method, device and equipment based on position sensing | |
CN114677588A (en) | Method, device, robot and storage medium for obstacle detection | |
CN115443483A (en) | Depth estimation based on neural network model | |
JP2022548027A (en) | A method for obtaining data from an image of a user's object having biometric characteristics of the user | |
CN114638891A (en) | Target detection positioning method and system based on image and point cloud fusion | |
US20240383695A1 (en) | Method for determining material-cage stacking, computer device, and storage medium | |
KR102382883B1 (en) | 3d hand posture recognition apparatus and method using the same | |
CN111813984B (en) | Method and device for realizing indoor positioning by using homography matrix and electronic equipment | |
Zhao et al. | Cy-CNN: cylinder convolution based rotation-invariant neural network for point cloud registration | |
WO2020000696A1 (en) | Image processing method and apparatus, computer device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18929997 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18929997 Country of ref document: EP Kind code of ref document: A1 |