CN109151387B

CN109151387B - webRTC-based low-delay solution for face recognition of mobile camera

Info

Publication number: CN109151387B
Application number: CN201810980968.3A
Authority: CN
Inventors: 叶�武; 潘瑶斌; 方垚
Original assignee: Hangzhou Arcvideo Technology Co ltd
Current assignee: Hangzhou Arcvideo Technology Co ltd
Priority date: 2018-08-27
Filing date: 2018-08-27
Publication date: 2020-10-23
Anticipated expiration: 2038-08-27
Also published as: CN109151387A

Abstract

The invention discloses a webRTC-based low-delay solution for face recognition of a mobile camera. The method specifically comprises the following steps: the mobile terminal initiates a face detection request; initiating a transcoding task to a transcoder by a monitoring server; the transcoder initiates a request to the RTC server to establish a chat room; the RTC server returns the room number to the transcoder; the transcoder tells the monitoring server the room number; the monitoring server tells the room number of the mobile terminal; the mobile terminal is connected with the RTC server through the room number and joins the room; the RTC server and the communication cloud establish a data transmission node in the same room, and time low-delay data transmission is carried out based on the webRTC; the mobile terminal starts to send data to the transcoder through the communication cloud; the transcoder establishes one-in-two tasks to realize the tasks of face snapshot and real-time transparent transmission. The invention has the beneficial effects that: the problem of picture delay can be effectively solved, the delay result is probably 200ms to 300ms, and the theory can be reduced to 100 ms.

Description

webRTC-based low-delay solution for face recognition of mobile camera

Technical Field

The invention relates to the technical field related to video coding and decoding, in particular to a webRTC-based low-delay solution for face recognition of a mobile camera.

Background

In the process of developing a face monitoring project of a mobile phone end, when a rtmp stream is sent to a server by the mobile phone end for face recognition, the problem of overlarge picture delay is found, and the longer the distance of the mobile phone end is, the higher the delay of the public network flow can be, the more than ten seconds can be.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a webRTC-based mobile camera face recognition low-delay solution which can effectively shorten the delay time.

In order to achieve the purpose, the invention adopts the following technical scheme:

a webRTC-based low-delay solution for face recognition of a mobile camera specifically comprises the following steps:

(1) the mobile terminal initiates a face detection request;

(2) initiating a transcoding task to a transcoder by a monitoring server;

(3) the transcoder initiates a request to the RTC server to establish a chat room;

(4) the RTC server returns the room number to the transcoder;

(5) the transcoder tells the monitoring server the room number;

(6) the monitoring server tells the room number of the mobile terminal;

(7) the mobile terminal is connected with the RTC server through the room number and joins the room;

(8) the RTC server and the communication cloud establish a data transmission node in the same room, and time low-delay data transmission is carried out based on the webRTC;

(9) the mobile terminal starts to send data to the transcoder through the communication cloud;

(10) the transcoder establishes one-in-two tasks to realize the tasks of face snapshot and real-time transparent transmission.

By adopting the low-delay solution method for face recognition of the webRTC-based mobile camera, the problem of picture delay can be effectively solved, the data converted into RBG24 by decoding video is displayed by using opecv, the delay result can be obtained to be approximately between 200ms and 300ms, and the theoretical delay can be reduced to be within 100 ms; the delay time is almost the same as that of the mobile phone adopting the 4G network.

Preferably, in the step (8), the webRTC-based mobile terminal specifically includes RtcMessage, communication, a communication cloud and hardware, where RtcMessage is used as a signaling set for the mobile terminal to initiate a request to create a room to the communication cloud or join the room to the communication cloud, after the room is created successfully by the communication cloud, the mobile terminal establishes communication connection with the communication cloud, and the hardware collects audio and video data and transmits the audio and video data to the communication cloud or receives data of the communication cloud.

Preferably, in step (10), when the transcoder establishes one-in-two tasks, a bottom layer transcoding technology is adopted, and the implementation is carried out by inheriting a dshow framework, which is specifically implemented as follows: firstly, a Source module is accessed into an RTC server to obtain video data of a mobile terminal, then an infTee module distributes the data to a video data decoder and a video pin-spelling module framewrapper, the first branch video data decoder analyzes code stream data and then transmits the code stream data to a video encoder to be encoded into an RGB24 image, and the RGB24 image is transmitted to a human face identification module to carry out feature comparison, so that a human face is captured; and the second branch video spile module framewrapper is transmitted to the FLVmux module to generate RTMP live broadcast stream, and an audio mute packet is added to perform real-time transparent transmission.

Preferably, in the step (10), the video data is received and decoded into H264 bare data, then the H264 video data is converted into an RBG24 graph, and the graph is continuously refreshed and displayed by using the cv of Opencv in an imshow method, so that the effect of real-time viewing is achieved.

Preferably, in the step (10), the face snapshot includes four parts of face detection, face tracking, face recognition and living body verification, wherein the face detection refers to detecting a face of a static picture and returning face frame coordinates, landmark coordinates and quality score information; the face tracking means that millisecond-level face tracking detection is realized on monitoring or dynamic videos in complex scenes, face frame coordinates, landmark coordinates and quality score information of all faces in each frame are obtained in real time, and the face tracking is not influenced by face shielding, blurring and side face factors; face recognition refers to recognition for 1:1 and 1: n, wherein the 1:1 comparison has a false recognition rate lower than one ten-thousandth under the condition that the recall rate is 96%, and 1: n comparison is carried out on a large-scale portrait database with unlimited race and unlimited age to realize millisecond-level retrieval; the living body verification means that whether the front of the mobile terminal camera is operated by a real person is verified.

The invention has the beneficial effects that: the problem of picture delay can be effectively solved, the data converted into RBG24 by decoding video is displayed by using the opecv, the delay result is probably between 200ms and 300ms, and the theory can be reduced to be within 100 ms.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a schematic diagram based on a webRTC;

fig. 3 is a schematic diagram of an underlying transcoding technique.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

In the embodiment shown in fig. 1, a webRTC-based low-latency solution for face recognition of a mobile camera specifically includes the following steps:

(1) a Mobile terminal (Mobile App) initiates a face detection request;

(2) initiating a transcoding task to a transcoder (transcoder) by a monitor server (monitor server);

(3) a transcoder (transcoder) initiates a request to an RTC server to establish a chat room;

(4) the RTC server returns a room number (session id) to a transcoder (transcoder);

(5) the transcoder (transcoder) tells the monitoring server (monitor server) the room number (sessionid);

(6) the monitoring server (monitor server) tells the Mobile terminal (Mobile App) of the room number (sessionid);

(7) the Mobile terminal (Mobile App) is connected with an RTC server through a room number (session id) to join a room;

as shown in fig. 2, the webRTC-based mobile terminal specifically includes RtcMessage, communication, a communication cloud, and hardware, where RtcMessage is used as a signaling set for a mobile terminal to initiate a request to create a room to the communication cloud or join a room to the communication cloud, and after the room is created successfully by the communication cloud, a communication connection is established with the mobile terminal, and the hardware collects audio and video data and transmits the audio and video data to the communication cloud or receives data of the communication cloud.

(10) the transcoder establishes one-in-two tasks to realize the tasks of face snapshot and real-time transparent transmission;

when the transcoder establishes one-in-two tasks, a bottom layer transcoding technology is adopted, and a dshow framework is inherited for realization, as shown in fig. 3, the specific realization is as follows: firstly, a Source module is accessed into an RTC server to obtain video data of a mobile terminal, then an infTee module distributes the data to a video data decoder and a video pin-out module frame wrapper, the first branch video data decoder analyzes code stream data, then the code stream data is transmitted to a video encoder to be encoded into an RGB24 image, and the RGB24 image is transmitted to a human face identification module to carry out feature comparison, so that a human face is captured; the second branch video spile module frame wrapper is transmitted to the FLVmux module to generate RTMP live broadcast stream, and an audio mute packet is added (and the time required by AV synchronization is eliminated because a pure video transmission mechanism is adopted), so that the system adapts to an RTMP stream player which needs audio to perform real-time transparent transmission.

DirectShow is a streaming media framework on a windows platform (the method inherits the framework and is realized under linux), and provides high-quality multimedia stream acquisition and playback functions. It supports a wide variety of media file formats including ASF, MPEG, AVI, MP3 and WAV files, while supporting the acquisition of multimedia streams using WDM drive or earlier VFW drive. DirectShow incorporates other DirectX technologies that can automatically detect and use available audio-video hardware acceleration, as well as support systems without hardware acceleration. DirectShow greatly simplifies media playback, format conversion, and acquisition. But at the same time it also provides a bottom-layer flow control framework for user-defined solutions, allowing users to create DirectShow components themselves that support new file formats or other uses. The following are several typical applications written using DirectShow: DVD players, video editing applications, AVI to ASF converters, MP3 players, and digital video capture applications.

The video data is received and decoded into H264 naked data, then the H264 video data is converted into an RBG24 graph, and the graph is displayed continuously in a refreshing mode by using a cv (cv) of Opencv, so that the effect of real-time viewing is achieved.

The face snapshot comprises four parts of face detection, face tracking, face identification and living body verification, wherein the face detection refers to the detection of a static picture face and the return of face frame coordinates, landmark coordinates and quality score information, and on an FDDB test set, the detection effect reaches the industry leading level; the face tracking means that millisecond-level face tracking detection is realized on monitoring or dynamic videos in complex scenes, face frame coordinates, landmark coordinates and quality score information of all faces in each frame are obtained in real time, and the face tracking is not influenced by face shielding, blurring and side face factors; face recognition refers to recognition for 1:1 and 1: n, wherein the 1:1 comparison has a false recognition rate lower than one ten-thousandth under the condition that the recall rate is 96%, and 1: n comparison realizes millisecond-level retrieval on a large-scale portrait database with unlimited race and age, can realize real-time identification and alarm of multi-path videos and multiple faces under a dynamic complex scene, and has the accuracy rate of 99.87 percent on an LFW test set; the in-vivo verification refers to verifying whether the operation is performed by a real person or not in front of a mobile terminal camera, so that counterfeit behaviors such as high-definition pictures, three-dimensional models, video recording and face changing are prevented, and the safety requirement of sensitive industries on face identification is met.

By adopting the low-delay solution method for face recognition of the webRTC-based mobile camera, the problem of picture delay can be effectively solved, the data converted into RBG24 by decoding video is displayed by using opecv, the delay result can be obtained to be approximately between 200ms and 300ms, and the theoretical delay can be reduced to be within 100 ms; the mobile phone adopts the 4G network, which is almost the delay time, and the 4G delay adopted by the mobile terminal in a long distance is slightly higher than about 2S.

Claims

1. A webRTC-based low-delay solution for face recognition of a mobile camera is characterized by comprising the following steps:

(1) the mobile terminal initiates a face detection request to the mobile terminal camera;

(2) initiating a transcoding task to a transcoder by a monitoring server;

(4) the RTC server returns the room number to the transcoder;

(5) the transcoder tells the monitoring server the room number;

(6) the monitoring server tells the room number of the mobile terminal;

(10) the transcoder establishes a one-in two-out task, and realizes the human face snapshot and real-time transparent transmission tasks by using the opecv display; when a transcoder establishes a one-in two-out task, a bottom layer transcoding technology is adopted, and a dshow framework is inherited for realization, and the method is specifically realized as follows: firstly, a Source module is accessed into an RTC server to obtain video data of a mobile terminal, then an infTee module distributes the data to a video data decoder and a video pin-out module frame wrapper, the first branch video data decoder analyzes code stream data, then the code stream data is transmitted to a video encoder to be encoded into an RGB24 image, and the RGB24 image is transmitted to a human face identification module to carry out feature comparison, so that a human face is captured; and the second branch video spile module frame wrapper is transmitted to the FLVmux module to generate RTMP live broadcast stream, and an audio mute packet is added to perform real-time transparent transmission.

2. The webRTC-based mobile camera face recognition low-delay solution as claimed in claim 1, wherein in step (8), the webRTC-based mobile camera face recognition low-delay solution specifically comprises RtcmAddress, communication cloud and hardware, wherein the RtcmAddress is used as a signaling set for a mobile terminal to initiate a request to create a room or join a room to the communication cloud, after the communication cloud creates the room successfully, the communication connection with the mobile terminal is established, and the hardware collects audio and video data and transmits the audio and video data to the communication cloud or receives the data of the communication cloud.

3. The webRTC-based low-latency solution for face recognition of a mobile camera is characterized in that, in the step (10), video data is received, decoded into H264 bare data, the H264 video data is converted into an RBG24 graph, and the graph is continuously refreshed and displayed by using the cv:: imshow method of Opencv, so that the effect of real-time viewing is achieved.

4. The webRTC-based mobile camera face recognition low-latency solution of claim 1, wherein in the step (10), the face snapshot includes four parts, namely face detection, face tracking, face recognition and live body verification, wherein the face detection refers to detecting a static picture face and returning face frame coordinates, landmark coordinates and quality score information; the face tracking means that millisecond-level face tracking detection is realized on monitoring or dynamic videos in complex scenes, face frame coordinates, landmark coordinates and quality score information of all faces in each frame are obtained in real time, and the face tracking is not influenced by face shielding, blurring and side face factors; face recognition refers to recognition for 1:1 and 1: n, wherein the 1:1 comparison has a false recognition rate lower than one ten-thousandth under the condition that the recall rate is 96%, and 1: n comparison is carried out on a large-scale portrait database with unlimited race and unlimited age to realize millisecond-level retrieval; the living body verification means that whether the front of the mobile terminal camera is operated by a real person is verified.