CN109151387B - webRTC-based low-delay solution for face recognition of mobile camera - Google Patents
webRTC-based low-delay solution for face recognition of mobile camera Download PDFInfo
- Publication number
- CN109151387B CN109151387B CN201810980968.3A CN201810980968A CN109151387B CN 109151387 B CN109151387 B CN 109151387B CN 201810980968 A CN201810980968 A CN 201810980968A CN 109151387 B CN109151387 B CN 109151387B
- Authority
- CN
- China
- Prior art keywords
- face
- transcoder
- mobile terminal
- room
- webrtc
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004891 communication Methods 0.000 claims abstract description 31
- 230000005540 biological transmission Effects 0.000 claims abstract description 16
- 238000001514 detection method Methods 0.000 claims abstract description 15
- 238000012544 monitoring process Methods 0.000 claims abstract description 15
- 238000000034 method Methods 0.000 claims abstract description 10
- 230000000977 initiatory effect Effects 0.000 claims abstract description 4
- 238000012795 verification Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 claims description 4
- 230000011664 signaling Effects 0.000 claims description 3
- 230000003068 static effect Effects 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000001133 acceleration Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
- H04L67/025—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/14—Session management
- H04L67/141—Setup of application sessions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/268—Signal distribution or switching
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Studio Devices (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a webRTC-based low-delay solution for face recognition of a mobile camera. The method specifically comprises the following steps: the mobile terminal initiates a face detection request; initiating a transcoding task to a transcoder by a monitoring server; the transcoder initiates a request to the RTC server to establish a chat room; the RTC server returns the room number to the transcoder; the transcoder tells the monitoring server the room number; the monitoring server tells the room number of the mobile terminal; the mobile terminal is connected with the RTC server through the room number and joins the room; the RTC server and the communication cloud establish a data transmission node in the same room, and time low-delay data transmission is carried out based on the webRTC; the mobile terminal starts to send data to the transcoder through the communication cloud; the transcoder establishes one-in-two tasks to realize the tasks of face snapshot and real-time transparent transmission. The invention has the beneficial effects that: the problem of picture delay can be effectively solved, the delay result is probably 200ms to 300ms, and the theory can be reduced to 100 ms.
Description
Technical Field
The invention relates to the technical field related to video coding and decoding, in particular to a webRTC-based low-delay solution for face recognition of a mobile camera.
Background
In the process of developing a face monitoring project of a mobile phone end, when a rtmp stream is sent to a server by the mobile phone end for face recognition, the problem of overlarge picture delay is found, and the longer the distance of the mobile phone end is, the higher the delay of the public network flow can be, the more than ten seconds can be.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a webRTC-based mobile camera face recognition low-delay solution which can effectively shorten the delay time.
In order to achieve the purpose, the invention adopts the following technical scheme:
a webRTC-based low-delay solution for face recognition of a mobile camera specifically comprises the following steps:
(1) the mobile terminal initiates a face detection request;
(2) initiating a transcoding task to a transcoder by a monitoring server;
(3) the transcoder initiates a request to the RTC server to establish a chat room;
(4) the RTC server returns the room number to the transcoder;
(5) the transcoder tells the monitoring server the room number;
(6) the monitoring server tells the room number of the mobile terminal;
(7) the mobile terminal is connected with the RTC server through the room number and joins the room;
(8) the RTC server and the communication cloud establish a data transmission node in the same room, and time low-delay data transmission is carried out based on the webRTC;
(9) the mobile terminal starts to send data to the transcoder through the communication cloud;
(10) the transcoder establishes one-in-two tasks to realize the tasks of face snapshot and real-time transparent transmission.
By adopting the low-delay solution method for face recognition of the webRTC-based mobile camera, the problem of picture delay can be effectively solved, the data converted into RBG24 by decoding video is displayed by using opecv, the delay result can be obtained to be approximately between 200ms and 300ms, and the theoretical delay can be reduced to be within 100 ms; the delay time is almost the same as that of the mobile phone adopting the 4G network.
Preferably, in the step (8), the webRTC-based mobile terminal specifically includes RtcMessage, communication, a communication cloud and hardware, where RtcMessage is used as a signaling set for the mobile terminal to initiate a request to create a room to the communication cloud or join the room to the communication cloud, after the room is created successfully by the communication cloud, the mobile terminal establishes communication connection with the communication cloud, and the hardware collects audio and video data and transmits the audio and video data to the communication cloud or receives data of the communication cloud.
Preferably, in step (10), when the transcoder establishes one-in-two tasks, a bottom layer transcoding technology is adopted, and the implementation is carried out by inheriting a dshow framework, which is specifically implemented as follows: firstly, a Source module is accessed into an RTC server to obtain video data of a mobile terminal, then an infTee module distributes the data to a video data decoder and a video pin-spelling module framewrapper, the first branch video data decoder analyzes code stream data and then transmits the code stream data to a video encoder to be encoded into an RGB24 image, and the RGB24 image is transmitted to a human face identification module to carry out feature comparison, so that a human face is captured; and the second branch video spile module framewrapper is transmitted to the FLVmux module to generate RTMP live broadcast stream, and an audio mute packet is added to perform real-time transparent transmission.
Preferably, in the step (10), the video data is received and decoded into H264 bare data, then the H264 video data is converted into an RBG24 graph, and the graph is continuously refreshed and displayed by using the cv of Opencv in an imshow method, so that the effect of real-time viewing is achieved.
Preferably, in the step (10), the face snapshot includes four parts of face detection, face tracking, face recognition and living body verification, wherein the face detection refers to detecting a face of a static picture and returning face frame coordinates, landmark coordinates and quality score information; the face tracking means that millisecond-level face tracking detection is realized on monitoring or dynamic videos in complex scenes, face frame coordinates, landmark coordinates and quality score information of all faces in each frame are obtained in real time, and the face tracking is not influenced by face shielding, blurring and side face factors; face recognition refers to recognition for 1:1 and 1: n, wherein the 1:1 comparison has a false recognition rate lower than one ten-thousandth under the condition that the recall rate is 96%, and 1: n comparison is carried out on a large-scale portrait database with unlimited race and unlimited age to realize millisecond-level retrieval; the living body verification means that whether the front of the mobile terminal camera is operated by a real person is verified.
The invention has the beneficial effects that: the problem of picture delay can be effectively solved, the data converted into RBG24 by decoding video is displayed by using the opecv, the delay result is probably between 200ms and 300ms, and the theory can be reduced to be within 100 ms.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a schematic diagram based on a webRTC;
fig. 3 is a schematic diagram of an underlying transcoding technique.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
In the embodiment shown in fig. 1, a webRTC-based low-latency solution for face recognition of a mobile camera specifically includes the following steps:
(1) a Mobile terminal (Mobile App) initiates a face detection request;
(2) initiating a transcoding task to a transcoder (transcoder) by a monitor server (monitor server);
(3) a transcoder (transcoder) initiates a request to an RTC server to establish a chat room;
(4) the RTC server returns a room number (session id) to a transcoder (transcoder);
(5) the transcoder (transcoder) tells the monitoring server (monitor server) the room number (sessionid);
(6) the monitoring server (monitor server) tells the Mobile terminal (Mobile App) of the room number (sessionid);
(7) the Mobile terminal (Mobile App) is connected with an RTC server through a room number (session id) to join a room;
(8) the RTC server and the communication cloud establish a data transmission node in the same room, and time low-delay data transmission is carried out based on the webRTC;
as shown in fig. 2, the webRTC-based mobile terminal specifically includes RtcMessage, communication, a communication cloud, and hardware, where RtcMessage is used as a signaling set for a mobile terminal to initiate a request to create a room to the communication cloud or join a room to the communication cloud, and after the room is created successfully by the communication cloud, a communication connection is established with the mobile terminal, and the hardware collects audio and video data and transmits the audio and video data to the communication cloud or receives data of the communication cloud.
(9) The mobile terminal starts to send data to the transcoder through the communication cloud;
(10) the transcoder establishes one-in-two tasks to realize the tasks of face snapshot and real-time transparent transmission;
when the transcoder establishes one-in-two tasks, a bottom layer transcoding technology is adopted, and a dshow framework is inherited for realization, as shown in fig. 3, the specific realization is as follows: firstly, a Source module is accessed into an RTC server to obtain video data of a mobile terminal, then an infTee module distributes the data to a video data decoder and a video pin-out module frame wrapper, the first branch video data decoder analyzes code stream data, then the code stream data is transmitted to a video encoder to be encoded into an RGB24 image, and the RGB24 image is transmitted to a human face identification module to carry out feature comparison, so that a human face is captured; the second branch video spile module frame wrapper is transmitted to the FLVmux module to generate RTMP live broadcast stream, and an audio mute packet is added (and the time required by AV synchronization is eliminated because a pure video transmission mechanism is adopted), so that the system adapts to an RTMP stream player which needs audio to perform real-time transparent transmission.
DirectShow is a streaming media framework on a windows platform (the method inherits the framework and is realized under linux), and provides high-quality multimedia stream acquisition and playback functions. It supports a wide variety of media file formats including ASF, MPEG, AVI, MP3 and WAV files, while supporting the acquisition of multimedia streams using WDM drive or earlier VFW drive. DirectShow incorporates other DirectX technologies that can automatically detect and use available audio-video hardware acceleration, as well as support systems without hardware acceleration. DirectShow greatly simplifies media playback, format conversion, and acquisition. But at the same time it also provides a bottom-layer flow control framework for user-defined solutions, allowing users to create DirectShow components themselves that support new file formats or other uses. The following are several typical applications written using DirectShow: DVD players, video editing applications, AVI to ASF converters, MP3 players, and digital video capture applications.
The video data is received and decoded into H264 naked data, then the H264 video data is converted into an RBG24 graph, and the graph is displayed continuously in a refreshing mode by using a cv (cv) of Opencv, so that the effect of real-time viewing is achieved.
The face snapshot comprises four parts of face detection, face tracking, face identification and living body verification, wherein the face detection refers to the detection of a static picture face and the return of face frame coordinates, landmark coordinates and quality score information, and on an FDDB test set, the detection effect reaches the industry leading level; the face tracking means that millisecond-level face tracking detection is realized on monitoring or dynamic videos in complex scenes, face frame coordinates, landmark coordinates and quality score information of all faces in each frame are obtained in real time, and the face tracking is not influenced by face shielding, blurring and side face factors; face recognition refers to recognition for 1:1 and 1: n, wherein the 1:1 comparison has a false recognition rate lower than one ten-thousandth under the condition that the recall rate is 96%, and 1: n comparison realizes millisecond-level retrieval on a large-scale portrait database with unlimited race and age, can realize real-time identification and alarm of multi-path videos and multiple faces under a dynamic complex scene, and has the accuracy rate of 99.87 percent on an LFW test set; the in-vivo verification refers to verifying whether the operation is performed by a real person or not in front of a mobile terminal camera, so that counterfeit behaviors such as high-definition pictures, three-dimensional models, video recording and face changing are prevented, and the safety requirement of sensitive industries on face identification is met.
By adopting the low-delay solution method for face recognition of the webRTC-based mobile camera, the problem of picture delay can be effectively solved, the data converted into RBG24 by decoding video is displayed by using opecv, the delay result can be obtained to be approximately between 200ms and 300ms, and the theoretical delay can be reduced to be within 100 ms; the mobile phone adopts the 4G network, which is almost the delay time, and the 4G delay adopted by the mobile terminal in a long distance is slightly higher than about 2S.
Claims (4)
1. A webRTC-based low-delay solution for face recognition of a mobile camera is characterized by comprising the following steps:
(1) the mobile terminal initiates a face detection request to the mobile terminal camera;
(2) initiating a transcoding task to a transcoder by a monitoring server;
(3) the transcoder initiates a request to the RTC server to establish a chat room;
(4) the RTC server returns the room number to the transcoder;
(5) the transcoder tells the monitoring server the room number;
(6) the monitoring server tells the room number of the mobile terminal;
(7) the mobile terminal is connected with the RTC server through the room number and joins the room;
(8) the RTC server and the communication cloud establish a data transmission node in the same room, and time low-delay data transmission is carried out based on the webRTC;
(9) the mobile terminal starts to send data to the transcoder through the communication cloud;
(10) the transcoder establishes a one-in two-out task, and realizes the human face snapshot and real-time transparent transmission tasks by using the opecv display; when a transcoder establishes a one-in two-out task, a bottom layer transcoding technology is adopted, and a dshow framework is inherited for realization, and the method is specifically realized as follows: firstly, a Source module is accessed into an RTC server to obtain video data of a mobile terminal, then an infTee module distributes the data to a video data decoder and a video pin-out module frame wrapper, the first branch video data decoder analyzes code stream data, then the code stream data is transmitted to a video encoder to be encoded into an RGB24 image, and the RGB24 image is transmitted to a human face identification module to carry out feature comparison, so that a human face is captured; and the second branch video spile module frame wrapper is transmitted to the FLVmux module to generate RTMP live broadcast stream, and an audio mute packet is added to perform real-time transparent transmission.
2. The webRTC-based mobile camera face recognition low-delay solution as claimed in claim 1, wherein in step (8), the webRTC-based mobile camera face recognition low-delay solution specifically comprises RtcmAddress, communication cloud and hardware, wherein the RtcmAddress is used as a signaling set for a mobile terminal to initiate a request to create a room or join a room to the communication cloud, after the communication cloud creates the room successfully, the communication connection with the mobile terminal is established, and the hardware collects audio and video data and transmits the audio and video data to the communication cloud or receives the data of the communication cloud.
3. The webRTC-based low-latency solution for face recognition of a mobile camera is characterized in that, in the step (10), video data is received, decoded into H264 bare data, the H264 video data is converted into an RBG24 graph, and the graph is continuously refreshed and displayed by using the cv:: imshow method of Opencv, so that the effect of real-time viewing is achieved.
4. The webRTC-based mobile camera face recognition low-latency solution of claim 1, wherein in the step (10), the face snapshot includes four parts, namely face detection, face tracking, face recognition and live body verification, wherein the face detection refers to detecting a static picture face and returning face frame coordinates, landmark coordinates and quality score information; the face tracking means that millisecond-level face tracking detection is realized on monitoring or dynamic videos in complex scenes, face frame coordinates, landmark coordinates and quality score information of all faces in each frame are obtained in real time, and the face tracking is not influenced by face shielding, blurring and side face factors; face recognition refers to recognition for 1:1 and 1: n, wherein the 1:1 comparison has a false recognition rate lower than one ten-thousandth under the condition that the recall rate is 96%, and 1: n comparison is carried out on a large-scale portrait database with unlimited race and unlimited age to realize millisecond-level retrieval; the living body verification means that whether the front of the mobile terminal camera is operated by a real person is verified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810980968.3A CN109151387B (en) | 2018-08-27 | 2018-08-27 | webRTC-based low-delay solution for face recognition of mobile camera |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810980968.3A CN109151387B (en) | 2018-08-27 | 2018-08-27 | webRTC-based low-delay solution for face recognition of mobile camera |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109151387A CN109151387A (en) | 2019-01-04 |
CN109151387B true CN109151387B (en) | 2020-10-23 |
Family
ID=64828178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810980968.3A Active CN109151387B (en) | 2018-08-27 | 2018-08-27 | webRTC-based low-delay solution for face recognition of mobile camera |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109151387B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110868609B (en) * | 2019-12-02 | 2021-08-13 | 杭州当虹科技股份有限公司 | Method for monitoring and standardizing live video |
CN112491924B (en) * | 2020-12-09 | 2022-03-22 | 威创集团股份有限公司 | Cross-platform face recognition login method, system and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017118241A (en) * | 2015-12-22 | 2017-06-29 | 西日本電信電話株式会社 | Audio video communication system, server, virtual client, audio video communication method, and audio video communication program |
CN107027045A (en) * | 2017-04-11 | 2017-08-08 | 广州华多网络科技有限公司 | Pushing video streaming control method, device and video flowing instructor in broadcasting end |
CN107995187A (en) * | 2017-11-30 | 2018-05-04 | 上海哔哩哔哩科技有限公司 | Video main broadcaster, live broadcasting method, terminal and system based on HTML5 browsers |
-
2018
- 2018-08-27 CN CN201810980968.3A patent/CN109151387B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017118241A (en) * | 2015-12-22 | 2017-06-29 | 西日本電信電話株式会社 | Audio video communication system, server, virtual client, audio video communication method, and audio video communication program |
CN107027045A (en) * | 2017-04-11 | 2017-08-08 | 广州华多网络科技有限公司 | Pushing video streaming control method, device and video flowing instructor in broadcasting end |
CN107995187A (en) * | 2017-11-30 | 2018-05-04 | 上海哔哩哔哩科技有限公司 | Video main broadcaster, live broadcasting method, terminal and system based on HTML5 browsers |
Also Published As
Publication number | Publication date |
---|---|
CN109151387A (en) | 2019-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12356038B2 (en) | Re-encoding predicted picture frames in live video stream applications | |
US9286940B1 (en) | Video editing with connected high-resolution video camera and video cloud server | |
CN110430441B (en) | Cloud mobile phone video acquisition method, system, device and storage medium | |
TWI501634B (en) | Embedded appliance multimedia capture | |
WO2016138844A1 (en) | Multimedia file live broadcast method, system and server | |
US20160034245A1 (en) | Direct streaming for wireless display | |
US11527266B2 (en) | Artificial intelligence analysis of multimedia content | |
CA2740119C (en) | System and method for storing multi-source multimedia presentations | |
US20070159552A1 (en) | Method and System for Video Conference | |
CN107370714A (en) | The high efficiency communication method that facing cloud renders | |
CN109151387B (en) | webRTC-based low-delay solution for face recognition of mobile camera | |
WO2020215454A1 (en) | Screen recording method, client, and terminal device | |
CN108418832A (en) | A kind of virtual reality shopping guide method, system and storage medium | |
CN114173150A (en) | A method, device, system and terminal device for recording live video | |
JP2020524450A (en) | Transmission system for multi-channel video, control method thereof, multi-channel video reproduction method and device thereof | |
US20250156369A1 (en) | Appliances and methods to provide robust computational services in addition to a/v encoding, for example at edge of mesh networks | |
CN108234940A (en) | A kind of video monitoring server-side, system and method | |
US10721500B2 (en) | Systems and methods for live multimedia information collection, presentation, and standardization | |
CN115567494A (en) | Cloud video interaction system, method and device | |
CN113055636B (en) | Data processing method and conference system | |
EP4539449A1 (en) | Camera time-based motion trails and motion heat-maps for periodic captured images | |
CN113706891A (en) | Traffic data transmission method, traffic data transmission device, electronic equipment and storage medium | |
TW202107247A (en) | Live streaming method for mobile electronic device capable of allowing online game players to share audio-video files recording played online games through a mobile electronic device | |
WO2023051705A1 (en) | Video communication method and apparatus, electronic device, and computer readable medium | |
US9426415B2 (en) | System, method and architecture for in-built media enabled personal collaboration on endpoints capable of IP voice video communication |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |