CN113473066A

CN113473066A - Video conference picture adjusting method

Info

Publication number: CN113473066A
Application number: CN202110499454.8A
Authority: CN
Inventors: 孔尧
Original assignee: Shanghai Mingwork Information Technology Co ltd
Current assignee: Shanghai Mingwork Information Technology Co ltd
Priority date: 2021-05-10
Filing date: 2021-05-10
Publication date: 2021-10-01

Abstract

The present invention relates to a video conference screen adjustment method, comprising the following steps: S1, initiating a video conference call; S2, acquiring the speakers in the video conference, the complexity and on-site noise in the conference room, and adjusting the on-site conference room pickups Pickup parameters; S3, obtain the face information, voice and voiceprint information of the video conference speaker, and on-site seat direction information, so as to adjust the angle of the camera in the video conference room and focus on the speaker; S4, monitor the speaker's When the voice, voiceprint information, and voice position information are changed, monitor whether the speaker's mouth moves, and if so, keep it unchanged; if not, return to step S3. The invention overcomes the shortcomings of the prior art, quickly organizes the video conference, quickly locks the clear picture of the speaker, improves the realism of the video conference and improves the user experience.

Description

Video conference picture adjusting method

Technical Field

The invention relates to the technical field of video conferences, in particular to a video conference picture adjusting method.

Background

Video conference (videcoconference), which may also be referred to as video conference, is a communication method for transmitting sound and images in real time by holding a conference between user terminals at two or more locations through a transmission channel using video technology and equipment. It can also be used to transmit still image, document, fax, etc. People participating in the video conference can make comments on the television, simultaneously observe the image, action, expression and the like of the other party, show real television images of real objects, drawings, documents and the like or display characters and pictures written on a blackboard and a whiteboard, so that people participating in the conference at different places feel like having face-to-face conversation with the other party, and can effectively replace the live conference.

Due to the technical development, the one-to-one video call conference can not meet the requirements of a multi-person conference, and the problem that participants at the opposite side can not accurately know the speaking mood of the speaker and accurately understand the expressed content due to the fact that the speaker can not be captured quickly when the multi-person conference is in a meeting and the displayed picture is too small when the speaker speaks is solved.

Disclosure of Invention

The technical problem to be solved by the invention is to overcome the defects of the prior art and provide a video conference picture adjusting method, which can quickly organize a video conference, quickly lock the clear picture of a speaker and improve the reality sense and the user experience of the video conference.

In order to solve the technical problems, the technical scheme provided by the invention is as follows: a video conference picture adjusting method comprises the following steps:

s1, initiating a video conference call;

s2, acquiring the complexity and the site noise of the speaker in the video conference and the conference room, and adjusting the pickup parameters of the pickup of the site conference room;

s3, acquiring face information, voice and voiceprint information of the video conference speaker and on-site seat direction information to adjust the angle of the camera in the video conference room and focus and aim at the speaker;

s4, monitoring whether the mouth of the speaker moves or not when the voice, the voiceprint information and the voice position information of the speaker change, and if so, keeping the mouth unchanged; if not, the process returns to step S3.

Further, the video conference call is initiated in step S1, wherein: the conference initiator initiates a video conference call through the participant ID information in the video conference system.

Further, the conference initiator initiates the video conference call through the ID information of the participants in the video conference system, which is characterized in that: the ID information of the participants can be obtained through a pre-sent short message link, third-party software or face and ID information which is pre-recorded when the participants enter a conference room.

Further, when initiating the video conference call in step S1, the method further includes: when a group participant calls the whole members, the participant who is present is judged according to the face information, the seat information and the participant ID information which are acquired on the spot, and the participant who is not present in the conference is actively screened and called.

Further, the step S2 includes recognizing the speaker and the position thereof by the sound source position and the voiceprint, and adjusting the camera angle to aim at the speaker.

Further, in the step S3, the angle and the focal length of the camera in the video conference room are adjusted, which is characterized in that: and adjusting the focusing and amplifying of the camera to be placed at the maximum position of the picture, and adjusting the focal length of the camera to recognize the head, the upper hand and the arm of the speaker according to the recognized still image of the speaker.

Further, in step S3, when the video conference system detects that the sound volume of the speaker and the speaking time are less than the certain parameters, the camera position is not adjusted, and the system stays at the position of the last speaker.

Further, when the video conference system monitors that no person speaks within a period of time, or speaking time is too short, or sound positions acquired during speaking are too many, the camera is adjusted to be in a panoramic mode, so that conference scenes of all participants are determined to be acquired.

Further, the video conference system monitors that no one speaks within a period of time parameter, specifically, the time is more than 60 seconds;

the speaking time is too short, specifically, the speaking time is less than 5 seconds;

when the voice position obtained during speaking is too much, specifically when the voice print information of the speaker is obtained to be more than 3.

Further, a system using the video conference picture adjustment method of claim 1.

The invention has the following advantages: the invention achieves the purpose of rapidly gathering the participants through the information of the participants stored in the video conference system, can rapidly determine the attitude and the expression of the speaker by rapidly locking the position of the speaker and focusing and amplifying the picture of the speaker during the conference, improves the reality of the video conference, improves the user experience, lightens the operation amount of the participants of the conference, and can keep the integrity and the continuity of the conference experience so as to ensure that the video conference is smoothly carried out.

Drawings

Fig. 1 is a schematic flowchart of a video conference picture adjustment method according to an embodiment of the present application.

Detailed Description

The present invention will be described in further detail with reference to examples.

As shown in fig. 1, the method for adjusting the video conference picture of the present invention is specifically implemented as follows:

s1, initiating a video conference call;

Scene: the visitor enters the formed face and ID information (account information when the visitor makes a reservation for a meeting by adopting nailing software or other APP, or employee ID and face information, or the ID and face information of the meeting staff acquired by the host through various ways) through entrance guard when making a reservation or entering the door.

When a meeting is started, the face image information acquired by the video conference camera is automatically matched with the ID, so that the unified marking of the marked seat, the face head portrait and the ID account is realized.

S1, when a meeting is started, firstly, a meeting initiator calls participants in a participant group through nails, participants in a meeting room site do not need to call, and when a whole-member call is initiated in the meeting, participants in the meeting room site are screened out according to face information, seat information and nail ID information acquired in the site, participants not in the meeting room site are actively screened out, and a call is initiated to group participants not in the meeting room site to wait for accessing the meeting.

S2 after the conference is successfully accessed, the pickup parameters of the pickup of the on-site conference room are adjusted according to the complexity, noise and the like of the conference speaker and the conference room, so that the problem of noise of the surrounding environment is solved, the authenticity transmission of the conference audio is ensured, each device is connected through a network cable, the power can be supplied and transmitted, and the low delay is ensured.

S3 if the first step is to select to initiate the calling participant, the meeting room camera acquires the face information, voice and voiceprint information of the on-site participant and the on-site seat information, determines that ID, face, voice direction and voiceprint are uniform by combining the nail (or other APP) ID information and the face information, identifies who speaks, adjusts the angle according to the camera, focuses and aims at the speaker (determines the position by matching the microphone with the camera, and determines the speaker by the sound source position in order to adjust the speaker angle).

S4 verifies again that the speaker is focused and enlarged by adjusting the camera focal length (placed at the position of the maximum screen and adjusted to the best position to recognize the speaker 'S head, upper body hand, and arm motions based on the recognized speaker still image) based on the adjusted camera position (i.e., the speaker is determined based on the sound and voiceprint and the camera is adjusted to the speaker), in combination with the speaker' S mouth motion acquired by the camera over a period of time, the speaker is determined again and the camera focal length is adjusted to focus and enlarge the speaker. If the speaker identified by the camera has not been operated at the mouth after a period of time (greater than 5S) or other sound information is detected by the sound pick-up of the on-site conference equipment, the process re-enters S3.

Scene: when a video conference is initiated to start a multi-person discussion, face information, voiceprint information corresponding to a face ID of a mark is identified and acquired through position information of a sound source acquired by intelligent conference equipment and a mobile camera, and face characteristics of a speaker are focused through a speaking time adjusting camera.

Preferably, in step S1, a video conference call is initiated, and the conference initiator initiates the video conference call through the participant ID information in the video conference system.

Preferably, when the conference initiator initiates the video conference call through the ID information of the participants in the video conference system, the ID information of the participants can be obtained through a pre-sent short message link, third-party software, or pre-entered face and ID information when entering a conference room.

Preferably, when initiating the video conference call in step S1 and initiating a full member call to the group participants, the method determines the participants who are already present according to the face information, seat information and participant ID information obtained on site, and actively screens the participants who are not calling on the conference site.

The above "participant who is already present" specifically includes all the face information acquired before the conference start time set by the main participant and appearing in the camera device as the participant who is already present.

Preferably, step S2 further includes recognizing the speaker and the position thereof from the sound source position and the voiceprint, and adjusting the camera angle to aim at the speaker.

Preferably, in step S3, the camera angle and the focal length in the video conference room are adjusted, the camera is focused and enlarged to be placed at the position with the largest screen, and the focal length of the camera is adjusted to recognize the head, the upper hand and the arm of the speaker according to the recognized still image of the speaker.

Preferably, in step S3, when the video conference system detects that the sound volume of the speaker and the speaking time are less than the certain parameters, the video conference system does not adjust the position of the camera and stays at the position of the last speaker.

Preferably, when the video conference system monitors that no person speaks within a period of time, or the speaking time is too short, or the sound position acquired during speaking is too much, the camera is adjusted to be in a panoramic mode, so as to determine to acquire the meeting scenes of all the participants. And after the acquired sound and voiceprint information is carried out for a period of time, specifically more than 120 seconds, locking the speaker again, and adjusting the camera to be in a mode of focusing the speaker.

Preferably, the video conference system monitors that no one speaks within a period of time parameter, specifically, the time is more than 60 seconds;

when the number of sound positions acquired during speaking is too large, specifically, when the number of voiceprint information of the speaker is acquired to be more than 3.

Preferably, the invention also comprises a system using the video conference picture adjusting method.

Although the invention has been described in detail hereinabove with respect to a general description and specific embodiments thereof, it will be apparent to those skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. a video conference screen adjustment method, is characterized in that, comprises the steps:

S1. Initiate a video conference call;

S2. Acquire the speakers in the video conference and the complexity and on-site noise in the conference room, and adjust the pickup parameters of the on-site conference room pickup;

S3. Acquire the face information, voice and voiceprint information of the video conference speaker, and on-site seat direction information, so as to adjust the camera angle in the video conference room and focus on the speaker;

S4. When the speaker's voice, voiceprint information, and voice position information are changed, monitor whether the speaker's mouth moves, and if so, keep it unchanged; if not, return to step S3.

2 . The method for adjusting a video conference screen according to claim 1 , wherein a video conference call is initiated in the step S1 , wherein the conference initiator initiates the video conference call through participant ID information in the video conference system. 3 .

3. video conference picture adjustment method according to claim 1, conference initiator initiates video conference call by participant ID information in the video conference system, it is characterized in that: participant ID information can be linked by the short message of pre-sent, Access to third-party software, or pre-entered face and ID information when entering the conference room.

4. The video conference screen adjustment method according to claim 1, when initiating a video conference call in the step S1, it is characterized in that: when initiating an all-member call to the group participants, according to the face information obtained on the spot, The seat information and participant ID information determine the participants who are already present, and actively screen and call the participants who are not at the conference site.

5 . The method for adjusting a video conference image according to claim 1 , wherein in the step S2 , the method further comprises identifying the speaker and the position through the position of the sound source and the voiceprint, and adjusting the angle of the camera to aim at the speaker. 6 .

6. video conference picture adjustment method according to claim 1, in described step S3, adjust the camera angle and focal length in the video conference room, it is characterized in that: adjust camera focus to enlarge and place with picture maximum position, and static state according to the speaker of identification image, adjust the camera focal length to recognize the speaker's head and upper body hand and arm movements.

7. The video conference screen adjustment method according to claim 1, wherein in the step S3, when the video conference system detects that the speaker's voice volume and speaking time are less than a certain parameter, the camera position is not adjusted, and the camera continues to stay at the end. A speaker's position.

8. a kind of video conference picture adjustment method according to claim 1 is characterized in that: no one speaks in the video conference system monitoring a period of time parameter, or the speaking time is all too short, or when the sound position acquired when speaking is too many, Adjust the camera to panorama mode to determine the meeting scene for all participants.

9. a kind of video conference picture adjustment method according to claim 8, is characterized in that:

The video conferencing system monitors that no one speaks within a period of time parameters, specifically, the time is greater than 60 seconds;

When there are too many voice positions acquired when speaking, it is specifically when the acquired voiceprint information of the speaker is greater than 3.

10 . A system for adjusting a video conference screen, characterized in that: a system using the method for adjusting a video conference screen according to claim 1 . 11 .