CN102209225B

CN102209225B - Method and device for realizing video communication

Info

Publication number: CN102209225B
Application number: CN 201010137021
Authority: CN
Inventors: 岳中辉
Original assignee: Huawei Device Co Ltd
Current assignee: Huawei Device Shenzhen Co Ltd
Priority date: 2010-03-30
Filing date: 2010-03-30
Publication date: 2013-04-17
Anticipated expiration: 2030-03-30
Also published as: CN102209225A; WO2011120407A1

Abstract

The embodiment of the invention provides a method and device for realizing video communication, belonging to the technical field of communication. The method comprises the following steps of: after a local user and a remote user establish connection, acquiring head position information of the remote user; determining a loudspeaker playing method corresponding to the remote user according to the head position information of the remote user; and when the remote user speaks, reproducing according to the loudspeaker playing method corresponding to the speaker user. According to the method and the device, a direction in which the local user hears the voice of the remote user is basically kept consistent with a direction in which the local user sees an image of the remote user, and the sound-surround ambiance of users is improved.

Description

The implementation method of video communication and device

Technical field

The present invention relates to the communications field, relate in particular to a kind of implementation method and device of video communication.

Background technology

TV conference service can be understood as common video conferencing service, and it utilizes television equipment and communication network to hold a meeting by the multimedia communication means, can realize simultaneously two places or how between image, voice, the interactive function of data.The implementation method of the video communication that prior art provides is: the image of receiver, video side and voice data, voice data is adopted the stereophony encoding scheme, the voice data that gathers L channel cooperates view data to play back from the audio amplifier on the left side, gather the voice data of R channel, cooperate view data to play back from the right audio amplifier.

In realizing process of the present invention, the inventor finds prior art, and there are the following problems:

The scheme of prior art adopts the stereophony encoding and decoding that voice data is processed, and then the sound that picks up of L channel spreads out of from left side audio amplifier, and the sound that R channel picks up spreads out of from the right audio amplifier, forms the dual track listening area.The central acoustic image of dual track is unstable, takes back sometimes or takes over, and is larger with the gap of image.And can only be general distinguish three orientation, left, middle and right, the sound bearing is not accurate enough meticulous.

Summary of the invention

Embodiment of the present invention provides a kind of implementation method and device of video communication, the method and device can make that the local user hears that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially in the video communication, strengthen user's telepresenc.

The embodiment of the invention provides a kind of implementation method of video communication, and described method comprises:

After local device and remote equipment connect, obtain the head position information of remote subscriber;

Determine the loud speaker broadcast mode that described remote subscriber is corresponding according to the head position information of described remote subscriber; When remote subscriber was made a speech, the loud speaker broadcast mode corresponding according to the spokesman carried out playback.

The embodiment of the invention also provides a kind of device of realizing video communication, and described device comprises:

Acquiring unit is used for obtaining the head position information of remote subscriber after local device and remote equipment connect;

The playback control unit is used for determining the loud speaker broadcast mode that described remote subscriber is corresponding according to the head position information of described remote subscriber; When remote subscriber was made a speech, the loud speaker broadcast mode corresponding according to the spokesman carried out playback.

The present invention provides again a kind of system that realizes video communication, and this system comprises: remote equipment, local device and media server;

Remote equipment is used for gathering the Audio and Video data of remote subscriber, and sends to media server;

Media server is for the exchange of the Audio and Video data of finishing described remote equipment and described local device;

Local device is used for determining the loud speaker broadcast mode that described remote subscriber is corresponding according to the head position information of the remote subscriber that obtains after local user and remote subscriber connect; When remote subscriber was made a speech, the loud speaker broadcast mode corresponding according to the spokesman carried out playback.

The present invention provides a kind of video communication system again, and this system comprises: remote equipment, local device and multipoint control unit media server;

Media server, exchange for the Audio and Video data of finishing remote equipment and local device, and after local user and remote subscriber connect, determine the loud speaker broadcast mode that described remote subscriber is corresponding according to the head position information of the remote subscriber that obtains; When remote subscriber was made a speech, the loud speaker broadcast mode corresponding according to the spokesman sent reproduction command to local device;

Local device is used for carrying out playback according to described reproduction command control local sound reproduction device.

Can be found out by the above-mentioned technical scheme that provides, the technical scheme of the embodiment of the invention is after local device and remote equipment connect, obtain the header information of remote subscriber, and set up its corresponding loud speaker broadcast mode according to this header information, broadcast by this player method control loudspeaker, and then can make the local user hear that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially, strengthen user's telepresenc.

Description of drawings

Fig. 1 is the flow chart of the implementation method of a kind of video communication provided by the invention;

A kind of panel speaker array of figure that Fig. 2 provides for one embodiment of the invention;

A kind of panel speaker array of figure that Fig. 3 provides for one embodiment of the invention;

The flow chart of the implementation method of a kind of video communication that Fig. 4 provides for one embodiment of the invention;

The flow chart of the implementation method of a kind of video communication that Fig. 5 provides for another embodiment of the present invention;

The flow chart of the implementation method of a kind of video communication that Fig. 6 provides for further embodiment of this invention;

Fig. 7 is the structure chart of the implement device of a kind of video communication provided by the invention;

Fig. 8 is the structure chart of the realization system of a kind of video communication provided by the invention;

Fig. 9 is the technology scene graph that the present invention realizes embodiment one described method;

Figure 10 is the upper and lower schematic diagram that arranges of loud speaker provided by the invention;

Figure 11 is the left and right schematic diagram that arranges of loud speaker provided by the invention.

Embodiment

Embodiment of the present invention provides a kind of implementation method of video communication, and the method comprises the steps: as shown in Figure 1

S11, after local user and remote subscriber connect, obtain the head position information of remote subscriber;

The concrete grammar of the above-mentioned head position information of obtaining remote subscriber can for, method by the image processing, for example: face recognition technology, obtain the head position information of remote subscriber, perhaps obtain the head position information of remote subscriber by manual method, namely by for the far-end participant distributes fixing position, and then the area information of its head position itself is determined.

S12, determine the loud speaker broadcast mode that described remote subscriber is corresponding according to the head position information of described remote subscriber;

S13, when remote subscriber is made a speech, the loud speaker broadcast mode corresponding according to the spokesman carries out playback.

Optionally, the concrete grammar of above-mentioned definite remote subscriber speech can adopt following methods, for example, adopt face recognition technology to determine spokesman in the remote subscriber for the image of remote subscriber, can also transmit the spokesman that the audio code stream that comes is judged remote subscriber by far-end Mike by media server (take multipoint control unit (Multipoint Control Unit, MCU) as example).

Above-mentioned media server by far-end Mike transmit the audio code stream that comes judge remote subscriber the spokesman concrete grammar can for: here take remote subscriber as 3 people as example, certainly user's number also can be other number under the actual conditions, when the user is 3 people, remote site is that 3 participants arrange respectively a Mike, for example user A is distributed Mike 1, user B is distributed Mike 2, user C is distributed Mike 3; When if media server receives the audio code stream of Mike's 1 transmission, then confirm user A speech, in like manner, when media server receives Mike 2 code stream, confirm user B speech, when media server receives Mike 3 code stream, confirm user C speech, by this microphone and participant's corresponding relation, determine the spokesman of speech.

The mode of above-mentioned affirmation user speech in for example only be realize that the present invention carries out for example, in actual applications, the present invention does not limit the concrete grammar of confirming user's speech, as long as it can confirm that the user makes a speech.

Optionally, above-mentioned according to loud speaker broadcast mode corresponding to spokesman carry out method that playback realizes can for, local device is controlled tone playing equipment corresponding to this spokesman according to loud speaker broadcast mode corresponding to spokesman and is carried out playback; The method can also for, media server sends reproduction command according to loud speaker broadcast mode corresponding to spokesman to local device, local device carries out playback according to tone playing equipment corresponding to this reproduction command control this spokesman of control.

Optionally, when loud speaker is the panel speaker array, realize that S12,13 method are specifically as follows:

According to the loud speaker in this panel speaker array of its correspondence of head position validation of information of this remote subscriber, when this remote subscriber speech, start loud speaker corresponding to spokesman and carry out playback.

Optionally, when loud speaker is upper and lower the setting, realize that S12,13 method are specifically as follows: the image of remote subscriber is shown up and down, and calculate remote subscriber head position center to the vertical range that shows picture centre, calculate the ratio of this vertical range and described demonstration image total height;

Difference=the 8X* of upper speaker and lower speaker volume (this vertical range of 0.5-and this show the ratio of image total height) dB (formula 1);

And according to carrying out playback after the volume adjustment of described difference to upper and lower loud speaker; The below illustrates the concrete operations of adjusting playback with an example, here suppose that upper and lower loud speaker difference is 3dB, the volume that then controls upper speaker is 43dB, the volume of lower loud speaker is 40dB, wherein the value of upper speaker volume is reference volume, this reference volume user can set up on their own, as being above-mentioned 43dB, can certainly be 53dB, 60dB etc.Certain above-mentioned difference also can be-3dB, when be-during 3dB, its control method can for, the volume of control upper speaker is 40dB, the volume of the lower loud speaker of control is 43dB, and the volume of the upper speaker here also is reference volume, and concrete volume value user also can set up on their own.

The accoustic coefficient that above-mentioned X sets for the user.

The center and the total height that show image have different designs according to the mode that image shows.When above-mentioned image display pattern adopts projection, the center of demonstration image and the center that total height is respectively projected image and the total height of projected image; When above-mentioned image display pattern adopts display to show, the center of demonstration image and the center that total height is respectively display pannel and the height of display pannel.

Optionally, when the loud speaker of local terminal is left and right the setting, realize that S12,13 method are specifically as follows:

Show about the image with remote subscriber, and calculate remote subscriber head position information centre to the horizontal range that shows picture centre, calculate the ratio of described horizontal range and described demonstration image overall width;

Difference=the 8X* of left speaker and right speaker volume (ratio of the described horizontal range of 0.5-and described demonstration image overall width) dB (formula 2);

According to carrying out playback after the volume adjustment of described difference to left and right loud speaker; The below illustrates the concrete operations of adjusting playback with an example, here suppose that left and right loud speaker difference is 4dB, the volume that then controls left speaker is 44dB, the volume of right loud speaker is 40dB, wherein the value of left speaker volume is reference volume, this reference volume user can set up on their own, as being above-mentioned 44dB, can certainly be 54dB, 60dB etc.Certain above-mentioned difference also can be-4dB, when be-during 4dB, its control method can for, the volume of control left speaker is 40dB, the volume that controls right loud speaker is 44dB, and the volume of the left speaker here also is reference volume, and concrete volume value user also can set up on their own.

The accoustic coefficient that above-mentioned X sets for the user.

Method provided by the invention is determined the loud speaker broadcast mode that it is corresponding according to the head position information of remote subscriber, when remote subscriber is made a speech, start loud speaker broadcast mode corresponding to spokesman and carry out playback, reach the local user and heard the purpose that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially, strengthened user's telepresenc.

For clearer and more definite explanation enforcement of the present invention, describe below by specific embodiment:

Embodiment one: present embodiment provides a kind of implementation method of video communication, the technology scene that realizes is, the present invention is at local device, media server, finish between the system that remote equipment forms that (scene of its specific implementation as shown in Figure 9, wherein look audio collecting device A, B, C, D, E is responsible for respectively gathering remote subscriber A, B, C, the video/audio of D and local user E), wherein media server is finished the exchange of the Audio and Video data of remote equipment and local device, remote equipment gathers the Audio and Video data of remote subscriber, and sends to media server; Wherein remote equipment can be one, also can be for a plurality of; Local user and remote subscriber carry out video communication by display device, and the display device in the present embodiment is take projecting apparatus as example, and the panel speaker array are set (such as Fig. 2 or shown in Figure 3 at projection plane; Wherein 1～36 represent respectively zone that array distributes and corresponding loud speaker numbering among Fig. 2; 1～9 represents respectively the regional and corresponding loud speaker numbering that array distributes among Fig. 3), suppose that here remote subscriber has 4 people, be set as respectively A, B, C, D, the local user is set as E; Then said method can as shown in Figure 4, only take panel speaker display shown in Figure 2 as example describes, comprise the steps: here

After S41, local terminal meeting-place and remote site connect, start each head position information that face recognition technology is determined A, B, C and D by remote equipment;

The method of above-mentioned definite A, B, C and D head position information only describes as an example of face recognition technology example, in actual applications can also be with other mode, such as manual confirmation A, B, C and D head position information or use other recognition technology, for example: the angle according to ergonomics is determined, the participant is in the positional information in meeting-place, and the present invention does not limit to the concrete grammar of determining A, B, C and D head position information.

Optionally, when implementing this step, better mode is the participant's image information that directly gathers remote site by far-end, therefrom utilizes the technology of recognition of face to determine each participant's positional information.

S42, each head position of loud speaker in the corresponding panel speaker array respectively of determining this A, B, C and D according to each head position information of A, B, C and D;

Realize S42 concrete grammar can for, as shown in Figure 2, number according to loud speaker is 36 zones with the panel speaker array partition, determines that by face recognition technology the head position information of A is positioned at zone 11 as shown in Figure 2, confirms that then loud speaker corresponding to A head is loud speaker 11; Determine that in like manner B, C, the loud speaker that the D head is corresponding are respectively:

loud speaker

13,15,17.In actual conditions, can also occur determining that by face recognition technology the head position information of A is positioned at a plurality of zones as shown in Figure 2, the

zone

10 and 11 of Fig. 2 for example, perhaps the

zone

21,22 and 23, at this moment, confirm that loud speaker corresponding to A head is loud speaker corresponding to All Ranges corresponding to A head position information, the zone 10 that for example head is corresponding and 11 o'clock, determine that loud speaker is

loud speaker

10 and 11, such as zone 21 corresponding to head, 22 and, determine that loud speaker is

loud speaker

21,22,23 at 23 o'clock.

S43, when remote subscriber is made a speech, start loud speaker corresponding to spokesman and carry out playback; When for example A makes a speech, start loud speaker corresponding to A and carry out playback.

The above-mentioned method of knowing that remote subscriber is made a speech has multiple, and whether the oral area that can detect with the mode of manual identified people's face changes, also can detect by the mode of audio collection this remote subscriber and make a speech.

Optionally, above-mentioned control method to the local terminal audio output device can be finished by local conference facility, also can be finished by media server;

When finishing by local device, remote equipment sends to the local terminal conference facility by media server with the user images information of remote site, set up the corresponding relation between the audio output device of participant's information of remote site and local terminal by local device, when the participant of remote site makes a speech, the spokesman is determined in recognition of face by local terminal, and then finish control to the audio output device of local terminal by local device, in the present embodiment, loud speaker sounding by local loudspeaker array corresponding to control far-end spokesman is to realize the control of loudspeaker array, and then realize that the local user hears that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially, increase the effect of user's telepresenc;

When finishing by media server, determine the information of the audio output device of local conference terminal by media server, this information can comprise: the type of this audio output device, number, arrangement mode etc., and then after the user's who gets access to remote site image information, header information according to this image information acquisition remote subscriber, set up the corresponding relation of the audio output device of the header information of remote subscriber and local terminal for local meeting-place, and then, when some users of remote site make a speech, detect the sound source position of the remote site that remote site sends over by media server, and then according to the corresponding relation of the audio output device of the header information of remote subscriber and local terminal, determine that loud speaker corresponding in the audio output device of local terminal finishes the output of sound, pass through present embodiment, can be so that process accordingly and control function and realize at media server, realize that the local user hears that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially, increase the effect of user's telepresenc, also reduced the complexity of local device realization this programme simultaneously.

The method that present embodiment provides is determined the respectively loud speaker of correspondence of its head position information according to each head position information of A, B, C and D, when remote subscriber is made a speech, start loud speaker corresponding to spokesman and carry out playback, reach the local user and heard the purpose that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially, increased user's telepresenc.

Another embodiment: present embodiment provides a kind of implementation method of video communication, the technology scene of its realization is: the method that present embodiment provides is at local device, media server, finish between the system that remote equipment forms, wherein media server is finished the exchange of the Audio and Video data of remote equipment and local device, remote equipment gathers the Audio and Video data of remote subscriber, and sends to media server; Local user and remote subscriber carry out video communication by display device, above-mentioned display device can for, display, LCD TV, plasm TV etc.The upper and lower midline position of supposing display device arranges respectively a loud speaker (as shown in figure 10), certainly the setting of loud speaker also can depart from the midline position of display device, the center line that display device is set such as the upper speaker position of taking back, the display device center line that lower loud speaker the arranges LCD TV position that takes over, the present invention is when upper and lower the setting, the left and right particular location that does not limit to loud speaker, what only need to guarantee display device upper and lowerly respectively arranges a loud speaker and gets final product; Here suppose that remote subscriber has 4 people, be set as respectively A, B, C, D, the local user is set as E; Suppose A, B, C, D head portrait put in order for, from top to bottom: A, B, C, D; The mouth center of the equal finger picture in head portrait position in the present embodiment; Then said method can as shown in Figure 5, comprise the steps:

Step 51: after local meeting-place and remote site connect, start each head position information that face recognition technology is determined A, B, C and D by remote equipment;

Step 52, calculate it according to A, B, C, D head portrait orientation and and calculate this vertical range and the ratio that shows image (being the total height that display device shows image) total height to the vertical range that shows picture centre (display device center) in the head position center separately;

Step 53, when remote subscriber is made a speech, adjust the volume of upper and lower loud speaker according to ratio number corresponding to this spokesman, and carry out playback by the volume after adjusting.

Its concrete adjustment mode can for: suppose that A, B, the ratio number that C, D are corresponding are respectively: 0.125,0.375,0.625,0.875; The difference of the volume of the upper and lower loud speaker of the correspondence that then calculates according to above-mentioned formula 1 (wherein X=3) is respectively: 9dB, 3dB ,-3dB and-9dB.Certainly when the X of formula 1 gets other values, corresponding difference also can be other numerical value, and as when the X=2, the difference that calculates is respectively: 6dB, 2dB ,-2dB and-6dB, the value of X also can be other numerical value in actual conditions, and the user can set up the concrete numerical value of X on their own here.

After the user sets a reference note value, for example the volume value with upper speaker is set as the reference note value, be specifically as follows 40dB, the volume that then controls upper speaker is 40dB, the volume of lower loud speaker is 43dB (X=3 wherein, ratio is 0.625) or 38dB (wherein X=2, ratio is 0.375).Certainly in actual conditions, also can be other volume value.

The principle that the below realizes with present embodiment illustrates the technique effect of present embodiment, prove by experiment, people's ear is hearing that the pronunciation of two sound sources (for example goes up, lower) time, its actual sound of experiencing is that the three unities sends out, we generally with this place being virtual sound source, for example, when the volume of two sound sources is consistent, synthetic virtual sound source is the center of two sound sources, is set to such as sound source, under when arranging, the giving great volume of upper sound source, then synthetic virtual sound source is close to upper sound source position, in like manner, the giving great volume of lower sound source, then synthetic virtual source position is close to lower sound source position.When so the situation such as present embodiment occurring, be specifically as follows: when the spokesman makes a speech, can adjust by the volume of the upper and lower sound source of control (present embodiment is loud speaker) position of its synthetic virtual sound source, with the position modulation of this virtual sound source during to spokesman's picture position, the local user hears that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially, increases the effect of user's telepresenc.

The method that present embodiment provides arrives the vertical range that shows picture centre and the ratio that shows the image overall width according to each head position information calculations head portrait of A, B, C and D, and control the volume of upper and lower loud speaker according to this ratio, thereby carry out playback, reach the local user and heard the purpose that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially, increased user's telepresenc.

Above-mentioned another embodiment when the loud speaker of display device when being horizontally disposed with, head portrait can be carried out level shows, and ratio is modified as, then head portrait carries out the calculating of volume difference according to formula 2 to the horizontal range that shows picture centre and the ratio that shows the image overall width.This be horizontally disposed with loud speaker can for, at the left and right midline position of display device one loud speaker (as shown in figure 11) is set respectively; The setting that this loud speaker is set certainly also can depart from the midline position of display device, the center line that is arranged on display device such as left speaker is the position on the upper side, the center line that right loud speaker is arranged on display device is the position on the lower side, the present invention is when being horizontally disposed with, the upper and lower particular location that does not limit to loud speaker, what only need to guarantee display device left and rightly respectively arranges a loud speaker and gets final product.

When people's ear when hearing two sound sources pronunciations (for example left and right), its actual sound of experiencing is that the three unities sends out, we generally with this place being virtual sound source, for example, when the volume of two sound sources is consistent, synthetic virtual sound source is the center of two sound sources, when being set to left and right the setting such as sound source, giving great volume of left sound source, then synthetic virtual sound source is close to left sound source position, in like manner, the giving great volume of right sound source, then synthetic virtual source position is close to the bottom right sound source position.When so the situation such as present embodiment occurring, be specifically as follows: when the spokesman makes a speech, can adjust by the volume of the left and right sound source of control (present embodiment is loud speaker) position of its synthetic virtual sound source, with the position modulation of this virtual sound source during to spokesman's picture position, the local user hears that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially, increases the effect of user's telepresenc.

The invention provides another embodiment, present embodiment is at local device, media server, finish between the system that remote equipment forms, wherein media server is finished the exchange of the Audio and Video data of remote equipment and local device, remote equipment gathers the Audio and Video data of remote subscriber, and sends to media server; Local user and remote subscriber carry out video communication by projection, and at projection plane panel speaker array (such as Fig. 2) is set, here suppose that remote subscriber has 4 people, be set as respectively A, B, C, D, remote equipment is that A, B, C, D distribute respectively

Mike

1,2,3,4; The local user is set as E; Then said method can as shown in Figure 6, comprise:

After S61, local meeting-place and remote site connect, start each head position information that face recognition technology is determined A, B, C and D by remote equipment;

The method that realizes S61 is specifically as follows: the method for above-mentioned definite A, B, C and D head position information only describes as an example of face recognition technology example, in actual applications can also be with other mode, such as manual confirmation A, B, C and D head position information or use other recognition technology, for example: the angle according to ergonomics is determined, the participant is in the positional information in meeting-place, and the present invention does not limit to the concrete grammar of determining A, B, C and D head position information.

The head that S62, local device determine this A, B, C and D according to the head position information of A, B, C and D is the position of loud speaker in the corresponding panel speaker array respectively;

Realize S52 concrete grammar can for, as shown in Figure 2, number according to loud speaker is 36 zones with the panel speaker array partition, determines that by face recognition technology the head position information of A is positioned at zone 11 as shown in Figure 2, confirms that then loud speaker corresponding to A head is loud speaker 11; Determine that in like manner B, C, the loud speaker that the D head is corresponding are respectively:

loud speaker

zone

10 and 11 of Fig. 2 for example, perhaps the

zone

loud speaker

21,22,23 at 23 o'clock.

When the audio code stream that S63, media server send according to Mike 1 was determined the A speech, the audio code stream that Mike 1 is sent and definite A sent to local device for spokesman's information;

Realize S63 practical methods can for: because remote subscriber has A, B, C, D, it distributes respectively

Mike

1,2,3,4, then media server is set up the corresponding relation of Mike 1 and user A, in like manner, set up the corresponding relation of Mike 2 and user B, the corresponding relation of Mike 3 and user C, the corresponding relation of Mike 4 and user D, then when media server detects the audio code stream that Mike 1 sends over, according to the corresponding relation of Mike 1 with user A, determine user A speech, and the audio code stream that Mike 1 is sent and definite A send to local device for spokesman's information.

S64, local device start loud speaker corresponding to A and play the audio code stream that Mike 1 sends.

Optionally, the step finished of above-mentioned local device all can be controlled local device by media server and finishes.

The local device of the method that present embodiment provides is determined the respectively loud speaker of correspondence of its head position information according to the head position information of A, B, C and D, when media server is determined the spokesman, start loud speaker corresponding to spokesman by local device and carry out playback, reach the local user and heard the purpose that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially, increased user's telepresenc.

The present invention also provides a kind of device of realizing video communication, and this installs as shown in Figure 7, and wherein the dotted line module represents optional module, and this device specifically comprises:

Acquiring unit 71 is used for obtaining the head position information of remote subscriber after local user and remote subscriber connect;

Playback control unit 72 is used for determining the loud speaker broadcast mode that described remote subscriber is corresponding according to the head position information of described remote subscriber; When remote subscriber was made a speech, the loud speaker broadcast mode corresponding according to the spokesman carried out playback.

Optionally, when described loud speaker was the panel speaker array, playback control unit 72 comprised:

Array module 721 is used for the loud speaker according to the described panel speaker array of its correspondence of head position validation of information of described remote subscriber,

Playback module 722 when described remote subscriber speech, starts loud speaker corresponding to spokesman and carries out playback.

Optionally, when described loud speaker was upper and lower the setting, playback control unit 72 comprised:

High computational module 723 is used for the image of remote subscriber is shown up and down, and calculates remote subscriber head position center to the vertical range that shows picture centre, calculates the ratio of described vertical range and described demonstration image total height;

Vertically playback module 724 is used for carrying out playback after the volume adjustment of volume difference to upper and lower loud speaker according to upper and lower loud speaker; The computational methods of the difference of upper speaker and lower speaker volume can be referring to the description in the formula 1.

Optionally, when described loud speaker was left and right the setting, playback control unit 72 comprised:

Width computing module 725 is used for showing about the image with remote subscriber, and calculates remote subscriber head position center to the horizontal range that shows picture centre, calculates the ratio of described horizontal range and described demonstration image overall width;

Horizontal playback module 726 is used for carrying out playback after the volume adjustment of volume difference to left and right loud speaker according to left and right loud speaker; The difference of left speaker and right speaker volume can be referring to the description in the formula 2.

Optionally, this device can be the equipment of individualism, and this device also can be installed in the local device certainly, and certainly in actual conditions, this device also can be installed in the media server.

Device provided by the invention is determined the loud speaker player method that it is corresponding according to the head position information of remote subscriber, when remote subscriber is made a speech, start loud speaker player method corresponding to spokesman and carry out playback, reach the local user and heard the purpose that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially, increased user's telepresenc.

The present invention also provides a kind of system that realizes video communication, this system as shown in Figure 8: comprising: remote equipment 81, local device 82 and media server 83;

Remote equipment 81 is used for gathering the Audio and Video data of remote subscriber, and sends to media server 83;

Media server 83 is for finishing the exchange of remote equipment 81 with the Audio and Video data of local device 82;

Local device 82 is used for determining the loud speaker broadcast mode that described remote subscriber is corresponding according to the head position information of the remote subscriber that obtains after local user and remote subscriber connect; When remote subscriber was made a speech, the loud speaker broadcast mode corresponding according to the spokesman carried out playback.

Local device 82 in the system provided by the invention can be determined the loud speaker broadcast mode that it is corresponding according to the head position information of remote subscriber, when remote subscriber is made a speech, start loud speaker broadcast mode corresponding to spokesman and carry out playback, reach the local user and heard the purpose that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially, increased user's telepresenc.

The present invention also provides another kind of video communication system, and this system comprises: remote equipment, local device and media server;

Media server is for the exchange of the Audio and Video data of finishing remote equipment and local device;

Media server also is used for determining the loud speaker broadcast mode that described remote subscriber is corresponding according to the head position information of the remote subscriber that obtains after local user and remote subscriber connect; When remote subscriber was made a speech, the loud speaker broadcast mode corresponding according to the spokesman sent reproduction command to local device 82;

Local device is used for carrying out playback according to this reproduction command control local sound reproduction device.

Media server in the system provided by the invention can be determined the loud speaker broadcast mode that it is corresponding according to the head position information of remote subscriber, when remote subscriber is made a speech, start loud speaker broadcast mode corresponding to spokesman and carry out playback, reach the local user and heard the purpose that the orientation of the image of the orientation of remote subscriber sound and the remote subscriber that the local user sees is consistent substantially, increased user's telepresenc.

It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the module in the accompanying drawing or flow process might not be that enforcement the present invention is necessary.

One of ordinary skill in the art will appreciate that all or part of step that realizes in above-described embodiment method can come the relevant hardware of instruction finish by program, described program can be stored in a kind of computer-readable recording medium, this program comprises step of embodiment of the method one or a combination set of when carrying out.

In sum, the technical scheme that the specific embodiment of the invention provides, the orientation with image of the remote subscriber that orientation that the local user hears remote subscriber sound and local user see is consistent substantially, has increased the advantage of user's telepresenc.

More than the embodiment of the invention is described in detail, used specific case herein principle of the present invention and execution mode set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims

1. the implementation method of a video communication is characterized in that, described method comprises:

After local user and remote subscriber connect, obtain the head position information of remote subscriber;

Determine the loud speaker player method that described remote subscriber is corresponding according to the head position information of described remote subscriber; When remote subscriber was made a speech, the loud speaker broadcast mode corresponding according to the spokesman carried out playback, specifically comprises:

When described loud speaker is upper and lower the setting, the image of remote subscriber is shown up and down, and calculate remote subscriber head position center to the vertical range that shows picture centre, calculate the ratio of described vertical range and described demonstration image total height;

The ratio of the difference=8X*(0.5 of upper speaker and lower speaker volume-described vertical range and described demonstration image total height) dB; And according to carrying out playback after the volume adjustment of described difference to upper and lower loud speaker;

The accoustic coefficient that above-mentioned X sets for the user.

2. method according to claim 1 is characterized in that, determines the loud speaker broadcast mode that described remote subscriber is corresponding according to the head position information of described remote subscriber; When remote subscriber is made a speech, carry out playback according to loud speaker broadcast mode corresponding to spokesman and specifically comprise:

When described loud speaker is the panel speaker array, according to the loud speaker in the described panel speaker array of its correspondence of head position validation of information of described remote subscriber, when described remote subscriber speech, starts loud speaker corresponding to spokesman and carry out playback.

3. method according to claim 1 is characterized in that, determines the loud speaker broadcast mode that described remote subscriber is corresponding according to the head position information of described remote subscriber; When remote subscriber is made a speech, carry out playback according to loud speaker broadcast mode corresponding to spokesman and specifically comprise:

When described loud speaker is left and right the setting, show about the image with remote subscriber, and calculate remote subscriber head position center to the horizontal range that shows picture centre, calculate the ratio of described horizontal range and described demonstration image overall width;

The ratio of the difference=8X*(0.5 of left speaker and right speaker volume-described horizontal range and described demonstration image overall width) dB; And according to carrying out playback after the volume adjustment of described difference to left and right loud speaker;

The accoustic coefficient that above-mentioned X sets for the user.

4. a device of realizing video communication is characterized in that, described device comprises:

Acquiring unit is used for obtaining the head position information of remote subscriber after local user and remote subscriber connect;

The playback control unit is used for determining the loud speaker player method that described remote subscriber is corresponding according to the head position information of described remote subscriber; When remote subscriber was made a speech, the loud speaker broadcast mode corresponding according to the spokesman carried out playback;

Wherein, when described loud speaker was upper and lower the setting, described playback control unit comprised:

The high computational module is used for the image of remote subscriber is shown up and down, and calculates remote subscriber head position center to the vertical range that shows picture centre, calculates the ratio of described vertical range and described demonstration image total height;

Vertically playback module is used for carrying out playback after the volume adjustment of volume difference to upper and lower loud speaker according to upper and lower loud speaker; The ratio of the difference=8X*(0.5 of upper speaker and lower speaker volume-described vertical range and described demonstration image total height) dB; The accoustic coefficient that above-mentioned X sets for the user.

5. device according to claim 4 is characterized in that, when described loud speaker was the panel speaker array, described playback control unit comprised:

The location confirmation module is used for the loud speaker according to the described panel speaker array of its correspondence of head position validation of information of described remote subscriber,

Playback module when described remote subscriber speech, starts loud speaker corresponding to spokesman and carries out playback.

6. device according to claim 4 is characterized in that, when described loud speaker was left and right the setting, described playback control unit comprised:

The width computing module is used for showing about the image with remote subscriber, and calculates remote subscriber head position center to the horizontal range that shows picture centre, calculates the ratio of described horizontal range and described demonstration image overall width;

Horizontal playback module is used for carrying out playback after the volume adjustment of volume difference to left and right loud speaker according to left and right loud speaker; The ratio of the difference=8X*(0.5 of left speaker and right speaker volume-described horizontal range and described demonstration image overall width) dB; The accoustic coefficient that above-mentioned X sets for the user.

7. a system that realizes video communication is characterized in that, described system comprises: remote equipment, local device and multipoint control unit media server;

Local device is used for determining the loud speaker broadcast mode that described remote subscriber is corresponding according to the head position information of the remote subscriber that obtains after local user and remote subscriber connect; When remote subscriber was made a speech, the loud speaker broadcast mode corresponding according to the spokesman carried out playback, wherein:

When described loud speaker is upper and lower the setting, described local device shows the image of remote subscriber up and down, and calculate remote subscriber head position center to the vertical range that shows picture centre, calculate the ratio of described vertical range and described demonstration image total height;

The accoustic coefficient that above-mentioned X sets for the user.

8. a video communication system is characterized in that, described system comprises: remote equipment, local device and multipoint control unit media server;

Media server, exchange for the Audio and Video data of finishing remote equipment and local device, and after local user and remote subscriber connect, determine the loud speaker broadcast mode that described remote subscriber is corresponding according to the head position information of the remote subscriber that obtains; When remote subscriber was made a speech, the loud speaker broadcast mode corresponding according to the spokesman sent reproduction command to local device, wherein:

When described loud speaker is upper and lower the setting, described media server shows the image of remote subscriber up and down, and calculate remote subscriber head position center to the vertical range that shows picture centre, calculate the ratio of described vertical range and described demonstration image total height;

The accoustic coefficient that above-mentioned X sets for the user;