CN103390410A

CN103390410A - Telephone conference system and method

Info

Publication number: CN103390410A
Application number: CN2012101442289A
Authority: CN
Inventors: 徐筱琦; 杨朝光
Original assignee: Acer Inc
Current assignee: Acer Inc
Priority date: 2012-05-10
Filing date: 2012-05-10
Publication date: 2013-11-13

Abstract

The invention provides a remote telephone conference system and a method, wherein the system comprises: a far-end microphone array for receiving far-end sound; a voice identification module for identifying a plurality of sound sources from the remote voice; a near-end display interface for displaying the plurality of sound sources; and the sound adjusting module is used for adjusting a sound characteristic of each sound source respectively. The invention can visualize the spatial position of the far-end conference participants, is more beneficial to the near-end conference participants to know the seat relationship of the far-end conference participants compared with the prior art, and provides a basis for adjusting sound parameters so as to achieve the purpose of improving the quality of the remote telephone conference.

Description

Telephone conference system and method

技术领域 technical field

本发明涉及电话会议技术。The invention relates to teleconferencing technology.

背景技术 Background technique

远程电话会议系统是一种商务办公常见的通信手段，其能够使双方、三方、甚至多方人员不受地域限制的进行沟通。The teleconferencing system is a common means of communication in business offices, which enables two parties, three parties, or even multiple parties to communicate without geographical restrictions.

在远程电话会议中，就通话的双方而言，远端或近端的与会人员皆可能不只一人。某些会议系统会分别为各个与会人员配置专用的麦克风，如此虽可确保每个与会人员的发言可被确实接收，但其身份验证程序及会议管理机制较为复杂；除此之外，当与会人员增加时，对麦克风数量的需求即随之增加，而相邻麦克风间声音干扰的情形亦会变得更加严重。为了方便电话会议系统的架设，多数的电话会议不会为每个与会人员配置专用的麦克风，而是让各方所有的与会人员共享相同的麦克风。然而，受限于座位的安排，当与会人员距离麦克风的远近有所不同时，麦克风的收音效果也会随之有所不同，如此即减损了双方通话的质量。In a teleconference, as far as both parties are concerned, there may be more than one participant at the far or near end. Some conferencing systems will configure dedicated microphones for each participant. Although this can ensure that the speech of each participant can be received, the authentication procedure and conference management mechanism are more complicated; in addition, when the participants When the number of microphones increases, the demand for the number of microphones will increase, and the sound interference between adjacent microphones will become more serious. In order to facilitate the erection of the conference call system, most conference calls do not configure a dedicated microphone for each participant, but let all participants share the same microphone. However, limited by the arrangement of the seats, when the distance between the participants and the microphone is different, the sound collection effect of the microphone will also be different, which will impair the quality of the conversation between the two parties.

因此需要一种更方便好用的远程电话会议系统及方法。Therefore, a more convenient and easy-to-use teleconferencing system and method are needed.

发明内容 Contents of the invention

为了克服现有技术的缺陷，本发明提供一种远程电话会议系统，其包括：一远端麦克风数组，设置于远端，用以接收远端声音；一声音辨识模块，耦接至该远端麦克风数组，用以从远端声音中辨识出多个音源；一近端显示界面，设置于近端，耦接至该声音辨识模块，用以显示该声音辨识模块所辨识出的所述多个音源；一声音调整模块，耦接至该声音辨识模块，用以分别针对各该音源的一声音特征进行调整。In order to overcome the defects of the prior art, the present invention provides a teleconferencing system, which includes: a remote microphone array, arranged at the far end, to receive the sound from the far end; a sound recognition module, coupled to the far end A microphone array, used to identify multiple sound sources from far-end sounds; a near-end display interface, set at the near end, coupled to the sound recognition module, for displaying the multiple sound sources identified by the sound recognition module a sound source; a sound adjustment module, coupled to the sound recognition module, for adjusting a sound feature of each of the sound sources.

本发明另提供一种远程电话会议方法，其包括：以一远端麦克风数组接收远端声音；从远端声音中辨识出多个音源；以一近端显示界面显示所辨识出的所述多个音源；分别针对各该音源的至少一声音特征进行调整。The present invention also provides a method for teleconferencing, which includes: receiving far-end sound with a far-end microphone array; identifying multiple sound sources from the far-end sound; and displaying the identified multiple sound sources with a near-end display interface. sound sources; adjusting at least one sound characteristic of each of the sound sources.

本发明可将远端与会人员的空间位置予以视觉化，相对于现有技术而言，更有助于近端与会人员了解远端与会人员的座位关系，并借此提供调整声音参数的基础，达到提升远程电话会议质量的目的。The present invention can visualize the spatial position of the far-end participants. Compared with the prior art, it is more helpful for the near-end participants to understand the seat relationship of the far-end participants, thereby providing a basis for adjusting sound parameters. To achieve the purpose of improving the quality of remote teleconferencing.

附图说明 Description of drawings

图1是依据本发明一实施例的远程电话会议系统架构示意图。FIG. 1 is a schematic diagram of a teleconferencing system architecture according to an embodiment of the present invention.

图2为依据本发明一实施例的远程电话会议方法流程图。FIG. 2 is a flow chart of a teleconferencing method according to an embodiment of the present invention.

其中，附图标记说明如下：Wherein, the reference signs are explained as follows:

100~远程电话会议系统；100~ remote teleconferencing system;

102~远端麦克风数组；102~far-end microphone array;

104~声音辨识模块；104~sound recognition module;

106~近端显示界面；106~proximal display interface;

108~近端控制界面；108~ near-end control interface;

110~声音调整模块；110~sound adjustment module;

112~声音播放模块；112~sound playback module;

S202~S210~步骤。S202~S210~steps.

具体实施方式 Detailed ways

下文为介绍本发明的最佳实施例。各实施例用以说明本发明的原理，但非用以限制本发明。本发明的范围当以所附的权利要求书为准。The following describes the preferred embodiment of the present invention. Each embodiment is used to illustrate the principles of the present invention, but not to limit the present invention. The scope of the present invention should be determined by the appended claims.

为了使远程电话会议系统更易于使用，本发明提供一种新式远程电话会议系统。下文将配合附图说明本发明的远程电话会议系统的各种实施例。In order to make the teleconferencing system easier to use, the present invention provides a novel teleconferencing system. Various embodiments of the teleconferencing system of the present invention will be described below with reference to the accompanying drawings.

远程电话会议系统teleconferencing system

图1是依据本发明一实施例的远程电话会议系统架构示意图。本发明的远程电话会议系统100至少包括：一远端麦克风数组102、一声音辨识模块104、一近端显示界面106、一近端控制界面108、一声音调整模块110以及一声音播放模块112。为方便说明，下文的实施例皆以单向通信为例(即远端使用者说话、近端使用者收听)，然而，本发明当然不必以此为限，本领域普通技术人员可轻易将本发明应用在双向通信上。同理，本发明不限定于双方通话的类型，多方通话的类型也在本发明所涵盖范围之内。FIG. 1 is a schematic diagram of a teleconferencing system architecture according to an embodiment of the present invention. The teleconferencing system 100 of the present invention at least includes: a remote microphone array 102 , a sound recognition module 104 , a near-end display interface 106 , a near-end control interface 108 , a sound adjustment module 110 and a sound playback module 112 . For the convenience of description, the following embodiments all take one-way communication as an example (that is, the far-end user speaks, and the near-end user listens). The invention is applied to two-way communication. Similarly, the present invention is not limited to the types of two-party calls, and the types of multi-party calls are also within the scope of the present invention.

本发明的远端麦克风数组102设置于远端，可用以接收远端声音。一般而言，麦克风数组102通常包括两个或两个以上的麦克风。本发明的麦克风不限于动圈式、电容式或其它各种类型的麦克风。本领域普通技术人员可依麦克风数量、各个麦克风的指向性以及会议空间将麦克风数组102设置于适当位置。举例而言，在圆桌会议中可采用具有全指向性声场灵敏度的麦克风数组，并将其设置于圆桌中心位置。The far-end microphone array 102 of the present invention is set at the far end and can be used to receive far-end sound. Generally speaking, the microphone array 102 usually includes two or more microphones. The microphone of the present invention is not limited to dynamic, condenser, or other various types of microphones. Those skilled in the art can set the microphone array 102 at a proper position according to the number of microphones, the directivity of each microphone and the meeting space. For example, in a round table conference, a microphone array with omnidirectional sound field sensitivity can be used and placed at the center of the round table.

本发明的声音辨识模块104不限定设置于远端或近端，只要能通过有线或无线通信方式连接至前述远端麦克风数组102即可。值得注意的是，本发明的重要特征即在于本发明的声音辨识模块104可依据各种既有的声学演算技术，从麦克风数组102所取得的混杂的远端声音中辨识及分离出多个各自不同的音源。举例而言，这些音源即包括各个与会人员的语音，以及各种非语音的杂音。大体来说，声学演算技术主要可分为声音方向辨识技术以及音质辨识技术。声音方向辨识技术可利用麦克风数组102中各麦克风的位置及灵敏度，计算出各个音源的方向及距离(即音源在空间中的位置)；而音质辨识技术则可对各音源的音压、频谱及波形进行分析，借以取得各个音源诸如音量、清晰度、音频及音质(或称音色)等声音特征，甚至从中判断各个音源是否为语音、是否为杂音、对说话者的概略性别及年纪加以估测。更详细地说，由于语音并非持续不断的声音，且其音量及音频皆可能发生变化，因此，在更佳的实施例中，本发明的声音辨识模块104可持续交叉比对一音源在空间中的位置以及其音质，达到追踪锁定该音源的目的。除此之外，在某些实施例中，声音辨识模块104亦可进行一般性的噪声过滤及回声消除的动作。然而，由于前述声音处理技术细节非本发明欲强调的重点，且其可由各种既有技术达成，因此，本文不再加以赘述以节省篇幅。The sound recognition module 104 of the present invention is not limited to be disposed at the far end or near end, as long as it can be connected to the aforementioned far end microphone array 102 through wired or wireless communication. It is worth noting that the important feature of the present invention is that the voice recognition module 104 of the present invention can identify and separate multiple individual voices from the mixed far-end voices obtained by the microphone array 102 according to various existing acoustic calculation techniques. different sound sources. For example, these sound sources include voices of various participants and various non-speech noises. Generally speaking, acoustic calculation technology can be mainly divided into sound direction recognition technology and sound quality recognition technology. The sound direction recognition technology can utilize the position and sensitivity of each microphone in the microphone array 102 to calculate the direction and distance of each sound source (that is, the position of the sound source in space); and the sound quality recognition technology can analyze the sound pressure, frequency spectrum and Analyze the waveform to obtain the sound characteristics of each sound source such as volume, clarity, audio frequency and sound quality (or timbre), and even judge whether each sound source is speech or noise, and estimate the approximate gender and age of the speaker . In more detail, since speech is not a continuous sound, and its volume and audio frequency may change, therefore, in a better embodiment, the sound recognition module 104 of the present invention can continuously cross compare a sound source in space position and its sound quality to achieve the purpose of tracking and locking the sound source. In addition, in some embodiments, the sound recognition module 104 can also perform general noise filtering and echo cancellation operations. However, since the above technical details of the sound processing are not the focus of the present invention and can be achieved by various existing technologies, they are not repeated here to save space.

本发明的近端显示界面106(即屏幕)设置于近端，其耦接至该声音辨识模块104，可用以向近端使用者显示该声音辨识模块104所辨识出的各个音源，甚至，在某些实施例中，显示所述多个音源的各项声音特征。举例而言，在一最简单的实施例中，近端显示界面106仅以文字显示声音辨识模块104所辨识出的远端音源，并分别赋与各个既存的音源如“与会者1”、“与会者2”等名称。每当声音辨识模块104检测到远端有新成员加入时，近端显示界面106即可将其以醒目文字予以标注。在一较佳的实施例中，近端显示界面106可以二维或三维画面模拟远端会议空间，并依照声音辨识模块104所检测到各个音源的所在空间位置的坐标，将其标注在虚拟画面的对应位置之上。其中，各个音源除了有“与会者1”、“与会者2”等名称之外，尚可附注各种声音特征，例如：音量、清晰度、音频、音质、是否为语音、说话者的性别年纪等相关估测信息，本领域普通技术人员可依据本发明的精神自行设计近端显示界面106所显示的信息项目及其显示风格。值得注意的是，本发明的电话会议技术亦可进一步应用在视讯会议中，而近端显示界面106亦可同步显示远端传来的实际画面以代替前述虚拟画面。通过本发明的近端显示界面106，近端使用者可轻易掌握远端的与会情况。The near-end display interface 106 (i.e. screen) of the present invention is arranged at the near-end, and it is coupled to the voice recognition module 104, and can be used to display each sound source identified by the voice recognition module 104 to the near-end user. In some embodiments, various sound characteristics of the plurality of sound sources are displayed. For example, in a simplest embodiment, the near-end display interface 106 only displays the far-end sound sources recognized by the sound recognition module 104 in text, and respectively assigns each existing sound source such as "participant 1", " Participant 2" and so on. Whenever the voice recognition module 104 detects that a new member joins at the far end, the near-end display interface 106 can mark it with striking text. In a preferred embodiment, the near-end display interface 106 can simulate the far-end meeting space in two-dimensional or three-dimensional images, and mark them on the virtual screen according to the coordinates of the spatial positions of the sound sources detected by the voice recognition module 104 above the corresponding position. Among them, in addition to the names of "participant 1" and "participant 2", each sound source can also add various sound characteristics, such as: volume, clarity, audio, sound quality, whether it is voice, the gender and age of the speaker and related estimated information, those skilled in the art can design the information items displayed on the near-end display interface 106 and their display style according to the spirit of the present invention. It is worth noting that the teleconferencing technology of the present invention can also be further applied in video conferencing, and the near-end display interface 106 can also synchronously display the actual image transmitted from the far end instead of the aforementioned virtual image. Through the near-end display interface 106 of the present invention, the near-end user can easily grasp the situation of the far-end meeting.

本发明的近端控制界面108耦接至本发明的声音调整模块110，可用以接收使用者对声音调整模块110的控制，而本发明的声音调整模块110可依据使用者的控制而针对声音辨识模块104所辨识出的各个音源分别调整其声音特征，而声音特征即包括：音量、清晰度、音频和/或音质。举例而言，近端使用者可通过控制声音调整模块110而增加某些远端重要与会人员的音量，或提升其清晰度；同样的，可降低、甚至滤除某些杂音或非与会人员所发出的语音，借此强化会议的通话质量。在某些特殊的实施例中，声音调整模块110甚至可对各个音源进行各种音效处理，包括改变其音频或音质，达到隐匿说话者身份的目的。本发明的声音调整模块110不限于设置在近端或远端，只要能通过有线或无线方式连接至该声音辨识模块104即可。在较佳的实施例中，声音调整模块110与声音辨识模块104可整合于一处理器之中，达到强化声音处理效能的目的。The near-end control interface 108 of the present invention is coupled to the sound adjustment module 110 of the present invention, and can be used to receive the user's control on the sound adjustment module 110, and the sound adjustment module 110 of the present invention can recognize the sound according to the user's control Each sound source identified by the module 104 adjusts its sound characteristics respectively, and the sound characteristics include: volume, clarity, audio frequency and/or sound quality. For example, the near-end user can increase the volume of certain far-end important participants or improve their clarity by controlling the sound adjustment module 110; The voice sent out, thereby enhancing the call quality of the meeting. In some special embodiments, the sound adjustment module 110 can even perform various sound effect processing on each sound source, including changing its audio frequency or sound quality, so as to hide the speaker's identity. The sound adjustment module 110 of the present invention is not limited to be installed at the near end or the far end, as long as it can be connected to the sound recognition module 104 through wired or wireless means. In a preferred embodiment, the sound adjustment module 110 and the sound recognition module 104 can be integrated into a processor to achieve the purpose of enhancing sound processing performance.

最后，本发明的声音播放模块112耦接至近端喇叭，可用以播放前述调整声音特征后的各个音源。本发明的声音播放模块112同样不限于设置在近端或远端，只要能通过有线或无线方式连接至该声音调整模块110即可。在较佳的实施例中，声音播放模块112亦可与声音调整模块110及声音辨识模块104整合于一处理器之中。本领域普通技术人员可了解到，声音辨识模块104、声音调整模块110及声音播放模块112的区别仅为方便说明，任何处理器具有前述模块的功能者皆属于本发明所涵盖的范围之内。Finally, the sound playing module 112 of the present invention is coupled to the near-end speaker, and can be used to play each sound source after the aforementioned sound characteristics are adjusted. The sound playing module 112 of the present invention is also not limited to being located at the near end or the far end, as long as it can be connected to the sound adjustment module 110 in a wired or wireless manner. In a preferred embodiment, the sound playing module 112 can also be integrated with the sound adjustment module 110 and the sound recognition module 104 into one processor. Those skilled in the art can understand that the difference between the voice recognition module 104 , the voice adjustment module 110 and the voice playback module 112 is only for convenience of description, and any processor with the functions of the aforementioned modules falls within the scope of the present invention.

远程电话会议方法Teleconferencing method

除了前述的远程电话会议系统之外，本发明另提供一种远程电话会议方法。图2为依据本发明一实施例的远程电话会议方法流程图。该方法200包括：在步骤S202中，以一远端麦克风数组接收远端声音；在步骤S204中，从远端声音中辨识出多个音源；在步骤S206中，以一近端显示界面显示所辨识出的所述多个音源及其声音特征；在步骤S208中，分别针对各该音源的至少一声音特征进行调整；以及在步骤S210中，播放调整声音特征后的所述多个音源。其中，步骤S204可通过声音方向辨识技术和/或音质辨识技术而从远端声音中辨识出所述多个音源，而这些声音特征即各个音源的方向、距离、音量、清晰度、音频和/或音质。由于本领域普通技术人员可参照前述关于远程电话会议系统的各个实施例中了解本发明的远程电话会议方法，故此处将不再赘述其相关细节以节省篇幅。In addition to the aforementioned teleconferencing system, the present invention further provides a teleconferencing method. FIG. 2 is a flow chart of a teleconferencing method according to an embodiment of the present invention. The method 200 includes: in step S202, receiving far-end sound with a far-end microphone array; in step S204, identifying multiple sound sources from the far-end sound; in step S206, displaying the The identified plurality of sound sources and their sound characteristics; in step S208 , adjusting at least one sound characteristic of each of the sound sources; and in step S210 , playing the plurality of sound sources after adjusting the sound characteristics. Wherein, step S204 can identify the multiple sound sources from the remote sound through the sound direction recognition technology and/or sound quality recognition technology, and these sound features are the direction, distance, volume, clarity, audio and/or sound quality of each sound source. or sound quality. Since those skilled in the art can refer to the aforementioned various embodiments of the teleconferencing system to understand the teleconferencing method of the present invention, relevant details will not be repeated here to save space.

本发明虽以较佳实施例揭示如上，然其并非用以限定本发明的范围，任何本领域普通技术人员，在不脱离本发明的精神和范围内，当可做些许的更动与润饰，因此本发明的保护范围当视所附的权力要求所界定的范围为准。Although the present invention is disclosed above with preferred embodiments, it is not intended to limit the scope of the present invention. Anyone skilled in the art may make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention should be determined by the scope defined by the appended claims.

Claims

1. remote phone conference system comprises:

One voice recognition module, in order to receive a far-end sound that receives from a far-end microphone array, and pick out a plurality of sources of sound in this far-end sound certainly;

One near-end display interface, be coupled to this voice recognition module, in order to the described a plurality of sources of sound that show that this voice recognition module picks out; And

One sound adjusting module, be coupled to this voice recognition module, in order to adjust for a sound characteristic of this source of sound respectively respectively.

2. remote phone conference system as claimed in claim 1 also comprises:

One near-end is controlled interface, is coupled to this sound adjusting module, in order to receive the control of this user to this sound adjusting module.

3. remote phone conference system as claimed in claim 1 also comprises:

One sound broadcasting module, be coupled to this sound adjusting module, in order to the described a plurality of sources of sound after broadcast adjustment sound characteristic.

4. remote phone conference system as claimed in claim 1, wherein this voice recognition module is that one of them picks out described a plurality of source of sound from this far-end sound by audio direction identification technique and tonequality identification technique.

5. remote phone conference system as claimed in claim 1, wherein this near-end display interface is also in order to show the sound characteristic of described a plurality of sources of sound that this voice recognition module picks out.

6. remote phone conference system as claimed in claim 1, the sound characteristic of wherein said a plurality of sources of sound comprises direction and/or the distance of described a plurality of sources of sound.

7. remote phone conference system as claimed in claim 1, the sound characteristic of wherein said a plurality of sources of sound is volumes of described a plurality of sources of sound.

8. remote phone conference system as claimed in claim 1, the sound characteristic of wherein said a plurality of sources of sound comprises the sharpness of described a plurality of sources of sound, audio frequency and/or tonequality.

9. remote phone conference method comprises:

Receive a far-end sound of a far-end microphone array;

Pick out a plurality of sources of sound from this far-end sound; And

Show the described a plurality of sources of sound that picked out with a near-end display interface;

Adjust at least one sound characteristic of this source of sound respectively respectively.

10. remote phone conference method as claimed in claim 9 also comprises:

One of them picks out described a plurality of source of sound from this far-end sound by audio direction identification technique and tonequality identification technique.

11. remote phone conference method as claimed in claim 9, the sound characteristic of wherein said a plurality of sources of sound comprise the direction of described a plurality of sources of sound, distance, volume, sharpness, audio frequency and tonequality one of them.