TWI846005B

TWI846005B - Video conference device and method for adjusting camera directions

Info

Publication number: TWI846005B
Application number: TW111129957A
Authority: TW
Inventors: 潘慶元; 蔡敷恩
Original assignee: 圓展科技股份有限公司
Priority date: 2022-08-10
Filing date: 2022-08-10
Publication date: 2024-06-21
Also published as: TW202407685A

Abstract

A method for adjusting camera directions which is applicable to a video conference device electrically connecting to a speaker is provided. The video conference device includes au audio processor, a microphone array and a camera. The method includes: receiving, by the audio processor, a far-end audio signal from a far end; the audio processor converting the far-end audio signal into a reference audio signal and sending the reference audio signal to the speaker; playing, by the speaker, a first sound according to the reference audio signal; recording, by the microphone array, a near-end audio signal, wherein the near-end signal includes the first sound and a second sound emitted from a near end; performing, by the audio processor, a filtering operation to generate a filtered audio signal according to the reference audio signal and the near-end signal; computing, by the audio processor, an angular control signal according to the filtered audio signal; and adjusting, by the camera, a camera direction to capture the near-end.

Description

Video conferencing device and method for adjusting the shooting direction of a camera

本發明關於視訊會議，特別是一種可追蹤發言者的視訊會議裝置及調整攝影機拍攝方向的方法。The present invention relates to video conferencing, and in particular to a video conferencing device capable of tracking a speaker and a method for adjusting the shooting direction of a camera.

隨著視訊會議的需求提升，視訊會議裝置已搭載具有追蹤音源功能的攝影機，於偵測近端的聲音時，控制攝影機的鏡頭朝向近端的發言者，從而讓位於遠端的會議參加者同時接收到近端的發言者的聲音及影像。As the demand for video conferencing increases, video conferencing devices have been equipped with cameras that have the function of tracking sound sources. When the near-end sound is detected, the camera lens is controlled to face the near-end speaker, so that the conference participants at the far end can receive the near-end speaker's sound and image at the same time.

但是在會議中，常常在近端的發言者說話的同時，揚聲器也正在播放遠端的會議參加者的聲音，這種情況可能導致攝影機朝向揚聲器進行拍攝，而非朝向近端的發言者。然而，若是在揚聲器播放聲音時直接暫停攝影機的追蹤拍攝，仍無法保證攝影機拍攝到此時正在發言的對象，例如當近端有多個發言者時，攝影機鏡頭可能仍停留在前一個發言者身上，而非目前正在發言的人。However, in a meeting, when the speaker at the near end is speaking, the speaker is also playing the voice of the conference participant at the far end. This may cause the camera to shoot towards the speaker instead of the speaker at the near end. However, if the camera tracking is directly paused when the speaker is playing sound, it is still impossible to guarantee that the camera will shoot the subject who is speaking at that time. For example, when there are multiple speakers at the near end, the camera lens may still stay on the previous speaker instead of the person who is currently speaking.

另外，雖然可利用演算法判斷近端麥克風收到的音頻輸入訊號是否存在回聲，並且在偵測到回聲時控制攝影機不進行追蹤拍攝，然而這種方式所用的音頻輸入訊號已包含揚聲器播放的聲音資訊，從訊號本質上就不利於精確地判斷出近端發言者的位置。因此，若回聲的問題沒有被解決，遠端使用者便無法即時看到近端發言者的影像。In addition, although algorithms can be used to determine whether there is an echo in the audio input signal received by the near-end microphone, and to control the camera not to track and shoot when an echo is detected, the audio input signal used in this method already contains the sound information played by the speaker, and the signal nature is not conducive to accurately determining the position of the near-end speaker. Therefore, if the echo problem is not solved, the remote user will not be able to see the image of the near-end speaker in real time.

有鑑於此，本發明提出一種視訊會議裝置及調整攝影機拍攝方向的方法，即使在視訊會議中遠端的會議參加者的說話聲音在近端透過揚聲器播放出來，攝影機仍然可以精確地追蹤近端的發言者。In view of this, the present invention provides a video conference device and a method for adjusting the shooting direction of a camera, so that even if the voice of a remote conference participant in a video conference is played out through a speaker at the near end, the camera can still accurately track the near-end speaker.

依據本發明一實施例的一種調整攝影機拍攝方向的方法，適用於一視訊會議裝置，該視訊會議裝置包括一音訊處理器、一麥克風陣列及一攝影機，該視訊會議裝置電性連接一揚聲器，該方法包括：該音訊處理器接收來自一遠端的一遠端音訊；該音訊處理器轉換該遠端音訊為一參考音訊，並傳送該參考音訊至該揚聲器；該揚聲器依據該參考音訊播放一第一聲音；該麥克風陣列錄製一近端音訊，該近端音訊包含該第一聲音及來自一近端的一第二聲音；該音訊處理器依據該參考音訊及該近端音訊執行一過濾操作以產生一過濾音訊；該音訊處理器依據該過濾音訊計算一角度控制訊號；以及該攝影機依據該角度控制訊號調整拍攝方向以拍攝該近端。A method for adjusting the shooting direction of a camera according to an embodiment of the present invention is applicable to a video conference device, the video conference device includes an audio processor, a microphone array and a camera, the video conference device is electrically connected to a speaker, and the method includes: the audio processor receives a remote audio from a remote end; The audio processor converts the far-end audio into a reference audio and transmits the reference audio to the speaker; the speaker plays a first sound according to the reference audio; the microphone array records a near-end audio, the near-end audio includes the first sound and a second sound from a near-end; the audio processor performs a filtering operation according to the reference audio and the near-end audio to generate a filtered audio; the audio processor calculates an angle control signal according to the filtered audio; and the camera adjusts the shooting direction according to the angle control signal to shoot the near-end.

依據本發明一實施例的一種視訊會議裝置，用以電性連接一揚聲器，其中該揚聲器用以播放一第一聲音，且該視訊會議裝置包括：一音訊處理器，用以接收來自一遠端的遠端音訊，轉換該遠端音訊為一參考音訊，並傳送該參考音訊至該揚聲器，其中該第一聲音關聯於該參考音訊；該音訊處理器依據該參考音訊及一近端音訊執行一過濾操作以產生一過濾音訊，並依據該過濾音訊計算一角度控制訊號；一麥克風陣列，用以錄製該近端音訊，該近端音訊包含該第一聲音及來自一近端的一第二聲音；以及一攝影機，電性連接該音訊處理器，該攝影機依據該角度控制訊號調整拍攝方向以拍攝該近端。According to an embodiment of the present invention, a video conference device is electrically connected to a speaker, wherein the speaker is used to play a first sound, and the video conference device includes: an audio processor, which is used to receive a remote audio signal from a remote end, convert the remote audio signal into a reference audio signal, and transmit the reference audio signal to the speaker, wherein the first sound is related to the reference audio signal; the audio processor is based on The reference audio and a near-end audio perform a filtering operation to generate a filtered audio, and an angle control signal is calculated based on the filtered audio; a microphone array is used to record the near-end audio, the near-end audio includes the first sound and a second sound from a near-end; and a camera is electrically connected to the audio processor, and the camera adjusts the shooting direction according to the angle control signal to shoot the near-end.

綜上所述，本發明提出的視訊會議裝置及調整攝影機拍攝方向的方法，藉由去除麥克風陣列收到的揚聲器訊號，當遠端與近端同時發聲時，不僅可以避免攝影機錯誤地追蹤到發出聲音的揚聲器，而且能夠讓攝影機精確地追蹤近端的發言者。In summary, the video conference device and the method for adjusting the shooting direction of a camera proposed by the present invention can avoid the camera from mistakenly tracking the speaker that makes the sound when the far end and the near end speak at the same time by removing the speaker signal received by the microphone array, and can also enable the camera to accurately track the speaker at the near end.

以上之關於本揭露內容之說明及以下之實施方式之說明係用以示範與解釋本發明之精神與原理，並且提供本發明之專利申請範圍更進一步之解釋。The above description of the disclosed content and the following description of the implementation methods are used to demonstrate and explain the spirit and principle of the present invention, and provide a further explanation of the scope of the patent application of the present invention.

以下在實施方式中詳細敘述本發明之詳細特徵以及特點，其內容足以使任何熟習相關技藝者了解本發明之技術內容並據以實施，且根據本說明書所揭露之內容、申請專利範圍及圖式，任何熟習相關技藝者可輕易地理解本發明相關之構想及特點。以下之實施例係進一步詳細說明本發明之觀點，但非以任何觀點限制本發明之範疇。The detailed features and characteristics of the present invention are described in detail in the following embodiments, and the content is sufficient for any person skilled in the relevant art to understand the technical content of the present invention and implement it accordingly. According to the content disclosed in this specification, the scope of the patent application and the drawings, any person skilled in the relevant art can easily understand the concept and characteristics of the present invention. The following embodiments are to further illustrate the viewpoints of the present invention, but are not to limit the scope of the present invention by any viewpoint.

圖1是本發明一實施例的視訊會議裝置3的應用示意圖，此視訊會議裝置3電性連接處理器1、揚聲器5及麥克風7。需先說明的是，圖1~圖3中的箭頭方向代表該資料傳輸方向。FIG1 is a schematic diagram of an application of a video conference device 3 according to an embodiment of the present invention. The video conference device 3 is electrically connected to a processor 1, a speaker 5 and a microphone 7. It should be noted that the arrow directions in FIG1 to FIG3 represent the data transmission directions.

處理器1例如為個人電腦或智慧型手機，處理器1可透過網路N接收遠端音訊，並將遠端音訊傳送至視訊會議裝置3。遠端音訊例如來自於遠端的視訊會議裝置3或收音裝置。在一實施例中，處理器1透過通用序列匯流排（Universal Serial Bus）電性連接至視訊會議裝置3。在另一實施例中，視訊會議裝置3本身內建通訊模組，因此可直接從網路N接收遠端音訊，而無須透過其他電子裝置。The processor 1 is, for example, a personal computer or a smart phone. The processor 1 can receive remote audio through the network N and transmit the remote audio to the video conference device 3. The remote audio comes from the remote video conference device 3 or a receiver. In one embodiment, the processor 1 is electrically connected to the video conference device 3 via a universal serial bus. In another embodiment, the video conference device 3 has a built-in communication module, so it can directly receive the remote audio from the network N without going through other electronic devices.

視訊會議裝置3電性連接於揚聲器5及麥克風7，並透過揚聲器5播放遠端音訊，以及透過麥克風7錄製近端（即本地端）的聲音。在另一實施例中，可採用內建揚聲器5、麥克風7的揚聲電話（speakerphone）取代圖1所示的揚聲器5及麥克風7，也就是透過一個裝置實現播放及錄音的功能。The video conference device 3 is electrically connected to the speaker 5 and the microphone 7, and plays the remote audio through the speaker 5, and records the near-end (i.e., local) sound through the microphone 7. In another embodiment, a speakerphone with a built-in speaker 5 and microphone 7 can be used to replace the speaker 5 and microphone 7 shown in FIG. 1, that is, the functions of playing and recording are realized through one device.

圖2是基於圖1並進一步展示視訊會議裝置3內部的方塊架構圖，視訊會議裝置3包括音訊處理器32、麥克風陣列34、傳輸介面36及攝影機38。FIG. 2 is a block diagram based on FIG. 1 and further illustrates the internal structure of the video conferencing device 3 . The video conferencing device 3 includes an audio processor 32 , a microphone array 34 , a transmission interface 36 , and a camera 38 .

音訊處理器32電性連接於傳輸介面36、麥克風陣列34及攝影機38，音訊處理器32用以從處理器1接收來自遠端的遠端音訊並執行下列操作：轉換遠端音訊為參考音訊，以及透過傳輸介面36傳送參考音訊至揚聲器5，其中傳輸介面36可採用通道鏈路（Channel link）來傳輸低電壓差動訊號（Low Voltage Differential Signaling，LVDS），但本發明不限於此。The audio processor 32 is electrically connected to the transmission interface 36, the microphone array 34 and the camera 38. The audio processor 32 is used to receive remote audio from the processor 1 and perform the following operations: convert the remote audio into a reference audio, and transmit the reference audio to the speaker 5 through the transmission interface 36. The transmission interface 36 can use a channel link to transmit a low voltage differential signal (LVDS), but the present invention is not limited to this.

揚聲器5依據參考音訊進行播放而產生第一聲音，換言之，揚聲器5播放來自遠端的遠端音訊。麥克風陣列34錄製近端音訊，此近端音訊包含揚聲器5播放的第一聲音（即上述遠端音訊）及來自近端的第二聲音，此處的「近端」定義為視訊會議裝置3所處的週邊位置，而第二聲音例如是近端的使用者發出的語音。在一實施例中，麥克風陣列34具有至少二個麥克風341及343，每一個麥克風341或343各自錄製一聲音分量，而第二聲音係由多個聲音分量組成。The speaker 5 plays the reference audio to generate the first sound. In other words, the speaker 5 plays the remote audio from the remote end. The microphone array 34 records the near-end audio, which includes the first sound played by the speaker 5 (i.e., the far-end audio mentioned above) and the second sound from the near-end. The "near-end" here is defined as the near-end audio played by the video conference device 3. The second sound is, for example, the voice of a user at the near end. In one embodiment, the microphone array 34 has at least two microphones 341 and 343, each microphone 341 or 343 records a sound component, and the second sound is composed of multiple sound components.

如圖2所示，傳輸介面除連接揚聲器5外，更用於電性連接麥克風7（非麥克風陣列34的麥克風341及343）。此麥克風7用以錄製另一近端音訊，另一近端音訊同樣包含揚聲器5播放的第一聲音及來自近端的第二聲音。考慮到近端的發言者可能從靠近麥克風陣列34的位置移動到靠近揚聲器5的位置，或者是近端的多個使用者中的一者原本就靠近揚聲器5所在的位置，因此僅依靠麥克風陣列34可能無法清楚地錄製到靠近揚聲器5的發言者的語音，透過麥克風7錄製的另一近端音訊可針對上述情境予以補償。As shown in FIG. 2 , in addition to connecting the speaker 5, the transmission interface is also used to electrically connect the microphone 7 (not the microphones 341 and 343 of the microphone array 34). The microphone 7 is used to record another near-end audio signal, which also includes the first sound played by the speaker 5 and the second sound from the near end. Considering that the near-end speaker may move from a position close to the microphone array 34 to a position close to the speaker 5, or one of the multiple near-end users is originally close to the position where the speaker 5 is located, the microphone array 34 alone may not be able to clearly record the voice of the speaker close to the speaker 5. The other near-end audio signal recorded by the microphone 7 can compensate for the above situation.

在揚聲器5播放第一聲音，麥克風陣列34錄製近端音訊，且麥克風7錄製另一近端音訊時，音訊處理器32可即時地執行下列操作：依據參考音訊及近端音訊執行過濾操作以產生過濾音訊，依據過濾音訊計算角度控制訊號，以及將角度控制訊號傳送至攝影機38。When the speaker 5 plays the first sound, the microphone array 34 records the near-end audio, and the microphone 7 records another near-end audio, the audio processor 32 can perform the following operations in real time: perform a filtering operation based on the reference audio and the near-end audio to generate a filtered audio, calculate an angle control signal based on the filtered audio, and transmit the angle control signal to the camera 38.

攝影機38可依據角度控制訊號調整拍攝方向以拍攝近端。在一實施例中，攝影機38包括攝像鏡頭及馬達模組，馬達模組依據角度控制訊號調整攝像鏡頭的拍攝角度。在另一實施例中，攝影機38例如為全方位迴轉變焦（PTZ）攝影機，其鏡頭可以進行左右轉動（Pan）、上下傾斜（Tilt）及放大（Zoom-in）等操作，本發明所述的角度控制訊號可對應於上述操作中的至少一者。The camera 38 can adjust the shooting direction according to the angle control signal to shoot the near end. In one embodiment, the camera 38 includes a camera lens and a motor module, and the motor module adjusts the shooting angle of the camera lens according to the angle control signal. In another embodiment, the camera 38 is, for example, a PTZ camera, and its lens can perform operations such as panning, tilting, and zooming in. The angle control signal described in the present invention can correspond to at least one of the above operations.

圖3是基於圖2並進一步展示音訊處理器32內部的方塊架構圖，音訊處理器32包括轉換電路321、適應性濾波器323、角度計算電路325、聲音強化電路327及混音器329。FIG3 is a block diagram based on FIG2 and further illustrates the internal structure of the audio processor 32. The audio processor 32 includes a conversion circuit 321, an adaptive filter 323, an angle calculation circuit 325, a sound enhancement circuit 327, and a mixer 329.

轉換電路321電性連接處理器1、傳輸介面36、適應性濾波器323及混音器329。轉換電路321例如採用USB聲音類別（USB Audio Class，UAC）協議，並據以將遠端音訊轉換為參考音訊。舉例來說，將遠端音訊進行立體聲轉成單聲道或/及重採樣，比如將48kHz 立體聲轉換成32kHz 單聲道後，將轉換後的32kHz 單聲道做為參考音訊。The conversion circuit 321 is electrically connected to the processor 1, the transmission interface 36, the adaptive filter 323 and the mixer 329. The conversion circuit 321 adopts, for example, the USB Audio Class (UAC) protocol and converts the remote audio into a reference audio accordingly. For example, the remote audio is converted from stereo to mono or/and resampled, such as converting 48kHz stereo to 32kHz mono, and the converted 32kHz mono is used as the reference audio.

適應性濾波器323（adaptive filter）電性連接於轉換電路321、麥克風陣列34、聲音強化電路327及角度計算電路325。適應性濾波器323依據參考音訊及近端音訊執行過濾操作以產生過濾音訊，所述過濾操作包括：依據參考音訊及適應性濾波器323係數執行卷積（convolution）運算以產生反向訊號，並整合近端音訊及反向訊號以產生過濾音訊。適應性濾波器323更依據該過濾音訊更新適應性濾波器323係數。換言之，在揚聲器5即將播放參考音訊之前，適應性濾波器323先取得要播放的參考音訊以及麥克風陣列34錄製的近端音訊，藉此預估麥克風陣列34即將錄製到的新的近端音訊的特性。適應性濾波器323採用多個線性函數，每個函數具有至少一係數，適應性濾波器323例如採用歸一化最小均方濾波器Normalized least mean squares filter (NLMS)演算法修正這些線性函數的係數，藉此反映揚聲器5播放參考音訊時線性的頻率響應。適應性濾波器323進一步計算揚聲器5播放參考音訊時的反向訊號，並整合麥克風陣列34錄製的近端音訊與反向訊號，藉此從近端音訊中濾除屬於遠端音訊的成份。The adaptive filter 323 is electrically connected to the conversion circuit 321, the microphone array 34, the sound enhancement circuit 327 and the angle calculation circuit 325. The adaptive filter 323 performs a filtering operation based on the reference audio and the near-end audio to generate a filtered audio. The filtering operation includes: performing a convolution operation based on the reference audio and the coefficient of the adaptive filter 323 to generate a reverse signal, and integrating the near-end audio and the reverse signal to generate the filtered audio. The adaptive filter 323 further updates the coefficient of the adaptive filter 323 based on the filtered audio. In other words, before the speaker 5 is about to play the reference audio, the adaptive filter 323 first obtains the reference audio to be played and the near-end audio recorded by the microphone array 34, thereby estimating the characteristics of the new near-end audio to be recorded by the microphone array 34. The adaptive filter 323 uses a plurality of linear functions, each of which has at least one coefficient. The adaptive filter 323 uses, for example, a Normalized least mean squares filter (NLMS) algorithm to modify the coefficients of these linear functions, thereby reflecting the linear frequency response when the speaker 5 plays the reference audio. The adaptive filter 323 further calculates the reverse signal when the speaker 5 plays the reference audio, and integrates the near-end audio recorded by the microphone array 34 with the reverse signal, thereby filtering out the components belonging to the far-end audio from the near-end audio.

聲音強化電路327電性連接適應性濾波器323及混音器329，聲音強化電路327對過濾音訊執行以下操作中的至少一者：波束成型（Beamforming）、降噪、殘餘回聲（residual echo）抑制及自動增益。The sound enhancement circuit 327 is electrically connected to the adaptive filter 323 and the mixer 329. The sound enhancement circuit 327 performs at least one of the following operations on the filtered audio: beamforming, noise reduction, residual echo suppression, and automatic gain.

混音器329電性連接於轉換電路321、傳輸介面36及聲音強化電路327，在過濾音訊經聲音強化電路327進行強化處理後，混音器329依據強化處理後的過濾音訊及另一近端音訊執行混音操作以產生回傳音訊，此回傳音訊將由轉換電路321回傳至處理器1，然後再被處理器1透過網路N回傳至遠端。混音器329將聲音強化電路327的輸出訊號乘以一權重，並將麥克風7產生的另一近端音訊乘以另一權重，然後將兩者加總以實現上述混音操作。在一實施例中，所述的兩個權重可依據輸入訊號及另一近端音訊各自的振幅進行調整。在另一實施例中，由於人聲屬於寬頻帶，因此只在輸入訊號及另一近端音訊各自的指定頻段乘以各自的權重。The mixer 329 is electrically connected to the conversion circuit 321, the transmission interface 36 and the sound enhancement circuit 327. After the filtered audio is enhanced by the sound enhancement circuit 327, the mixer 329 performs a mixing operation based on the enhanced filtered audio and another near-end audio to generate a return audio. The return audio will be returned from the conversion circuit 321 to the processor 1, and then returned to the far end by the processor 1 through the network N. The mixer 329 multiplies the output signal of the sound enhancement circuit 327 by a weight, and multiplies the other near-end audio generated by the microphone 7 by another weight, and then adds the two together to achieve the above mixing operation. In one embodiment, the two weights can be adjusted according to the amplitude of the input signal and the other near-end audio. In another embodiment, since human voice belongs to wideband, the input signal and the other near-end audio are multiplied by their respective weights only in their respective designated frequency bands.

角度計算電路325電性連接適應性濾波器323及攝影機38，角度計算電路325採用到達時間差（Time Difference of Arrival，TDOA）技術依據過濾音訊（去除揚聲器5聲音的麥克風陣列34信號）進行運算以產生角度控制訊號。請參考圖3，The angle calculation circuit 325 is electrically connected to the adaptive filter 323 and the camera 38. The angle calculation circuit 325 uses the Time Difference of Arrival (TDOA) technology to generate an angle control signal based on the filtered audio (the microphone array 34 signal that removes the sound of the speaker 5). Please refer to FIG. 3.

麥克風陣列34包括多個麥克風（例如麥克風341及343），且過濾音訊包含多個過濾音訊分量，這些過濾音訊分量分別對應於所述多個麥克風，因此角度計算電路325係依據該些過濾音訊分量中的二者計算一時間差以產生該角度控制訊號。在一實施例中，可配置至少四個麥克風的麥克風陣列34，以便於角度計算電路325產生三個維度的角度控制訊號。The microphone array 34 includes a plurality of microphones (e.g., microphones 341 and 343), and the filtered audio includes a plurality of filtered audio components, which correspond to the plurality of microphones, respectively. Therefore, the angle calculation circuit 325 calculates a time difference according to two of the filtered audio components to generate the angle control signal. In one embodiment, the microphone array 34 may be configured with at least four microphones, so that the angle calculation circuit 325 generates a three-dimensional angle control signal.

圖4是本發明一實施例的調整攝影機38追蹤方向的方法的流程圖。如圖4所示，步驟S1為音訊處理器32從處理器1接收來自遠端的遠端音訊，步驟S2 為音訊處理器32中的轉換電路321轉換遠端音訊為參考音訊，並傳送參考音訊至揚聲器5，步驟S3 為揚聲器5依據參考音訊播放第一聲音；步驟S4為麥克風陣列34錄製近端音訊，且麥克風7錄製另一近端音訊；步驟S5為音訊處理器32中的適應性濾波器323依據參考音訊及近端音訊執行過濾操作以產生過濾音訊；步驟S6為音訊處理器32中的角度計算電路325依據該過濾音訊計算角度控制訊號；步驟S7 為攝影機38依據角度控制訊號調整拍攝方向以拍攝近端。FIG4 is a flow chart of a method for adjusting the tracking direction of a camera 38 according to an embodiment of the present invention. As shown in FIG4, step S1 is that the audio processor 32 receives the remote audio from the processor 1, step S2 is that the conversion circuit 321 in the audio processor 32 converts the remote audio into a reference audio and transmits the reference audio to the speaker 5, step S3 is The speaker 5 plays the first sound according to the reference audio signal; step S4 is for the microphone array 34 to record the near-end audio signal, and the microphone 7 to record another near-end audio signal; step S5 is for the adaptive filter 323 in the audio processor 32 to perform a filtering operation according to the reference audio signal and the near-end audio signal to generate a filtered audio signal; step S6 is for the angle calculation circuit 325 in the audio processor 32 to calculate the angle control signal according to the filtered audio signal; step S7 is for the camera 38 to adjust the shooting direction according to the angle control signal to shoot the near-end.

圖5是圖4的步驟S5的細部流程圖：步驟S51為適應性濾波器323依據參考音訊及適應性濾波器323係數執行卷積運算以產生反向訊號；步驟S52為適應性濾波器323整合近端音訊及反向訊號以產生過濾音訊；步驟S53為適應性濾波器323依據過濾音訊更新適應性濾波器323係數。FIG5 is a detailed flow chart of step S5 of FIG4 : step S51 is the adaptive filter 323 performing a convolution operation based on the reference audio and the coefficient of the adaptive filter 323 to generate a reverse signal; step S52 is the adaptive filter 323 integrating the near-end audio and the reverse signal to generate a filtered audio; step S53 is the adaptive filter 323 updating the coefficient of the adaptive filter 323 based on the filtered audio.

圖6是本發明另一實施例的調整攝影機38追蹤方向的方法的流程圖，其中步驟S1~S可參考圖4。步驟S8為音訊處理器32中的聲音強化電路327依據過濾音訊執行強化處理以產生強化音訊；步驟S9為音訊處理器32中的混音器329依據強化音訊及步驟S4中的另一近端音訊執行混音操作以產生回傳音訊；步驟S10為音訊處理器32中的轉換電路321將回傳音訊傳送至處理器1。FIG6 is a flow chart of a method for adjusting the tracking direction of a camera 38 according to another embodiment of the present invention, wherein steps S1 to S1 can refer to FIG4. Step S8 is for the sound enhancement circuit 327 in the audio processor 32 to perform enhancement processing according to the filtered audio to generate enhanced audio; step S9 is for the mixer 329 in the audio processor 32 to perform mixing operation according to the enhanced audio and another near-end audio in step S4 to generate a return audio; step S10 is for the conversion circuit 321 in the audio processor 32 to transmit the return audio to the processor 1.

請注意，倘若能實質達到相同功效，所述方法不一定必須完全按照圖4至圖6中步驟的順序進行，且其他輔助性的步驟亦可插入其中。Please note that if the same effect can be achieved, the method does not necessarily have to be performed in the order of the steps in Figures 4 to 6, and other auxiliary steps may also be inserted therein.

綜上所述，本發明提出的視訊會議裝置及調整攝影機拍攝方向的方法，藉由去除麥克風陣列收到的揚聲器訊號，當遠端使用者與近端使用者同時發聲時，不僅可以避免攝影機錯誤地追蹤到發出聲音的揚聲器，而且能夠讓攝影機精確地追蹤近端的發言者。此外，由於本發明的適應性濾波器已濾除近端音訊中屬於遠端音訊的成份，這使得遠端使用者可以收到零回聲的音訊或僅具有較小回聲的音訊，大幅提昇通訊上的體驗。In summary, the video conference device and the method for adjusting the shooting direction of a camera proposed by the present invention can not only avoid the camera from mistakenly tracking the speaker that makes the sound when the far-end user and the near-end user speak at the same time by removing the speaker signal received by the microphone array, but also enable the camera to accurately track the near-end speaker. In addition, because the adaptive filter of the present invention has filtered out the components of the far-end audio in the near-end audio, the far-end user can receive zero-echo audio or audio with only a small echo, which greatly improves the communication experience.

雖然本發明以前述之實施例揭露如上，然其並非用以限定本發明。在不脫離本發明之精神和範圍內，所為之更動與潤飾，均屬本發明之專利保護範圍。關於本發明所界定之保護範圍請參考所附之申請專利範圍。Although the present invention is disclosed as above with the aforementioned embodiments, it is not intended to limit the present invention. Any changes and modifications made within the spirit and scope of the present invention are within the scope of patent protection of the present invention. Please refer to the attached patent application for the scope of protection defined by the present invention.

1:處理器 3:視訊會議裝置 5:揚聲器 7:麥克風 32:音訊處理器 34:麥克風陣列 341、343:麥克風 36:傳輸介面 321:轉換電路 323:適應性濾波器 325:角度計算電路 327:聲音強化電路 329:混音器 N:網路 S1~S10、S51~S53:步驟 1: Processor 3: Video conferencing device 5: Speaker 7: Microphone 32: Audio processor 34: Microphone array 341, 343: Microphone 36: Transmission interface 321: Conversion circuit 323: Adaptive filter 325: Angle calculation circuit 327: Sound enhancement circuit 329: Mixer N: Network S1~S10, S51~S53: Steps

圖1是本發明一實施例的視訊會議裝置的應用示意圖；圖2是本發明一實施例的視訊會議裝置的方塊架構圖；圖3是本發明一實施例的音訊處理器的方塊架構圖；圖4是本發明一實施例的調整攝影機追蹤方向的方法的流程圖；圖5是圖4的步驟S5的細部流程圖；以及圖6是本發明另一實施例的調整攝影機追蹤方向的方法的流程圖。 FIG1 is an application schematic diagram of a video conferencing device according to an embodiment of the present invention; FIG2 is a block diagram of a video conferencing device according to an embodiment of the present invention; FIG3 is a block diagram of an audio processor according to an embodiment of the present invention; FIG4 is a flow chart of a method for adjusting the camera tracking direction according to an embodiment of the present invention; FIG5 is a detailed flow chart of step S5 of FIG4; and FIG6 is a flow chart of a method for adjusting the camera tracking direction according to another embodiment of the present invention.

1:處理器 1: Processor

3:視訊會議裝置 3: Video conferencing equipment

5:揚聲器 5: Speaker

7:麥克風 7: Microphone

32:音訊處理器 32: Audio processor

34:麥克風陣列 34: Microphone array

36:傳輸介面 36: Transmission interface

341、343:麥克風 341, 343: Microphone

N:網路 N: Network

Claims

A method for adjusting the shooting direction of a camera is applicable to a video conference device, the video conference device includes an audio processor, a microphone array and a camera, the video conference device is electrically connected to a speaker, the method includes: the audio processor receives a remote audio signal from a remote end; the audio processor converts the remote audio signal into a reference audio signal and transmits the reference audio signal to the speaker; the speaker plays a first sound according to the reference audio signal; the microphone array records a near-end audio signal, the near-end audio signal includes the first sound and the near-end audio signal from the remote end; A second sound from a near end; the audio processor performs a filtering operation based on the reference audio and the near end audio to generate a filtered audio, wherein the audio processor includes an adaptive filter that performs an operation based on the reference audio and an adaptive filter coefficient to generate a reverse signal, and integrates the near end audio and the reverse signal to generate the filtered audio; the audio processor further includes an angle calculation circuit that calculates an angle control signal based on the filtered audio; and the camera adjusts the shooting direction based on the angle control signal to shoot the near end.

The method of claim 1, wherein the step of the audio processor performing the filtering operation based on the reference audio and the near-end audio to generate the filtered audio includes: the audio processor performs a convolution operation based on the reference audio and the adaptive filter coefficient to generate the reverse signal; the audio processor integrates the near-end audio and the reverse signal to generate the filtered audio; and the audio processor updates the adaptive filter coefficient based on the filtered audio.

The method of claim 1, wherein the video conferencing device is further used to electrically connect to another microphone, and the method further includes: the other microphone records another near-end audio, the other near-end audio includes the first sound and the second sound from the near-end; the audio processor performs a mixing operation based on the filtered audio and the other near-end audio to generate a return audio; and the audio processor transmits the return audio to the far-end.

As described in claim 3, before the audio processor performs the mixing operation based on the filtered audio and the other near-end audio to generate the return audio, the audio processor further includes performing at least one of the following operations on the filtered audio: beamforming, noise reduction, residual echo suppression, and automatic gain.

As described in claim 1, the microphone array includes a plurality of microphones, the filtered audio includes a plurality of filtered audio components respectively corresponding to the microphones, and the audio processor calculates the angle control signal according to the filtered audio, including: the audio processor calculates a time difference according to two of the filtered audio components to generate the angle control signal.

A video conference device is electrically connected to a speaker, wherein the speaker is used to play a first sound, and the video conference device includes: an audio processor, which is used to receive a remote audio signal from a remote end, convert the remote audio signal into a reference audio signal, and transmit the reference audio signal to the speaker, wherein the first sound is related to the reference audio signal; the audio processor includes: an adaptive filter, which performs a filtering operation based on the reference audio signal and a near-end audio signal to generate a filtered audio signal, wherein the filtering operation A reverse signal is generated by performing calculations based on the reference audio and an adaptive filter coefficient, and the near-end audio and the reverse signal are integrated to generate the filtered audio; and an angle calculation circuit is used to calculate an angle control signal based on the filtered audio; a microphone array is used to record the near-end audio, and the near-end audio includes the first sound and a second sound from a near-end; and a camera is electrically connected to the audio processor, and the camera adjusts the shooting direction according to the angle control signal to shoot the near-end.

The video conferencing device as described in claim 6, wherein the filtering operation is that the audio processor performs a convolution operation based on the reference audio and the adaptive filter coefficient to generate the reverse signal, the audio processor integrates the near-end audio and the reverse signal to generate the filtered audio; and the audio processor updates the adaptive filter coefficient based on the filtered audio.

The video conferencing device as described in claim 6 is further used to electrically connect to another microphone, the other microphone is used to record another near-end audio, and the other near-end audio includes the first sound and the second sound from the near-end; the audio processor performs a mixing operation based on the filtered audio and the other near-end audio to generate a return audio, and the audio processor transmits the return audio to the far-end.

The video conferencing device as described in claim 6, wherein the audio processor performs at least one of the following operations on the filtered audio before performing the mixing operation based on the filtered audio and the other near-end audio to generate the return audio: beamforming, noise reduction, residual echo suppression, and automatic gain.

The video conferencing device as described in claim 6, wherein the microphone array includes a plurality of microphones, the filtered audio includes a plurality of filtered audio components respectively corresponding to the microphones, and the audio processor calculates a time difference based on two of the filtered audio components to generate the angle control signal.