JP7316974B2

JP7316974B2 - SOUND COLLECTION DEVICE, SYSTEM, PROGRAM AND METHOD THAT TRANSMITS ENVIRONMENTAL SOUND IN WHICH SPECIAL SOUND SIGNAL IS SUPPRESSED

Info

Publication number: JP7316974B2
Application number: JP2020065561A
Authority: JP
Inventors: 正樹内藤; 俊治堀内
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2023-07-28
Anticipated expiration: 2040-04-01
Also published as: JP2021162742A

Description

本発明は、ネットワークを介したテレプレゼンスシステムの技術に関する。 The present invention relates to technology of a telepresence system via a network.

テレビ会議システムの場合、自発的にネットワークを接続しない限り、相手方の状況を共有することはできない。そのために、会社の社員同士であっても、例えば在宅やシェアオフィスに滞在している社員は、孤立した職場環境に置かれ、疎外感を抱く場合もある（例えば非特許文献１参照）。 In the case of a teleconferencing system, the status of the other party cannot be shared unless the network is voluntarily connected. Therefore, even among company employees, for example, employees staying at home or in a shared office may be placed in an isolated work environment and feel alienated (see, for example, Non-Patent Document 1).

近年、同じ会社内であっても、複数の拠点間で、映像及び音声を常時流し続けるテレプレゼンスシステムが利用されるようになってきている。これは、テレビ会議システムであるが、会議中にのみ接続するものではなく、就業時間中に常時接続されている。このシステムによれば、遠隔の異なる拠点に滞在する社員同士であっても、互いの状況を共有しながら、あたかも同じ居所で仕事をしているような環境を提供することができる。テレプレゼンスシステムは、国内及び海外における会社の拠点間のみではなく、会社と在宅又はシェアオフィスとの間でも、ネットワークを介して手軽に接続することができる。
また、会社に限らず、遠隔に居住する親子の家族間でも利用することができる。 In recent years, even within the same company, telepresence systems have come to be used that constantly transmit video and audio between a plurality of bases. It's a videoconferencing system, but it's not only connected during meetings, it's always connected during working hours. According to this system, it is possible to provide an environment in which even employees staying at different remote bases share information about each other's situation and work as if they were working in the same place. A telepresence system can be easily connected via a network not only between domestic and overseas company bases, but also between a company and a home or shared office.
In addition, it can be used not only in companies but also between parents and children who live remotely.

従来、例えば代表的なテレビ電話のSkype（登録商標）によれば、遠隔拠点のメンバの動向を共有するために、「在籍／離席」の状態を、相手側の端末のディスプレイに表示することができる。
また、作業中にディスプレイを見ていなくても、遠隔拠点のメンバの状況を知るために、その相手方の状況を合成音で伝える技術もある（例えば非特許文献４参照）。
更に、テレプレゼンスシステムを介して、相手方周辺の環境音や画像を常時送信することよって、互いの状況を共有する技術もある（例えば非特許文献２、３参照）。
更に、遠隔拠点間で互いに多様な環境音を認識し合う環境音認識装置の技術もある（例えば特許文献１参照）。 Conventionally, for example, according to Skype (registered trademark), a typical videophone, the status of "presence/absence" is displayed on the display of the other party's terminal in order to share the movement of members at remote sites. can be done.
In addition, there is also a technique for conveying the situation of a member at a remote site by synthesized sound in order to know the situation of a remote site member without looking at the display during work (see, for example, Non-Patent Document 4).
Furthermore, there is also a technique for sharing the situation with each other by constantly transmitting environmental sounds and images around the other party via a telepresence system (for example, see Non-Patent Documents 2 and 3).
Furthermore, there is also a technology of an environmental sound recognition device in which various environmental sounds are mutually recognized between remote bases (see, for example, Patent Document 1).

特許第６０８５５３８号公報Japanese Patent No. 6085538

総務省編、「テレワークの動向と生産性に関する調査研究報告書，総務省情報通信国際戦略局(2010)」、[online]、［令和２年３月１０日検索］、インターネット＜URL:https://www.soumu.go.jp/johotsusintokei/linkdata/h22_06_houkoku.pdf＞Ministry of Internal Affairs and Communications, "Survey Research Report on Telework Trends and Productivity, Ministry of Internal Affairs and Communications International Strategy Bureau (2010)", [online], [searched March 10, 2020], Internet <URL: https ://www.soumu.go.jp/johotsusintokei/linkdata/h22_06_houkoku.pdf＞ Telepresence: Integrating shared task and person spaces, W Buxton - Proceedings of graphics interface, 1992、[online]、［令和２年３月１０日検索］、インターネット＜URL:https://www.billbuxton.com/TelepShrdSpce.pdf＞Telepresence: Integrating shared task and person spaces, W Buxton - Proceedings of graphics interface, 1992, [online], [searched on March 10, 2020], Internet < URL: https://www.billbuxton.com/TelepShrdSpce .pdf＞日本人間工学会大会講演集 406-407, 2009：テレワーク向け常時接続型音声会議システムProceedings of the Annual Meeting of the Japan Ergonomics Society 406-407, 2009: Always-connected voice conference system for telework HRI 2018: Fribo: A Social Networking Robot for Increasing Social Connectedness through Sharing Daily Home Activities from Living Noise Data.、[online]、［令和２年３月１０日検索］、インターネット＜URL:https://yonsei.pure.elsevier.com/en/publications/fribo-a-social-networking-robot-for-increasing-social-connectedne＞HRI 2018: Fribo: A Social Networking Robot for Increasing Social Connectedness through Sharing Daily Home Activities from Living Noise Data., [online], [searched on March 10, 2020], Internet <URL: https://yonsei. pure.elsevier.com/en/publications/fribo-a-social-networking-robot-for-increasing-social-connectedne > 小野一穂、「マルチチャネルオーディオ」、[online]、［令和２年３月１０日検索］、インターネット＜URL:https://www.jstage.jst.go.jp/article/itej/68/8/68_604/_pdf/-char/ja＞Kazuho Ono, "Multichannel Audio", [online], [Searched on March 10, 2020], Internet <URL: https://www.jstage.jst.go.jp/article/itej/68/8 /68_604/_pdf/-char/en＞

しかしながら、既存のテレプレゼンスシステムやテレビ会議システムによれば、拠点毎に発生する全ての音声が相手方へ伝わる。そのために、相手方にとっては、耳障りな雑音も伝わり、喧しく感じる場合がある。また、プライバシの問題となる音声が伝わる場合もある。 However, according to existing telepresence systems and video conference systems, all voices generated at each site are transmitted to the other party. For this reason, the other party may feel noisy because of the harsh noise being transmitted. In addition, there are cases in which voice, which poses a problem of privacy, is transmitted.

そこで、本発明は、拠点内で発生する音声の中で、相手方へ伝える必要が無い特定の音響信号を抑圧した環境音を送信する収音装置、システム、プログラム及び方法を提供することを目的とする。 Therefore, an object of the present invention is to provide a sound collecting device, a system, a program, and a method for transmitting environmental sound by suppressing a specific sound signal that does not need to be transmitted to the other party among sounds generated in a base. do.

本発明によれば、マイクロフォンによって収音した環境音信号を、スピーカによって再生する再生装置へ送信する収音装置において、
音響タグが紐付けられた音響オブジェクトを蓄積する第１の音響データベースと、
除去すべき音響タグを登録する除去音響タグテーブルと、
第１の音響データベースを用いて、環境音信号に内在する１つ以上の音響オブジェクトを検出し、当該音響オブジェクトの音響タグを特定する音響オブジェクト検出エンジンと、
特定された各音響タグが、除去音響タグテーブルに登録されたものである場合、環境音信号から、当該音響タグに紐付く音響オブジェクトの音響信号部分を抑圧する音響オブジェクト抑圧手段と、
音響オブジェクトを除去した環境音信号を、再生装置へ送信する環境音送信手段と、
除去した音響オブジェクトに紐付く音響タグを、再生装置へ送信する音響タグ送信手段と
を有することを特徴とする。 According to the present invention, in a sound collecting device that transmits an environmental sound signal collected by a microphone to a reproducing device that reproduces the sound using a speaker,
a first acoustic database storing acoustic objects associated with acoustic tags;
a removal acoustic tag table for registering acoustic tags to be removed;
an acoustic object detection engine that uses the first acoustic database to detect one or more acoustic objects inherent in the ambient sound signal and identifies acoustic tags for the acoustic objects;
acoustic object suppressing means for suppressing, from an environmental sound signal, an acoustic signal portion of an acoustic object associated with the acoustic tag when each identified acoustic tag is registered in a removal acoustic tag table;
environmental sound transmitting means for transmitting the environmental sound signal from which the acoustic object has been removed to the reproducing device;
and acoustic tag transmitting means for transmitting the acoustic tag associated with the removed acoustic object to the reproducing device.

本発明の収音装置における他の実施形態によれば、
環境センサに接続されており、
環境センサは、音響タグに紐付いており、
音響オブジェクト抑圧手段は、環境センサから所定信号を受信した際に、環境音信号から、当該環境センサの音響タグに紐付く音響オブジェクトの音響信号部分を除去する
ことも好ましい。 According to another embodiment of the sound collecting device of the present invention,
connected to environmental sensors,
Environmental sensors are tied to acoustic tags,
It is also preferable that, when receiving a predetermined signal from the environment sensor, the acoustic object suppressing means removes, from the environmental sound signal, an acoustic signal portion of the acoustic object linked to the acoustic tag of the environmental sensor.

本発明の収音装置における他の実施形態によれば、
カメラに接続されており、
音響タグが紐付けられた画像オブジェクトを蓄積する画像データベースと、
画像データベースを用いて、カメラによって撮影された映像に内在する１つ以上の画像オブジェクトを検出し、当該画像オブジェクトの音響タグを特定する画像オブジェクト検出エンジンと
を更に有し、
音響オブジェクト抑圧手段は、環境音信号から、画像オブジェクト検出エンジンによって特定された音響タグに紐付く音響オブジェクトの音響信号部分を除去する
ことも好ましい。 According to another embodiment of the sound collecting device of the present invention,
connected to the camera,
an image database for accumulating image objects with associated acoustic tags;
an image object detection engine that uses the image database to detect one or more image objects inherent in the video captured by the camera and identifies acoustic tags for the image objects;
It is also preferable that the acoustic object suppressing means remove, from the environmental sound signal, the acoustic signal portion of the acoustic object linked to the acoustic tag identified by the image object detection engine.

本発明によれば、前述した収音装置と、当該収音装置から受信した環境音信号を再生する再生装置とを有するシステムにおいて、
再生装置は、
音響タグが紐付けられた音響オブジェクトを蓄積する第２の音響データベースと、
第２の音響データベースを用いて、音響タグに紐付く音響オブジェクトを、環境音信号に混合する音響オブジェクト混合手段と、
を有し、音響オブジェクトを混合した環境音信号をスピーカによって再生することを特徴とする。 According to the present invention, in a system having the above-described sound collecting device and a reproducing device for reproducing an environmental sound signal received from the sound collecting device,
The playback device
a second acoustic database that accumulates acoustic objects associated with acoustic tags;
an acoustic object mixing means for mixing an acoustic object associated with the acoustic tag with an environmental sound signal using a second acoustic database;
and reproduces an environmental sound signal mixed with an acoustic object by a speaker.

本発明のシステムにおける他の実施形態によれば、
再生装置の第２の音響データベースに蓄積された音響タグ及び音響オブジェクトは、収音装置の第１の音響データベースに蓄積された音響タグ及び音響オブジェクトの一部又は全部であり、
再生装置の第２の音響データベースに蓄積された音響タグと、収音装置の第１の音響データベースに蓄積された音響タグとが同一であっても、異なる音響信号に基づく音響オブジェクトである
ことも好ましい。 According to another embodiment of the system of the invention,
the acoustic tags and acoustic objects stored in the second acoustic database of the playback device are part or all of the acoustic tags and acoustic objects stored in the first acoustic database of the sound collection device;
Even if the acoustic tags stored in the second acoustic database of the playback device and the acoustic tags stored in the first acoustic database of the sound collection device are the same, they may be acoustic objects based on different acoustic signals. preferable.

本発明のシステムにおける他の実施形態によれば、
複数の収音装置と、１つの再生装置とがネットワークを介して接続されており、
収音装置毎に異なる拠点に配置され、再生装置は、異なる拠点の環境音信号を同時に再生する
ことも好ましい。 According to another embodiment of the system of the invention,
A plurality of sound collecting devices and one playback device are connected via a network,
It is also preferable that each sound collecting device is arranged at a different base, and the reproducing device simultaneously reproduces the environmental sound signals of the different bases.

本発明のシステムにおける他の実施形態によれば、
再生装置は、複数の収音装置それぞれから受信した環境音信号を、収音装置毎に異なる到来方向から当該環境音信号が再生されるように複数のスピーカから出力する音響信号を制御する
ことも好ましい。 According to another embodiment of the system of the invention,
The playback device may control acoustic signals output from the plurality of speakers so that environmental sound signals received from each of the plurality of sound pickup devices are played back from different arrival directions for each sound pickup device. preferable.

本発明のシステムにおける他の実施形態によれば、
収音装置は、カメラによって撮影された映像を再生装置へ送信し、
再生装置は、収音装置毎に受信した映像それぞれを区分してディスプレイによって再生し、
再生装置は、収音装置毎の映像が映るディスプレイの位置から、当該収音装置の環境音が到来するように当該環境音信号が再生される
ことも好ましい。 According to another embodiment of the system of the invention,
The sound collecting device transmits the video captured by the camera to the playback device,
The reproducing device divides each video received by each sound collecting device and reproduces them on a display,
It is also preferable that the reproducing device reproduces the environmental sound signal so that the environmental sound of the sound collecting device comes from the position of the display on which the image of each sound collecting device is projected.

本発明によれば、マイクロフォンによって収音した環境音信号を、スピーカから再生する再生装置へ送信する収音装置において、
音響タグが紐付けられた音響オブジェクトを蓄積する第１の音響データベースと、
除去すべき音響タグを登録する除去音響タグテーブルと、
第１の音響データベースと同一の音響タグであっても、異なる音響オブジェクトを蓄積する第２の音響データベースと、
第１の音響データベースを用いて、環境音信号に内在する１つ以上の音響オブジェクトを検出し、当該音響オブジェクトの音響タグを特定する音響オブジェクト検出エンジンと、
特定された各音響タグが、除去音響タグテーブルに登録されたものである場合、環境音信号から、当該音響タグに紐付く音響オブジェクトの音響信号部分を抑圧する音響オブジェクト抑圧手段と、
第２の音響データベースを用いて、音響タグに紐付く音響オブジェクトを、環境音信号に混合する音響オブジェクト混合手段と
音響オブジェクトを混合した環境音信号を、再生装置へ送信する環境音送信手段と、
を有することを特徴とする。 According to the present invention, in a sound collecting device that transmits an environmental sound signal collected by a microphone to a reproducing device that reproduces the sound from a speaker,
a first acoustic database storing acoustic objects associated with acoustic tags;
a removal acoustic tag table for registering acoustic tags to be removed;
a second acoustic database that stores different acoustic objects even if they are the same acoustic tags as those in the first acoustic database;
an acoustic object detection engine that uses the first acoustic database to detect one or more acoustic objects inherent in the ambient sound signal and identifies acoustic tags for the acoustic objects;
acoustic object suppressing means for suppressing, from an environmental sound signal, an acoustic signal portion of an acoustic object associated with the acoustic tag when each identified acoustic tag is registered in a removal acoustic tag table;
an acoustic object mixing means for mixing the acoustic object linked to the acoustic tag with the environmental sound signal using the second acoustic database; and an environmental sound transmitting means for transmitting the environmental sound signal mixed with the acoustic object to the reproducing device;
characterized by having

本発明によれば、マイクロフォンによって収音した環境音信号を、スピーカによって再生する再生装置へ送信する収音装置に搭載されたコンピュータを機能させるプログラムにおいて、
音響タグが紐付けられた音響オブジェクトを蓄積する第１の音響データベースと、
除去すべき音響タグを登録する除去音響タグテーブルと、
第１の音響データベースを用いて、環境音信号に内在する１つ以上の音響オブジェクトを検出し、当該音響オブジェクトの音響タグを特定する音響オブジェクト検出エンジンと、
特定された各音響タグが、除去音響タグテーブルに登録されたものである場合、環境音信号から、当該音響タグに紐付く音響オブジェクトの音響信号部分を抑圧する音響オブジェクト抑圧手段と、
音響オブジェクトを除去した環境音信号を、再生装置へ送信する環境音送信手段と、
除去した音響オブジェクトに紐付く音響タグを、再生装置へ送信する音響タグ送信手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, in a program that causes a computer installed in a sound pickup device that transmits an environmental sound signal picked up by a microphone to a playback device that plays back with a speaker,
a first acoustic database storing acoustic objects associated with acoustic tags;
a removal acoustic tag table for registering acoustic tags to be removed;
an acoustic object detection engine that uses the first acoustic database to detect one or more acoustic objects inherent in the ambient sound signal and identifies acoustic tags for the acoustic objects;
acoustic object suppressing means for suppressing, from an environmental sound signal, an acoustic signal portion of an acoustic object associated with the acoustic tag when each identified acoustic tag is registered in a removal acoustic tag table;
environmental sound transmitting means for transmitting the environmental sound signal from which the acoustic object has been removed to the reproducing device;
It is characterized by causing the computer to function as acoustic tag transmitting means for transmitting the acoustic tag associated with the removed acoustic object to the playback device.

本発明によれば、マイクロフォンによって収音した環境音信号を、スピーカから再生する再生装置へ送信する収音装置に搭載されたコンピュータを機能させるプログラムにおいて、
音響タグが紐付けられた音響オブジェクトを蓄積する第１の音響データベースと、
除去すべき音響タグを登録する除去音響タグテーブルと、
第１の音響データベースと同一の音響タグであっても、異なる音響オブジェクトを蓄積する第２の音響データベースと、
第１の音響データベースを用いて、環境音信号に内在する１つ以上の音響オブジェクトを検出し、当該音響オブジェクトの音響タグを特定する音響オブジェクト検出エンジンと、
特定された各音響タグが、除去音響タグテーブルに登録されたものである場合、環境音信号から、当該音響タグに紐付く音響オブジェクトの音響信号部分を抑圧する音響オブジェクト抑圧手段と、
第２の音響データベースを用いて、音響タグに紐付く音響オブジェクトを、環境音信号に混合する音響オブジェクト混合手段と
音響オブジェクトを混合した環境音信号を、再生装置へ送信する環境音送信手段と、
してコンピュータを機能させることを特徴とする。 According to the present invention, in a program for causing a computer installed in a sound collecting device that transmits an environmental sound signal collected by a microphone to a reproducing device that reproduces the sound from a speaker,
a first acoustic database storing acoustic objects associated with acoustic tags;
a removal acoustic tag table for registering acoustic tags to be removed;
a second acoustic database that stores different acoustic objects even if they are the same acoustic tags as those in the first acoustic database;
an acoustic object detection engine that uses the first acoustic database to detect one or more acoustic objects inherent in the ambient sound signal and identifies acoustic tags for the acoustic objects;
acoustic object suppressing means for suppressing, from an environmental sound signal, an acoustic signal portion of an acoustic object associated with the acoustic tag when each identified acoustic tag is registered in a removal acoustic tag table;
an acoustic object mixing means for mixing the acoustic object linked to the acoustic tag with the environmental sound signal using the second acoustic database; and an environmental sound transmitting means for transmitting the environmental sound signal mixed with the acoustic object to the reproducing device;
to make the computer function.

本発明によれば、マイクロフォンによって収音した環境音信号を、スピーカによって再生する再生装置へ送信する収音装置の収音再生方法において、
収音装置は、
音響タグが紐付けられた音響オブジェクトを蓄積する第１の音響データベースと、
除去すべき音響タグを登録する除去音響タグテーブルと
を有し、
第１の音響データベースを用いて、環境音信号に内在する１つ以上の音響オブジェクトを検出し、当該音響オブジェクトの音響タグを特定する第１のステップと、
特定された各音響タグが、除去音響タグテーブルに登録されたものである場合、環境音信号から、当該音響タグに紐付く音響オブジェクトの音響信号部分を除去する第２のステップと、
音響オブジェクトを除去した環境音信号を、再生装置へ送信すると共に、除去した音響オブジェクトに紐付く音響タグを、再生装置へ送信する第３のステップと
を実行することを特徴とする。 According to the present invention, in a sound collection and reproduction method of a sound collection device for transmitting an environmental sound signal collected by a microphone to a reproduction device for reproduction by a speaker,
The sound collecting device is
a first acoustic database storing acoustic objects associated with acoustic tags;
a removal acoustic tag table for registering acoustic tags to be removed;
a first step of detecting one or more acoustic objects inherent in the ambient sound signal using a first acoustic database and identifying acoustic tags for the acoustic objects;
a second step of removing, from the environmental sound signal, the acoustic signal portion of the acoustic object associated with the acoustic tag, if each identified acoustic tag is registered in the removed acoustic tag table;
and a third step of transmitting the environmental sound signal from which the acoustic object has been removed to the reproducing device, and transmitting the acoustic tag associated with the removed acoustic object to the reproducing device.

本発明の収音再生方法における他の実施形態によれば、
再生装置は、
音響タグが紐付けられた音響オブジェクトを蓄積する第２の音響データベースを有し、
第２の音響データベースを用いて、音響タグに紐付く音響オブジェクトを、環境音信号に混合する第４のステップと、
音響オブジェクトを混合した環境音信号をスピーカによって再生する第５のステップと
を実行することも好ましい。 According to another embodiment of the sound collection and reproduction method of the present invention,
The playback device
having a second acoustic database that stores acoustic objects associated with acoustic tags;
a fourth step of mixing the acoustic object associated with the acoustic tag with the environmental sound signal using the second acoustic database;
Playing the ambient sound signal mixed with the acoustic object by means of a loudspeaker is also preferably performed.

本発明によれば、マイクロフォンによって収音した環境音信号を、スピーカから再生する再生装置へ送信する収音装置の収音再生方法において、
収音装置は、
音響タグが紐付けられた音響オブジェクトを蓄積する第１の音響データベースと、
除去すべき音響タグを登録する除去音響タグテーブルと、
第１の音響データベースと同一の音響タグであっても、異なる音響オブジェクトを蓄積する第２の音響データベースと
を有し、
第１の音響データベースを用いて、環境音信号に内在する１つ以上の音響オブジェクトを検出し、当該音響オブジェクトの音響タグを特定する第１のステップと、
特定された各音響タグが、除去音響タグテーブルに登録されたものである場合、環境音信号から、当該音響タグに紐付く音響オブジェクトの音響信号部分を除去する第２のステップと、
第２の音響データベースを用いて、音響タグに紐付く音響オブジェクトを、環境音信号に混合する第３のステップと、
音響オブジェクトを混合した環境音信号を、再生装置へ送信する第４のステップと
を実行することを特徴とする。 According to the present invention, in a sound collection and reproduction method of a sound collection device for transmitting an environmental sound signal collected by a microphone to a reproduction device for reproduction from a speaker,
The sound collecting device is
a first acoustic database storing acoustic objects associated with acoustic tags;
a removal acoustic tag table for registering acoustic tags to be removed;
a second acoustic database storing different acoustic objects even if they are the same acoustic tags as the first acoustic database;
a first step of detecting one or more acoustic objects inherent in the ambient sound signal using a first acoustic database and identifying acoustic tags for the acoustic objects;
a second step of removing, from the environmental sound signal, the acoustic signal portion of the acoustic object associated with the acoustic tag, if each identified acoustic tag is registered in the removed acoustic tag table;
a third step of mixing the acoustic object associated with the acoustic tag with the environmental sound signal using the second acoustic database;
and a fourth step of transmitting the environmental sound signal mixed with the sound object to the reproducing device.

本発明の収音装置、システム、プログラム及び方法によれば、拠点内で発生する音声の中で、相手方へ伝える必要が無い特定の音響信号を抑圧した環境音を送信することができる。耳障りな雑音を抑圧すると共に、プライバシの問題の音声を除去する一方で、相手方にはどのような環境音が除去されたのかを伝えることができる。これによって、遠隔の異なる拠点に滞在するメンバ同士であっても、快適な環境音の中で、互いの状況を共有することができる。 According to the sound collecting device, system, program, and method of the present invention, it is possible to transmit environmental sound by suppressing a specific acoustic signal, which does not need to be transmitted to the other party, among voices generated in the base. It is possible to suppress harsh noises and eliminate privacy-related voices, while at the same time telling the other party what kind of environmental sounds have been eliminated. As a result, even members staying at different remote bases can share their situations in comfortable environmental sounds.

本発明における収音装置及び再生装置の機能構成図である。1 is a functional configuration diagram of a sound collecting device and a reproducing device according to the present invention; FIG. 音響オブジェクトを検出する説明図である。FIG. 4 is an explanatory diagram of detecting an acoustic object; 音響オブジェクトを抑圧する説明図である。FIG. 4 is an explanatory diagram of suppressing an acoustic object; 環境音送信部及び音響タグ送信部の説明図である。FIG. 4 is an explanatory diagram of an environmental sound transmission unit and an acoustic tag transmission unit; 音響オブジェクトを混合する説明図である。FIG. 4 is an explanatory diagram of mixing acoustic objects; 環境センサに接続された収音装置の機能構成図である。3 is a functional configuration diagram of a sound collection device connected to an environment sensor; FIG. カメラによって撮影された画像から音響オブジェクトを抑圧する収音装置の機能構成図である。FIG. 2 is a functional configuration diagram of a sound collection device that suppresses acoustic objects from an image captured by a camera; 音響オブジェクト抑圧部及び音響オブジェクト混合部を有する収音装置の機能構成図である。2 is a functional configuration diagram of a sound collecting device having an acoustic object suppression unit and an acoustic object mixing unit; FIG. 複数の収音装置から環境音を受信する再生装置の機能構成図である。2 is a functional configuration diagram of a playback device that receives environmental sounds from a plurality of sound collecting devices; FIG.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明における収音装置及び再生装置の機能構成図である。 FIG. 1 is a functional configuration diagram of a sound collecting device and a reproducing device according to the present invention.

本発明のシステムは、ネットワークを介して収音装置１と再生装置２とが接続されている。
図１によれば、収音装置１は、マイクロフォン１０１によって収音した環境音信号から特定の音響信号を抑圧し、その環境音信号を再生装置２へ送信する。また、カメラ１０２によって撮影した映像も、再生装置２へ同時に送信する。
再生装置２は、収音装置１から受信した環境音を、スピーカ２０１から再生する。また、再生装置２は、受信した環境音の中で、どのような音響信号が抑圧されたのか、を認識することができ、その抑圧された音響信号をユーザに明示することもできる。更に、受信した環境音に、抑圧された音響信号と異なる他の音響信号を混合し、新たな環境音でスピーカ２０１から再生することもできる。 In the system of the present invention, a sound collecting device 1 and a reproducing device 2 are connected via a network.
According to FIG. 1 , the sound collecting device 1 suppresses a specific acoustic signal from the environmental sound signal picked up by the microphone 101 and transmits the environmental sound signal to the reproducing device 2 . At the same time, the video captured by the camera 102 is also transmitted to the playback device 2 .
The reproducing device 2 reproduces the environmental sound received from the sound collecting device 1 through the speaker 201 . In addition, the playback device 2 can recognize what kind of acoustic signal has been suppressed in the received environmental sound, and can clearly show the suppressed acoustic signal to the user. Furthermore, it is also possible to mix the received environmental sound with another acoustic signal different from the suppressed acoustic signal and reproduce the new environmental sound from the speaker 201 .

＜収音装置１＞
図１によれば、収音装置１は、第１の音響データベース１１と、除去音響タグテーブル１２と、音響オブジェクト検出エンジン１３と、音響オブジェクト抑圧部１４と、環境音送信部１５と、環境タグ送信部１６と、映像送信部１７とを有する。これら機能構成部は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現できる。また、これら機能構成部の処理の流れは、収音送信方法としても理解できる。 <Sound collection device 1>
1, the sound collecting device 1 includes a first acoustic database 11, a removed acoustic tag table 12, an acoustic object detection engine 13, an acoustic object suppressor 14, an environmental sound transmitter 15, an environmental tag It has a transmission unit 16 and a video transmission unit 17 . These functional configuration units can be realized by executing a program that causes a computer installed in the device to function. In addition, the processing flow of these functional configuration units can also be understood as a sound collection and transmission method.

図２は、音響オブジェクトを検出する説明図である。 FIG. 2 is an explanatory diagram of detecting an acoustic object.

［第１の音響データベース１１］
第１の音響データベース１１は、音響タグが紐付けられた音響オブジェクト（音オブジェクト）を蓄積したものである。
音響タグ<->音響オブジェクト
「音響タグ」は、音響オブジェクトを特定するための識別子である。
「音響オブジェクト」は、音響信号そのものに限らず、時系列の周波数スペクトルのような音響的特徴量の標準パターンのようなものであってもよい。音響信号については、例えばITU-R 勧告BS.2051「番組制作における高度音響システム」のような規格に準拠した音響信号を用いてもよい。 [First acoustic database 11]
The first acoustic database 11 accumulates acoustic objects (sound objects) to which acoustic tags are linked.
Acoustic tag<->acoustic object An “acoustic tag” is an identifier for specifying an acoustic object.
The "acoustic object" is not limited to the acoustic signal itself, but may be a standard pattern of acoustic features such as a time-series frequency spectrum. As for the audio signal, for example, an audio signal conforming to a standard such as ITU-R Recommendation BS.2051 "Advanced Audio System in Program Production" may be used.

［除去音響タグテーブル１２］
除去音響タグテーブル１２は、除去すべき音響タグを登録したものである。
例えば、プリンタやドア開閉音のような雑音を除去したい場合、それら音響オブジェクトに紐付けられた音響タグが登録される。また、例えば、人の声のプライバシを除去したい場合、それらの音響オブジェクトに紐付けられた音響タグが登録される。 [Removal Acoustic Tag Table 12]
The removal acoustic tag table 12 registers acoustic tags to be removed.
For example, when noise such as the sound of a printer or door opening/closing is to be removed, acoustic tags associated with those acoustic objects are registered. Also, for example, when it is desired to remove the privacy of a person's voice, acoustic tags associated with those acoustic objects are registered.

［音響オブジェクト検出エンジン１３］
音響オブジェクト検出エンジン１３は、第１の音響データベース１１を用いて、環境音信号に内在する１つ以上の音響オブジェクトを検出し、当該音響オブジェクトの音響タグを特定する。特定された音響タグは、音響オブジェクト抑圧部１４へ出力される。 [Acoustic object detection engine 13]
The acoustic object detection engine 13 uses the first acoustic database 11 to detect one or more acoustic objects inherent in the environmental sound signal, and identifies acoustic tags of the acoustic objects. The specified acoustic tag is output to the acoustic object suppression unit 14 .

音響オブジェクト検出エンジン１３は、メル周波数ケプストラム係数（ＭＦＣＣ）を特徴量とし抽出し、深層学習に基づくニューラルネットワークを用いて音響オブジェクトを識別する（例えば非特許文献３、４参照）。これは、制約付きボルツマンマシン（ＲＢＭ）に基づく自己符号化器によって事前学習された隠れ層を積み重ねて、多層の階層ネットワークを構築し、最終層の出力を使った識別ネットワークを追加して、全体として教師あり学習によって音響タグを検出している。 The acoustic object detection engine 13 extracts mel-frequency cepstrum coefficients (MFCC) as features and identifies acoustic objects using a neural network based on deep learning (see Non-Patent Documents 3 and 4, for example). It stacks hidden layers pretrained by a constrained Boltzmann machine (RBM)-based autoencoder to build a multi-layered hierarchical network and adds a discriminative network using the output of the final layer to obtain the overall We detect acoustic tags by supervised learning as follows.

図２によれば、マイクロフォン１０１によって収音された環境音信号が、音響オブジェクト検出エンジン１３に入力されている。この環境音信号には、例えば以下のような様々な音響が混在している。
「ブーンッガシャッキー」
「山本さん、おはよう」
「ギー」
「伊藤さんに昨日会ったよ～」
「ピンポン」
「カタカタカタ」
そして、音響オブジェクト検出エンジン１３は、例えば以下のように音響オブジェクト及び音響タグを検出する。
音響タグ101（チャイム音）
音響タグ167（プリンタ音）
音響タグ239（キーボードの打鍵音）
音響タグ143（人名「山本さん」）
音響タグ52 （人名「伊藤さん」） According to FIG. 2, an environmental sound signal picked up by the microphone 101 is input to the acoustic object detection engine 13 . For example, various sounds such as those described below are mixed in the environmental sound signal.
"Boonga Shacky"
"Mr. Yamamoto, good morning."
"Gee"
"I met Ito-san yesterday."
"ping pong"
"Kata kata kata kata"
Then, the acoustic object detection engine 13 detects acoustic objects and acoustic tags, for example, as follows.
Acoustic tag 101 (chime sound)
Acoustic tag 167 (printer sound)
Acoustic tag 239 (keyboard tap sound)
Acoustic tag 143 (Person's name "Mr. Yamamoto")
Acoustic tag 52 (Person's name "Mr. Ito")

［音響オブジェクト抑圧部１４］
音響オブジェクト抑圧部１４は、特定された各音響タグが、除去音響タグテーブル１２に登録されたものである場合、環境音信号から、当該音響タグに紐付く音響オブジェクトを抑圧する。 [Sound object suppression unit 14]
When each identified acoustic tag is registered in the removal acoustic tag table 12, the acoustic object suppression unit 14 suppresses the acoustic object associated with the acoustic tag from the environmental sound signal.

図３は、音響オブジェクトを抑圧する説明図である。
図３によれば、除去音響タグテーブル１２には、例えば以下の音響タグが登録されているとする。
音響タグ101（チャイム音）
音響タグ167（プリンタ音）
音響タグ52 （人名「伊藤さん」）
この場合、音響オブジェクト抑圧部１４は、環境音信号から、これら音響タグに紐付く音響オブジェクトを、周波数的に抑圧する。 FIG. 3 is an explanatory diagram of suppressing an acoustic object.
According to FIG. 3, it is assumed that the following acoustic tags are registered in the removed acoustic tag table 12, for example.
Acoustic tag 101 (chime sound)
Acoustic tag 167 (printer sound)
Acoustic tag 52 (Person's name "Mr. Ito")
In this case, the acoustic object suppression unit 14 suppresses the acoustic objects associated with these acoustic tags from the environmental sound signal in terms of frequency.

尚、他の実施形態として、音響タグに基づく音響オブジェクトを単に抑圧するのみでなく、音響タグに基づく音響オブジェクトの音響レベルが所定閾値以上となった場合にのみ、その音響オブジェクトを抑圧するものであってもよい。
例えば、前述した実施形態によれば、人名「山本さん」「伊藤さん」それぞれに付与された音響タグに基づく音響オブジェクトを抑圧するように説明したが、人声の周波数に基づく音響オブジェクトを抑圧するものであってもよい。 In another embodiment, the acoustic object based on the acoustic tag is not only suppressed, but only when the acoustic level of the acoustic object based on the acoustic tag exceeds a predetermined threshold value. There may be.
For example, according to the above-described embodiment, the acoustic objects based on the acoustic tags assigned to the personal names "Mr. Yamamoto" and "Mr. Ito" are suppressed. can be anything.

図４は、環境音送信部及び音響タグ送信部の説明図である。 FIG. 4 is an explanatory diagram of the environmental sound transmission unit and the acoustic tag transmission unit.

［環境音送信部１５］
環境音送信部１５は、特定の音響オブジェクトを抑圧した環境音信号を、再生装置２へ送信する。これによって、再生装置２は、特定の雑音やプライバシ音声が除去された環境音を再生することができる。 [Environmental sound transmission unit 15]
The environmental sound transmission unit 15 transmits an environmental sound signal in which a specific acoustic object is suppressed to the playback device 2 . As a result, the reproduction device 2 can reproduce the environmental sound from which the specific noise and privacy sound have been removed.

図４によれば、例えば以下のような環境音が送信される。
「山本さん、おはよう」
「ギー」
「・・・に昨日会ったよ～」
「カタカタカタ」
このように、例えばプリンタの雑音のような「ブーンッガシャッキー」「ピンポン」や、プライベートの音声のような「伊藤さん」が抑圧される。 According to FIG. 4, for example, the following environmental sounds are transmitted.
"Mr. Yamamoto, good morning."
"Gee"
"I met with... yesterday."
"Kata kata kata kata"
In this way, for example, "Boongga shucky" and "Ping-pong" like printer noise and "Mr. Ito" like private voice are suppressed.

［環境タグ送信部１６］
環境タグ送信部１６は、抑圧された音響オブジェクトに紐付く音響タグを、再生装置２へ送信する。これによって、再生装置２は、受信した環境音信号について、その音響タグに紐付く音響オブジェクトが抑圧されていることを認識する。 [Environmental tag transmission unit 16]
The environment tag transmission unit 16 transmits the acoustic tag associated with the suppressed acoustic object to the playback device 2 . As a result, the playback device 2 recognizes that the acoustic object associated with the acoustic tag is suppressed in the received environmental sound signal.

図４によれば、例えば以下のような音響タグが送信される。
音響タグ101（チャイム音）
音響タグ167（プリンタ音）
音響タグ52 （人名「伊藤さん」）
尚、音響タグに基づく音響オブジェクトが挿入されていた時刻も、環境音に同期して送信する。環境音を再生する際に、音響オブジェクトの挿入時刻を特定するためである。 According to FIG. 4, for example, the following acoustic tags are transmitted.
Acoustic tag 101 (chime sound)
Acoustic tag 167 (printer sound)
Acoustic tag 52 (Person's name "Mr. Ito")
The time at which the acoustic object based on the acoustic tag was inserted is also transmitted in synchronization with the environmental sound. This is for specifying the insertion time of the sound object when reproducing the environmental sound.

［映像送信部１７］
映像送信部１７は、カメラ１０２によって撮影された映像を、再生装置２へ送信する。相手方の拠点と映像も共有して認識することが好ましい。 [Video transmission unit 17]
The video transmission unit 17 transmits the video captured by the camera 102 to the playback device 2 . It is preferable to share and recognize images with the base of the other party.

＜再生装置２＞
図１によれば、再生装置２は、第２の音響データベース２１と、音響オブジェクト混合部２２と、映像再生部２３とを有する。これら機能構成部は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現できる。また、これら機能構成部の処理の流れは、受信再生方法としても理解できる。 <Playback Device 2>
According to FIG. 1, the playback device 2 comprises a second acoustic database 21 , an acoustic object mixer 22 and a video playback unit 23 . These functional configuration units can be realized by executing a program that causes a computer installed in the device to function. In addition, the processing flow of these functional components can also be understood as a reception/playback method.

［第２の音響データベース２１］
第２の音響データベース２１は、音響タグが紐付けられた音響オブジェクトを蓄積する。 [Second Acoustic Database 21]
The second acoustic database 21 accumulates acoustic objects linked with acoustic tags.

第２の音響データベース２１に蓄積された音響タグ及び音響オブジェクトは、収音装置１の第１の音響データベース１１に蓄積された音響タグ及び音響オブジェクトの一部又は全部であってもよい。
例えば、第１の音響データベース１１に蓄積された音響タグ及び音響オブジェクトが、第２の音響データベース２１に蓄積されていない場合、その音響オブジェクトは抑圧されたままの環境音が再生される。
一方で、第１の音響データベース１１に蓄積された音響タグ及び音響オブジェクトが、第２の音響データベース２１にも蓄積されている場合、第２の音響データベース２１の音響オブジェクトによって混合された環境音が再生される。即ち、第２の音響データベース２１に蓄積された音響タグと、収音装置１の第１の音響データベース１１に蓄積された音響タグとが、異なる音響信号に基づく音響オブジェクトである場合、第２の音響データベース２１の音響オブジェクトによって変換された環境音が再生される。 The acoustic tags and acoustic objects stored in the second acoustic database 21 may be part or all of the acoustic tags and acoustic objects stored in the first acoustic database 11 of the sound collection device 1 .
For example, if the acoustic tags and acoustic objects stored in the first acoustic database 11 are not stored in the second acoustic database 21, the environmental sound is reproduced with the acoustic objects suppressed.
On the other hand, when the acoustic tags and acoustic objects stored in the first acoustic database 11 are also stored in the second acoustic database 21, the environmental sounds mixed by the acoustic objects in the second acoustic database 21 are is played. That is, when the acoustic tags stored in the second acoustic database 21 and the acoustic tags stored in the first acoustic database 11 of the sound collection device 1 are acoustic objects based on different acoustic signals, the second Environmental sounds converted by the acoustic objects in the acoustic database 21 are reproduced.

［音響オブジェクト混合部２２］
音響オブジェクト混合部２２は、第２の音響データベース２１を用いて、音響タグに紐付く音響オブジェクトを、環境音信号に混合する。混合した環境音信号は、スピーカ２０１へ出力される。 [Sound object mixer 22]
The acoustic object mixing unit 22 uses the second acoustic database 21 to mix the acoustic object associated with the acoustic tag with the environmental sound signal. The mixed environmental sound signal is output to speaker 201 .

図５は、音響オブジェクトを混合する説明図である。 FIG. 5 is an explanatory diagram of mixing acoustic objects.

図５によれば、音響オブジェクト混合部２２は、収音装置１から、環境音信号と、抑圧された音響タグとを受信する。そして、第２の音響データベース２１を用いて、受信した音響タグをキーとして、音響オブジェクトを検索する。図５によれば、以下のように検索される。
音響タグ101<->「リーン」
音響タグ52 <->「ピポパ」
音響タグ101は、第１の音響データベース１１では「ピンポン」であったものが、第２の音響データベース２１では擬似音声「リーン」となっている。
音響タグ52は、第１の音響データベース１１では「伊藤さん」であったものが、第２の音響データベース２１では擬似音声「ピポパ」となっている。
ここで、音響タグ167は、第２の音響データベース２１では検索されない。これは、第１の音響データベースにおける「ブーンッガシャッキー」は抑圧されたままとなることを意味する。特に耳障りな雑音については、環境音から消音（除去）するのみでよい。 According to FIG. 5, the acoustic object mixer 22 receives the ambient sound signal and the suppressed acoustic tags from the sound pickup device 1 . Then, using the second acoustic database 21, the acoustic object is searched using the received acoustic tag as a key. According to FIG. 5, retrieval is performed as follows.
Acoustic tag 101 <->"lean"
Acoustic tag 52 <->"Pipopa"
The acoustic tag 101, which was "ping-pong" in the first acoustic database 11, is pseudo-voice "lean" in the second acoustic database 21. FIG.
The acoustic tag 52, which was "Mr. Ito" in the first acoustic database 11, is a pseudo-speech "pipopa" in the second acoustic database 21. FIG.
Here, acoustic tag 167 is not searched in second acoustic database 21 . This means that the "boomggashucky" in the first acoustic database remains suppressed. Noise that is particularly annoying can be simply silenced (removed) from the environmental sound.

図５によれば、例えば以下のような音響オブジェクトを混合した環境音が再生される。
「山本さん、おはよう」
「ギー」
「ピポパに昨日会ったよ～」
「リーン」
「カタカタカタ」 According to FIG. 5, for example, environmental sounds mixed with the following sound objects are reproduced.
"Mr. Yamamoto, good morning."
"Gee"
"I met Pipopa yesterday~"
“Lean”
"Kata kata kata kata"

［映像再生部２３］
映像再生部２３は、収音装置１から受信した映像を、ディスプレイ２０２へ出力する。相手方の拠点と映像も共有して認識することが好ましい。 [Video playback unit 23]
The video reproducing unit 23 outputs the video received from the sound collecting device 1 to the display 202 . It is preferable to share and recognize images with the base of the other party.

図６は、環境センサに接続された収音装置の機能構成図である。 FIG. 6 is a functional configuration diagram of a sound collecting device connected to an environment sensor.

図６によれば、収音装置１は、環境センサ１８に接続されており、ON/OFF信号を受信する。環境センサとしては、例えばドア開閉センサのようなものであってもよい。環境センサは、いずれか１つの音響タグに紐付いている。環境センサのON/OFF信号は、音響オブジェクト抑圧部１４へ入力される。 According to FIG. 6, the sound collecting device 1 is connected to an environmental sensor 18 and receives an ON/OFF signal. The environmental sensor may be, for example, a door open/close sensor. An environmental sensor is tied to any one acoustic tag. An ON/OFF signal of the environment sensor is input to the acoustic object suppression unit 14 .

音響オブジェクト抑圧部１４は、環境センサ１８から所定信号を受信した際に、環境音信号から、当該環境センサの音響タグに紐付く音響オブジェクトの音響信号部分を抑圧する。これによって、例えばドア開閉音のような雑音を、環境音から抑圧することができる。 When a predetermined signal is received from the environment sensor 18, the acoustic object suppression unit 14 suppresses the acoustic signal portion of the acoustic object linked to the acoustic tag of the environmental sensor from the environmental sound signal. This makes it possible to suppress noise such as door opening/closing sound from environmental sounds.

図７は、カメラによって撮影された画像から音響オブジェクトを抑圧する収音装置の機能構成図である。 FIG. 7 is a functional configuration diagram of a sound pickup device that suppresses an acoustic object from an image captured by a camera.

図７によれば、収音装置１は、カメラによって撮影された画像を入力する。
また、図７によれば、収音装置１は、画像データベース１９０及び画像オブジェクト検出エンジン１９１を更に有する。 According to FIG. 7, the sound collecting device 1 receives an image captured by a camera.
Moreover, according to FIG. 7, the sound collecting device 1 further has an image database 190 and an image object detection engine 191 .

［画像データベース１９０］
画像データベース１９０は、音響タグが紐付けられた画像オブジェクトを蓄積する。 [Image database 190]
The image database 190 accumulates image objects linked with acoustic tags.

［画像オブジェクト検出エンジン１９１］
画像オブジェクト検出エンジン１９１は、画像データベース１９０を用いて、カメラによって撮影された映像に内在する１つ以上の画像オブジェクトを検出し、当該画像オブジェクトの音響タグを特定する。特定された音響タグは、音響オブジェクト抑圧部１４へ出力される。 [Image object detection engine 191]
The image object detection engine 191 uses the image database 190 to detect one or more image objects inherent in the video captured by the camera and identify the acoustic tags of the image objects. The specified acoustic tag is output to the acoustic object suppression unit 14 .

具体的には、画像オブジェクト検出エンジン１９１は、入力された画像又は映像から、物体（画像オブジェクト）を枠（バウンディングボックス）で囲み、その物体の種別（カテゴリ）を識別する。これは、例えばＳＳＤ(Single Shot Multibox Detector)のようなものであってもよい。ＳＳＤは、画像をグリッドで分割し、各グリッドに対して固定された複数のバウンディングボックスの当てはまり具合から、その位置のバウンディングボックスを検知する。そのバウンディングボックスには、１つの画像オブジェクトが収まる。
また、画像オブジェクト検出エンジン１９１としては、例えばＲＧＢ認識に基づくＣＮＮ(Convolutional Neural Network)のようなニューラルネットワークであって、ＹＯＬＯ(You Only Look Once)（登録商標）のようなものであってもよい。 Specifically, the image object detection engine 191 encloses an object (image object) in a frame (bounding box) from the input image or video, and identifies the type (category) of the object. This may be, for example, an SSD (Single Shot Multibox Detector). The SSD divides the image into grids and detects the bounding box at that position from the fit of a plurality of fixed bounding boxes for each grid. One image object fits in that bounding box.
The image object detection engine 191 may be a neural network such as a CNN (Convolutional Neural Network) based on RGB recognition, such as YOLO (You Only Look Once) (registered trademark). .

音響オブジェクト抑圧部１４は、画像オブジェクト検出エンジン１９１から音響タグを受信した際に、環境音信号から、当該環境センサの音響タグに紐付く音響オブジェクトの音響信号部分を抑圧する。これによって、例えばドアの開閉が映像に映り込んだ際に、その開閉音のような雑音を、環境音から抑圧することができる。 When the acoustic tag is received from the image object detection engine 191, the acoustic object suppression unit 14 suppresses the acoustic signal portion of the acoustic object linked to the acoustic tag of the environmental sensor from the environmental sound signal. As a result, for example, when the opening and closing of the door is reflected in the video, noise such as the opening and closing sound can be suppressed from the environmental sound.

図８は、音響オブジェクト抑圧部及び音響オブジェクト混合部を有する収音装置の機能構成図である。 FIG. 8 is a functional configuration diagram of a sound collecting device having an acoustic object suppressing section and an acoustic object mixing section.

図８によれば、図１における収音装置１及び再生装置２を組み合わせたものである。この場合、収音装置１は、音響オブジェクト抑圧部１４によって特定の音響オブジェクトが抑圧された環境音に、音響オブジェクト混合部２２によって他の特定の音響オブジェクトが混合される。図１の場合に再生装置２によって混合される環境音が、図８の場合に収音装置１から送信される。 According to FIG. 8, the sound collecting device 1 and the reproducing device 2 in FIG. 1 are combined. In this case, in the sound collection device 1 , the acoustic object mixing unit 22 mixes the environmental sound in which the specific acoustic object is suppressed by the acoustic object suppressing unit 14 with another specific acoustic object. The environmental sound mixed by the reproducing device 2 in the case of FIG. 1 is transmitted from the sound collecting device 1 in the case of FIG.

図９は、複数の収音装置から環境音を受信する再生装置の機能構成図である。 FIG. 9 is a functional configuration diagram of a reproducing device that receives environmental sounds from a plurality of sound collecting devices.

図９によれば、複数の収音装置１と、１つの再生装置２とがネットワークを介して接続されている。各収音装置１は異なる拠点に配置され、再生装置２は、異なる拠点の環境音信号を同時に再生する。
再生装置２は、ディスプレイを配置し、収音装置１毎に受信した映像それぞれを区分して表示するものであってもよい。そして、再生装置２は、複数の収音装置それぞれから受信した環境音を、収音装置毎に異なる到来方向から当該環境音信号が再生されるように複数のスピーカから出力する音響信号を制御するべく、環境音合成部２４を備える。再生装置２から環境音を聴音しているユーザは、収音装置毎の映像が映るディスプレイの表示位置から、当該収音装置の環境音が到来するように当該環境音信号が聞こえる。
これには、マルチチャネル音響の技術が用いられる（例えば非特許文献５参照）。この技術によれば、配置された各ディスプレイの位置を基準に、方向の異なる複数のチャネルを設置し、各チャネルが音の到来方向と１対１に対応するようにする。これによって、スピーカのない位置にディスプレイが配置されても、ユーザは、その方向から音が到来しているように聞こえる。 According to FIG. 9, a plurality of sound collecting devices 1 and one reproducing device 2 are connected via a network. Each sound collecting device 1 is placed at a different base, and a reproducing device 2 simultaneously reproduces environmental sound signals from different bases.
The reproducing device 2 may be arranged with a display, and may display each image received by each sound collecting device 1 in a divided manner. Then, the reproducing device 2 controls the acoustic signals output from the plurality of speakers so that the environmental sounds received from each of the plurality of sound collecting devices are reproduced from different arrival directions for each sound collecting device. Therefore, an environmental sound synthesizing unit 24 is provided. A user who is listening to the environmental sound from the playback device 2 hears the environmental sound signal from the display position of the display showing the video of each sound collecting device as if the environmental sound of the sound collecting device is coming.
For this purpose, multi-channel acoustic technology is used (see, for example, Non-Patent Document 5). According to this technique, a plurality of channels with different directions are installed based on the position of each arranged display, and each channel corresponds to the incoming direction of sound on a one-to-one basis. As a result, even if the display is placed at a position where there are no speakers, the user hears the sound as if it were coming from that direction.

以上、詳細に説明したように、本発明の収音装置、システム、プログラム及び方法によれば、拠点内で発生する音声の中で、相手方へ伝える必要が無い特定の音響信号を抑圧した環境音を送信することができる。耳障りな雑音を抑圧すると共に、プライバシの問題の音声を除去する一方で、相手方にはどのような環境音が除去されたのかを伝えることができる。これによって、遠隔の異なる拠点に滞在するメンバ同士であっても、快適な環境音の中で、互いの状況を共有することができる。 As described in detail above, according to the sound collecting device, system, program, and method of the present invention, among sounds generated within a site, environmental sounds in which specific sound signals that do not need to be transmitted to the other party are suppressed can be sent. It is possible to suppress harsh noises and eliminate privacy-related voices, while at the same time telling the other party what kind of environmental sounds have been eliminated. As a result, even members staying at different remote bases can share their situations in comfortable environmental sounds.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 For the various embodiments of the present invention described above, various changes, modifications and omissions within the spirit and scope of the present invention can be easily made by those skilled in the art. The foregoing description is exemplary only and is not intended to be limiting. The invention is to be limited only as limited by the claims and the equivalents thereof.

１収音装置
１０１マイクロフォン
１０２カメラ
１１第１の音響データベース
１２除去音響タグテーブル
１３音響オブジェクト検出エンジン
１４音響オブジェクト抑圧部
１５環境音送信部
１６環境タグ送信部
１７映像送信部
１８環境センサ
１９０画像データベース
１９１画像オブジェクト検出エンジン
２再生装置
２０１スピーカ
２０２ディスプレイ
２１第２の音響データベース
２２音響オブジェクト混合部
２３映像再生部
２４環境音合成部

1 sound collecting device 101 microphone 102 camera 11 first acoustic database 12 removed acoustic tag table 13 acoustic object detection engine 14 acoustic object suppression unit 15 environmental sound transmitter 16 environmental tag transmitter 17 video transmitter 18 environment sensor 190 image database 191 Image object detection engine 2 playback device 201 speaker 202 display 21 second audio database 22 audio object mixing unit 23 video playback unit 24 environmental sound synthesis unit

Claims

In a sound collecting device that transmits an environmental sound signal picked up by a microphone to a reproducing device that reproduces it with a speaker,
a first acoustic database storing acoustic objects associated with acoustic tags;
a removal acoustic tag table for registering acoustic tags to be removed;
an acoustic object detection engine that uses the first acoustic database to detect one or more acoustic objects inherent in the ambient sound signal and identifies acoustic tags for the acoustic objects;
acoustic object suppressing means for suppressing, from an environmental sound signal, an acoustic signal portion of an acoustic object associated with the acoustic tag when each identified acoustic tag is registered in a removal acoustic tag table;
environmental sound transmitting means for transmitting the environmental sound signal from which the acoustic object has been removed to the reproducing device;
and acoustic tag transmitting means for transmitting an acoustic tag associated with the removed acoustic object to a reproducing device.

connected to environmental sensors,
Environmental sensors are tied to acoustic tags,
2. The acoustic object suppressing means according to claim 1, wherein, upon receiving a predetermined signal from the environmental sensor, the acoustic signal portion of the acoustic object linked to the acoustic tag of the environmental sensor is removed from the environmental sound signal. sound collection device.

connected to the camera,
an image database for accumulating image objects with associated acoustic tags;
an image object detection engine that uses the image database to detect one or more image objects inherent in the video captured by the camera and identifies acoustic tags for the image objects;
3. The sound collecting device according to claim 1, wherein the acoustic object suppression means removes, from the environmental sound signal, an acoustic signal portion of the acoustic object linked to the acoustic tag specified by the image object detection engine.

A system comprising the sound collecting device according to any one of claims 1 to 3 and a reproducing device for reproducing an environmental sound signal received from the sound collecting device,
The playback device
a second acoustic database that accumulates acoustic objects associated with acoustic tags;
an acoustic object mixing means for mixing an acoustic object associated with the acoustic tag with an environmental sound signal using a second acoustic database;
and reproducing an ambient sound signal mixed with an acoustic object by a speaker.

the acoustic tags and acoustic objects stored in the second acoustic database of the playback device are part or all of the acoustic tags and acoustic objects stored in the first acoustic database of the sound collection device;
Even if the acoustic tags stored in the second acoustic database of the reproducing device and the acoustic tags stored in the first acoustic database of the sound collecting device are the same, they are acoustic objects based on different acoustic signals. 5. A system according to claim 4.

A plurality of sound collecting devices and one playback device are connected via a network,
6. The system according to claim 4, wherein each sound collecting device is arranged at a different base, and the reproducing device simultaneously reproduces the environmental sound signals of the different bases.

The reproduction device controls acoustic signals output from the plurality of speakers so that environmental sound signals received from each of the plurality of sound collection devices are reproduced from different arrival directions for each sound collection device. 7. A system according to claim 6.

The sound collecting device transmits the video captured by the camera to the playback device,
The reproducing device divides each video received by each sound collecting device and reproduces them on a display,
8. The system according to claim 7, wherein the reproducing device reproduces the environmental sound signal so that the environmental sound of the sound collecting device comes from the position of the display on which the image of each sound collecting device is projected.

In a sound collecting device that transmits an environmental sound signal picked up by a microphone to a reproducing device that reproduces it from a speaker,
a first acoustic database storing acoustic objects associated with acoustic tags;
a removal acoustic tag table for registering acoustic tags to be removed;
a second acoustic database that stores different acoustic objects even if they are the same acoustic tags as those in the first acoustic database;
an acoustic object detection engine that uses the first acoustic database to detect one or more acoustic objects inherent in the ambient sound signal and identifies acoustic tags for the acoustic objects;
acoustic object suppressing means for suppressing, from an environmental sound signal, an acoustic signal portion of an acoustic object associated with the acoustic tag when each identified acoustic tag is registered in a removal acoustic tag table;
an acoustic object mixing means for mixing the acoustic object linked to the acoustic tag with the environmental sound signal using the second acoustic database; and an environmental sound transmitting means for transmitting the environmental sound signal mixed with the acoustic object to the reproducing device;
A sound collecting device comprising:

In a program that causes a computer installed in a sound pickup device that transmits environmental sound signals picked up by a microphone to a playback device that plays back through a speaker,
a first acoustic database storing acoustic objects associated with acoustic tags;
a removal acoustic tag table for registering acoustic tags to be removed;
an acoustic object detection engine that uses the first acoustic database to detect one or more acoustic objects inherent in the ambient sound signal and identifies acoustic tags for the acoustic objects;
acoustic object suppressing means for suppressing, from an environmental sound signal, an acoustic signal portion of an acoustic object associated with the acoustic tag when each identified acoustic tag is registered in a removal acoustic tag table;
environmental sound transmitting means for transmitting the environmental sound signal from which the acoustic object has been removed to the reproducing device;
A program for causing a computer to function as acoustic tag transmission means for transmitting an acoustic tag associated with a removed acoustic object to a playback device.

In a program that causes a computer installed in a sound pickup device that transmits environmental sound signals picked up by a microphone to a playback device that reproduces them from a speaker,
a first acoustic database storing acoustic objects associated with acoustic tags;
a removal acoustic tag table for registering acoustic tags to be removed;
a second acoustic database that stores different acoustic objects even if they are the same acoustic tags as those in the first acoustic database;
an acoustic object detection engine that uses the first acoustic database to detect one or more acoustic objects inherent in the ambient sound signal and identifies acoustic tags for the acoustic objects;
acoustic object suppressing means for suppressing, from an environmental sound signal, an acoustic signal portion of an acoustic object associated with the acoustic tag when each identified acoustic tag is registered in a removal acoustic tag table;
an acoustic object mixing means for mixing the acoustic object linked to the acoustic tag with the environmental sound signal using the second acoustic database; and an environmental sound transmitting means for transmitting the environmental sound signal mixed with the acoustic object to the reproducing device;
A program characterized by making a computer function by

In a sound pickup and reproduction method for a sound pickup device for transmitting an environmental sound signal picked up by a microphone to a reproduction device for reproduction by a speaker,
The sound collecting device is
a first acoustic database storing acoustic objects associated with acoustic tags;
a removal acoustic tag table for registering acoustic tags to be removed;
a first step of detecting one or more acoustic objects inherent in the ambient sound signal using a first acoustic database and identifying acoustic tags for the acoustic objects;
a second step of removing, from the environmental sound signal, the acoustic signal portion of the acoustic object associated with the acoustic tag, if each identified acoustic tag is registered in the removed acoustic tag table;
A sound pickup and reproduction method, comprising: transmitting an environmental sound signal from which an acoustic object has been removed to a reproducing device; and transmitting an acoustic tag associated with the removed acoustic object to the reproducing device. .

The playback device
having a second acoustic database that stores acoustic objects associated with acoustic tags;
a fourth step of mixing the acoustic object associated with the acoustic tag with the environmental sound signal using the second acoustic database;
13. The sound pickup and reproduction method according to claim 12 , further comprising the step of reproducing the environmental sound signal mixed with the sound object by a speaker.

In a sound pickup and reproduction method for a sound pickup device for transmitting an environmental sound signal picked up by a microphone from a speaker to a reproduction device for reproduction,
The sound collecting device is
a first acoustic database storing acoustic objects associated with acoustic tags;
a removal acoustic tag table for registering acoustic tags to be removed;
a second acoustic database storing different acoustic objects even if they are the same acoustic tags as the first acoustic database;
a first step of detecting one or more acoustic objects inherent in the ambient sound signal using a first acoustic database and identifying acoustic tags for the acoustic objects;
a second step of removing, from the environmental sound signal, the acoustic signal portion of the acoustic object associated with the acoustic tag, if each identified acoustic tag is registered in the removed acoustic tag table;
a third step of mixing the acoustic object associated with the acoustic tag with the environmental sound signal using the second acoustic database;
and a fourth step of transmitting an environmental sound signal mixed with an acoustic object to a reproducing device.