JP4010161B2

JP4010161B2 - Acoustic presentation system, acoustic reproduction apparatus and method, computer-readable recording medium, and acoustic presentation program.

Info

Publication number: JP4010161B2
Application number: JP2002062386A
Authority: JP
Inventors: 哲二郎近藤; 哲彦有光; 大介菊地
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-03-07
Filing date: 2002-03-07
Publication date: 2007-11-21
Anticipated expiration: 2022-03-07
Also published as: JP2003264900A

Description

【０００１】
【発明の属する技術分野】
この発明は、音響提示システムと音響再生装置及びその方法並びにコンピュータ読み取り可能な記録媒体と音響提示プログラムに関する。詳しくは、特定音源からの音を集音手段によって取得して音響信号を生成し、特定音源の方向に対する撮影を行い画像信号を生成し、画像信号の画像の動き検出を行い、画像信号の画像より広い画像表示領域に該検出した動きに応じて表示位置を移動させながら画像信号の画像の表示を行う際の該表示位置を示す表示位置信号を生成し、音響出力信号に基づいた音響出力を行う複数の音響出力手段に関する設置情報と表示位置信号に基づいて、複数の音響出力信号を生成して複数の音響出力手段に供給することにより、画像の表示位置の移動に合わせて音像の位置を移動させるものである。
【０００２】
【従来の技術】
従来の音響提示システムでは、例えばステレオマイクロフォンを用いて右側方向の音響と左側方向の音響を集音して、左右チャンネルの音響信号を生成すると共に、再生時には右側に配置されているスピーカを右チャネルの音響信号に基づいて駆動し、左側に配置されているスピーカを左チャネルの音響信号に基づいて駆動することで、モノラルマイクロフォンを用いる場合よりも臨場感を高める方法が行われている。
【０００３】
また、人間の頭蓋骨模型を作り、この模型の左耳と右耳の内部にマイクを設けて、人間の耳から入った音を擬似的に集音すると共に、再生時にはヘッドホンを用いる所謂バイノーラルサウンドも実用化されている。
【０００４】
また、近年では、臨場感をさらに向上させるため、前方のスピーカ以外に補助スピーカを設けて、この補助スピーカから反射音や後方からの音および低周波数領域の音等を再生可能とする方式、所謂５．１ｃｈや６．１ｃｈ方式のサラウンド方式等も実用化されている。
【０００５】
さらに、音響信号に対して信号処理を行うことで再生音場を再現や、３次元音場を再現することが行われている。例えば、無響室で集音した音やコンパクトディスク等に記録された音に、所望の空間でのインパルス応答（音響伝達関数）を畳み込み、所望の空間の特性を再生音場で再現させる。すなわち、所望の空間の音源位置で基準音を出力させると共に視聴位置で集音を行い、音源位置で出力された音と視聴位置で集音された音との関係から所望の空間のインパルス応答を求めることができる。このため、インパルス応答を実現するフィルタを登録しておき、音の再生時には、この登録されたフィルタを用いるものとすれば、所望の空間の特性を再生音場で再現できる。また、種々のフィルタを登録して、選択可能とすれば、種々の再生音場を再現できると共に、三次元音場を再現することもできる。
【０００６】
【発明が解決しようとする課題】
ところで、画像と音響を別々に記録した場合、ビデオカメラの向きに関係なく音響を記録すると、記録された音響信号に基づく再生音は方向性を持たなくなってしまい、臨場感に欠けたものとなってしまう。
【０００７】
また、このように記録された音響を、臨場感を高めて再生するためには、画像を見ながら音像の位置を画像に合わせる操作を人偽的に行わなければならず、簡単に臨場感の高い再生音を得ることができない。
【０００８】
また、ビデオカメラにマイクロフォンを設けて画像と音響を記録した場合、図１９Ａに示すように、所望の被写体ＯＢに合わせてビデオカメラ９０の撮像方向を変更すると、図１９Ｂに示すように、再生画像がフレーム「Ｆa」「Ｆb」「Ｆc」の順に変更されても所望の被写体ＯＢの位置を表示画像の中央に保つことができる。しかし、所望の被写体に対応する音像も位置Ｑmに固定されてしまう。このため、再生時には、表示画像の背景の動きによって被写体の動きを表現できるが、音像の位置は固定されてしまい、臨場感の高い再生音場を得ることができない。
【０００９】
そこで、この発明では、表示画像に対応させて容易に臨場感の高い再生音場を得ることができる音響提示システムと音響再生装置及びその方法並びにコンピュータ読み取り可能な記録媒体と音響提示プログラムを提供するものである。
【００１０】
【課題を解決するための手段】
この発明に係る音響提示システムは、特定音源からの音を取得して音響信号を生成する集音手段と、特定音源の方向に対する撮影を行い画像信号を生成する撮像手段と、集音手段で生成された音響信号に基づいて複数の音響出力信号を生成する音響出力信号生成手段と、音響出力信号に基づいた音響出力を行う複数の音響出力手段と、画像信号の画像の動き検出を行い、画像信号の画像より広い画像表示領域に該検出した動きに応じて表示位置を移動させながら画像信号の画像の表示を行う際の該表示位置を示す表示位置信号を生成する画像処理手段と、複数の音響出力手段に関する設置情報と表示位置信号に基づき、音響出力信号生成手段における複数の音響出力信号の生成動作を制御する提示制御手段とを有するものである。また、環境音を取得して環境音信号を生成する環境音集音手段と、設置情報と前記環境音信号に基づき、環境音重畳信号を生成する環境音処理手段と、音響出力手段毎に、対応する環境音信号を音響出力信号に加算する信号加算手段とを有するものである。
【００１２】
さらに音響再生装置は、特定音源からの音を取得して生成された音響信号に基づいて複数の音響出力信号を生成する音響出力信号生成手段と、特定音源の方向に対する撮影を行うことにより生成された画像信号の画像の動き検出を行い、画像信号の画像より広い画像表示領域に該検出した動きに応じて表示位置を移動させながら画像信号の画像の表示を行う際の該表示位置を示す表示位置信号を生成する画像処理手段と、音響出力信号に基づいて音響出力を行う複数の音響出力手段に関する設置情報と表示位置信号に基づき、音響出力信号生成手段における複数の音響出力信号の生成動作を制御する提示制御手段とを有するものである。また、環境音を取得して生成された環境音信号と設置情報に基づいて環境音重畳信号を生成する環境音処理手段と、音響出力手段毎に、対応する音響出力信号を環境音信号に加算する信号加算手段とを有するものである。
【００１３】
次に、この発明に係る音響提示方法は、特定音源からの音を集音手段によって取得して音響信号を生成し、特定音源の方向に対する撮影を行い画像信号を生成し、画像信号の画像の動き検出を行い、画像信号の画像より広い画像表示領域に該検出した動きに応じて表示位置を移動させながら画像信号の画像の表示を行う際の該表示位置を示す表示位置信号を生成し、音響出力信号に基づいた音響出力を行う複数の音響出力手段に関する設置情報と表示位置信号に基づいて、複数の音響出力信号を生成して複数の音響出力手段に供給するものである。
【００１５】
さらに音響再生方法は、特定音源の方向に対する撮影を行うことにより生成された画像信号の画像の動き検出を行い、画像信号の表示画像より広い画像表示領域に該検出した動きに応じて表示位置を移動させながら画像信号の画像の表示を行う際の該表示位置を示す表示位置信号を生成し、音響出力信号に基づいて音響出力を行う複数の音響出力手段に関する設置情報と、表示位置信号と、特定音源からの音を取得して生成した音響信号に基づいて、複数の音響出力信号を生成するものである。
【００１６】
次に、この発明に係るコンピュータ読み取り可能な記録媒体は、コンピュータに、特定音源の方向に対する撮影を行って生成された画像信号の画像の動き検出を行って、画像信号の画像より広い画像表示領域に該検出した動きに応じて表示位置を移動させながら画像信号の画像の表示を行う際の該表示位置を示す表示位置信号を生成する手順と、音響出力信号に基づいて音響出力を行う複数の音響出力手段に関する設置情報と、表示位置信号に基づいて、提示調整信号を生成する手順と、特定音源からの音を取得して生成した音響信号と提示調整信号に基づいて、複数の音響出力手段に供給する音響出力信号を生成する手順とを実行させる実行させるプログラムを記録したものである。
【００１７】
また、音響提示プログラムは、コンピュータに、特定音源の方向に対する撮影を行って生成された画像信号の画像の動き検出を行って、画像信号の画像より広い画像表示領域に該検出した動きに応じて表示位置を移動させながら画像信号の画像の表示を行う際の該表示位置を示す表示位置信号を生成する手順と、音響出力信号に基づいて音響出力を行う複数の音響出力手段に関する設置情報と、表示位置信号に基づいて提示調整信号を生成する手順と、特定音源からの音を取得して生成した音響信号と提示調整信号に基づいて、複数の音響出力手段に供給する音響出力信号を生成する手順とを実行させるものである。
【００１８】
この発明においては、撮像手段によって撮影画像の画像信号が生成されると共に、超指向性マイクロフォン等を用いた集音手段によって、特定音源例えば撮影方向の音を取得して音響信号が生成される。また、特定音源と集音手段の位置関係として撮影方向や被写体までの距離を示す位置情報信号や環境音信号が生成される。この音響信号と画像信号および位置検出信号や環境音信号に基づいて情報信号を生成することで音響取得が行われる。
【００１９】
また情報信号を用いて音響再生を行う場合には、情報信号から音響信号と画像信号および位置検出信号や環境音信号が分離される。この分離された画像信号を用いて動き検出が行われて、例えば所望の被写体の動きに応じて撮影画像の表示位置が移動させると共に、この撮影画像の表示位置を示す表示位置信号が生成される。また、表示位置信号と、音響出力信号に基づいて音響出力を行う複数の音響出力手段の数や設置位置等を示す設置情報等に基づき提示調整信号が生成される。この提示調整信号と音響信号に基づき、複数の音響出力信号が生成されると共に音響出力信号の信号レベルや位相が制御されて、複数の音響出力手段に供給される。
【００２０】
【発明の実施の形態】
以下、図を参照しながら、この発明の実施の一形態について説明する。図１は、この発明におけ音響提示システムの全体構成を示している。音響取得装置１０の撮像部１２では、撮影画像の画像信号Ｓvを生成して情報信号生成部１８に供給する。
【００２１】
図２は撮像部１２の構成を示している。撮像レンズ１２１を通して入射された光は、撮像素子部１２２に入射されて、撮像面上に撮影画像が結像される。撮像素子部１２２は固体撮像素子例えばＣＣＤを用いて構成されており、光電変換によって得られた撮影画像に基づく信号を後述する駆動部１３２からの駆動信号ＲＣに基づいて読み出し、撮影画像の三原色撮像信号Ｓvaを生成して前処理部１２３に供給する。
【００２２】
前処理部１２３では、撮像信号Ｓvaからノイズ成分を除去する処理、例えば相関二重サンプリング処理を行い、ノイズ除去された撮像信号ＳvaをＡ／Ｄ変換部１２４に供給する。Ａ／Ｄ変換部１２４では、撮像信号Ｓvaをディジタルの画像信号Ｓvbに変換してフィードバッククランプ部１２５に供給する。フィードバッククランプ部１２５では、ブランキング期間の黒レベル信号と基準信号との誤差を検出してＡ／Ｄ変換部１２４に供給することで、安定した黒レベルで所要の大きさの画像信号Ｓvbを得ることが出来るように、Ａ／Ｄ変換動作を制御する。補正処理部１２６では、画像信号Ｓvbに対してシェーディング補正や撮像素子の欠陥に対する補正処理等を行う。この補正処理部１２６で補正処理が行われた画像信号Ｓvbは、プロセス処理部１２７に供給される。
【００２３】
プロセス処理部１２７では、補正処理後の画像信号Ｓvbに対してγ処理、輪郭補償処理、ニー補正処理等を行う。この信号処理が行われた画像信号Ｓvbは、画像信号Ｓvとして情報信号生成部１８に供給される。また、画像信号Ｓvは、モニタ部１２８に供給されて、このモニタ部１２８に画像信号Ｓvに基づく画像が表示されて、撮影画像の確認が行われる。
【００２４】
撮像制御部１３０には、操作部１３１が接続されており、操作部１３１をユーザが操作すると、ユーザの操作に応じた操作信号ＰＳが操作部１３１から撮像制御部１３０に供給される。撮像制御部１３０では、この操作信号ＰＳに基づいて各種の制御信号ＣＴを生成して各部の動作を制御することにより、撮像部１２をユーザの操作に応じて動作させる。また、撮像素子部１２２での信号読み出しフレーム周期を設定する制御信号ＴＣを生成して駆動部１３２に供給する。この駆動部１３２では、制御信号ＴＣに基づき駆動信号ＲＣを生成して撮像素子部１２２に供給する。
【００２５】
図１の集音部１４は、マイクロフォンを用いて構成されており、撮像部１２の前面や上部等に固定して設けられている。この集音部１４は、特定音源からの音である撮像部１２の撮像方向からの音を集音して例えばディジタルの音響信号Ｓaを生成して情報信号生成部１８に供給する。この集音部１４で用いるマイクロフォンは、狙った音源の音を拾うことができるように超指向性（鋭指向性）マイクロフォンであるガンマイク等を用いる。このように超指向性マイクロフォンを用いることで、不必要な方向からの雑音や音響を簡単に取り除くことができる。また、相関利用や複数マイクを用いて各マイクの遅延量を補正する等の音源分離手法によって、狙った音源の音を拾うことができる。
【００２６】
位置検出部１６は、特定音源と集音部１４の位置関係、例えば所望の被写体を特定音源として、この所望の被写体を撮影したときの撮像設定情報、すなわち撮像部１２の撮像方向や所望の被写体までの距離を検出して、検出結果を示す位置情報信号Ｓpを生成したのち情報信号生成部１８に供給する。
【００２７】
図３は、撮像部１２の撮像方向と所望の被写体までの距離を検出する位置検出部１６の構成を示している。角度センサ１６１は、回転角を測定できるセンサやジャイロ等を用いて角度を測定するものであり、撮像部１２の撮像方向を検出して角度信号Ｓpaを極座標算出部１６３に供給する。例えば基準位置に対する水平方向の角度（以下「方位角」という）φや、基準位置に対する上下方向の角度（以下「ピッチ角」という）δを示す角度信号Ｓpaを生成して極座標算出部１６３に供給する。測距センサ１６２は、光や超音波等を用いてあるいは撮像部１２における焦点位置に基づいて距離を測定するものであり、所望の被写体までの距離ＬＯを検出して距離信号Ｓpbを極座標算出部１６３に供給する。極座標算出部１６３では、角度信号Ｓpaと距離信号Ｓpbから極座標を算出し、ディジタルの位置情報信号Ｓpとして情報信号生成部１８に供給する。
【００２８】
情報信号生成部１８では、供給された画像信号Ｓvや音響信号Ｓaおよび位置情報信号Ｓpに基づいて情報信号ＷＳを生成して信号記録再生装置２０に供給する。
【００２９】
信号記録再生装置２０は、光や磁気あるいは半導体素子等を利用する記録媒体を用いて構成されており、供給された情報信号ＷＳを記録媒体に記録する。また、記録媒体を再生して得られた情報信号ＲＳは、音響再生装置３０の情報信号分離部３２に供給される。
【００３０】
情報信号分離部３２は、情報信号ＲＳから画像信号Ｓvと音響信号Ｓaと位置情報信号Ｓpを分離する。さらに、分離された画像信号Ｓvを画像信号処理部４０に供給すると共に、音響信号Ｓaを音響出力信号生成部５２に供給する。さらに、位置情報信号Ｓpを提示制御部５０に供給する。
【００３１】
画像信号処理部４０は、画像信号Ｓvに基づく画像から動きを検出して、検出した動きに応じて画像の表示位置を移動させた画像出力信号ＳVoutと、表示位置を示す表示位置信号ＰＤを生成する。図４は、画像信号処理部４０の概略構成を示しており、画像信号Ｓvは、画像信号処理部４０のシーンチェンジ検出部４１と動き検出部４２と画像位置移動部４３に供給される。
【００３２】
シーンチェンジ検出部４１は、画像信号Ｓvに基づいてシーンチェンジ検出、すなわち連続シーンとこの連続シーンとは異なるシーンとの繋ぎ目部分である画像の不連続位置を検出する。図５は、シーンチェンジ検出部４１の概略構成を示しており、例えば２フレーム分の画像信号を用いて連続するシーンであるか否かを検出するものである。
【００３３】
シーンチェンジ検出部４１の遅延回路４１１は、画像信号Ｓvを１フレーム遅延させて遅延画像信号Ｓvjとして差分平均算出回路４１２に供給する。差分平均算出回路４１２は、画像信号Ｓvと遅延画像信号Ｓvjに基づき、２フレーム間の差分平均値Ｄavを算出して正規化回路４１４に供給する。この差分平均値Ｄavの算出は、各画素における２フレーム間の輝度レベルの差分値を算出して、得られた差分値の平均値を差分平均値Ｄavとして正規化回路４１４に供給する。なお、１フレームの画像の画素数が「Ｎ」で、画像信号Ｓvに基づく輝度レベルを「ＹＣ」、遅延画像信号Ｓvjに基づく輝度レベルを「ＹＰ」としたとき、差分平均値Ｄavは式（１）に基づいて算出できる。
【００３４】
【数１】

【００３５】
ここで、差分平均値Ｄavは、画像の輝度レベルによって大きく変化する。例えば画像が明るい場合、シーンの切り替えが行われなくとも画像の一部が暗い画像に変化するだけで差分平均値Ｄavが大きくなってしまう。一方、画像が暗い場合、シーンの切り替えが行われても輝度レベルの変化が小さいことから差分平均値Ｄavは大きくならない。このため、シーンチェンジ検出部４１に正規化回路４１４を設けるものとして、画像の明るさに応じた差分平均値Ｄavの正規化を行い、画像の明るさの影響を少なくして正しくシーンチェンジ検出を可能とする。
【００３６】
輝度平均算出回路４１３は、画像信号Ｓvに基づき、各画素の輝度レベルに基づき１フレームにおける輝度レベルの平均値を算出して輝度平均値Ｙavとして正規化回路４１４に供給する。なお、上述のように１フレームの画像の画素数が「Ｎ」で画像信号Ｓvに基づく画素の輝度レベルを「ＹＣ」としたとき、輝度平均値Ｙavは式（２）に基づいて算出できる。
【００３７】
【数２】

【００３８】
正規化回路４１４は、画像の明るさに応じた差分平均値Ｄavの正規化を行う。すなわち、式（３）に示すように、画像の明るさを示す輝度平均値Ｙavに応じて差分平均値Ｄavを補正して差分平均正規化値（以下単に「正規化値」という）Ｅを生成する。
Ｅ＝Ｄav／Ｙav ・・・（３）
この正規化回路４１４で生成された正規化値Ｅは、判定回路４１５に供給される。
【００３９】
判定回路４１５は、予め設定された閾値Ｒfを有しており、正規化値Ｅと閾値Ｒfを比較して、正規化値Ｅが閾値Ｒfよりも大きいときにはシーンチェンジと判定する。また、正規化値Ｅが閾値Ｒf以下であるときにはシーンチェンジでない連続シーンと判定する。さらに、判定回路４１５は、この判定結果を示すシーンチェンジ検出信号ＣＨを生成して図４の動き検出部４２と画像位置移動部４３に供給する。
【００４０】
このように、正規化回路４１４は画像の明るさに応じた差分平均値Ｄavの正規化を行い、判定回路４１５は正規化値Ｅを用いてシーンチェンジであるか連続シーンであるかの判別を行うので、画像の明るさの影響を少なくして正しくシーンチェンジを検出できる。
【００４１】
また、上述のシーンチェンジ検出部４１では、正規化値Ｅを用いてシーンチェンジ検出を行うものとしたが、２フレーム間の画像の相関係数ｒを求めて、この相関係数ｒと閾値を比較することで、精度良くシーンチェンジ検出を行うこともできる。
【００４２】
動き検出部４２は、シーンチェンジ検出部４１からのシーンチェンジ検出信号ＣＨによって連続シーンであることが示されたフレームに関して動きベクトルの検出を行い、表示面積の広い部分の動きベクトル例えば背景部分の動きベクトルを検出する。図６は、動き検出部４２の構成を示しており、例えばブロックマッチング方法を用いて動きベクトルの検出を行う場合である。
【００４３】
動き検出部４２の遅延回路４２１は、画像信号Ｓvを１フレーム遅延させて遅延画像信号Ｓvkとして画像位置切替回路４２２に供給する。画像位置切替回路４２２は、遅延画像信号Ｓvkに基づく画像の位置を、予め設定された動き探索範囲内で水平方向や垂直方向に順次変更して新たな画像信号Ｓvlを順次生成する。この生成された画像信号Ｓvlは、差分演算回路４２３に供給される。また、画像位置切替回路４２２は、画像の移動方向と移動量を示す動きベクトルＭＶを最小値判定回路４２４に供給する。
【００４４】
差分演算回路４２３は、画像信号Ｓvlと画像信号Ｓvとの差分値ＤＭを順次算出して、最小値判定回路４２４に供給する。
最小値判定回路４２４は、差分値ＤＭと、この差分値ＤＭの算出に用いた画像信号Ｓvlを生成する際の動きベクトルＭＶとを関係付けて保持する。また、画像位置切替回路４２２で動き探索範囲内での画像の移動を完了したとき、最小値判定回路４２４は、保持している差分値ＤＭから最小値を判別して、この最小値となる差分値ＤＭと関係付けて保持されている動きベクトルＭＶを、動き検出情報ＭＶＤとして図４の画像位置移動部４３に供給する。
【００４５】
画像位置移動部４３は、シーンチェンジ検出信号ＣＨと動き検出情報ＭＶＤに基づき表示位置を決定する。さらに、決定した表示位置に画像を表示する画像出力信号ＳVoutを生成する。
【００４６】
図７は、画像位置移動部４３の構成を示している。画像位置移動部４３の画像位置決定回路４３１は、シーンチェンジ検出信号ＣＨと動き検出情報ＭＶＤに基づき表示位置を決定して、この表示位置を示す表示位置信号ＰＤを書込読出制御回路４３７と後述する提示制御部５０に供給する。
【００４７】
図８は、画像位置決定回路４３１の構成を示している。動き累積回路４３２は、シーンチェンジ検出信号ＣＨに基づいて連続シーンの期間を判別すると共に、この連続シーンの期間中における動き検出情報ＭＶＤで示された動きベクトルを累積して、動きベクトルの時間推移情報である動き累積値ＭＶＴを生成して初期位置決定回路４３３と表示範囲判定回路４３４に供給する。
【００４８】
初期位置決定回路４３３は、動き累積値ＭＶＴの振れ幅ＥＷを求めることでシーン毎に表示位置の移動範囲を判別し、この移動範囲が画像表示領域内の移動可能範囲（画像表示領域の右側端部に画像を表示したときと、画像を水平移動させて左側端部に表示したときの表示画像の中心間の距離）の中央となるように、連続シーンの最初の表示画像に対する表示位置を決定して、初期位置ＰＰとして表示範囲判定回路４３４と表示位置決定回路４３６に供給する。
【００４９】
表示範囲判定回路４３４は、初期位置ＰＰに連続シーンの最初の表示画像を表示してから連続シーンの最後の画像を表示するまでの期間中、動き検出情報ＭＶＤに基づいて画像の表示位置を移動させたときに、表示画像が画像表示領域に入りきるか否かを、初期位置ＰＰと動き累積値ＭＶＴに基づいて判別する。この表示範囲判定回路４３４での判別結果を示す判別結果信号ＣＪaは動き補正回路４３５に供給される。
【００５０】
動き補正回路４３５は、判別結果信号ＣＪaで表示画像が画像表示領域に入りきらないことが示されたとき、表示画像が画像表示領域に入りきるように動き検出情報ＭＶＤを補正して、動き検出情報ＭＶＥとして表示位置決定回路４３６に供給する。また、判別結果信号ＣＪaで表示画像が画像表示領域に入りきることが示されたとき、動き検出情報ＭＶＤの補正を行うことなく表示位置決定回路４３６に供給する。
【００５１】
また、動き累積値ＭＶＴの振れ幅ＷＴを初期位置決定回路４３３から動き補正回路４３５に供給すると共に、画像表示領域内の移動可能範囲を動き補正回路４３５に予め記憶させておくものとし、振れ幅と移動可能範囲を用いて動き検出情報ＭＶＤを補正するための補正係数を設定することもできる。この場合には、表示位置を移動しても表示画像が画像表示領域内となるように動き量が補正されるので、表示位置が移動可能範囲を超えて制限されてしまうことを防止できる。
【００５２】
表示位置決定回路４３６は、初期位置決定回路４３３から供給された初期位置ＰＰを連続シーンの最初の表示位置とする。その後、動き検出情報ＭＶＥに基づき、動き検出情報ＭＶＥで示された動きベクトルの方向とは逆方向に画像を移動させた位置を表示位置として、この表示位置を示す表示位置信号ＰＤを順次出力して、図７の書込読出制御回路４３７に供給する。また、表示位置信号ＰＤを図１に示す提示制御部に供給する。
【００５３】
ここで、表示位置信号ＰＤに基づいて画像の表示位置を切り換える場合、図７に示すように、例えば画像表示領域に対応した記憶領域を有する画像メモリ４３８を設けるものとし、書込読出制御回路４３７は、画像メモリ４３８に画像信号Ｓvを書き込む際の書込位置を、表示位置信号ＰＤに基づいて表示位置に対応させる。このように画像信号Ｓvを書き込むものとすれば、表示領域に対応した画像メモリ４３８から画像信号を読み出すだけで表示位置を容易に移動できる。
【００５４】
書込読出制御回路４３７は、画像メモリ４３８に画像信号を書き込むための書込制御信号ＭＷＣと、画像メモリ４３８に書き込まれている画像信号を読み出すための読出制御信号ＭＲＣを生成して、画像メモリ４３８に供給する。ここで、書込読出制御回路４３７は、上述したように画像位置決定回路４３１で決定された表示位置と対応する画像メモリ４３８の記憶位置に画像信号を記憶させるため、表示位置信号ＰＤに基づいて書込制御信号ＭＷＣを生成する。
【００５５】
画像メモリ４３８は、書込制御信号ＭＷＣに基づき、画像位置決定回路４３１で決定された表示位置に対応する記憶領域に、この表示位置の決定が行われたフレームの画像信号Ｓvを記憶する。なお、画像信号Ｓvが記憶されていない領域には、例えば黒表示となる信号を記憶させる。また、画像メモリ４３８は、書込読出制御回路４３７からの読出制御信号ＭＲＣに基づき、記憶領域に記憶されている画像信号を読み出して画像出力信号ＳVoutとして画像出力部４５に供給する。画像出力部４５では、供給された画像出力信号ＳVoutに基づいて表示位置を撮影画像の動きに応じて移動させながら画像提示を行う。
【００５６】
図１の提示制御部５０は、設置情報供給部５０１と調整信号生成部５０２を有している。設置情報供給部５０１は、後述する複数の音響出力部６０-1〜６０-nがどのような位置に設けられており、どのようなスピーカが用いられているか等を示す設置情報ＣＳaを保持し、あるいはユーザの操作等によって設置情報ＣＳaを生成して、この設置情報ＣＳaを調整信号生成部５０２に供給する。
【００５７】
調整信号生成部５０２は、設置情報供給部５０１から供給された設置情報ＣＳaと画像信号処理部４０から供給された表示位置信号ＰＤに基づき、あるいは、さらに情報信号分離部３２から供給された位置情報信号Ｓpを用いるものとして、これらの信号に基づいて提示調整信号ＣＰを生成して、音響出力信号生成部５２に供給する。この提示調整信号ＣＰは、各音響出力部６０-1〜６０-nに供給する音響出力信号ＳAout-1〜ＳAout-nの信号レベルや出力タイミングを調整して音像の位置や臨場感を制御するものである。
【００５８】
音響出力信号生成部５２は、音響信号Ｓaに対して提示調整信号ＣＰに基づいた信号レベル調整や出力タイミング調整を行い、音響出力信号ＳAout-1を生成して音響出力部６０-1に供給する。同様に、提示調整信号ＣＰに基づいた信号レベル調整や位相調整を行い、音響出力信号ＳAout-2〜ＳAout-nを生成して音響出力部６０-2〜６０-nに供給する。
音響出力部６０-1〜６０-nは、スピーカを用いて構成されており、供給された音響出力信号ＳAout-1〜ＳAout-nに基づいて音響出力を行う。
【００５９】
次に動作について説明する。図９は表示位置信号を用いた場合の動作を説明するための図である。調整信号生成部５０２は、設置情報ＣＳaと表示位置信号ＰＤで示された表示位置に基づき、音響信号Ｓaに対して式（４），（５）に基づいた音圧比となるように信号レベルの調整を行うための提示調整信号ＣＰを生成して音響出力信号生成部５２に供給する。なお、式（４），（５）において、「Ｘ，Ｙ」は画像表示領域における画像の移動範囲を示す座標値であり「ｘ，ｙ」は表示位置信号ＰＤで示された表示位置ＪＤを示す座標値である。
【００６０】

例えば、図９Ａに示すように、設置情報ＣＳaによって音響出力部６０-L，６０-Rが、画像表示領域の中央で左右の端部側に設けられているときには、音響出力部６０-Lと音響出力部６０-Rとの音圧比が「（ｘ／Ｘ）：（Ｘ−ｘ）／Ｘ」となるように、音響出力部６０-L，６０-Rに供給する音声出力信号ＳAout-L，ＳAout-Rの信号レベルを調整するための提示調整信号ＣＰを生成する。
【００６１】
音響出力信号生成部５２は、音響信号Ｓaと提示調整信号ＣＰに基づき、音圧比が「（ｘ／Ｘ）：（Ｘ−ｘ）／Ｘ」となるように音響信号Ｓaの信号レベルの割合を調整して、音響出力信号ＳAout-Lと音響出力信号ＳAout-Rを生成すると共に、この音響出力信号ＳAout-Lを音響出力部６０-L、音響出力信号ＳAout-Rを音響出力部６０-Rにそれぞれ供給する。
【００６２】
また、図９Ｂに示すように、設置情報ＣＳaによって音響出力部６０-LU，６０-LD，６０-RU，６０-RDが、画像表示領域の４角に設けられていることが示された場合には、左右方向だけでなく上下方向の音響出力部の位置も考慮して、音響出力部６０-LU，６０-LD，６０-RU，６０-RDに供給する音声出力信号ＳAout-LU，ＳAout-LD，ＳAout-RU，ＳAout-RDの信号レベルを調整するための提示調整信号ＣＰを生成する。
【００６３】
このように、表示位置信号ＰＤに基づいて、左右方向に設置された音響出力部や上下方向に設置された音響出力部に供給する音響出力信号の信号レベルが調整されるので、集音部１４によって撮影方向の音響のみを集音しても、画像の表示位置に対応させて音像を定位させることができる。
【００６４】
また、調整信号生成部５０２は、位置情報信号Ｓpで示された角度情報を用いて提示調整信号ＣＰを生成すると共に、表示位置信号ＰＤに基づいて、提示調整信号ＣＰを補正するものとしても良い。
【００６５】
図１０は角度情報を用いた場合の動作を示している。図１０Ａは、聴取位置ＨＰにおいて、左側に位置する音響出力部６０-Lと右側に位置する音響出力部６０-Rが、センター位置ＭＰに対して角度αを有するように設置されていると設置情報ＣＳaによって示された場合である。ここで、位置情報信号Ｓpに基づく角度情報、すなわち角度信号Ｓpaによって示されたセンター位置ＭＰに対する右方向の方位角φを正の値、左方向の方位角φを負の値としたとき、音響信号Ｓaに対して式（６）に基づいた音圧比となるように信号レベルの調整を行うための提示調整信号ＣＰを生成して音響出力信号生成部５２に供給する。

【００６６】
ここで、表示画像が画像表示領域から外れて途切れてしまうことが無いよう画像の動き量より表示位置の移動が少なくされると、角度信号Ｓpaに基づいて信号レベルを調整したとき、被写体の位置よりも音像の位置が外側となってしまうおそれがある。このため、調整信号生成部５０２は、提示調整信号ＣＰを生成する際に表示位置信号ＰＤを用いて補正を行う。例えば表示位置が画像表示領域の左側端部に近づいたときには、左側の音圧比が高くなるように音圧比を補正して、音像の位置を画像表示領域の中央に近づける。また、表示位置が左側端部に達したときには、その後角度信号Ｓpaによって方位角φが左方向に増加しても、提示調整信号ＣＰを表示位置が左側端部に達した状態で保持させる。
【００６７】
音響出力信号生成部５２は、音響信号Ｓaと提示調整信号ＣＰに基づき、式（６）に応じた音圧比となるように音響信号Ｓaの信号レベルの割合を調整して、音響出力信号ＳAout-Lと音響出力信号ＳAout-Rを生成すると共に、この音響出力信号ＳAout-Lを音響出力部６０-L、音響出力信号ＳAout-Rを音響出力部６０-Rにそれぞれ供給する。
【００６８】
図１０Ｂは、上側に位置する音響出力部６０-Uと下側に位置する音響出力部６０-Dが、センター位置ＭＰに対して角度βを有するように設置されていると設置情報ＣＳaによって示された場合である。ここで、角度信号Ｓpaによって示されたセンター位置ＭＰに対する下方向のピッチ角δを正の値、上方向のピッチ角δを負の値としたときには、音響信号Ｓaに対して式（７）に基づいた音圧比となるように信号レベルの調整を行うための提示調整信号ＣＰを生成して音響出力信号生成部５２に供給する。

【００６９】
音響出力信号生成部５２は、音響信号Ｓaと提示調整信号ＣＰに基づき、式（７）に応じた音圧比となるように音響信号Ｓaの信号レベルの割合を調整して、音響出力信号ＳAout-Uと音響出力信号ＳAout-Dを生成すると共に、この音響出力信号ＳAout-Uを音響出力部６０-U、音響出力信号ＳAout-Dを音響出力部６０-Dにそれぞれ供給する。
【００７０】
このように、位置情報信号Ｓpで示された方位角φやピッチ角δに基づいて、左右方向に設置された音響出力部や上下方向に設置された音響出力部に供給する音響出力信号の信号レベルが調整されるので、集音部１４によって撮影方向の音響のみを集音しても、音像を正しく定位させることができる。すなわち、撮影方向の音のみを集音しても、音源の移動に合わせて音像を移動させることができる。また、表示画像が画像表示領域から外れてしまうことが無いように表示位置が制御されても、表示画像の位置に合わせて音像を定位させることができる。
【００７１】
次に、距離情報を用いた動作について説明する。図１１は、左側に位置する音響出力部６０-Lと右側に位置する音響出力部６０-Rがセンター位置ＭＰと角度αを有するように設置されており、聴取位置ＨＰと音響出力部６０-L，６０-Rとの間隔が距離ＫＳであることが設置情報ＣＳaによって示された場合である。位置情報信号Ｓpの距離情報すなわち距離信号Ｓpbによって所望の被写体ＯＢまでが距離ＬＯで、角度信号Ｓpaによって示された方位角や表示位置ＪＤに基づいて算出した方位角が方位角φであるとき、所望の被写体ＯＢから音響出力部６０-Lまでの距離ＬSLは式（８）で算出できる。また、所望の被写体ＯＢから音響出力部６０-Rまでの距離ＬSRは式（９）で算出できる。
【００７２】
ＬSL＝√(ＫＳ²＋ＬＯ²−２×ＫＳ×ＬＯ×ｃｏｓ(α＋φ)) ・・・（８）
ＬSR＝√(ＫＳ²＋ＬＯ²−２×ＫＳ×ＬＯ×ｃｏｓ(α−φ)) ・・・（９）
このため、被写体ＯＢから音響出力部６０-Lまでの距離と被写体ＯＢから音響出力部６０-Rまでの距離との距離差ＤLRは、「ＤLR＝ＬSL−ＬSR」となる。すなわち、距離差ＤLRだけ音響出力部６０-Lから出力される音は、音響出力部６０-Rから出力される音に比べて遅れたものとなる。
【００７３】
調整信号生成部５０２は、距離差ＤLRを算出すると共に、この距離差ＤLRを音速Ｖauで除算して遅延時間ＴLRを算出して、この遅延時間ＴLRだけ音響出力信号の出力が時間差を生じるように位相の制御すなわち出力タイミングを制御する提示調整信号ＣＰを生成して、音響出力信号生成部５２に供給する。なお、上述したように信号レベルを調整するための提示調整信号ＣＰを生成して、この提示調整信号ＣＰに対して、遅延時間ＴLRだけ音響出力信号の出力が時間差を生じるように補正を行い、この補正後の信号を提示調整信号ＣＰとして音響出力信号生成部５２に供給しても良い。
【００７４】
音響出力信号生成部５２は、提示調整信号ＣＰに基づいて音響出力信号ＳAout-L，ＳAout-Rの信号レベルを調整するだけでなく、音響出力信号ＳAout-Lを遅延時間ＴLRだけ音響出力信号ＳAout-Rよりも遅延させて音響出力部６０-Lに供給する。
【００７５】
このように、位置情報信号Ｓpで示された距離ＬＯに基づいて、左右方向に設置された音響出力部や上下方向に設置された音響出力部に供給する音響出力信号の位相を調整することで、集音部１４によって撮影方向の音響のみを集音しても、この集音した音に基づいて臨場感の高い音響提示を行うことが可能となり、聴取者は現実感の高い良好な再生音場を得ることができる。
【００７６】
例えば図１２に示すように、撮影画像を表示する際には表示画像Ｚaよりも広い画像表示領域Ｚbを設け、撮影画像の動き検出を行い被写体の動きに合わせて表示画像の表示位置を表示位置Ｐ1から表示位置Ｐ2に移動させる場合、所望の被写体ＯＢの移動に合わせて音像を位置Ｑ1から位置Ｑ2に移動させることができるので、臨場感の高いと共に移動感のある音響提示を行うことができる。
【００７７】
このように上述の実施の形態によれば、集音部１４によって撮影方向の音響のみを集音しても、音源の移動に合わせて音像を移動させることができると共に、表示画像が画像表示領域から外れてしまうことが無いように表示位置が制御されても、表示画像の位置に合わせて音像を定位させることができる。
【００７８】
さらに、図１３Ａに示す場合に比べて図１３Ｂや図１３Ｃに示すように多くの音響出力部６０を設けるものとして、設置情報ＣＳaと表示位置信号ＰＤに基づいて、表示位置に応じた位置の音響出力部６０を選択して、この選択した音響出力部６０に対して音響出力信号ＳAoutを供給するように提示調整信号ＣＰを生成してもよい。この場合には、表示画像に合わせて音像を容易に設定できる。
【００７９】
ところで、上述の実施の形態では、超指向性マイクロフォンを用いることで、不必要な方向からの雑音や音響を取り除いている。しかし、自然環境に近い再生音場を作るためには、超指向性マイクロフォンで集音した音だけでなく反射音等の間接音や雑音等も再生することが好ましい。
【００８０】
そこで、第２の実施の形態として、周囲の環境音も提示できる音響提示システムの構成を図１４に示す。なお、図１４において、図１と対応する部分については同一符号を付し、詳細な説明は省略する。環境音集音部１５は、図１５Ａに示す指向特性の無指向性マイクロフォンや図１５Ｂに示す指向特性の前方指向性マイクロフォン等を用いて構成されており、環境音を集音して環境音信号Ｓsaを生成したのち情報信号生成部１８に供給する。ここで、前方指向性マイクロフォンを用いる場合には、前方指向性マイクロフォンを撮影者の周囲に複数設けるものとすれば、環境音をもれなく集音できる。なお、図１５Ｃの指向特性は超指向性マイクロフォンを示している。
【００８１】
情報信号生成部１８は、供給された画像信号Ｓvや音響信号Ｓaと位置情報信号Ｓpおよび環境音信号Ｓsaに基づいて情報信号ＷＳを生成して信号記録再生装置２０に供給する。
【００８２】
情報信号分離部３２は、情報信号ＲＳから画像信号Ｓvと音響信号Ｓaと位置情報信号Ｓpおよび環境音信号Ｓsaを分離して、画像信号Ｓvを画像出力部４５、音響信号Ｓaを音響出力信号生成部５２、位置情報信号Ｓpを提示制御部５０、環境音信号Ｓsaを環境音処理部５３に供給する。
【００８３】
環境音処理部５３は、設置情報供給部５３１と環境音信号調整部５３２を有している。設置情報供給部５３１は、いずれの音響出力部６０-1〜６０-nから環境音を出力するか、また環境音を出力する音響出力部では、どのようなスピーカが用いられているか等を示す設置情報ＣＳbを保持し、あるいはユーザの操作等によって設置情報ＣＳbを生成して、設置情報ＣＳbを環境音信号調整部５３２に供給する。なお、音響出力部６０-1〜６０-kとは別個に、環境音出力のための音響出力部を有している場合には、この音響出力部に関する情報も設置情報ＣＳbに含ませる。
【００８４】
環境音信号調整部５３２は、設置情報供給部５３１から供給された設置情報ＣＳbに基づき、環境音信号Ｓsaの信号レベルを使用する音響出力部毎に調整して、各音響出力部から環境音が出力されたとき、環境音の音像が実際の音像位置とは異なった方向に定位されてしまうことを防止する。この環境音信号調整部５３２で生成された音響出力部毎の環境音重畳信号Ｓsbは、信号加算部５４に供給される。
【００８５】
信号加算部５４は、音響出力信号生成部５２から供給された音響出力信号ＳAoutと環境音信号調整部５３２から供給された環境音重畳信号Ｓsbを、対応する音響出力部６０毎に加算して、音響出力信号ＳBoutとして音響出力部６０に供給する。例えば、特定音源からの音と環境音を音響出力部６０-kから出力する場合、音響出力信号ＳAout-kと環境音重畳信号Ｓsb-kを加算して、音響出力信号ＳBout-kを音響出力部６０-kに供給する。なお、特定音源から出力された音を前方に位置する音響出力部から出力し、環境音を後方に位置する音響出力部から出力する場合には、音響出力信号ＳAoutを音響出力信号ＳBoutとして前方に位置する音響出力部に供給すると共に、環境音重畳信号Ｓsbを音響出力信号ＳBoutとして後方に位置する音響出力部に供給する。
【００８６】
このように、第２の実施の形態によれば、集音部１４によって特定音源からの音のみを集音して、音像を正しく定位させることができるだけでなく、周囲の環境音も正しく再生できるので、自然で臨場感の高い音響提示を行うことが可能となり、聴取者は自然でより現実感の高い再生音場を得ることができる。
【００８７】
なお、上述の実施の形態では、情報信号を生成して記録媒体に記録する構成を示したが、信号記録再生装置に変えて信号伝送装置を設けるものとし、この情報信号を伝送する構成としても良い。
【００８８】
さらに、上述の処理はハードウェアだけでなくソフトウェアで実現するものとしても良い。この場合の構成を図１６に示す。コンピュータは、図１６に示すようにＣＰＵ(Central Processing Unit)７０１を内蔵しており、このＣＰＵ７０１にはバス７２０を介してＲＯＭ７０２，ＲＡＭ７０３，ハード・ディスク・ドライブ７０４，入出力インタフェース７０５が接続されている。さらに、入出力インタフェース７０５には入力部７１１や記録媒体ドライブ７１２，通信部７１３，信号入力部７１４，信号出力部７１５が接続されている。
【００８９】
外部装置から命令が入力されたり、キーボードやマウス等の操作手段あるいはマイク等の音声入力手段等を用いて構成された入力部７１１から命令が入力されると、この命令が入出力インタフェース７０５を介してＣＰＵ７０１に供給される。
【００９０】
ＣＰＵ７０１は、ＲＯＭ７０２やＲＡＭ７０３あるいはハード・ディスク・ドライブ７０４に記憶されているプログラムを実行して、供給された命令に応じた処理を行う。さらに、ＲＯＭ７０２やＲＡＭ７０３あるいはハード・ディスク・ドライブ７０４には、上述の音響提示システムに於ける信号処理をコンピュータで実行させるための音響提示プログラムを予め記憶させて、信号入力部７１４に入力された信号に基づいて音響出力信号を生成して、信号出力部７１５から出力する。また、記録媒体に音響提示プログラムを記録しておくものとし、記録媒体ドライブ７１２によって、音響プログラムを記録媒体に記録しあるいは記録媒体に記録されている音響プログラムを読み出してコンピュータで実行するものとしても良い。さらに、通信部７１３によって、音響プログラムを有線あるいは無線の伝送路を介して送信あるいは受信するものとし、受信した音響プログラムをコンピュータで実行するものとしても良い。
【００９１】
図１７は、音響提示プログラムの全体構成を示すフローチャートである。
ステップＳＴ１では、音響取得を行うか否かを判別して、音響取得を行う場合にはステップＳＴ２に進み、音響取得を行わない場合にはステップＳＴ６に進む。
【００９２】
ステップＳＴ２では、情報信号の生成を行う。すなわち、信号入力部７１４に入力された角度信号Ｓpaや距離信号Ｓpbに基づいて位置情報信号Ｓpを生成する。また、この位置情報信号Ｓpと、信号入力部７１４に入力された画像信号Ｓvや音響信号Ｓaを例えば多重化して１つの情報信号ＷＳを生成する。
【００９３】
ステップＳＴ３では、生成した情報信号を記録媒体に記録するあるいは外部機器に伝送するように設定されているか否かを判別する。ここで、情報信号の記録や伝送を行うように設定されている場合にはステップＳＴ４に進み、情報信号の記録や伝送を行わないように設定されている場合にはステップＳＴ５に進む。
【００９４】
ステップＳＴ４では、情報信号の記録や伝送を行いステップＳＴ５に進む。ここで、情報信号を記録する場合には、情報信号を記録媒体ドライブ７１２に供給して、記録媒体ドライブ７１２に装着されている記録媒体に記録する。また、情報信号を伝送する場合には、通信部７１３を介して情報信号を出力する。
【００９５】
ステップＳＴ５では、音響取得を終了するか否かを判別する。ここで、入力部７１１を用いて終了操作が行われたときにはステップＳＴ６に進み、終了操作が行われていないときにはステップＳＴ２に戻り、情報信号の生成を継続する。
ステップＳＴ６では、音響再生を行うか否かを判別して、音響再生を行う場合にはステップＳＴ７に進み、音響再生を行わない場合にはステップＳＴ１に戻る。
【００９６】
ステップＳＴ７では、音響出力信号に基づいて音響出力を行う音響出力部の設置情報ＣＳaを設定する。例えば、入力部７１１を操作して音響出力部６０の設置位置やどのようなスピーカを用いているか等の設置情報ＣＳaを入力する。あるいはハード・ディスク・ドライブ７０４等に予め記憶されている設置情報ＣＳaを読み出す。
【００９７】
ステップＳＴ８では、情報信号の分離処理を行う。すなわち、記録媒体から読み出した情報信号や、通信部７１３で受信した情報信号あるいは音響取得処理で生成された情報信号から、画像信号Ｓvと音響信号Ｓaと位置情報信号Ｓpを分離してステップＳＴ９に進む。
【００９８】
ステップＳＴ９は、画像信号Ｓvを用いて動き検出を行う。この動き検出によって検出した動きに応じて画像信号Ｓvに基づく画像の表示位置を移動させて、新たな画像出力信号ＳVoutを生成すると共に画像の表示位置を示す表示位置信号ＰＤを生成してステップＳＴ１０に進む。
【００９９】
ステップＳＴ１０では、表示位置信号ＰＤと設置情報ＣＳaおよび位置情報信号Ｓpに基づき、提示調整信号ＣＰを生成する。
ステップＳＴ１１では、音響信号Ｓaと提示調整信号ＣＰに基づき複数の音響出力信号ＳAout-1〜ＳAout-ｎを生成して信号出力部７１５から出力する。
【０１００】
ステップＳＴ１２では、音響再生を終了するか否かを判別する。ここで、入力部７１１を用いて終了操作が行われたときにはステップＳＴ１に戻る。また、終了操作が行われていないときにはステップＳＴ９に戻り、画像の表示位置に応じた音響出力信号ＳAout-1〜ＳAout-ｎの生成および出力を継続する。このような処理を行って得られた音響出力信号ＳAoutを音響出力部６０に供給することで、ソフトウェアによっても臨場感の高い音響提示を行える。
【０１０１】
【発明の効果】
この発明によれば、集音手段を用いて特定音源からの音を取得して音響信号が生成されると共に、特定音源方向の撮影を行って生成した画像信号を用いて動き検出が行われて画像の表示位置の移動や表示位置を示す表示位置信号が生成される。この音響信号と表示位置信号、および音響出力信号に基づいて音響出力を行う複数の音響出力手段に関する設置情報とに基づいて、複数の音響出力信号が生成されて複数の音響出力手段に供給される。このため、画像の表示位置に対応させて音像を定位させることが簡単にできると共に、特定音源の位置が移動したときには音像の位置も移動されて移動感のある音響提示ができる。
【０１０２】
また、環境音を取得して環境音信号が生成されると共に、複数の音響出力手段の設置情報と環境音信号に基づき、環境音重畳信号が生成されて、この環境音重畳信号が、対応する音響出力信号に加算されるので、より自然で臨場感の高い音響提示を行うことができる。
【０１０３】
さらに、特定音源と集音手段の位置関係から位置情報信号が生成されて、この位置情報信号も用いて複数の音響出力信号が生成されるので、臨場感の高い音響提示を行うことができる。
【図面の簡単な説明】
【図１】音響提示システムの構成を示す図である。
【図２】撮像部の構成を示す図である。
【図３】位置検出部の構成を示す図である。
【図４】画像信号処理部の構成を示す図である。
【図５】シーンチェンジ検出部の構成を示す図である。
【図６】動き検出部の構成を示す図である。
【図７】画像位置移動部の構成を示す図である。
【図８】画像位置決定回路の構成を示す図である。
【図９】表示位置信号を用いた場合の動作を説明するための図である。
【図１０】角度情報を用いた場合の動作を説明するための図である。
【図１１】距離情報を用いた場合の動作を説明するための図である。
【図１２】音像の位置を示す図である。
【図１３】音響出力部の配置を示す図である。
【図１４】第２の実施の形態の構成を示す図である。
【図１５】マイクロフォンの指向特性を示す図である。
【図１６】コンピュータを用いた構成を示す図である。
【図１７】音響提示プログラムを示すフローチャートである。
【図１８】従来の動作を示す図である。
【符号の説明】
１０・・・音響取得装置、１２・・・撮像部、１４・・・集音部、１５・・・環境音集音部、１６・・・位置検出部、１８・・・情報信号生成部、２０・・・信号記録再生装置、３０・・・音響再生装置、３２・・・情報信号分離部、４０・・・画像信号処理部、４１・・・シーンチェンジ検出部、４２・・・動き検出部、４３・・・画像位置移動部、４５・・・画像出力部、５０・・・提示制御部、５２・・・音響出力信号生成部、５３・・・環境音処理部、５４・・・信号加算部、６０・・・音響出力部、９０・・・ビデオカメラ、１６１・・・角度センサ、１６２・・・測距センサ、１６３・・・極座標算出部、５０１・・・設置情報供給部、５０２・・・調整信号生成部、５３１・・・設置情報供給部、５３２・・・環境音信号調整部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an acoustic presentation system, an acoustic reproduction apparatus and method, a computer-readable recording medium, and an acoustic presentation program. Specifically, sound from a specific sound source is acquired by sound collecting means to generate an acoustic signal, shooting is performed in the direction of the specific sound source, and an image signal is generated. of Image motion detection of While moving the display position according to the detected movement to an image display area wider than the image Image signal A plurality of sound output signals are generated based on installation information and display position signals related to a plurality of sound output means that generate a display position signal indicating the display position when displaying an image and perform sound output based on the sound output signal. Is generated and supplied to a plurality of sound output means, so that the position of the sound image is moved in accordance with the movement of the display position of the image.
[0002]
[Prior art]
In a conventional sound presentation system, for example, a stereo microphone is used to collect right-side sound and left-side sound to generate left and right channel sound signals, and at the time of playback, the right speaker is connected to the right channel. In other words, there is a method in which a sense of presence is enhanced by using a left microphone and driving a speaker arranged on the left side based on an acoustic signal of the left channel.
[0003]
In addition, a human skull model is made, and microphones are provided inside the left and right ears of this model to collect sound that enters from the human ear, and so-called binaural sound that uses headphones during playback is also available. It has been put into practical use.
[0004]
Also, in recent years, in order to further improve the sense of reality, an auxiliary speaker is provided in addition to the front speaker, so that reflected sound, sound from the rear, sound in a low frequency region, and the like can be reproduced from this auxiliary speaker, so-called 5.1ch and 6.1ch surround systems have also been put into practical use.
[0005]
Furthermore, reproduction of a reproduced sound field or reproduction of a three-dimensional sound field is performed by performing signal processing on an acoustic signal. For example, an impulse response (acoustic transfer function) in a desired space is convoluted with sound collected in an anechoic room or sound recorded on a compact disc or the like, and the characteristics of the desired space are reproduced in the reproduction sound field. That is, the reference sound is output at the sound source position in the desired space and the sound is collected at the viewing position, and the impulse response of the desired space is obtained from the relationship between the sound output at the sound source position and the sound collected at the viewing position. Can be sought. For this reason, if a filter that realizes an impulse response is registered and the registered filter is used at the time of sound reproduction, the characteristics of a desired space can be reproduced in the reproduction sound field. If various filters are registered and can be selected, various reproduction sound fields can be reproduced, and a three-dimensional sound field can also be reproduced.
[0006]
[Problems to be solved by the invention]
By the way, if the image and sound are recorded separately, and the sound is recorded regardless of the orientation of the video camera, the reproduced sound based on the recorded sound signal has no directionality and lacks a sense of presence. End up.
[0007]
In addition, in order to reproduce the sound recorded in this way with an enhanced sense of presence, it is necessary to perform an operation to match the position of the sound image to the image while watching the image. High playback sound cannot be obtained.
[0008]
Further, when an image and sound are recorded by providing a microphone in the video camera, when the imaging direction of the video camera 90 is changed in accordance with a desired subject OB as shown in FIG. 19A, a reproduced image is obtained as shown in FIG. 19B. Can be maintained at the center of the display image even if the frames are changed in the order of the frames “Fa”, “Fb”, and “Fc”. However, the sound image corresponding to the desired subject is also fixed at the position Qm. For this reason, during reproduction, the movement of the subject can be expressed by the movement of the background of the display image, but the position of the sound image is fixed, and a reproduced sound field with a high sense of reality cannot be obtained.
[0009]
Therefore, in the present invention, an acoustic presentation system that can easily obtain a reproduced sound field with a high sense of realism corresponding to a display image. When A sound reproducing apparatus and method, a computer-readable recording medium, and a sound presentation program are provided.
[0010]
[Means for Solving the Problems]
The sound presentation system according to the present invention is generated by a sound collecting unit that acquires sound from a specific sound source and generates an acoustic signal, an imaging unit that captures an image in the direction of the specific sound source and generates an image signal, and a sound collecting unit Sound output signal generating means for generating a plurality of sound output signals based on the sound signals that have been output, a plurality of sound output means for performing sound outputs based on the sound output signals, and an image signal of Image motion detection of While moving the display position according to the detected movement to an image display area wider than the image Image signal Based on the image processing means for generating the display position signal indicating the display position when displaying the image, the installation information on the plurality of sound output means and the display position signal, the plurality of sound output signals in the sound output signal generation means Presentation control means for controlling the generation operation. Further, for each of the environmental sound collecting means for acquiring the environmental sound and generating the environmental sound signal, the environmental sound processing means for generating the environmental sound superimposed signal based on the installation information and the environmental sound signal, and the sound output means, Signal adding means for adding the corresponding environmental sound signal to the sound output signal.
[0012]
Furthermore, the sound reproduction device is generated by performing shooting with respect to the direction of the specific sound source, and sound output signal generating means for generating a plurality of sound output signals based on the sound signal generated by acquiring sound from the specific sound source. Image signal of Image motion detection of While moving the display position according to the detected movement to an image display area wider than the image Image signal Image processing means for generating a display position signal indicating the display position when displaying an image, and sound output based on installation information and display position signals regarding a plurality of sound output means for performing sound output based on the sound output signal Presentation control means for controlling the operation of generating a plurality of sound output signals in the signal generation means. Moreover, the environmental sound processing means for generating the environmental sound superimposed signal based on the environmental sound signal generated by acquiring the environmental sound and the installation information, and the corresponding sound output signal for each sound output means are added to the environmental sound signal. Signal adding means.
[0013]
Next, in the sound presentation method according to the present invention, the sound from the specific sound source is acquired by the sound collecting means to generate an acoustic signal, the image is generated in the direction of the specific sound source, the image signal is generated, of Image motion detection of While moving the display position according to the detected movement to an image display area wider than the image Image signal A plurality of sound output signals are generated based on installation information and display position signals related to a plurality of sound output means that generate a display position signal indicating the display position when displaying an image and perform sound output based on the sound output signal. Is generated and supplied to a plurality of sound output means.
[0015]
Furthermore, the sound reproduction method uses an image signal generated by shooting in the direction of a specific sound source. of Image motion detection of While moving the display position according to the detected movement to an image display area wider than the display image Image signal A display position signal indicating the display position at the time of displaying an image is generated, and installation information regarding a plurality of sound output means for performing sound output based on the sound output signal, display position signals, and sound from a specific sound source are obtained. A plurality of sound output signals are generated based on the acquired and generated sound signals.
[0016]
Next, a computer-readable recording medium according to the present invention is an image signal generated by photographing a specific sound source in a computer. of Image motion detection of While moving the display position according to the detected movement to an image display area wider than the image Image signal Procedure for generating a display position signal indicating the display position when displaying an image, installation information regarding a plurality of sound output means for performing sound output based on the sound output signal, and presentation adjustment based on the display position signal A program for executing a procedure for generating a signal and a procedure for generating a sound output signal to be supplied to a plurality of sound output means based on a sound signal generated by obtaining sound from a specific sound source and a presentation adjustment signal Is recorded.
[0017]
In addition, the sound presentation program is an image signal generated by photographing a specific sound source in a computer. of Image motion detection of While moving the display position according to the detected movement to an image display area wider than the image Image signal A procedure for generating a display position signal indicating the display position when displaying an image, installation information on a plurality of sound output means for performing sound output based on the sound output signal, and a presentation adjustment signal based on the display position signal And a procedure of generating sound output signals to be supplied to a plurality of sound output means based on the sound signal generated by acquiring the sound from the specific sound source and the presentation adjustment signal.
[0018]
In the present invention, an image signal of a photographed image is generated by the image pickup means, and a sound signal is generated by acquiring sound of a specific sound source, for example, a shooting direction, by sound collection means using a super-directional microphone or the like. Also, a positional information signal and an environmental sound signal indicating the shooting direction and the distance to the subject are generated as the positional relationship between the specific sound source and the sound collecting means. Sound acquisition is performed by generating an information signal based on the acoustic signal, the image signal, the position detection signal, and the environmental sound signal.
[0019]
When performing sound reproduction using an information signal, the sound signal, the image signal, the position detection signal, and the environmental sound signal are separated from the information signal. Motion detection is performed using the separated image signal, for example, the display position of the photographed image is moved according to the movement of the desired subject, and a display position signal indicating the display position of the photographed image is generated. . In addition, the presentation adjustment signal is generated based on the display position signal and the installation information indicating the number, the installation position, and the like of a plurality of sound output units that perform sound output based on the sound output signal. A plurality of sound output signals are generated based on the presentation adjustment signal and the sound signal, and the signal level and phase of the sound output signal are controlled and supplied to the plurality of sound output means.
[0020]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 shows the overall configuration of an audio presentation system according to the present invention. The imaging unit 12 of the sound acquisition device 10 generates an image signal Sv of the captured image and supplies it to the information signal generation unit 18.
[0021]
FIG. 2 shows the configuration of the imaging unit 12. The light incident through the imaging lens 121 enters the imaging element unit 122, and a captured image is formed on the imaging surface. The image sensor unit 122 is configured using a solid-state image sensor, for example, a CCD, and reads out a signal based on a captured image obtained by photoelectric conversion based on a drive signal RC from a drive unit 132 described later, and captures the three primary colors of the captured image. A signal Sva is generated and supplied to the preprocessing unit 123.
[0022]
The preprocessing unit 123 performs processing for removing a noise component from the imaging signal Sva, for example, correlated double sampling processing, and supplies the imaging signal Sva from which noise has been removed to the A / D conversion unit 124. The A / D conversion unit 124 converts the imaging signal Sva into a digital image signal Svb and supplies it to the feedback clamp unit 125. The feedback clamp unit 125 detects an error between the black level signal during the blanking period and the reference signal and supplies it to the A / D conversion unit 124 to obtain an image signal Svb of a required magnitude at a stable black level. The A / D conversion operation is controlled so that it can be performed. The correction processing unit 126 performs shading correction, correction processing for defects in the image sensor, and the like on the image signal Svb. The image signal Svb subjected to the correction processing by the correction processing unit 126 is supplied to the process processing unit 127.
[0023]
The process processing unit 127 performs γ processing, contour compensation processing, knee correction processing, and the like on the image signal Svb after the correction processing. The image signal Svb subjected to this signal processing is supplied to the information signal generator 18 as an image signal Sv. Further, the image signal Sv is supplied to the monitor unit 128, and an image based on the image signal Sv is displayed on the monitor unit 128, and the captured image is confirmed.
[0024]
An operation unit 131 is connected to the imaging control unit 130, and when the user operates the operation unit 131, an operation signal PS corresponding to the user's operation is supplied from the operation unit 131 to the imaging control unit 130. The imaging control unit 130 operates the imaging unit 12 according to a user operation by generating various control signals CT based on the operation signal PS and controlling the operation of each unit. Further, a control signal TC for setting a signal reading frame period in the image sensor unit 122 is generated and supplied to the driving unit 132. The drive unit 132 generates a drive signal RC based on the control signal TC and supplies it to the image sensor unit 122.
[0025]
The sound collection unit 14 in FIG. 1 is configured using a microphone, and is fixedly provided on the front surface or upper part of the imaging unit 12. The sound collecting unit 14 collects sound from the imaging direction of the imaging unit 12 that is sound from a specific sound source, generates a digital acoustic signal Sa, for example, and supplies the digital signal to the information signal generating unit 18. The microphone used in the sound collection unit 14 uses a gun microphone or the like which is a super-directivity (sharp directivity) microphone so that the sound of the targeted sound source can be picked up. By using a super-directional microphone in this way, noise and sound from unnecessary directions can be easily removed. Moreover, the sound of the target sound source can be picked up by a sound source separation method such as using correlation or correcting the delay amount of each microphone using a plurality of microphones.
[0026]
The position detection unit 16 has a positional relationship between the specific sound source and the sound collection unit 14, for example, imaging setting information when the desired subject is captured using the desired subject as a specific sound source, that is, the imaging direction of the imaging unit 12 and the desired subject. The position information signal Sp indicating the detection result is generated and then supplied to the information signal generation unit 18.
[0027]
FIG. 3 shows the configuration of the position detection unit 16 that detects the imaging direction of the imaging unit 12 and the distance to the desired subject. The angle sensor 161 measures an angle using a sensor or a gyro that can measure a rotation angle, detects the imaging direction of the imaging unit 12, and supplies an angle signal Spa to the polar coordinate calculation unit 163. For example, an angle signal Spa indicating the horizontal angle (hereinafter referred to as “azimuth angle”) φ with respect to the reference position and the vertical angle (hereinafter referred to as “pitch angle”) δ with respect to the reference position is generated and supplied to the polar coordinate calculation unit 163. To do. The distance measuring sensor 162 measures the distance using light, ultrasonic waves, or the like or based on the focal position in the imaging unit 12, detects the distance LO to a desired subject, and generates the distance signal Spb as a polar coordinate calculation unit. 163. The polar coordinate calculation unit 163 calculates polar coordinates from the angle signal Spa and the distance signal Spb, and supplies the polar coordinate to the information signal generation unit 18 as a digital position information signal Sp.
[0028]
The information signal generator 18 generates an information signal WS based on the supplied image signal Sv, acoustic signal Sa, and position information signal Sp and supplies the information signal WS to the signal recording / reproducing apparatus 20.
[0029]
The signal recording / reproducing apparatus 20 is configured using a recording medium that utilizes light, magnetism, or a semiconductor element, and records the supplied information signal WS on the recording medium. Further, the information signal RS obtained by reproducing the recording medium is supplied to the information signal separating unit 32 of the sound reproducing device 30.
[0030]
The information signal separation unit 32 separates the image signal Sv, the acoustic signal Sa, and the position information signal Sp from the information signal RS. Further, the separated image signal Sv is supplied to the image signal processing unit 40, and the acoustic signal Sa is supplied to the acoustic output signal generation unit 52. Further, the position information signal Sp is supplied to the presentation control unit 50.
[0031]
The image signal processing unit 40 detects a motion from the image based on the image signal Sv, and generates an image output signal SVout in which the display position of the image is moved according to the detected motion, and a display position signal PD indicating the display position. To do. FIG. 4 shows a schematic configuration of the image signal processing unit 40, and the image signal Sv is supplied to the scene change detection unit 41, the motion detection unit 42, and the image position movement unit 43 of the image signal processing unit 40.
[0032]
The scene change detection unit 41 detects a scene change based on the image signal Sv, that is, detects a discontinuous position of an image that is a joint portion between a continuous scene and a scene different from the continuous scene. FIG. 5 shows a schematic configuration of the scene change detection unit 41, which detects, for example, whether or not the scenes are continuous using image signals for two frames.
[0033]
The delay circuit 411 of the scene change detection unit 41 delays the image signal Sv by one frame and supplies it to the difference average calculation circuit 412 as a delayed image signal Svj. The difference average calculation circuit 412 calculates a difference average value Dav between two frames based on the image signal Sv and the delayed image signal Svj, and supplies it to the normalization circuit 414. For the calculation of the difference average value Dav, the difference value of the luminance level between two frames in each pixel is calculated, and the obtained average value of the difference values is supplied to the normalization circuit 414 as the difference average value Dav. When the number of pixels of an image in one frame is “N”, the luminance level based on the image signal Sv is “YC”, and the luminance level based on the delayed image signal Svj is “YP”, the difference average value Dav is an expression ( It can be calculated based on 1).
[0034]
[Expression 1]

[0035]
Here, the difference average value Dav varies greatly depending on the luminance level of the image. For example, when the image is bright, even if the scene is not switched, the difference average value Dav increases only by changing a part of the image to a dark image. On the other hand, when the image is dark, the difference average value Dav does not increase because the change in the luminance level is small even when the scene is switched. For this reason, assuming that the normalization circuit 414 is provided in the scene change detection unit 41, normalization of the difference average value Dav according to the brightness of the image is performed, and the effect of the brightness of the image is reduced to correctly detect the scene change. Make it possible.
[0036]
The luminance average calculation circuit 413 calculates an average value of luminance levels in one frame based on the luminance level of each pixel based on the image signal Sv, and supplies the average value to the normalization circuit 414 as the luminance average value Yav. As described above, when the number of pixels of one frame image is “N” and the luminance level of the pixel based on the image signal Sv is “YC”, the average luminance value Yav can be calculated based on the equation (2).
[0037]
[Expression 2]

[0038]
The normalization circuit 414 normalizes the difference average value Dav according to the brightness of the image. That is, as shown in Equation (3), the difference average value Dav is corrected according to the luminance average value Yav indicating the brightness of the image to generate a difference average normalized value (hereinafter simply referred to as “normalized value”) E. To do.
E = Dav / Yav (3)
The normalized value E generated by the normalization circuit 414 is supplied to the determination circuit 415.
[0039]
The determination circuit 415 has a preset threshold value Rf, compares the normalized value E with the threshold value Rf, and determines that a scene change occurs when the normalized value E is greater than the threshold value Rf. When the normalized value E is equal to or less than the threshold value Rf, it is determined that the scene is not a continuous scene. Further, the determination circuit 415 generates a scene change detection signal CH indicating the determination result and supplies the scene change detection signal CH to the motion detection unit 42 and the image position movement unit 43 in FIG.
[0040]
In this manner, the normalization circuit 414 normalizes the difference average value Dav according to the brightness of the image, and the determination circuit 415 determines whether the scene change or the continuous scene using the normalized value E. Therefore, the scene change can be detected correctly with less influence of the brightness of the image.
[0041]
In the above scene change detection unit 41, the scene change detection is performed using the normalized value E. However, the correlation coefficient r of the image between two frames is obtained, and the correlation coefficient r and the threshold value are set. By comparing, scene change detection can be performed with high accuracy.
[0042]
The motion detection unit 42 detects a motion vector for a frame indicated to be a continuous scene by the scene change detection signal CH from the scene change detection unit 41, and performs a motion vector having a large display area, for example, a motion of a background portion. Detect vectors. FIG. 6 shows a configuration of the motion detection unit 42, which is a case where a motion vector is detected using, for example, a block matching method.
[0043]
The delay circuit 421 of the motion detector 42 delays the image signal Sv by one frame and supplies the delayed image signal Svk to the image position switching circuit 422. The image position switching circuit 422 sequentially changes the position of the image based on the delayed image signal Svk in the horizontal direction and the vertical direction within a preset motion search range, and sequentially generates new image signals Svl. The generated image signal Svl is supplied to the difference calculation circuit 423. Further, the image position switching circuit 422 supplies a motion vector MV indicating the moving direction and moving amount of the image to the minimum value determining circuit 424.
[0044]
The difference calculation circuit 423 sequentially calculates a difference value DM between the image signal Svl and the image signal Sv and supplies the difference value DM to the minimum value determination circuit 424.
The minimum value determination circuit 424 associates and holds the difference value DM and the motion vector MV when generating the image signal Svl used for calculation of the difference value DM. When the image position switching circuit 422 completes the movement of the image within the motion search range, the minimum value determination circuit 424 determines the minimum value from the held difference value DM, and the difference that is the minimum value The motion vector MV held in association with the value DM is supplied to the image position moving unit 43 in FIG. 4 as motion detection information MVD.
[0045]
The image position moving unit 43 determines the display position based on the scene change detection signal CH and the motion detection information MVD. Further, an image output signal SVout for displaying an image at the determined display position is generated.
[0046]
FIG. 7 shows the configuration of the image position moving unit 43. The image position determination circuit 431 of the image position moving unit 43 determines a display position based on the scene change detection signal CH and the motion detection information MVD, and writes a display position signal PD indicating the display position to the writing / reading control circuit 437 and the later-described. To the presentation control unit 50.
[0047]
FIG. 8 shows the configuration of the image position determination circuit 431. The motion accumulation circuit 432 discriminates the period of the continuous scene based on the scene change detection signal CH, and accumulates the motion vector indicated by the motion detection information MVD during the period of the continuous scene, thereby moving the motion vector over time. The accumulated motion value MVT as information is generated and supplied to the initial position determination circuit 433 and the display range determination circuit 434.
[0048]
The initial position determination circuit 433 determines the moving range of the display position for each scene by obtaining the fluctuation width EW of the motion accumulated value MVT, and this moving range is the movable range in the image display area (the right end of the image display area). Determine the display position for the first display image of the continuous scene so that it is the center of the distance between the center of the display image when the image is displayed on the screen and when the image is horizontally moved and displayed on the left edge. Then, the initial position PP is supplied to the display range determination circuit 434 and the display position determination circuit 436.
[0049]
The display range determination circuit 434 moves the display position of the image based on the motion detection information MVD during a period from when the first display image of the continuous scene is displayed to the initial position PP until the last image of the continuous scene is displayed. It is determined based on the initial position PP and the accumulated motion value MVT whether or not the display image can completely enter the image display area. A determination result signal CJa indicating the determination result in the display range determination circuit 434 is supplied to the motion correction circuit 435.
[0050]
The motion correction circuit 435 corrects the motion detection information MVD so that the display image can fit into the image display area when the discrimination result signal CJa indicates that the display image does not fit into the image display area, and motion detection is performed. The information MVE is supplied to the display position determination circuit 436. In addition, when the determination result signal CJa indicates that the display image has completely entered the image display area, the motion detection information MVD is supplied to the display position determination circuit 436 without correction.
[0051]
Further, the shake width WT of the motion accumulated value MVT is supplied from the initial position determination circuit 433 to the motion correction circuit 435, and the movable range in the image display area is stored in the motion correction circuit 435 in advance. A correction coefficient for correcting the motion detection information MVD can be set using the movable range. In this case, since the amount of movement is corrected so that the display image is within the image display area even if the display position is moved, it is possible to prevent the display position from being restricted beyond the movable range.
[0052]
The display position determination circuit 436 sets the initial position PP supplied from the initial position determination circuit 433 as the first display position of the continuous scene. Thereafter, based on the motion detection information MVE, the display position signal PD indicating the display position is sequentially output with the position where the image is moved in the direction opposite to the direction of the motion vector indicated by the motion detection information MVE as a display position. And supplied to the write / read control circuit 437 of FIG. Further, the display position signal PD is supplied to the presentation control unit shown in FIG.
[0053]
Here, when the image display position is switched based on the display position signal PD, as shown in FIG. 7, for example, an image memory 438 having a storage area corresponding to the image display area is provided, and the writing / reading control circuit 437 is provided. Causes the writing position when writing the image signal Sv to the image memory 438 to correspond to the display position based on the display position signal PD. If the image signal Sv is written in this way, the display position can be easily moved by simply reading the image signal from the image memory 438 corresponding to the display area.
[0054]
The write / read control circuit 437 generates a write control signal MWC for writing an image signal in the image memory 438 and a read control signal MRC for reading the image signal written in the image memory 438, and the image memory 438. Here, the write / read control circuit 437 stores the image signal in the storage position of the image memory 438 corresponding to the display position determined by the image position determination circuit 431 as described above, and therefore, based on the display position signal PD. Write control signal MWC is generated.
[0055]
Based on the write control signal MWC, the image memory 438 stores the image signal Sv of the frame in which the display position is determined in a storage area corresponding to the display position determined by the image position determination circuit 431. Note that, for example, a signal for displaying black is stored in an area where the image signal Sv is not stored. Further, the image memory 438 reads the image signal stored in the storage area based on the read control signal MRC from the write / read control circuit 437 and supplies it to the image output unit 45 as the image output signal SVout. The image output unit 45 presents an image while moving the display position according to the movement of the captured image based on the supplied image output signal SVout.
[0056]
The presentation control unit 50 in FIG. 1 includes an installation information supply unit 501 and an adjustment signal generation unit 502. The installation information supply unit 501 holds installation information CSa that indicates in what position a plurality of later-described sound output units 60-1 to 60-n are provided and what kind of speaker is used. Alternatively, the installation information CSa is generated by a user operation or the like, and the installation information CSa is supplied to the adjustment signal generation unit 502.
[0057]
The adjustment signal generation unit 502 is based on the installation information CSa supplied from the installation information supply unit 501 and the display position signal PD supplied from the image signal processing unit 40, or further, the position information supplied from the information signal separation unit 32. Based on these signals, the presentation adjustment signal CP is generated based on these signals Sp and supplied to the sound output signal generation unit 52. The presentation adjustment signal CP adjusts the signal level and output timing of the sound output signals SAout-1 to SAout-n supplied to the sound output units 60-1 to 60-n to control the position and presence of the sound image. Is.
[0058]
The sound output signal generation unit 52 performs signal level adjustment and output timing adjustment based on the presentation adjustment signal CP with respect to the sound signal Sa, generates a sound output signal SAout-1, and supplies the sound output signal SAout-1 to the sound output unit 60-1. . Similarly, signal level adjustment and phase adjustment based on the presentation adjustment signal CP are performed, and acoustic output signals SAout-2 to SAout-n are generated and supplied to the acoustic output units 60-2 to 60-n.
The sound output units 60-1 to 60-n are configured using speakers, and perform sound output based on the supplied sound output signals SAout-1 to SAout-n.
[0059]
Next, the operation will be described. FIG. 9 is a diagram for explaining the operation when the display position signal is used. Based on the display position indicated by the installation information CSa and the display position signal PD, the adjustment signal generation unit 502 adjusts the signal level so that the sound pressure ratio based on the equations (4) and (5) is obtained for the acoustic signal Sa. A presentation adjustment signal CP for adjustment is generated and supplied to the sound output signal generation unit 52. In equations (4) and (5), “X, Y” is a coordinate value indicating the moving range of the image in the image display area, and “x, y” is the display position JD indicated by the display position signal PD. The coordinate value shown.
[0060]

For example, as shown in FIG. 9A, when the sound output units 60-L and 60-R are provided on the left and right end portions in the center of the image display area according to the installation information CSa, the sound output unit 60-L The sound output signal SAout-L supplied to the sound output units 60-L and 60-R so that the sound pressure ratio with the sound output unit 60-R is “(x / X) :( X−x) / X”. , The presentation adjustment signal CP for adjusting the signal level of SAout-R is generated.
[0061]
Based on the acoustic signal Sa and the presentation adjustment signal CP, the acoustic output signal generation unit 52 sets the ratio of the signal level of the acoustic signal Sa so that the sound pressure ratio becomes “(x / X) :( X−x) / X”. The acoustic output signal SAout-L and the acoustic output signal SAout-R are generated by adjusting the acoustic output signal SAout-L, the acoustic output signal SAout-L as the acoustic output unit 60-L, and the acoustic output signal SAout-R as the acoustic output unit 60-R. To supply each.
[0062]
9B, when the installation information CSa indicates that the sound output units 60-LU, 60-LD, 60-RU, and 60-RD are provided at the four corners of the image display area. Includes the audio output signals SAout-LU, SAout supplied to the sound output units 60-LU, 60-LD, 60-RU, 60-RD in consideration of the position of the sound output unit in the vertical direction as well as the left-right direction. The presentation adjustment signal CP for adjusting the signal level of -LD, SAout-RU, and SAout-RD is generated.
[0063]
Thus, since the signal level of the sound output signal supplied to the sound output unit installed in the left-right direction and the sound output unit installed in the up-down direction is adjusted based on the display position signal PD, the sound collecting unit 14 Thus, even if only sound in the shooting direction is collected, the sound image can be localized corresponding to the display position of the image.
[0064]
The adjustment signal generation unit 502 may generate the presentation adjustment signal CP using the angle information indicated by the position information signal Sp and correct the presentation adjustment signal CP based on the display position signal PD. .
[0065]
FIG. 10 shows an operation when angle information is used. FIG. 10A shows that the acoustic output unit 60-L located on the left side and the acoustic output unit 60-R located on the right side are installed at an angle α with respect to the center position MP at the listening position HP. This is the case indicated by the information CSa. Here, the angle information based on the position information signal Sp, that is, the right azimuth angle φ with respect to the center position MP indicated by the angle signal Spa is a positive value, and the left azimuth angle φ is a negative value. A presentation adjustment signal CP for adjusting the signal level so as to obtain a sound pressure ratio based on Expression (6) with respect to the signal Sa is generated and supplied to the sound output signal generation unit 52.

[0066]
Here, when the movement of the display position is less than the amount of movement of the image so that the display image is not disconnected from the image display area, the position of the subject is obtained when the signal level is adjusted based on the angle signal Spa. The position of the sound image may be outside. For this reason, the adjustment signal generation unit 502 performs correction using the display position signal PD when generating the presentation adjustment signal CP. For example, when the display position approaches the left end of the image display area, the sound pressure ratio is corrected so as to increase the left sound pressure ratio, and the position of the sound image is brought closer to the center of the image display area. When the display position reaches the left end, the presentation adjustment signal CP is held in a state where the display position reaches the left end even if the azimuth angle φ is increased to the left by the angle signal Spa thereafter.
[0067]
Based on the acoustic signal Sa and the presentation adjustment signal CP, the acoustic output signal generation unit 52 adjusts the ratio of the signal level of the acoustic signal Sa so that the sound pressure ratio according to the equation (6) is obtained, and the acoustic output signal SAout− L and the sound output signal SAout-R are generated, and the sound output signal SAout-L is supplied to the sound output unit 60-L, and the sound output signal SAout-R is supplied to the sound output unit 60-R.
[0068]
FIG. 10B shows that the acoustic output unit 60-U located on the upper side and the acoustic output unit 60-D located on the lower side are installed so as to have an angle β with respect to the center position MP by the installation information CSa. This is the case. Here, when the downward pitch angle δ with respect to the center position MP indicated by the angle signal Spa is a positive value and the upward pitch angle δ is a negative value, the expression (7) is obtained for the acoustic signal Sa. A presentation adjustment signal CP for adjusting the signal level so as to obtain a sound pressure ratio based on the generated sound pressure ratio is generated and supplied to the sound output signal generation unit 52.

[0069]
Based on the acoustic signal Sa and the presentation adjustment signal CP, the acoustic output signal generation unit 52 adjusts the ratio of the signal level of the acoustic signal Sa so as to obtain a sound pressure ratio according to the equation (7), and the acoustic output signal SAout− U and the sound output signal SAout-D are generated, and the sound output signal SAout-U is supplied to the sound output unit 60-U, and the sound output signal SAout-D is supplied to the sound output unit 60-D.
[0070]
Thus, based on the azimuth angle φ and pitch angle δ indicated by the position information signal Sp, the signal of the acoustic output signal supplied to the acoustic output unit installed in the left-right direction and the acoustic output unit installed in the vertical direction Since the level is adjusted, the sound image can be correctly localized even if the sound collecting unit 14 collects only the sound in the shooting direction. That is, even if only the sound in the shooting direction is collected, the sound image can be moved in accordance with the movement of the sound source. Even if the display position is controlled so that the display image does not deviate from the image display area, the sound image can be localized according to the position of the display image.
[0071]
Next, an operation using distance information will be described. In FIG. 11, the sound output unit 60-L located on the left side and the sound output unit 60-R located on the right side are installed so as to have an angle α with the center position MP, and the listening position HP and the sound output unit 60- This is a case where the installation information CSa indicates that the distance between L and 60-R is the distance KS. When the distance information of the position information signal Sp, that is, the distance signal Spb, is the distance LO to the desired subject OB, and the azimuth angle calculated based on the azimuth angle indicated by the angle signal Spa and the display position JD is the azimuth angle φ. The distance LSL from the desired subject OB to the sound output unit 60-L can be calculated by Expression (8). Further, the distance LSR from the desired subject OB to the sound output unit 60-R can be calculated by Expression (9).
[0072]
LSL = √ (KS ² + LO ² -2 × KS × LO × cos (α + φ)) (8)
LSR = √ (KS ² + LO ² -2 × KS × LO × cos (α−φ)) (9)
Therefore, the distance difference DLR between the distance from the subject OB to the sound output unit 60-L and the distance from the subject OB to the sound output unit 60-R is “DLR = LSL−LSR”. That is, the sound output from the sound output unit 60-L by the distance difference DLR is delayed from the sound output from the sound output unit 60-R.
[0073]
The adjustment signal generation unit 502 calculates the distance difference DLR and divides the distance difference DLR by the sound velocity Vau to calculate the delay time TLR so that the output of the sound output signal causes a time difference by the delay time TLR. A presentation adjustment signal CP for controlling the phase, that is, the output timing, is generated and supplied to the sound output signal generation unit 52. As described above, the presentation adjustment signal CP for adjusting the signal level is generated, and the presentation adjustment signal CP is corrected so that the output of the sound output signal has a time difference by the delay time TLR. The corrected signal may be supplied to the sound output signal generation unit 52 as the presentation adjustment signal CP.
[0074]
The acoustic output signal generation unit 52 not only adjusts the signal level of the acoustic output signals SAout-L and SAout-R based on the presentation adjustment signal CP, but also adjusts the acoustic output signal SAout-L by the delay time TLR. It is delayed from -R and supplied to the sound output unit 60-L.
[0075]
In this way, by adjusting the phase of the sound output signal supplied to the sound output unit installed in the left-right direction and the sound output unit installed in the up-down direction based on the distance LO indicated by the position information signal Sp. Even if only the sound in the shooting direction is collected by the sound collecting unit 14, it is possible to present a highly realistic sound based on the collected sound, and the listener can obtain a good reproduction sound with a high sense of reality. You can get a place.
[0076]
For example, as shown in FIG. 12, when a captured image is displayed, an image display area Zb wider than the display image Za is provided, the motion of the captured image is detected, and the display position of the display image is set in accordance with the movement of the subject. When moving from P1 to the display position P2, the sound image can be moved from the position Q1 to the position Q2 in accordance with the desired movement of the subject OB, so that sound presentation with a high sense of presence and a sense of movement can be performed. .
[0077]
As described above, according to the above-described embodiment, even if only the sound in the shooting direction is collected by the sound collecting unit 14, the sound image can be moved in accordance with the movement of the sound source, and the display image is displayed in the image display area. Even if the display position is controlled so as not to deviate from the range, the sound image can be localized according to the position of the display image.
[0078]
Furthermore, as compared with the case shown in FIG. 13A, as shown in FIG. 13B and FIG. 13C, more sound output units 60 are provided. Based on the installation information CSa and the display position signal PD, the sound at the position corresponding to the display position is displayed. The presentation adjustment signal CP may be generated by selecting the output unit 60 and supplying the acoustic output signal SAout to the selected acoustic output unit 60. In this case, a sound image can be easily set according to the display image.
[0079]
By the way, in the above-mentioned embodiment, the noise and the sound from an unnecessary direction are removed by using a super-directional microphone. However, in order to create a reproduction sound field that is close to the natural environment, it is preferable to reproduce not only the sound collected by the super-directional microphone but also indirect sound such as reflected sound, noise, and the like.
[0080]
Therefore, as a second embodiment, FIG. 14 shows a configuration of an acoustic presentation system that can also present ambient environmental sounds. 14, parts corresponding to those in FIG. 1 are denoted by the same reference numerals, and detailed description thereof is omitted. The environmental sound collecting unit 15 is configured using a directional microphone with no directivity shown in FIG. 15A, a forward directivity microphone with a directivity shown in FIG. 15B, and the like. After generating Ssa, it is supplied to the information signal generator 18. Here, when a front directional microphone is used, if a plurality of front directional microphones are provided around the photographer, environmental sounds can be collected without omission. Note that the directivity of FIG. 15C indicates a super-directional microphone.
[0081]
The information signal generator 18 generates an information signal WS based on the supplied image signal Sv, acoustic signal Sa, position information signal Sp, and environmental sound signal Ssa, and supplies the information signal WS to the signal recording / reproducing apparatus 20.
[0082]
The information signal separation unit 32 separates the image signal Sv, the sound signal Sa, the position information signal Sp, and the environmental sound signal Ssa from the information signal RS, and generates the image signal Sv as the image output unit 45 and the sound signal Sa as the sound output signal. The unit 52 supplies the position information signal Sp to the presentation control unit 50, and supplies the environmental sound signal Ssa to the environmental sound processing unit 53.
[0083]
The environmental sound processing unit 53 includes an installation information supply unit 531 and an environmental sound signal adjustment unit 532. The installation information supply unit 531 indicates which sound output unit 60-1 to 60-n outputs the environmental sound, and what kind of speaker is used in the sound output unit that outputs the environmental sound. The installation information CSb is held, or the installation information CSb is generated by a user operation or the like, and the installation information CSb is supplied to the environmental sound signal adjustment unit 532. In addition, when it has the acoustic output part for environmental sound output separately from the acoustic output parts 60-1 to 60-k, the information regarding this acoustic output part is also included in the installation information CSb.
[0084]
Based on the installation information CSb supplied from the installation information supply unit 531, the environmental sound signal adjustment unit 532 adjusts for each acoustic output unit that uses the signal level of the environmental sound signal Ssa. When output, the sound image of the environmental sound is prevented from being localized in a direction different from the actual sound image position. The environmental sound superimposed signal Ssb for each sound output unit generated by the environmental sound signal adjustment unit 532 is supplied to the signal addition unit 54.
[0085]
The signal adding unit 54 adds the acoustic output signal SAout supplied from the acoustic output signal generating unit 52 and the environmental sound superimposed signal Ssb supplied from the environmental sound signal adjusting unit 532 for each corresponding acoustic output unit 60, The sound output signal SBout is supplied to the sound output unit 60. For example, when the sound from the specific sound source and the environmental sound are output from the sound output unit 60-k, the sound output signal SAout-k and the environmental sound superimposed signal Ssb-k are added to generate the sound output signal SBout-k as a sound output. Part 60-k. When the sound output from the specific sound source is output from the acoustic output unit located in the front and the environmental sound is output from the acoustic output unit located in the rear, the acoustic output signal SAout is used as the acoustic output signal SBout forward. While supplying to the acoustic output part located, environmental sound superimposition signal Ssb is supplied to the acoustic output part located back as acoustic output signal SBout.
[0086]
As described above, according to the second embodiment, not only the sound from the specific sound source can be collected by the sound collection unit 14 and the sound image can be correctly localized, but the ambient environmental sound can also be reproduced correctly. Therefore, it is possible to present a natural and highly realistic sound presentation, and the listener can obtain a natural and more realistic reproduction sound field.
[0087]
In the above-described embodiment, an information signal is generated and recorded on a recording medium. However, a signal transmission device is provided instead of the signal recording / reproducing device, and the information signal may be transmitted. good.
[0088]
Furthermore, the above-described processing may be realized not only by hardware but also by software. The configuration in this case is shown in FIG. The computer includes a CPU (Central Processing Unit) 701 as shown in FIG. 16, and a ROM 702, a RAM 703, a hard disk drive 704, and an input / output interface 705 are connected to the CPU 701 via a bus 720. Yes. Furthermore, an input unit 711, a recording medium drive 712, a communication unit 713, a signal input unit 714, and a signal output unit 715 are connected to the input / output interface 705.
[0089]
When a command is input from an external device, or a command is input from an input unit 711 configured using an operation unit such as a keyboard or a mouse or a voice input unit such as a microphone, the command is input via the input / output interface 705. Are supplied to the CPU 701.
[0090]
The CPU 701 executes a program stored in the ROM 702, the RAM 703, or the hard disk drive 704, and performs processing according to the supplied command. Further, the ROM 702, the RAM 703, or the hard disk drive 704 stores in advance a sound presentation program for causing the computer to execute signal processing in the sound presentation system described above, and the signal input to the signal input unit 714 The sound output signal is generated based on the signal output from the signal output unit 715. Also, the sound presentation program may be recorded on a recording medium, the sound program may be recorded on the recording medium by the recording medium drive 712, or the sound program recorded on the recording medium may be read and executed by the computer. good. Furthermore, the sound program may be transmitted or received by the communication unit 713 via a wired or wireless transmission path, and the received sound program may be executed by a computer.
[0091]
FIG. 17 is a flowchart showing the overall configuration of the sound presentation program.
In step ST1, it is determined whether or not sound acquisition is performed. If sound acquisition is performed, the process proceeds to step ST2, and if sound acquisition is not performed, the process proceeds to step ST6.
[0092]
In step ST2, an information signal is generated. That is, the position information signal Sp is generated based on the angle signal Spa and the distance signal Spb input to the signal input unit 714. Further, the position information signal Sp and the image signal Sv or the acoustic signal Sa input to the signal input unit 714 are multiplexed, for example, to generate one information signal WS.
[0093]
In step ST3, it is determined whether or not the generated information signal is set to be recorded on a recording medium or transmitted to an external device. If it is set to record or transmit information signals, the process proceeds to step ST4. If it is set not to record or transmit information signals, the process proceeds to step ST5.
[0094]
In step ST4, information signals are recorded and transmitted, and the process proceeds to step ST5. Here, when the information signal is recorded, the information signal is supplied to the recording medium drive 712 and recorded on the recording medium attached to the recording medium drive 712. When transmitting an information signal, the information signal is output via the communication unit 713.
[0095]
In step ST5, it is determined whether or not to end the sound acquisition. Here, when the end operation is performed using the input unit 711, the process proceeds to step ST6. When the end operation is not performed, the process returns to step ST2, and the generation of the information signal is continued.
In step ST6, it is determined whether or not sound reproduction is performed. If sound reproduction is performed, the process proceeds to step ST7, and if sound reproduction is not performed, the process returns to step ST1.
[0096]
In step ST7, installation information CSa of a sound output unit that performs sound output based on the sound output signal is set. For example, the input unit 711 is operated to input installation information CSa such as the installation position of the sound output unit 60 and what kind of speaker is used. Alternatively, the installation information CSa stored in advance in the hard disk drive 704 or the like is read out.
[0097]
In step ST8, information signal separation processing is performed. That is, the image signal Sv, the acoustic signal Sa, and the position information signal Sp are separated from the information signal read from the recording medium, the information signal received by the communication unit 713, or the information signal generated by the sound acquisition process, and the process proceeds to step ST9. move on.
[0098]
In step ST9, motion detection is performed using the image signal Sv. The display position of the image based on the image signal Sv is moved according to the motion detected by the motion detection, and a new image output signal SVout is generated, and a display position signal PD indicating the display position of the image is generated, and step ST10. Proceed to
[0099]
In step ST10, the presentation adjustment signal CP is generated based on the display position signal PD, the installation information CSa, and the position information signal Sp.
In step ST11, a plurality of sound output signals SAout-1 to SAout-n are generated based on the sound signal Sa and the presentation adjustment signal CP and output from the signal output unit 715.
[0100]
In step ST12, it is determined whether or not to end the sound reproduction. Here, when the end operation is performed using the input unit 711, the process returns to step ST1. When the end operation is not performed, the process returns to step ST9, and the generation and output of the acoustic output signals SAout-1 to SAout-n according to the image display position are continued. By supplying the acoustic output signal SAout obtained by performing such processing to the acoustic output unit 60, it is possible to perform acoustic presentation with a high sense of presence even by software.
[0101]
【The invention's effect】
According to the present invention, the sound signal is generated by using the sound collecting means to generate the sound signal, and the motion detection is performed using the image signal generated by shooting in the specific sound source direction. A display position signal indicating the movement of the display position of the image and the display position is generated. A plurality of sound output signals are generated and supplied to the plurality of sound output means based on the sound signal, the display position signal, and the installation information regarding the plurality of sound output means for performing sound output based on the sound output signal. . For this reason, it is possible to easily localize the sound image corresponding to the display position of the image, and when the position of the specific sound source is moved, the position of the sound image is also moved, and sound presentation with a sense of movement can be performed.
[0102]
In addition, the environmental sound signal is generated by acquiring the environmental sound, and the environmental sound superimposed signal is generated based on the installation information of the plurality of sound output means and the environmental sound signal. Since it is added to the sound output signal, more natural and highly realistic sound presentation can be performed.
[0103]
Furthermore, since a positional information signal is generated from the positional relationship between the specific sound source and the sound collecting means, and a plurality of acoustic output signals are generated using this positional information signal, it is possible to perform acoustic presentation with a high sense of presence.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration of an audio presentation system.
FIG. 2 is a diagram illustrating a configuration of an imaging unit.
FIG. 3 is a diagram illustrating a configuration of a position detection unit.
FIG. 4 is a diagram illustrating a configuration of an image signal processing unit.
FIG. 5 is a diagram illustrating a configuration of a scene change detection unit.
FIG. 6 is a diagram illustrating a configuration of a motion detection unit.
FIG. 7 is a diagram illustrating a configuration of an image position moving unit.
FIG. 8 is a diagram illustrating a configuration of an image position determination circuit.
FIG. 9 is a diagram for explaining an operation when a display position signal is used.
FIG. 10 is a diagram for explaining an operation when angle information is used.
FIG. 11 is a diagram for explaining an operation when distance information is used.
FIG. 12 is a diagram illustrating the position of a sound image.
FIG. 13 is a diagram illustrating an arrangement of sound output units.
FIG. 14 is a diagram showing a configuration of a second exemplary embodiment.
FIG. 15 is a diagram illustrating a directivity characteristic of a microphone.
FIG. 16 is a diagram illustrating a configuration using a computer.
FIG. 17 is a flowchart showing an audio presentation program.
FIG. 18 is a diagram illustrating a conventional operation.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 ... Sound acquisition apparatus, 12 ... Imaging part, 14 ... Sound collection part, 15 ... Environmental sound collection part, 16 ... Position detection part, 18 ... Information signal generation part, DESCRIPTION OF SYMBOLS 20 ... Signal recording / reproducing apparatus, 30 ... Sound reproducing apparatus, 32 ... Information signal separation part, 40 ... Image signal processing part, 41 ... Scene change detection part, 42 ... Motion detection , 43 ... Image position moving part, 45 ... Image output part, 50 ... Presentation control part, 52 ... Sound output signal generation part, 53 ... Environmental sound processing part, 54 ... Signal addition unit, 60 ... acoustic output unit, 90 ... video camera, 161 ... angle sensor, 162 ... ranging sensor, 163 ... polar coordinate calculation unit, 501 ... installation information supply unit , 502 ... Adjustment signal generation unit, 531 ... Installation information supply unit, 532 ... Environmental sound signal adjustment

Claims

Sound collecting means for acquiring sound from a specific sound source and generating an acoustic signal;
Imaging means for taking an image in the direction of the specific sound source and generating an image signal;
Sound output signal generating means for generating a plurality of sound output signals based on the sound signal generated by the sound collecting means;
A plurality of sound output means for performing sound output based on the sound output signal;
Performs motion detection of the image of the image signal, the display position when displaying the image of the image signal while moving the display position in response to movement of the detected wide image display region from the image of the image signal Image processing means for generating a display position signal indicating;
A sound presentation system comprising: presentation control means for controlling the generation operation of the plurality of sound output signals in the sound output signal generation means based on the installation information regarding the plurality of sound output means and the display position signal. .

The sound presentation system according to claim 1, wherein the presentation control unit controls a ratio of a signal level in the plurality of sound output signals.

Environmental sound collection means for acquiring environmental sounds and generating environmental sound signals;
Environmental sound processing means for generating an environmental sound superimposed signal based on the installation information and the environmental sound signal;
The sound presentation system according to claim 1, further comprising: a signal adding unit that adds the corresponding environmental sound signal to the sound output signal for each sound output unit.

A position detecting means for detecting a positional relationship between the specific sound source and the sound collecting means and generating a position information signal indicating a detection result;
The sound presentation system according to claim 1, wherein the presentation control unit controls a generation operation of the plurality of sound output signals based on the installation information, the display position signal, and the position information signal.

The sound presentation system according to claim 4, wherein the position information signal includes distance information to the specific sound source.

The sound presentation system according to claim 4, wherein the position information signal includes angle information indicating a sound collection direction.

The sound presentation system according to claim 5, wherein the presentation control unit controls phases of the plurality of sound output signals.

Sound output signal generation means for generating a plurality of sound output signals based on sound signals generated by acquiring sound from a specific sound source;
Wherein performs motion detection of the image of the image signal generated by performing the photographing with respect to the direction of a particular sound source, said image while moving the display position in response to movement of the detected wide image display region from the image of the image signal Image processing means for generating a display position signal indicating the display position when displaying an image of the signal;
Presentation control means for controlling the generation operation of the plurality of sound output signals in the sound output signal generation means based on the installation information on the plurality of sound output means for performing sound output based on the sound output signal and the display position signal; A sound reproducing device comprising:

Environmental sound processing means for generating an environmental sound superimposed signal based on the environmental sound signal generated by acquiring the environmental sound and the installation information;
9. The sound reproducing apparatus according to claim 8, further comprising: a signal adding unit that adds the corresponding sound output signal to the environmental sound signal for each sound output unit.

Sound from a specific sound source is acquired by sound collection means to generate an acoustic signal,
Shooting with respect to the direction of the specific sound source to generate an image signal,
Performs motion detection of the image of the image signal, the display position when displaying the image of the image signal while moving the display position in response to movement of the detected wide image display region from the image of the image signal A display position signal indicating
A plurality of sound output signals are generated and supplied to the plurality of sound output means based on installation information on the plurality of sound output means for performing sound output based on the sound output signal and the display position signal; Sound presentation method.

Performs motion detection of the image of the image signal generated by performing the photographing with respect to the direction of a particular sound source, the image signal while moving the display position in response to movement of the detected wide image display region from the image of the image signal Generating a display position signal indicating the display position when displaying the image of
A plurality of sound output signals based on installation information on a plurality of sound output means for performing sound output based on sound output signals, the display position signal, and sound signals generated by acquiring sound from the specific sound source Generating a sound.

On the computer,
Performing motion detection of the image of the image signal generated by performing a photographing with respect to the direction of a particular sound source, the image signal while moving the display position in response to movement of the detected wide image display region from the image of the image signal A procedure for generating a display position signal indicating the display position when displaying the image of
Installation information on a plurality of sound output means for performing sound output based on the sound output signal, and a procedure for generating a presentation adjustment signal based on the display position signal;
Recording a program to be executed for executing a procedure for generating sound output signals to be supplied to the plurality of sound output means based on the sound signal generated by acquiring sound from the specific sound source and the presentation adjustment signal A computer-readable recording medium characterized by the above.

On the computer,
Performing motion detection of the image of the image signal generated by performing a photographing with respect to the direction of a particular sound source, the image signal while moving the display position in response to movement of the detected wide image display region from the image of the image signal A procedure for generating a display position signal indicating the display position when displaying the image of
Installation information on a plurality of sound output means for performing sound output based on the sound output signal, and a procedure for generating a presentation adjustment signal based on the display position signal,
A sound presentation characterized by causing a sound output signal to be supplied to the plurality of sound output means to be generated based on the sound signal generated by acquiring sound from the specific sound source and the presentation adjustment signal. program.