JP2003032776A

JP2003032776A - Reproduction system

Info

Publication number: JP2003032776A
Application number: JP2001216695A
Authority: JP
Inventors: Kazutada Abe; 一任阿部; Kenichi Terai; 賢一寺井
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2001-07-17
Filing date: 2001-07-17
Publication date: 2003-01-31

Abstract

(57)【要約】【課題】受聴者の位置にかからわず常に最適な処理を
再生装置から出力された音声信号に対して行うことを可
能とする再生システムを提供する。【解決手段】再生システム１００は、受聴者１０１の
少なくとも一部を含む映像を取り込む映像取り込み手段
１０２と、音声を再生することにより音声信号を出力す
る再生装置１０７と、映像取り込み手段１０２から出力
された映像信号に基づいて、受聴者１０１の位置を検出
する検出手段１０６と、検出手段１０６によって検出さ
れた受聴者１０１の位置に応じた処理を再生装置１０７
から出力された音声信号に対して行う音声制御装置１０
８とを備えている。 (57) [Summary] [PROBLEMS] To provide a reproduction system capable of always performing optimal processing on an audio signal output from a reproduction device regardless of the position of a listener. SOLUTION: A reproducing system 100 includes a video capturing means 102 for capturing a video including at least a part of a listener 101, a reproducing apparatus 107 for outputting an audio signal by reproducing an audio, and a video output from the video capturing means 102. Detecting means 106 for detecting the position of the listener 101 based on the detected video signal, and a reproducing apparatus 107 for performing processing in accordance with the position of the listener 101 detected by the detecting means 106.
Control device 10 for the sound signal output from
8 is provided.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、ＡＶ（オーディ
オ、ビジュアル）機器などにおいて、受聴者にとって好
ましい音響再生を行うことを目的とした音声再生装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound reproducing apparatus for AV (audio / visual) equipment or the like for reproducing sound which is preferable for listeners.

【０００２】[0002]

【従来の技術】音の伝播は、音源と受聴者の位置関係
や、音源と受聴者の間の環境によって異なる。そこで、
音の伝播の違いを検知して、受聴者はその音源の位置
や、環境の印象を知覚することができる。例えば、音源
位置が受聴者の正面に固定してある場合、左右の耳に到
達する音は、受聴者が顔を右に向ければ左の音が、左に
向ければ右の音が相対的に大きくなる。このような、受
聴者の位置の変化と両耳で受聴する音の変化との関係
で、受聴者はより鮮明に音源の存在を知覚することがで
きる。2. Description of the Related Art Sound propagation varies depending on the positional relationship between the sound source and the listener and the environment between the sound source and the listener. Therefore,
By detecting a difference in sound propagation, the listener can perceive the position of the sound source and the impression of the environment. For example, when the sound source position is fixed in front of the listener, the sound reaching the left and right ears is the left sound when the listener turns his face to the right, and the right sound when the listener turns to the left. growing. Due to such a relationship between the change in the position of the listener and the change in the sound heard by both ears, the listener can more clearly perceive the presence of the sound source.

【０００３】これを利用し、受聴者の位置の変化を検知
して、所定の音声処理を行って受聴者に呈示することに
より、より臨場感あふれる音声の再生が可能となる。そ
の例として、特開平７−８２２２３５号公報に記載の映
像信号及び音響信号の再生装置などがある。これは、映
像信号再生手段上の仮想音源と受聴者の頭部との相対的
移動を検出する検出手段を備え、ヘッドホン再生におい
て受聴者の移動に対応した音像制御を実現することが可
能である。Utilizing this, by detecting a change in the position of the listener, performing a predetermined voice processing and presenting it to the listener, it becomes possible to reproduce a more realistic voice. As an example thereof, there is a reproducing apparatus for a video signal and an audio signal described in JP-A-7-822235. This is provided with detection means for detecting relative movement between the virtual sound source on the video signal reproduction means and the listener's head, and can realize sound image control corresponding to movement of the listener in headphone reproduction. .

【０００４】ヘッドホンによる音像制御と同様、スピー
カを用いた音像制御も提案されている。スピーカを用い
て音像制御を行う方式は、トランスオーラル（ｔｒａｎ
ｓａｕｒａｌ）方式と呼ばれる。トランスオーラル方式
は、原音場においての受聴者の両耳に達する音響信号と
等価な信号を、再生音場内のスピーカにより受聴者の両
耳に生成する方式である。つまり、原音場での右耳（右
チャンネル入力）の信号は再生音場での右耳にのみ、左
チャンネルの信号は左耳にのみ正確に到達することを理
想とする。しかしスピーカを用いることによって、片側
の信号が逆の耳に到達してしまうという現象（クロスト
ーク）が生じてしまう。これにより音像制御が行えなく
なってしまう。よって、トランスオーラル方式ではクロ
ストークを抑圧するため、ＤＳＰ（ＤｉｇｉｔａｌＳ
ｉｇｎａｌＰｒｏｃｅｓｓｏｒ）によるディジタルフ
ィルタを用いて実現している。Sound image control using a speaker has been proposed as well as sound image control using headphones. A method of performing sound image control using a speaker is a transaural (trans) method.
This is called the "saural" method. The trans-aural method is a method in which a signal equivalent to an acoustic signal reaching both ears of a listener in the original sound field is generated in both ears of the listener by a speaker in the reproduced sound field. That is, it is ideal that the right ear (right channel input) signal in the original sound field exactly reaches the right ear in the reproduced sound field, and the left channel signal exactly reaches the left ear. However, the use of the speaker causes a phenomenon (crosstalk) in which a signal on one side reaches the opposite ear. As a result, sound image control cannot be performed. Therefore, in the trans-aural method, crosstalk is suppressed, so that the DSP (Digital S
This is realized by using a digital filter according to the (Internal Processor).

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、クロス
トーク抑圧は受聴者の位置が移動した場合正しく行われ
ず、音像制御の効果が劣化するという問題点がある。そ
こで、受聴者の位置を検知し、その変化に対して所定の
処理を行うことができれば、クロストークの抑圧が実現
され、想定した音像制御が実現される。また、同時に受
聴者の移動を検知し、例えば受聴者の移動に対応するよ
う仮想音源を移動させるのは臨場感ある音声再生を行う
ためには効果的である。However, there is a problem in that the crosstalk suppression is not correctly performed when the position of the listener moves and the effect of sound image control deteriorates. Therefore, if the position of the listener is detected and a predetermined process can be performed for the change, crosstalk can be suppressed and the assumed sound image control can be realized. At the same time, it is effective to detect the movement of the listener and to move the virtual sound source so as to correspond to the movement of the listener, in order to reproduce the sound with a sense of reality.

【０００６】また、スピーカ再生においては、音が伝達
する範囲にいる人全てが受聴者となり得るなど、受聴者
が限定されない場合が多い。受聴者の位置の検出は受聴
者に検出装置をつけることなくできる方が、より簡便で
受聴者にとって好ましいといえる。In the case of speaker reproduction, the listeners are not limited in many cases, such that all the persons within the range of transmitting sound can be listeners. It can be said that it is easier and preferable for the listener to detect the listener's position without attaching a detection device to the listener.

【０００７】本発明は、上記問題点に鑑みてなされたも
のであり、受聴者の位置にかからわず常に最適な処理を
再生装置から出力された音声信号に対して行うことを可
能とする再生システムを提供することを目的とする。The present invention has been made in view of the above problems, and makes it possible to always perform optimum processing on an audio signal output from a reproducing apparatus regardless of the position of the listener. The purpose is to provide a reproduction system.

【０００８】[0008]

【課題を解決するための手段】本発明の再生システム
は、受聴者の少なくとも一部を含む映像を取り込む映像
取り込み手段と、音声を再生することにより音声信号を
出力する再生装置と、前記映像取り込み手段から出力さ
れた映像信号に基づいて、受聴者の位置を検出する検出
手段と、前記検出手段によって検出された前記受聴者の
位置に応じた処理を前記再生装置から出力された前記音
声信号に対して行う音声制御装置とを備えており、これ
により、上記目的が達成される。A reproduction system of the present invention is a video capturing means for capturing a video including at least a part of a listener, a reproducing device for outputting a voice signal by reproducing a voice, and the video capturing. Based on the video signal output from the means, a detecting unit that detects the position of the listener, and a process according to the position of the listener detected by the detecting unit is performed on the audio signal output from the reproducing device. And a voice control device for performing the same, thereby achieving the above object.

【０００９】前記再生装置は、２チャネル以上の音声信
号を出力してもよい。The playback device may output audio signals of two or more channels.

【００１０】前記検出手段は、前記受聴者の頭部の位置
を検出してもよい。The detecting means may detect the position of the head of the listener.

【００１１】前記音声制御装置は、前記検出手段によっ
て検出された前記受聴者の位置に応じた音像制御処理を
前記再生装置から出力された前記音声信号に対して行っ
てもよい。The audio control device may perform a sound image control process according to the position of the listener detected by the detection means on the audio signal output from the reproduction device.

【００１２】前記音声制御装置は、前記検出手段によっ
て検出された前記受聴者の位置に応じた指向性制御処理
を前記再生装置から出力された前記音声信号に対して行
ってもよい。The audio control device may perform directivity control processing according to the position of the listener detected by the detection means on the audio signal output from the reproduction device.

【００１３】前記検出手段は、前記受聴者の人数をさら
に検出してもよい。The detecting means may further detect the number of listeners.

【００１４】前記再生システムは、携帯端末において使
用されてもよい。The reproduction system may be used in a mobile terminal.

【００１５】[0015]

【発明の実施の形態】以下、図面を参照しながら本発明
の実施の形態を説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００１６】（実施の形態１）図１は、本発明の実施の
形態１の再生システム１００の構成の一例を示す。(Embodiment 1) FIG. 1 shows an example of the configuration of a reproduction system 100 according to Embodiment 1 of the present invention.

【００１７】再生システム１００は、映像を取り込む映
像取り込み手段１０２と、音声を再生することにより音
声信号を出力する再生装置１０７と、再生装置１０７か
ら出力される音声信号に対して処理を行う音声制御装置
１０８とを含む。音声制御装置１０８から出力される音
声信号は、スピーカ１０３、１０４に出力される。The reproducing system 100 includes a video capturing means 102 for capturing a video, a reproducing device 107 for reproducing an audio signal to output an audio signal, and an audio control for processing an audio signal output from the reproducing device 107. And device 108. The audio signal output from the audio control device 108 is output to the speakers 103 and 104.

【００１８】なお、再生装置１０７は、音声信号を出力
することに加えて、映像を再生することにより映像信号
を出力してもよい。再生装置１０７から出力される映像
信号は、ディスプレイ１０５に出力される。The reproduction device 107 may output the video signal by reproducing the video in addition to outputting the audio signal. The video signal output from the playback device 107 is output to the display 105.

【００１９】再生装置１０７としては、例えば、ＣＤ、
ＭＤ、半導体プレーヤなどのオーディオ機器、ＤＶＤ、
ＢＳディジタルテレビの受信装置などのＡＶ機器、パー
ソナルコンピュータなどのマルチメディア機器が好適に
使用され得る。As the reproducing device 107, for example, a CD,
Audio equipment such as MDs and semiconductor players, DVDs,
AV equipment such as a BS digital television receiver and multimedia equipment such as a personal computer can be preferably used.

【００２０】映像取り込み手段１０２は、映像取り込み
手段１０２によって取り込まれる映像が受聴者１０１の
少なくとも一部（例えば、受聴者の頭部）を含むように
方向づけられている。映像取り込み手段１０２は、取り
込まれた映像に対応する映像信号を出力する。The video capture means 102 is oriented such that the video captured by the video capture means 102 includes at least a portion of the listener 101 (eg, the listener's head). The video capturing means 102 outputs a video signal corresponding to the captured video.

【００２１】映像取り込み手段１０２としては、例え
ば、カメラが好適に使用され得る。カメラは、例えば、
ディスプレイ１０５の上に設置される。受聴者の位置を
検出する精度を向上させるために、映像取り込み手段１
０２として複数のカメラが使用されてもよい。A camera can be preferably used as the image capturing means 102. The camera, for example,
It is installed on the display 105. In order to improve the accuracy of detecting the position of the listener, the image capturing means 1
Multiple cameras may be used as 02.

【００２２】再生システム１００は、映像取り込み手段
１０２から出力された映像信号に基づいて、受聴者の位
置を検出する検出手段１０６をさらに含む。音声制御装
置１０８は、検出手段１０６によって検出された受聴者
の位置に応じた処理（例えば、音像制御処理または指向
性制御処理）を再生装置１０７から出力された音声信号
に対して行う。なお、検出手段１０６は、受聴者の位置
として、例えば、受聴者の頭部の位置を検出してもよ
い。The reproduction system 100 further includes detection means 106 for detecting the position of the listener based on the video signal output from the video capturing means 102. The audio control device 108 performs a process (for example, a sound image control process or a directivity control process) according to the position of the listener detected by the detection means 106 on the audio signal output from the reproduction device 107. The detection unit 106 may detect the position of the listener's head, for example, as the position of the listener.

【００２３】図２Ａは、再生システム１００の検出手段
１０６および音声制御装置１０８の構成の一例を示す。FIG. 2A shows an example of the configuration of the detection means 106 and the voice control device 108 of the reproduction system 100.

【００２４】検出手段１０６は、映像取り込み手段１０
２から出力された映像信号に対して所定の信号処理（例
えば、輪郭の強調処理や色の強調処理）を行う映像信号
処理部２０２と、映像信号処理部２０２からの出力に基
づいて受聴者の位置を示す情報（例えば、受聴者の頭部
の位置を示す情報）を抽出する位置情報抽出部２０３と
を含む。受聴者の位置を示す情報は、音声制御装置１０
８に入力される。The detecting means 106 is the image capturing means 10.
2 performs predetermined signal processing (for example, contour emphasis processing or color emphasis processing) on the video signal output from the video signal 2, and based on the output from the video signal processing section 202, A position information extraction unit 203 that extracts information indicating the position (for example, information indicating the position of the listener's head). The information indicating the position of the listener is the voice control device 10.
8 is input.

【００２５】音声制御装置１０８は、メモリ（例えば、
ＲＯＭ）に格納された複数のフィルタ係数のうち、入力
された受聴者の位置を示す情報に対応するフィルタ係数
を選択する係数選択部２０５と、係数選択部２０５によ
って選択されたフィルタ係数を用いて再生装置１０７か
ら出力された音声信号に対して音像制御処理を行う音像
制御部２０６とを含む。The voice controller 108 has a memory (eg,
Of the plurality of filter coefficients stored in the ROM), a coefficient selecting unit 205 that selects a filter coefficient corresponding to the input information indicating the position of the listener and a filter coefficient selected by the coefficient selecting unit 205 are used. A sound image control unit 206 that performs a sound image control process on the audio signal output from the reproduction device 107 is included.

【００２６】音像制御処理の例としては、遅延制御、音
圧レベル制御、クロストークキャンセル、仮想音源から
の頭部伝達関数の畳み込み演算などの処理が挙げられ
る。音像制御部２０６から出力された信号は増幅装置２
０８を介してスピーカ１０３、１０４に出力される。Examples of the sound image control processing include delay control, sound pressure level control, crosstalk cancellation, convolution calculation of a head related transfer function from a virtual sound source, and the like. The signal output from the sound image control unit 206 is the amplification device 2
It is output to the speakers 103 and 104 via 08.

【００２７】図２Ａに示される例では、再生装置１０７
から出力される音声信号は、２チャンネルの信号であ
る。しかし、音声信号のチャンネル数は２に限定されな
い。再生装置１０７から出力される音声信号は、２チャ
ンネル以上の信号であり得る。例えば、再生装置１０７
は、ＤＶＤやＢＳディジタルテレビ放送などで取り扱わ
れている５．１チャンネルのマルチチャンネルの音声信
号を音像制御部２０６に出力するようにしてもよい。In the example shown in FIG. 2A, the playback device 107
The audio signal output from is a 2-channel signal. However, the number of audio signal channels is not limited to two. The audio signal output from the reproduction device 107 may be a signal of two or more channels. For example, the playback device 107
May output a 5.1-channel multi-channel audio signal, which is handled in DVD or BS digital television broadcasting, to the sound image control unit 206.

【００２８】また、図２Ａに示される例では、２チャン
ネルのスピーカ１０３、１０４が使用されている。しか
し、スピーカのチャンネル数は２に限定されない。音像
制御の精度を向上させるため、あるいは、２人以上の受
聴者や、受聴者近傍の多点を制御するために、２チャン
ネル以上のスピーカを使用することも可能である。In the example shown in FIG. 2A, two-channel speakers 103 and 104 are used. However, the number of speaker channels is not limited to two. In order to improve the accuracy of sound image control, or to control two or more listeners or multiple points in the vicinity of the listeners, it is possible to use speakers of two or more channels.

【００２９】図２Ｂは、再生システム１００が実装され
た携帯電話１２０の構成の一例を示す。図２Ｂにおい
て、図１および図２Ａに示される構成要素と同一の構成
要素には同一の参照番号を付し、その説明を省略する。FIG. 2B shows an example of the configuration of the mobile phone 120 in which the reproduction system 100 is mounted. In FIG. 2B, the same components as those shown in FIGS. 1 and 2A are designated by the same reference numerals, and the description thereof will be omitted.

【００３０】図２Ｂに示される例では、携帯電話１２０
は、図１および図２Ａに示される再生装置１０７に代え
て、音声信号を生成する音声生成部１１０を含む。音声
信号としては、ｍｉｄｉやｍｌｄ形式などで記録された
着信音や音楽、通話相手の音声などが考えられる。音声
信号生成部１１０は、左チャンネルの信号を音像制御部
２０６の入力端子Ｌｉｎに入力し、右チャンネルの信号
を音像制御部２０６の入力端子Ｒｉｎに入力する。In the example shown in FIG. 2B, the mobile phone 120
Includes an audio generation unit 110 that generates an audio signal instead of the reproduction device 107 shown in FIGS. 1 and 2A. The voice signal may be a ring tone, music, voice of the other party, etc. recorded in the midi or mld format. The audio signal generation unit 110 inputs the signal of the left channel to the input terminal Lin of the sound image control unit 206, and inputs the signal of the right channel to the input terminal Rin of the sound image control unit 206.

【００３１】なお、再生システム１００は、任意のタイ
プの携帯端末において使用され得る。The reproduction system 100 can be used in any type of portable terminal.

【００３２】図３は、検出手段１０６による受聴者の位
置の検出の具体例を示す。図３に示される例では、映像
取り込み手段１０２は、ＸＹ平面の原点に配置されてい
るものとする。FIG. 3 shows a specific example of detection of the listener's position by the detection means 106. In the example shown in FIG. 3, it is assumed that the image capturing means 102 is arranged at the origin of the XY plane.

【００３３】映像取り込み手段１０２の視野（図３にお
いてハッチングの部分）の範囲内に、受聴者３０３、３
０４が存在すると仮定する。受聴者３０３、３０４の位
置は、視野角度と距離とによって表現される。ここで、
視野角度とは、映像取り込み手段の視野の基準線３０１
から反時計周りの角度をいい、距離とは、映像取り込み
手段１０２と受聴者３０３（または受聴者３０４）との
間の距離をいう。Within the range of the visual field (hatched portion in FIG. 3) of the image capturing means 102, the listeners 303, 3
Assume that 04 exists. The positions of the listeners 303 and 304 are represented by the viewing angle and the distance. here,
The visual field angle is the reference line 301 of the visual field of the image capturing means.
Is a counterclockwise angle, and the distance is the distance between the image capturing means 102 and the listener 303 (or listener 304).

【００３４】図３において、参照番号３０２は、映像取
り込み手段１０２によって取り込まれた映像を模式的に
示す。取り込み映像３０２は、受聴者３０３の映像３０
５と、受聴者３０４の映像３０６とを含む。取り込み映
像３０２の横幅はａであり、映像３０５は、取り込み映
像３０２の右端からｂの位置にあるものとする（ａ、ｂ
の単位は例えばピクセルである）。In FIG. 3, reference numeral 302 schematically indicates a video captured by the video capturing means 102. The captured image 302 is the image 30 of the listener 303.
5 and a video 306 of the listener 304. The width of the captured image 302 is a, and the image 305 is located at a position b from the right end of the captured image 302 (a, b).
The unit of is a pixel, for example).

【００３５】映像取り込み手段１０２の視野角度をＡと
し、受聴者３０３の視野角度をＢとすると、（数１）が
成立する。When the viewing angle of the image capturing means 102 is A and the viewing angle of the listener 303 is B, (Equation 1) is established.

【００３６】[0036]

【数１】ｂ／ａ＝Ｂ／Ａ従って、受聴者３０３の視野角度Ｂは、（数２）によっ
て求められる。## EQU00001 ## b / a = B / A Therefore, the viewing angle B of the listener 303 is obtained by (Equation 2).

【００３７】[0037]

【数２】Ｂ＝（ｂ／ａ）＊Ａ映像取り込み手段１０２と受聴者３０３との距離は、例
えば、取り込み映像３０２中の受聴者３０３の顔の大き
さを検出することにより、求めることができる。映像取
り込み手段１０２と受聴者との距離と取り込み映像３０
２中の受聴者の顔の大きさとの関係は、あらかじめ測定
され、メモリ（図示せず）に格納される。取り込み映像
３０２においては、受聴者３０３の顔の大きさの方が受
聴者３０４の顔の大きさより大きい。これは、映像取り
込み手段１０２と受聴者３０３との距離の方が映像取り
込み手段１０２と受聴者３０４との距離より近いからで
ある。## EQU00002 ## B = (b / a) * A The distance between the image capturing means 102 and the listener 303 can be obtained by, for example, detecting the size of the face of the listener 303 in the captured image 302. it can. Distance between image capturing means 102 and listener and captured image 30
The relationship with the size of the listener's face in 2 is measured in advance and stored in a memory (not shown). In the captured image 302, the face size of the listener 303 is larger than the face size of the listener 304. This is because the distance between the image capturing means 102 and the listener 303 is shorter than the distance between the image capturing means 102 and the listener 304.

【００３８】なお、人間の顔の大きさは個人により異な
るが、この距離の検出にあたっては人間の顔の大きさは
同一であると仮定しても差し支えない。また、受聴者の
顔の大きさがあらかじめわかっている場合には、映像取
り込み手段と受聴者との距離をより正確に求めることが
可能である。Although the size of the human face varies from person to person, it can be assumed that the size of the human face is the same when detecting this distance. Further, when the size of the listener's face is known in advance, it is possible to more accurately determine the distance between the image capturing means and the listener.

【００３９】さらに、２つの映像取り込み手段を異なる
位置に設置することが可能な場合には、２つの映像取り
込み手段から出力される映像信号の差分を利用して基準
点と受聴者との距離を求めることも可能である。Further, when the two video capturing means can be installed at different positions, the difference between the video signals output from the two video capturing means is used to determine the distance between the reference point and the listener. It is also possible to ask.

【００４０】このようにして、受聴者３０３の位置を示
す情報として、受聴者３０３の視野角度と、映像取り込
み手段１０２と受聴者３０３との距離とが検出され得
る。同様にして、受聴者３０４の位置を示す情報とし
て、受聴者３０４の視野角度と、映像取り込み手段１０
２と受聴者３０４との距離とが検出され得る。In this way, the viewing angle of the listener 303 and the distance between the image capturing means 102 and the listener 303 can be detected as the information indicating the position of the listener 303. Similarly, as the information indicating the position of the listener 304, the viewing angle of the listener 304 and the image capturing means 10 are obtained.
2 and the distance between the listener 304 can be detected.

【００４１】受聴者の顔の大きさを検出する方法として
は、さまざまな方法が使用され得る。例えば、輪郭検出
を利用して受聴者の大まかな位置および形を検出し、次
に肌色検出を利用して受聴者の顔の大きさを検出する方
法が使用され得る。あるいは、肌色検出の代わりに、ま
たは、肌色検出に加えて、目、髪の毛、口などの顔の特
徴を利用して受聴者の顔の大きさを検出してもよい。Various methods can be used to detect the size of the listener's face. For example, contour detection may be used to detect the approximate location and shape of the listener, and then skin color detection may be used to detect the listener's face size. Alternatively, instead of skin color detection, or in addition to skin color detection, the size of the listener's face may be detected using facial features such as eyes, hair, and mouth.

【００４２】図４を参照して、ディジタルフィルタ処理
を用いて輪郭検出を行う一例を説明する。図４におい
て、四角いますめは１画素を示し、黒色の画素（図４に
おいてハッチングの部分）は値１を有しており、白色の
画素は値０を有しているとする。An example of contour detection using digital filtering will be described with reference to FIG. In FIG. 4, a square dot indicates one pixel, a black pixel (hatched portion in FIG. 4) has a value of 1, and a white pixel has a value of 0.

【００４３】図４（ａ）は、映像取り込み手段１０２に
よって取り込まれた映像の一例を示す。このような映像
の輪郭は、例えば、係数[１−１]を有するＦＩＲディジ
タルフィルタを用いて映像の縦方向および横方向にフィ
ルタ処理を行うことにより検出され得る。このＦＩＲデ
ィジタルフィルタは、フィルタ処理されるデータの変化
を検出し、０以外の値を出力するという性質を有してい
るからである。FIG. 4A shows an example of an image captured by the image capturing means 102. Such a contour of an image can be detected by performing a filtering process in the vertical and horizontal directions of the image using, for example, an FIR digital filter having a coefficient [1-1]. This is because this FIR digital filter has the property of detecting a change in the data to be filtered and outputting a value other than 0.

【００４４】図４（ｂ）は、そのＦＩＲディジタルフィ
ルタを用いて図４（ａ）に示される映像をフィルタ処理
した結果を示す。このように、輪郭検出を利用して、受
聴者の大まかな位置および形を検出することができる。FIG. 4B shows the result of filtering the image shown in FIG. 4A using the FIR digital filter. In this way, outline detection can be used to detect the approximate position and shape of the listener.

【００４５】なお、受聴者の目を検出することが可能で
ある場合には、受聴者の顔の向きを検出することも可能
である。When it is possible to detect the listener's eyes, it is also possible to detect the orientation of the listener's face.

【００４６】図５は、図２Ａに示される音像制御部２０
６の構成の一例を示す。音像制御部２０６は、２つの入
力端子Ｌｉｎ、Ｒｉｎと、２つの出力端子Ｌｏｕｔ、Ｒ
ｏｕｔとを有している。FIG. 5 shows the sound image controller 20 shown in FIG. 2A.
6 shows an example of the configuration of No. 6. The sound image control unit 206 has two input terminals Lin and Rin and two output terminals Lout and Rin.
out and.

【００４７】入力端子Ｌｉｎには、再生装置１０７（図
２Ａ）から左チャンネルの音声信号が入力される。入力
端子Ｒｉｎには、再生装置１０７（図２Ａ）から右チャ
ンネルの音声信号が入力される。出力端子Ｌｏｕｔから
出力される信号は、増幅装置２０８を介してスピーカ１
０４に供給される。出力端子Ｒｏｕｔから出力される信
号は、増幅装置２０８を介してスピーカ１０３に供給さ
れる。An audio signal of the left channel is input to the input terminal Lin from the reproducing device 107 (FIG. 2A). The audio signal of the right channel is input to the input terminal Rin from the playback device 107 (FIG. 2A). The signal output from the output terminal Lout is transmitted to the speaker 1 via the amplification device 208.
04. The signal output from the output terminal Rout is supplied to the speaker 103 via the amplification device 208.

【００４８】音像制御部２０６は、フィルタ５０１〜５
０４と、加算器５０５、５０６とを含む。フィルタ５０
１、５０２の入力は入力端子Ｌｉｎに接続されている。
フィルタ５０３、５０４の入力は入力端子Ｒｉｎに接続
されている。加算器５０５は、フィルタ５０１の出力と
フィルタ５０３の出力とを加算し、その加算結果を出力
端子Ｌｏｕｔに出力する。加算器５０６は、フィルタ５
０２の出力とフィルタ５０４の出力とを加算し、その加
算結果を出力端子Ｒｏｕｔに出力する。The sound image control unit 206 includes filters 501 to 5
04 and adders 505 and 506. Filter 50
The inputs of 1, 502 are connected to the input terminal Lin.
The inputs of the filters 503 and 504 are connected to the input terminal Rin. The adder 505 adds the output of the filter 501 and the output of the filter 503, and outputs the addition result to the output terminal Lout. The adder 506 is the filter 5
02 output and the output of the filter 504 are added, and the addition result is output to the output terminal Rout.

【００４９】ここで、フィルタ５０１、５０２、５０
３、５０４のフィルタ係数が、それぞれ、Ｘｌｌ、Ｘｌ
ｒ、Ｘｒｌ、Ｘｒｒであるとする。Here, the filters 501, 502, 50
The filter coefficients of 3 and 504 are Xll and Xl, respectively.
Let r, Xrl, and Xrr.

【００５０】図６は、図２Ａに示される係数選択部２０
５内のメモリ（例えば、ＲＯＭ）に格納されるテーブル
６００の一例を示す。テーブル６００は、映像取り込み
手段１０２の視野範囲内に存在する受聴者が１人である
場合に使用される。図６において、Ｃ₁₁、Ｃ₁₂、・・
・、Ｃ_(n+1)4のそれぞれは、数値の配列（例えば、
[０．１，０．２，０．３，−０．３，０．０１]）を示
す。配列内の数値の個数は制御性能に依存する。FIG. 6 shows the coefficient selecting section 20 shown in FIG. 2A.
5 shows an example of a table 600 stored in a memory (for example, a ROM) in the memory 5. The table 600 is used when only one listener exists within the visual field range of the video capturing means 102. In FIG. 6, C ₁₁ , C ₁₂ , ...
., C _{(n + 1) 4} is an array of numerical values (for example,
[0.1, 0.2, 0.3, -0.3, 0.01]) is shown. The number of numbers in the array depends on the control performance.

【００５１】テーブル６００は、視聴者の位置を示す情
報（視野角度、距離）ごとに定義されたフィルタ係数の
組（Ｘｌｌ、Ｘｌｒ、Ｘｒｌ、Ｘｒｒ）を含む。The table 600 includes a set of filter coefficients (Xll, Xlr, Xrl, Xrr) defined for each information (viewing angle, distance) indicating the position of the viewer.

【００５２】係数選択部２０５は、位置情報抽出部２０
３から出力された受聴者の位置を示す情報（視野角度、
距離）に対応するフィルタ係数の組（Ｘｌｌ、Ｘｌｒ、
Ｘｒｌ、Ｘｒｒ）を選択し、選択されたフィルタ係数の
組（Ｘｌｌ、Ｘｌｒ、Ｘｒｌ、Ｘｒｒ）を音像制御部２
０６に出力する。その結果、選択されたフィルタ係数の
組（Ｘｌｌ、Ｘｌｒ、Ｘｒｌ、Ｘｒｒ）が、音像制御部
２０６に含まれるフィルタの組（フィルタ５０１、フィ
ルタ５０２、フィルタ５０３、フィルタ５０４）にそれ
ぞれ設定される。The coefficient selection unit 205 includes the position information extraction unit 20.
Information indicating the position of the listener output from 3 (viewing angle,
A set of filter coefficients (Xll, Xlr,
Xrl, Xrr) and select the selected filter coefficient set (Xll, Xlr, Xrl, Xrr) as the sound image control unit 2
It outputs to 06. As a result, the selected filter coefficient set (Xll, Xlr, Xrl, Xrr) is set in each of the filter sets (filter 501, filter 502, filter 503, and filter 504) included in the sound image control unit 206.

【００５３】例えば、受聴者が（視野角度、距離）＝
（０°、０．５ｍ）の位置に存在する場合には、係数選
択部２０５は、（０°、０．５ｍ）に対応するフィルタ
係数の組（Ｘｌｌ、Ｘｌｒ、Ｘｒｌ、Ｘｒｒ）＝
（Ｃ₁₁、Ｃ₁₂、Ｃ₁₃、Ｃ₁₄）を選択する。選択されたフ
ィルタ係数の組（Ｘｌｌ、Ｘｌｒ、Ｘｒｌ、Ｘｒｒ）＝
（Ｃ₁₁、Ｃ₁₂、Ｃ₁₃、Ｃ₁₄）が、音像制御部２０６に含
まれるフィルタの組（フィルタ５０１、フィルタ５０
２、フィルタ５０３、フィルタ５０４）にそれぞれ設定
される。For example, if the listener (viewing angle, distance) =
When it exists at the position of (0 °, 0.5 m), the coefficient selection unit 205 sets the filter coefficient set (Xll, Xlr, Xrl, Xrr) = (0 °, 0.5m) =
Select (C ₁₁ , C ₁₂ , C ₁₃ , C ₁₄ ). Selected filter coefficient set (Xll, Xlr, Xrl, Xrr) =
(C ₁₁ , C ₁₂ , C ₁₃ , C ₁₄ ) is a set of filters (filter 501, filter 50) included in the sound image control unit 206.
2, filter 503, and filter 504).

【００５４】このように、再生システム１００に対する
受聴者の相対的な位置に応じて、音像制御部２０６のフ
ィルタ係数を更新することにより、受聴者の位置にかか
わらず常に最適な音像制御（例えば、音像を定位させる
音像定位の制御）を行うことが可能になる。音像制御部
２０６のフィルタ係数の更新は、受聴者の位置の変化に
追従してリアルタイムに行うことが好ましい。As described above, by updating the filter coefficient of the sound image control unit 206 according to the relative position of the listener with respect to the reproduction system 100, the optimum sound image control is always performed regardless of the position of the listener (for example, It is possible to control the sound image localization for localizing the sound image). It is preferable that the filter coefficient of the sound image control unit 206 be updated in real time in accordance with the change in the position of the listener.

【００５５】通常、音像制御部２０６のフィルタ係数
は、受聴者が再生システム１００の正面に位置している
ことを想定して１とおりに設計される。これに対し、本
発明では、音像制御部２０６のフィルタ係数は、受聴者
が再生システム１００に対して複数の位置に存在するこ
とを想定して複数とおりに設計される。これにより、再
生システム１００に対する受聴者の相対的な位置が変化
する場合でも、受聴者の位置に応じて音像制御部２０６
のフィルタ係数を更新することにより、常に最適な音像
制御を行うことが可能になる。Normally, the filter coefficient of the sound image control unit 206 is designed in one way on the assumption that the listener is located in front of the reproduction system 100. On the other hand, in the present invention, the filter coefficient of the sound image control unit 206 is designed in a plurality of ways on the assumption that the listener exists at a plurality of positions with respect to the reproduction system 100. As a result, even if the relative position of the listener with respect to the reproduction system 100 changes, the sound image control unit 206 can be changed according to the position of the listener.
By updating the filter coefficient of, it is possible to always perform optimum sound image control.

【００５６】図７は、図２Ａに示される係数選択部２０
５内のメモリ（例えば、ＲＯＭ）に格納されるテーブル
７００の一例を示す。テーブル７００は、映像取り込み
手段１０２の視野範囲内に存在する受聴者が２人である
場合に使用される。図７において、Ｃ₁₁’、Ｃ₁₂’、・
・・、Ｃ₃₄’のそれぞれは、数値の配列（例えば、
[０．１，０．２，０．３，−０．３，０．０１]）を示
す。配列内の数値の個数は制御性能に依存する。FIG. 7 shows the coefficient selection unit 20 shown in FIG. 2A.
5 shows an example of a table 700 stored in a memory (for example, ROM) in the memory 5. The table 700 is used when there are two listeners within the visual field range of the video capturing means 102. In FIG. 7, C ₁₁ ′, C ₁₂ ′, ...
.., each of C ₃₄ 'is an array of numbers (eg,
[0.1, 0.2, 0.3, -0.3, 0.01]) is shown. The number of numbers in the array depends on the control performance.

【００５７】テーブル７００は、第１の受聴者の位置を
示す情報（視野角度、距離）および第２の受聴者の位置
を示す情報（視野角度、距離）の組ごとに定義されたフ
ィルタ係数の組（Ｘｌｌ、Ｘｌｒ、Ｘｒｌ、Ｘｒｒ）を
含む。The table 700 includes filter coefficients defined for each set of information (viewing angle, distance) indicating the position of the first listener and information indicating the position of the second listener (viewing angle, distance). The set (Xll, Xlr, Xrl, Xrr) is included.

【００５８】２つのスピーカを用いて音声を再生する場
合には、２つの制御点における音を制御することが可能
である。映像取り込み手段１０２の視野範囲内に１人の
受聴者のみが存在する場合には、その２つの制御点を受
聴者の両耳の近傍に配置することが好ましい。映像取り
込み手段１０２の視野範囲内に２人の受聴者が存在する
場合には、その２つの制御点を各受聴者の頭部の中心位
置に配置することが好ましい。受聴者が２人の場合に
は、受聴者が１人の場合に比べて、音像制御の性能は劣
るものの、ある程度の音像制御を実現することが可能で
ある。このように、映像取り込み手段１０２の視野範囲
内に存在する受聴者の人数に応じて、制御点の配置を変
更することにより、受聴者の人数に適合した適切な音像
制御を実現することが可能になる。When the sound is reproduced using the two speakers, it is possible to control the sound at the two control points. When only one listener exists within the visual field range of the image capturing means 102, it is preferable to arrange the two control points in the vicinity of both ears of the listener. When there are two listeners within the field of view of the image capturing means 102, it is preferable to arrange the two control points at the center positions of the heads of the listeners. When the number of listeners is two, the performance of sound image control is inferior to that when there is only one listener, but it is possible to realize sound image control to some extent. In this way, by changing the arrangement of the control points according to the number of listeners existing within the visual field range of the image capturing means 102, it is possible to realize appropriate sound image control suitable for the number of listeners. become.

【００５９】なお、受聴者の位置を示す情報として、
（視野角度、距離）または（ｘ、ｙ）座標以外のものを
使用してもよい。As the information indicating the position of the listener,
Other than (viewing angle, distance) or (x, y) coordinates may be used.

【００６０】また、受聴者の顔の向きを検出可能である
場合には、図６および図７に示されるテーブルに顔の向
きを示すパラメータを追加して、受聴者の位置を示す情
報および受聴者の顔の向きを示す情報の組ごとにフィル
タ係数の組（Ｘｌｌ、Ｘｌｒ、Ｘｒｌ、Ｘｒｒ）を定義
しておくことが好ましい。これにより、受聴者の位置お
よび受聴者の顔の向きにかかわらず常に最適な音像制御
（例えば、音像を定位させる音像定位の制御）を行うこ
とが可能になる。If the orientation of the listener's face can be detected, a parameter indicating the orientation of the face is added to the tables shown in FIGS. 6 and 7, and the information indicating the listener's position and the listener's position are received. It is preferable to define a set of filter coefficients (Xll, Xlr, Xrl, Xrr) for each set of information indicating the orientation of the listener's face. This makes it possible to always perform optimum sound image control (for example, control of sound image localization for localizing a sound image) regardless of the position of the listener and the orientation of the listener's face.

【００６１】（実施の形態２）図８は、本発明の実施の
形態２の再生システム２００の構成の一例を示す。図８
において、図２Ａに示される構成要素と同一の構成要素
には同一の参照番号を付し、その説明を省略する。(Second Embodiment) FIG. 8 shows an example of the configuration of a reproducing system 200 according to a second embodiment of the present invention. Figure 8
2A, the same components as those shown in FIG. 2A are designated by the same reference numerals, and the description thereof will be omitted.

【００６２】再生システム２００は、車に搭載されるカ
ーステレオシステムである。再生装置２００は、スピー
カ８０１〜８０４を含む。スピーカ８０１は左フロント
ドアに設置される。スピーカ８０２は右フロントドアに
設置される。スピーカ８０３は左リアダッシュボードに
設置される。スピーカ８０４は右リアダッシュボードに
設置される。The reproduction system 200 is a car stereo system mounted on a car. The playback device 200 includes speakers 801 to 804. The speaker 801 is installed on the left front door. The speaker 802 is installed on the right front door. The speaker 803 is installed on the left rear dashboard. The speaker 804 is installed on the right rear dashboard.

【００６３】映像取り込み手段１０２（例えば、カメ
ラ）は、例えば、ルームミラーまたはフロントダッシュ
ボードに設置される。The image capturing means 102 (for example, a camera) is installed on, for example, a room mirror or a front dashboard.

【００６４】音声制御装置８０８は、メモリ（例えば、
ＲＯＭ）に格納された複数のフィルタ係数のうち、受聴
者の位置を示す情報に対応するフィルタ係数を選択する
係数選択部８０５と、係数選択部８０５によって選択さ
れたフィルタ係数を用いて再生装置１０７から出力され
た音声信号に対して音像制御処理を行う音像制御部８０
６とを含む。The voice control device 808 is a memory (eg,
Of the plurality of filter coefficients stored in the ROM), the coefficient selecting section 805 selecting a filter coefficient corresponding to the information indicating the position of the listener, and the reproducing apparatus 107 using the filter coefficient selected by the coefficient selecting section 805. A sound image control unit 80 that performs a sound image control process on a sound signal output from
6 and.

【００６５】図９は、図８に示される音像制御部８０６
の構成の一例を示す。音像制御部８０６は、２つの入力
端子Ｌｉｎ、Ｒｉｎと、４つの出力端子ＦＬｏｕｔ、Ｆ
Ｒｏｕｔ、ＲＬｏｕｔ、ＲＲｏｕｔとを有している。FIG. 9 shows a sound image control unit 806 shown in FIG.
An example of the configuration will be shown. The sound image control unit 806 has two input terminals Lin and Rin and four output terminals FLout and Fin.
It has Rout, RLout, and RRout.

【００６６】入力端子Ｌｉｎには、再生装置１０７（図
８）から左チャンネルの音声信号が入力される。入力端
子Ｒｉｎには、再生装置１０７（図８）から右チャンネ
ルの音声信号が入力される。出力端子ＦＬｏｕｔから出
力される信号は、増幅装置２０８を介してスピーカ８０
１に供給される。出力端子ＦＲｏｕｔから出力される信
号は、増幅装置２０８を介してスピーカ８０２に供給さ
れる。出力端子ＲＬｏｕｔから出力される信号は、増幅
装置２０８を介してスピーカ８０３に供給される。出力
端子ＲＲｏｕｔから出力される信号は、増幅装置２０８
を介してスピーカ８０４に供給される。The audio signal of the left channel is input to the input terminal Lin from the reproducing device 107 (FIG. 8). A right channel audio signal is input from the playback device 107 (FIG. 8) to the input terminal Rin. The signal output from the output terminal FLout is transmitted to the speaker 80 via the amplification device 208.
1 is supplied. The signal output from the output terminal FRout is supplied to the speaker 802 via the amplification device 208. The signal output from the output terminal RLout is supplied to the speaker 803 via the amplification device 208. The signal output from the output terminal RRout is amplified by the amplification device 208.
Is supplied to the speaker 804 via the.

【００６７】音像制御部８０６は、フィルタ９１１〜９
１４、９２１〜９２４、９３１〜９３４、９４１〜９４
４と、加算器９５１〜９５４、９６１〜９６４、９７１
〜９７４とを含む。The sound image control unit 806 includes filters 911 to 9
14, 921-924, 931-934, 941-94
4 and adders 951 to 954, 961 to 964, 971
.About.974.

【００６８】ここで、フィルタ９１１〜９１４のフィル
タ係数がそれぞれＸ１１〜Ｘ１４であり、フィルタ９２
１〜９２４のフィルタ係数がそれぞれＸ２１〜Ｘ２４で
あり、フィルタ９３１〜９３４のフィルタ係数がそれぞ
れＸ３１〜Ｘ３４であり、フィルタ９４１〜９４４のフ
ィルタ係数がそれぞれＸ４１〜Ｘ４４であるとする。Here, the filter coefficients of the filters 911 to 914 are X11 to X14, respectively, and the filter 92
It is assumed that the filter coefficients of 1 to 924 are X21 to X24, the filter coefficients of the filters 931 to 934 are X31 to X34, and the filter coefficients of the filters 941 to 944 are X41 to X44, respectively.

【００６９】図１０は、図８に示される係数選択部８０
５内のメモリ（例えば、ＲＯＭ）に格納されるテーブル
１０００の一例を示す。図１０において、Ｃ₁₁₁、
Ｃ₁₁₂、・・・、Ｃ₄₄₄のそれぞれは、数値の配列（例え
ば、[０．１，０．２，０．３，−０．３，０．０１]）
を示す。配列内の数値の個数は制御性能に依存する。FIG. 10 shows the coefficient selection unit 80 shown in FIG.
5 shows an example of a table 1000 stored in a memory (for example, ROM) in the memory 5. In FIG. 10, C ₁₁₁ ,
Each of C ₁₁₂ , ..., C ₄₄₄ is an array of numerical values (for example, [0.1, 0.2, 0.3, −0.3, 0.01]).
Indicates. The number of numbers in the array depends on the control performance.

【００７０】テーブル１０００は、乗車パターンごとに
定義されたフィルタ係数の組（Ｘｌ１〜Ｘ１４、Ｘ２１
〜Ｘ２４、Ｘ３１〜Ｘ３４、Ｘ４１〜Ｘ４４）を含む。The table 1000 is a set of filter coefficients (X11 to X14, X21) defined for each boarding pattern.
To X24, X31 to X34, X41 to X44).

【００７１】係数選択部８０５は、位置情報抽出部２０
３から出力された受聴者の位置を示す情報（乗車パター
ン）に対応するフィルタ係数の組（Ｘｌ１〜Ｘ１４、Ｘ
２１〜Ｘ２４、Ｘ３１〜Ｘ３４、Ｘ４１〜Ｘ４４）を選
択し、選択されたフィルタ係数の組（Ｘｌ１〜Ｘ１４、
Ｘ２１〜Ｘ２４、Ｘ３１〜Ｘ３４、Ｘ４１〜Ｘ４４）を
音像制御部８０６に出力する。その結果、選択されたフ
ィルタ係数の組（Ｘｌ１〜Ｘ１４、Ｘ２１〜Ｘ２４、Ｘ
３１〜Ｘ３４、Ｘ４１〜Ｘ４４）が、音像制御部８０６
に含まれるフィルタの組（フィルタ９１１〜９１４、フ
ィルタ９２１〜９２４、フィルタ９３１〜９３４、フィ
ルタ９４１〜９４４）にそれぞれ設定される。The coefficient selection unit 805 has a position information extraction unit 20.
Of filter coefficients (X11 to X14, X) corresponding to the information (boarding pattern) indicating the position of the listener output from No. 3 of FIG.
21-X24, X31-X34, X41-X44), and the set of selected filter coefficients (X11-X14,
X21 to X24, X31 to X34, X41 to X44) are output to the sound image control unit 806. As a result, the set of selected filter coefficients (X11 to X14, X21 to X24, X
31-X34, X41-X44) are sound image control units 806.
Are set in the set of filters (filters 911 to 914, filters 921 to 924, filters 931 to 934, and filters 941 to 944).

【００７２】図１０において、例えば、乗車パターン
（ａ）は、運転席に１人乗車を示し、乗車パターン
（ｂ）は、運転席および助手席に２人乗車を示し、乗車
パターン（ｃ）は、運転席、助手席およびリアシートに
４人乗車を示す。In FIG. 10, for example, the boarding pattern (a) shows one person riding in the driver's seat, the boarding pattern (b) shows two people riding in the driver's seat and the passenger seat, and the boarding pattern (c) shows , 4 seats in the driver's seat, passenger seat and rear seat.

【００７３】このような乗車パターンは、実施の形態１
で説明したように受聴者の人数および位置を検出するこ
とによって判別され得る。ただし、車の座席の位置は固
定されているため、受聴者の位置も限定される。従っ
て、実施の形態１に比べて、受聴者の位置を示す情報の
検出を簡略化してもよい。Such a boarding pattern is used in the first embodiment.
It can be determined by detecting the number and position of listeners as described above. However, because the position of the car seat is fixed, the position of the listener is also limited. Therefore, the detection of the information indicating the position of the listener may be simplified as compared with the first embodiment.

【００７４】さらに、映像取り込み手段１０２および検
出手段１０６に代えて、人が座席に座ったことを検知す
るセンサ（例えば、圧力センサ）を座席に取り付けるこ
とにより、乗車パターンを判別するようにしてもよい。Further, instead of the image capturing means 102 and the detecting means 106, a sensor (for example, a pressure sensor) for detecting that a person is sitting on the seat is attached to the seat so that the boarding pattern can be determined. Good.

【００７５】４つのスピーカを用いて音声を再生する場
合には、４つの制御点における音を制御することが可能
である。When the sound is reproduced using the four speakers, it is possible to control the sounds at the four control points.

【００７６】図１１は、乗車パターンと４つの制御点の
配置との関係を示す。図１１において、ｘは制御点を示
す。FIG. 11 shows the relationship between the boarding pattern and the arrangement of the four control points. In FIG. 11, x indicates a control point.

【００７７】図１１（ａ）に示されるように乗車パター
ンが運転席に１人乗車を示す場合には、４つの制御点を
運転席の受聴者の両耳の近傍に配置することが好まし
い。図１１（ｂ）に示されるように乗車パターンが運転
席および助手席に２人乗車を示す場合には、４つの制御
点のうちの２つの制御点を運転席の受聴者の両耳の近傍
に配置し、残りの２つの制御点を助手席の受聴者の両耳
の近傍に配置することが好ましい。図１１（ｃ）に示さ
れるように乗車パターンが運転席、助手席およびリアシ
ートに４人乗車を示す場合には、４つの制御点のそれぞ
れを各受聴者の頭部の中心位置に配置することが好まし
い。As shown in FIG. 11 (a), when the riding pattern indicates that one person is riding in the driver's seat, it is preferable to arrange four control points in the vicinity of both ears of the listener in the driver's seat. As shown in FIG. 11 (b), when the riding pattern indicates that two passengers are riding in the driver's seat and the passenger's seat, two control points out of the four control points are located near both ears of the listener in the driver's seat. It is preferable to arrange the remaining two control points in the vicinity of both ears of the listener in the passenger seat. As shown in FIG. 11 (c), when the riding pattern indicates that the driver's seat, the passenger's seat, and the rear seat are for four passengers, each of the four control points should be arranged at the center position of the head of each listener. Is preferred.

【００７８】乗車パターンが４人乗車の場合には、乗車
パターンが１人乗車または２人乗車の場合に比べて、音
像制御の性能は劣るものの、ある程度の音像制御を実現
することが可能である。このように、映像取り込み手段
１０２の視野範囲内に存在する受聴者の人数に応じて、
制御点の配置を変更することにより、受聴者の人数に適
合した適切な音像制御を実現することが可能になる。When the riding pattern is for four passengers, the sound image control performance is inferior to that for one passenger or two passengers, but it is possible to realize some degree of sound image control. . In this way, according to the number of listeners existing within the visual field range of the image capturing means 102,
By changing the arrangement of the control points, it is possible to realize appropriate sound image control suitable for the number of listeners.

【００７９】なお、受聴者の年齢を検出可能な場合に
は、受聴者の年齢に応じた音像制御を行うようにしても
よい。例えば、受聴者が高齢者である場合には、音圧が
通常より高くなるように音声信号を処理することが好ま
しい。If the age of the listener can be detected, sound image control may be performed according to the age of the listener. For example, when the listener is an elderly person, it is preferable to process the audio signal so that the sound pressure is higher than usual.

【００８０】また、個人識別が可能な場合には、個人の
好みに応じた音像制御を行うようにしてもよい。When individual identification is possible, sound image control may be performed according to individual preference.

【００８１】図８に示される例では、再生装置１０７か
ら出力される音声信号は、２チャンネルの信号である。
しかし、音声信号のチャンネル数は２に限定されない。
再生装置１０７から出力される音声信号は、２チャンネ
ル以上の信号であり得る。例えば、再生装置１０７は、
ＤＶＤやＢＳディジタルテレビ放送などで取り扱われて
いる５．１チャンネルのマルチチャンネルの音声信号を
音像制御部８０６に出力するようにしてもよい。In the example shown in FIG. 8, the audio signal output from the reproducing device 107 is a 2-channel signal.
However, the number of audio signal channels is not limited to two.
The audio signal output from the reproduction device 107 may be a signal of two or more channels. For example, the playback device 107
A 5.1-channel multi-channel audio signal used in DVDs, BS digital television broadcasting, etc. may be output to the sound image control unit 806.

【００８２】また、図８に示される例では、４チャンネ
ルのスピーカ８０１〜８０４が使用されている。しか
し、スピーカのチャンネル数は４に限定されない。音像
制御の精度を向上させるため、あるいは、２人以上の受
聴者や、受聴者近傍の多点を制御するために、４チャン
ネル以上のスピーカを使用することも可能である。In the example shown in FIG. 8, 4-channel speakers 801 to 804 are used. However, the number of speaker channels is not limited to four. In order to improve the accuracy of sound image control, or to control two or more listeners or multiple points near the listeners, it is possible to use a speaker with four or more channels.

【００８３】上述したように、実施の形態１および実施
の形態２では、音声制御装置が音像制御処理を行う例を
主として説明した。アプリケーションによっては、音声
制御装置が音像制御処理以外の処理を行うことも可能で
ある。また、映像取り込み手段によって取り込まれる映
像に応じて音像制御処理とその他の処理とを切り替える
ことも可能である。音像制御処理以外の処理としては、
例えば、指向性制御処理が挙げられる。指向性制御処理
とは、例えば、取り込み映像に基づいて目標とする受聴
者を検出し、その目標とする受聴者の方向のみに音が伝
わるように制御する処理をいう。As described above, in the first and second embodiments, an example in which the voice control device performs the sound image control process has been mainly described. Depending on the application, the voice control device can also perform processing other than the sound image control processing. It is also possible to switch between the sound image control processing and other processing according to the image captured by the image capturing means. As the processing other than the sound image control processing,
For example, directivity control processing can be mentioned. The directivity control process is, for example, a process of detecting a target listener based on the captured image and controlling so that sound is transmitted only in the direction of the target listener.

【００８４】マルチチャネルスピーカを用いた超指向性
スピーカの技術を用いることにより、音の鋭いビームを
実現することができる。これを受聴者の方向に向けるこ
とにより、受聴者のみに音を伝播させることが可能であ
る。また、音像制御処理と指向性制御処理とを切り替え
るようにしてもよい。A sharp beam of sound can be realized by using a superdirective speaker technology using a multi-channel speaker. By directing this toward the listener, it is possible to propagate the sound only to the listener. Further, the sound image control process and the directivity control process may be switched.

【００８５】なお、再生システム１００または再生シス
テム２００に含まれる各手段および各装置は、ハードウ
ェアで実現されてもよいし、ソフトウェアで実現されて
もよいし、ハードウェアとソフトウェアとの組み合わせ
によって実現されてもよい。Each means and each device included in the reproduction system 100 or the reproduction system 200 may be realized by hardware, software, or a combination of hardware and software. May be done.

【００８６】[0086]

【発明の効果】本発明によれば、受聴者の位置にかから
わず常に最適な処理を再生装置から出力された音声信号
に対して行うことを可能とする再生システムを提供する
ことが可能になる。これにより、受聴者にとってより好
ましい、より臨場感ある音声再生を実現することが可能
になる。As described above, according to the present invention, it is possible to provide a reproduction system capable of always performing the optimum processing on the audio signal output from the reproduction device regardless of the position of the listener. become. As a result, it is possible to realize more realistic audio reproduction that is preferable to the listener.

[Brief description of drawings]

【図１】本発明の実施の形態１の再生システム１００の
構成の一例を示す図FIG. 1 is a diagram showing an example of a configuration of a reproduction system 100 according to a first embodiment of the present invention.

【図２Ａ】再生システム１００の検出手段１０６および
音声制御装置１０８の構成の一例を示す図FIG. 2A is a diagram showing an example of a configuration of a detection unit 106 and a voice control device 108 of the reproduction system 100.

【図２Ｂ】再生システム１００が実装された携帯電話１
２０の構成の一例を示す図FIG. 2B is a mobile phone 1 in which the reproduction system 100 is mounted.
The figure which shows an example of the structure of 20

【図３】検出手段１０６による受聴者の位置の検出の具
体例を示す図FIG. 3 is a diagram showing a specific example of detection of a listener's position by detection means 106.

【図４】ディジタルフィルタ処理を用いて輪郭検出を行
う一例を説明するための図FIG. 4 is a diagram for explaining an example in which contour detection is performed using digital filter processing.

【図５】図２Ａに示される音像制御部２０６の構成の一
例を示す図5 is a diagram showing an example of a configuration of a sound image control unit 206 shown in FIG. 2A.

【図６】図２Ａに示される係数選択部２０５内のメモリ
に格納されるテーブル６００の一例を示す図6 is a diagram showing an example of a table 600 stored in a memory in the coefficient selection unit 205 shown in FIG. 2A.

【図７】図２Ａに示される係数選択部２０５内のメモリ
に格納されるテーブル７００の一例を示す図7 is a diagram showing an example of a table 700 stored in a memory in the coefficient selection unit 205 shown in FIG. 2A.

【図８】本発明の実施の形態２の再生システム２００の
構成の一例を示す図FIG. 8 is a diagram showing an example of a configuration of a reproduction system 200 according to a second embodiment of the present invention.

【図９】図８に示される音像制御部８０６の構成の一例
を示す図9 is a diagram showing an example of the configuration of a sound image control unit 806 shown in FIG.

【図１０】図８に示される係数選択部８０５内のメモリ
（例えば、ＲＯＭ）に格納されるテーブル１０００の一
例を示す。10 shows an example of a table 1000 stored in a memory (for example, ROM) in the coefficient selection unit 805 shown in FIG.

【図１１】乗車パターンと４つの制御点の配置との関係
を示す図FIG. 11 is a diagram showing a relationship between a boarding pattern and arrangement of four control points.

[Explanation of symbols]

１００再生システム１０１受聴者１０２映像取り込み手段１０３、１０４スピーカ１０５ディスプレイ１０６検出手段１０７再生装置１０８音声制御装置 100 reproduction system 101 listener 102 image capturing means 103, 104 speaker 105 display 106 detection means 107 Playback device 108 voice control device

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5C054 AA01 AA05 CA04 CC02 CD03 CE01 CH01 DA08 EA01 FA00 FC03 FC12 FC13 FF03 HA17 5D020 BB03 ─────────────────────────────────────────────────── ─── Continued front page F-term (reference) 5C054 AA01 AA05 CA04 CC02 CD03 CE01 CH01 DA08 EA01 FA00 FC03 FC12 FC13 FF03 HA17 5D020 BB03

Claims

[Claims]

1. A video capturing means for capturing a video including at least a part of a listener, a reproducing device for outputting an audio signal by reproducing a voice, and a video signal output from the video capturing means, A detection unit that detects the position of the listener, and a voice control device that performs processing according to the position of the listener detected by the detection unit on the audio signal output from the reproduction device, Playback system.

2. The reproduction system according to claim 1, wherein the reproduction device outputs audio signals of two or more channels.

3. The reproduction system according to claim 1, wherein the detection means detects the position of the head of the listener.

4. The sound control device according to claim 1, wherein the sound control device performs a sound image control process according to the position of the listener detected by the detection means on the sound signal output from the reproduction device. Reproduction system.

5. The audio control device according to claim 1, wherein the audio control device performs directivity control processing according to the position of the listener detected by the detection means on the audio signal output from the reproduction device. Playback system described.

6. The reproduction system according to claim 1, wherein the detection unit further detects the number of listeners.

7. The reproduction system according to claim 1, wherein the reproduction system is used in a mobile terminal.