JPS59147398A - Voice section detector - Google Patents
Voice section detectorInfo
- Publication number
- JPS59147398A JPS59147398A JP2104483A JP2104483A JPS59147398A JP S59147398 A JPS59147398 A JP S59147398A JP 2104483 A JP2104483 A JP 2104483A JP 2104483 A JP2104483 A JP 2104483A JP S59147398 A JPS59147398 A JP S59147398A
- Authority
- JP
- Japan
- Prior art keywords
- speaker
- image
- voice section
- audio
- image sensor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Image Processing (AREA)
Abstract
(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.
Description
【発明の詳細な説明】
本発明は、音声認識等の音声信号処理装置に使用して好
適な音声区間検出装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech interval detection device suitable for use in an audio signal processing device such as speech recognition.
従来技や11
音声認識等を目的とした音声信号処理装置は既に周知で
あるが、該音声信号処理装置においては、その前処理と
して音声の存在する時間区間を検出することが必要であ
り、このためには、音声の開始時と終了時を明確に把握
することが大切である。Conventional techniques and 11. Audio signal processing devices for the purpose of speech recognition, etc. are already well known, but in this audio signal processing device, it is necessary to detect a time interval in which speech exists as a preprocessing. In order to do this, it is important to clearly understand when the audio starts and ends.
この前処理のため、従来は、例えば、振幅レベル比較器
等を用いて、マイクロフォンからの入力信号がある閾値
を越えた時点を開始時、また、ある閾値より下った時点
を終了時として認識していた。For this preprocessing, conventionally, for example, an amplitude level comparator or the like is used to recognize the point in time when the input signal from the microphone exceeds a certain threshold value as the start point, and the point in time when it falls below a certain threshold value as the end point. was.
しかしながら、一般に、音声信号は周囲雑音に重量され
ているので、雑音レベルによって開始、終了の検出時が
変動し、正しい音声区間の検出が困難であった。However, since the voice signal is generally weighted by ambient noise, the start and end detection times vary depending on the noise level, making it difficult to detect the correct voice section.
1−一回
本発明は、上述のごとき従来技術の欠点を解決するため
になされたもので、特に、音声入力の時間波形の処理に
加えて口唇の画像処理を併用することにより、入力信号
中の音声区間を正しく検出し得るようにしたものである
。1-Once The present invention has been made to solve the above-mentioned drawbacks of the prior art, and in particular, by combining lip image processing in addition to processing the time waveform of audio input, It is possible to correctly detect the voice section of .
司エーーー戎
本発明の構成について、以下、実施例に基づいて説明す
る。The configuration of the present invention will be described below based on embodiments.
第1図は、本発明の一実施例を説明するための構成図で
、図中、1は話者、2はマイクロフォン。FIG. 1 is a configuration diagram for explaining one embodiment of the present invention, in which 1 is a speaker and 2 is a microphone.
3は画(象センサで、図示のように、話者lから発声さ
れる音声波Aを収音するマイクロフォン2に並へである
種の画像センサ3を01置し、音声信号の収音と同時に
話者の口唇画像情報をも収集するようにしたものである
。ここで用いられる画像センサ3としては、原゛理的に
はいかなる種類のものでもよいが、リアルタイム処理性
を考えると、電気信号に変換されるものが望ましく乳剤
等を用いた写真乾板類は望ましくない。従って、小形ビ
ジコンやCCDセンサを用いた撮像板が考えられるが、
本発明においては、解像度は要求されないので、比較的
素子数の少ないCCD板等の固体撮像索子等が適してい
る。本発明lま上記のごどき画像センサを用いて、話者
の口唇画像を得、これによって音声区間を検出するもの
である。つまり、発声区間においては1話者の[1唇は
たえず動いており、1]唇画像情報もたえず変化してい
るが、発声していない区間においては、話者の口唇は1
にまっており、11唇画像も−゛定を保っている。従っ
て、−・般的に考えて、無声区間においては、11脣画
像は高いフレーム相関値を示し、他方、発声区間に45
いては、[1唇画而は比較的低いフレーム相関値を斤す
のて、この事に着目すれは、フレーム相関値が比+11
2的艮11.7間高い値を保−)だ後、高→低へ激減す
る]1,7点を音声開始時点と判定でき、これに対し、
フレーl、相関値が低→高ど激増し、その後、長時間、
高い値を保つようであ扛ば、その変化時点を音声開始時
点とI’l断することができる。こび)場合、正確に音
声区間を検出するためには、通常のテLノビジョン画像
よりもはるかに短いフL/−7、周期(l rn s
p c:〜数Ill S e (:程度)h;要;Rさ
Jしこ、が、本発明において1才、画素数が通常のテレ
ビジョンより著しく少なくしてよいので、簡rti−に
実況できることは明らかである。なお、月ノーム相関1
1− つの手段でよ〕って、必ずしも、実用的1こ考え
でこ4Lが最良とは云えず、例えば、画像中のある部分
に注[(シてこの部分の変化の状態をみて検出部る3L
うにすることも可能である。3 is an image sensor, and as shown in the figure, a type of image sensor 3 is placed on the microphone 2 that collects the audio wave A uttered by the speaker 1, and the image sensor 3 is used to collect the audio signal. At the same time, lip image information of the speaker is also collected.In principle, any type of image sensor 3 may be used as the image sensor 3 used here, but considering real-time processing performance, electric It is desirable to have something that can be converted into a signal, and not to use photographic plates that use emulsions or the like.Therefore, an image pickup plate that uses a small vidicon or CCD sensor may be considered.
In the present invention, since high resolution is not required, a solid-state imaging device such as a CCD board having a relatively small number of elements is suitable. The present invention uses the above-described image sensor to obtain an image of the lips of a speaker, and thereby detects a speech interval. In other words, in the utterance period, the speaker's [1] lip is constantly moving, and the lip image information is also constantly changing, but in the non-utterance period, the speaker's lips are constantly changing.
The 11th lip image also remains constant. Therefore, - Generally speaking, in the unvoiced section, 11 images show a high frame correlation value, while in the vocal section, 45
In this case, [1 lip image has a relatively low frame correlation value, and paying attention to this fact, the frame correlation value is +11
After maintaining a high value for 11.7 seconds, it sharply decreases from high to low] 1.7 points can be determined as the starting point of the voice, and on the other hand,
Frey, the correlation value increases dramatically from low to high, and then for a long time,
If it seems to maintain a high value, the point of change can be separated from the point of voice start. In this case, in order to accurately detect the voice section, the period (l rn s
p c: ~ several Ill S e (: degree) h; essential; It is clear that it can be done. In addition, monthly norm correlation 1
However, it cannot be said that 4L is necessarily the best for practical purposes. 3L
It is also possible to do so.
第2図は、本発明の他の実施例を示す要部構成図で、図
中、4は対物レンズ、5は光ファイノヘー、6は両偉セ
ンサ、7はスキャナーである。而して、この実施例は、
口唇の画像検出部において、音声波ピックアップ用マイ
クロフォンと一体化する部分を小形軽量にするために、
対物レンズ4を画像センリ゛部6と切り離し、対物レン
ズ4だけをマイクロフォンと−・体化し、画像センサ部
6を本体処理部にaまゼるようにし、この間を光ファイ
ノベー5゛C接続するようにしたものである。FIG. 2 is a block diagram of main parts showing another embodiment of the present invention. In the figure, 4 is an objective lens, 5 is an optical fiber, 6 is an optical sensor, and 7 is a scanner. Therefore, in this embodiment,
In order to make the portion integrated with the audio wave pickup microphone in the lip image detection section small and lightweight,
The objective lens 4 is separated from the image sensor unit 6, and only the objective lens 4 is integrated into a microphone, and the image sensor unit 6 is placed in front of the main processing unit, and an optical fiber 5C is connected between them. This is what I did.
なお1以上に、本発明の各実施例1′一つ(Aてβ2明
したが、本発明は、」二記実施例しこ1;長足さJする
ものではなく、例えば、音速と光速の違シ1番;よる(
Hrlずれが照視できない場合に1よ、これを有U正す
る手段を設けるようにすることも考えら4する。It should be noted that each of the embodiments of the present invention (A and β2) has been described above, but the present invention is not limited to two embodiments. Difference No. 1; Depends (
It is also considered to provide a means for correcting the Hrl deviation when it cannot be irradiated.
幼−一−釆
以トの説明から明らかなようしこ、本発明し;よると、
音声信号と口唇画像信号を音声処理のための情報源とす
るようにしたので、音声区■■を粘度よく検出すること
ができる。According to the present invention, it is clear from the explanation of Yo-ichi-Kanei.
Since the audio signal and the lip image signal are used as information sources for audio processing, it is possible to detect the audio segment (■) with good viscosity.
第1図及び第2図は、それぞれ本発明の実施イ911を
説明するための構成図である。
1・・・話者、2・・・マイクロフォン、3・・画(f
if(センサ、4・・・対物レンズ、5・・・光ファイ
ノ<−6・・・画像センサ、7・・スキャナー。
第 l 図FIGS. 1 and 2 are configuration diagrams for explaining an implementation 911 of the present invention, respectively. 1...Speaker, 2...Microphone, 3...Picture (f
if (sensor, 4... objective lens, 5... optical fiber <-6... image sensor, 7... scanner. Figure l
Claims (1)
一体化して話者の口唇像を得る画像センサを有し、話者
の音声波及び口唇画像情報を音声認識等の音声処理のた
めの情報源にしたことを特徴とする音声区間検出装置, has an image sensor that obtains an image of the speaker's lips by integrating it with a mask logo that pinks up the audio waves in box 15, and uses the speaker's audio waves and lip image information as an information source for audio processing such as speech recognition. A voice section detection device characterized by
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2104483A JPS59147398A (en) | 1983-02-10 | 1983-02-10 | Voice section detector |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2104483A JPS59147398A (en) | 1983-02-10 | 1983-02-10 | Voice section detector |
Publications (1)
Publication Number | Publication Date |
---|---|
JPS59147398A true JPS59147398A (en) | 1984-08-23 |
Family
ID=12043928
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2104483A Pending JPS59147398A (en) | 1983-02-10 | 1983-02-10 | Voice section detector |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS59147398A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011097268A (en) * | 2009-10-28 | 2011-05-12 | Sony Corp | Playback device, headphone, and playback method |
JP2014109770A (en) * | 2012-12-04 | 2014-06-12 | Samsung R&D Institute Japan Co Ltd | Speech processing unit, speech recognition system, speech processing method, and speech processing program |
-
1983
- 1983-02-10 JP JP2104483A patent/JPS59147398A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011097268A (en) * | 2009-10-28 | 2011-05-12 | Sony Corp | Playback device, headphone, and playback method |
JP2014109770A (en) * | 2012-12-04 | 2014-06-12 | Samsung R&D Institute Japan Co Ltd | Speech processing unit, speech recognition system, speech processing method, and speech processing program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4896838B2 (en) | Imaging apparatus, image detection apparatus, and program | |
JP2008263498A (en) | Wind noise reducing device, sound signal recorder and imaging apparatus | |
CN105578097A (en) | Video recording method and terminal | |
JP2009141555A (en) | Imaging apparatus with voice input function and its voice recording method | |
US20110102619A1 (en) | Imaging apparatus | |
JP6610725B2 (en) | Sound processing apparatus and sound processing program | |
CN102572263B (en) | Imaging apparatus and audio processing apparatus | |
US9282229B2 (en) | Audio processing apparatus, audio processing method and imaging apparatus | |
JPS59147398A (en) | Voice section detector | |
US20240404547A1 (en) | Sound source determining method and system, electronic device and readable storage medium | |
JPS6195203A (en) | Optical cutting line detection device | |
CN109637555B (en) | Japanese speech recognition translation system for business meeting | |
CN113762110A (en) | Law enforcement instant evidence fixing method and law enforcement instrument | |
JPH10285483A (en) | Method and apparatus for measuring time difference between television video signal and audio signal | |
Yoshida et al. | Sound quality improvement of extracted sound from video with rolling-shuttered camera | |
Kiritani et al. | Simultaneous high-speed digital recording of vocal fold vibration and speech signal | |
Yoshizawa et al. | Speech extraction with RGB-intensity gradient on rolling-shutter video | |
JP2003298916A (en) | Imaging apparatus, data processing apparatus and method, and program | |
Shindo et al. | Noise-reducing sound capture based on exposure-time of still camera | |
JPS5949742A (en) | Apparatus for detecting exhalation force | |
JP2002259990A (en) | Character input method and device as well as character input program and storage medium storing the program | |
JPH04207668A (en) | Image processor | |
JPS6079272A (en) | Method and apparatus for measuring speed of flying object | |
KR960028203A (en) | Video camera device | |
JPS58178761U (en) | Focus adjustment device for imaging device |