JP2002108390A

JP2002108390A - Speech recognition device and computer-readable recording medium

Info

Publication number: JP2002108390A
Application number: JP2000294434A
Authority: JP
Inventors: Akira Tsuruta; 彰鶴田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2000-09-27
Filing date: 2000-09-27
Publication date: 2002-04-10

Abstract

(57)【要約】【課題】抽出すべき音声区間の最初の部分や最後の部分
の欠落を少なくし、ユーザにスイッチを押すタイミング
と発声のタイミングを意識させるような煩わしさを軽減
する。【解決手段】音声入力部１と、スイッチ３の操作状態を
検出するスイッチ状態検出部４と、入力制御部２を設
け、その入力制御部２による制御により、音声信号の取
り込みが必要なときに音声入力部１に対して音声入力部
１の使用を通知し、音声信号の取り込みが必要でなくな
ったときに音声入力部１に対して音声入力部１の使用終
了を通知するとともに、スイッチ３がＯＦＦからＯＮ状
態になったときに、すぐに音声入力部１に対して音声信
号の取り込み開始を通知し、スイッチ３がＯＮからＯＦ
Ｆ状態になったときに、所定時間だけ遅れて音声信号の
取り込み終了を通知する。 (57) [Summary] [Problem] To reduce the omission of a first part and a last part of a voice section to be extracted, and to reduce a troublesomeness of making a user aware of a timing of pressing a switch and a timing of utterance. An audio input unit, a switch state detection unit for detecting an operation state of a switch, and an input control unit are provided. When the use of the voice input unit 1 is notified to the voice input unit 1 and the use of the voice signal is no longer necessary, the voice input unit 1 is notified of the end of use of the voice input unit 1 and the switch 3 is operated. When the state changes from OFF to ON, the audio input unit 1 is immediately notified of the start of the capture of the audio signal, and the switch 3 is turned ON to OFF.
When the state becomes the F state, the end of the capture of the audio signal is notified with a delay of a predetermined time.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、スイッチ操作と連
動させて必要な音声信号を取り込み、取り込んだ音声信
号を認識する音声認識装置及びコンピュータ読み取り可
能な記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device and a computer-readable recording medium which fetch a required voice signal in conjunction with a switch operation and recognize the fetched voice signal.

【０００２】[0002]

【従来の技術】現在、パーソナルコンピュータ（以下、
パソコンという）等の機械に対して所定の動作を行なわ
せる際に、キーボードやマウス等を用いる方法が主流で
ある。ところが、パソコンにおいて、例えば、よく見る
Ｗｅｂページを表示する場合、Ｗｅｂブラウザを起動
し、「お気に入り」等に登録しているＷｅｂページを選
択するか、あるいは複雑なＵＲＬを入力する必要があり
面倒である。そこで、直接音声を入力することで指令を
与え、所定の動作を行なわせる音声認識が注目され始め
ている。2. Description of the Related Art At present, personal computers (hereinafter, referred to as personal computers).
When a predetermined operation is performed on a machine such as a personal computer, a method using a keyboard, a mouse, or the like is mainly used. However, when displaying a frequently viewed Web page on a personal computer, for example, it is necessary to start up a Web browser and select a Web page registered as a “favorite” or input a complicated URL. is there. Therefore, voice recognition which gives a command by directly inputting a voice and performs a predetermined operation has been attracting attention.

【０００３】従来、この種の音声認識装置として、図８
に示すように、マイクによって集音したアナログの音声
信号をＡ／Ｄ変換器でディジタル化した音声信号１１を
取り込み、その音声信号１１から算出した音声パワーを
音声区間切り出し部１２に与えて音声区間を切り出し、
その切り出された音声区間について音響分析部１３によ
り特徴ベクトルを抽出してマッチング部１４に与え、こ
のマッチング部１４において、辞書部１５に予め登録さ
れている標準パターンとのマッチングを行うことによ
り、最も類似したものを認識結果１６として出力するよ
うにした装置がある。Conventionally, as this type of speech recognition apparatus, FIG.
As shown in the figure, an audio signal 11 obtained by digitizing an analog audio signal collected by a microphone with an A / D converter is taken in, and the audio power calculated from the audio signal 11 is given to an audio section cutout unit 12 to provide an audio section. Cut out
A feature vector of the cut-out speech section is extracted by the acoustic analysis unit 13 and provided to the matching unit 14. The matching unit 14 performs matching with a standard pattern registered in advance in the dictionary unit 15, so that There is a device that outputs a similar result as the recognition result 16.

【０００４】この場合の音声区間の検出方法としては、
例えば音声信号から算出した音声パワーについて、音声
パワーが所定値以上に増大した開始点を検出し、音声パ
ワーが第１閾値以下となる仮終了点を検出した後、所定
の第２閾値を超えて未満となる位置に仮終了点を順次移
動していき、最終の仮終了点が終了点と決定したことに
対応し、前記開始点から終了点までの音声信号を音声区
間として切り出すものがある。In this case, a method for detecting a voice section is as follows.
For example, with respect to the audio power calculated from the audio signal, a start point at which the audio power is increased to a predetermined value or more is detected, and a temporary end point at which the audio power is equal to or less than a first threshold is detected, In some cases, the tentative end point is sequentially moved to a position of less than and the audio signal from the start point to the end point is cut out as an audio section in response to the determination of the final tentative end point as the end point.

【０００５】しかし、常に音声信号を取り込んでいる状
態で音声区間の検出を行うと、取り込まれた音声信号に
「えーっと」などの不要語や、突発的な周囲の騒音（ノ
イズ）が入った場合、これら不要語やノイズ区間を誤っ
て音声区間として検出してしまい、認識が行われてしま
う。However, if a voice section is detected while a voice signal is constantly being captured, an unnecessary word such as "um" or sudden surrounding noise (noise) may be included in the captured voice signal. However, these unnecessary words and noise sections are erroneously detected as speech sections, and recognition is performed.

【０００６】そこで実際には、スイッチを押している間
において音声信号の入力を受け付ける方法や、スイッチ
を１度押した後、一定時間音声入力を受け付ける方法な
ど、必要な音声区間をユーザが直接指定する方法と、前
記した音声区間の検出とを組み合わせることにより、不
要語や周囲の騒音等による音声区間の誤検出を防いでい
る。Therefore, in practice, the user directly specifies a necessary voice section, such as a method of receiving a voice signal input while the switch is being pressed or a method of receiving a voice input for a certain time after pressing the switch once. By combining the method and the above-described detection of the voice section, erroneous detection of the voice section due to unnecessary words, ambient noise, and the like is prevented.

【０００７】[0007]

【発明が解決しようとする課題】ところで、以上の方法
においては、ユーザが発声の開始よりも少しでも遅れて
スイッチを押した場合や、ユーザの発声の終了よりも少
しでも早くスイッチを離した場合、抽出すべき音声区間
の最初の部分や最後の部分が欠けてしまうという問題が
生じる。By the way, in the above method, when the user presses the switch at least slightly after the start of the utterance, or when the user releases the switch at least a little before the end of the utterance of the user. However, there is a problem that the first part and the last part of the voice section to be extracted are missing.

【０００８】これを解決する１つの方法として、特開平
８−１８５１９６号公報では、入力音声信号を常に取り
込んで記憶しておき、スイッチ操作で指定される検出区
間より広い範囲で、記憶している入力音声信号から１つ
だけ音声区間を抽出し、出力するようにしている。As one method for solving this, in Japanese Patent Application Laid-Open No. Hei 8-185196, an input audio signal is always captured and stored, and stored over a wider range than a detection section designated by a switch operation. Only one voice section is extracted from the input voice signal and output.

【０００９】しかし、パソコン等においてこの方法を用
いた場合、１つのアプリケーションソフトが音声入力デ
バイスを占有してしまい、音声入力を必要とする他のア
プリケーションソフトを動作させても、音声入力デバイ
スを使用することができないという問題が生じる。ま
た、低消費電力で長時間使用可能であることが重要な要
素の１つである携帯情報端末等においてこの方法を用い
た場合、常に音声信号の取り込みを行うため消費電力が
増えてしまい、使用可能時間が短くなるという問題が生
じる。However, when this method is used in a personal computer or the like, one application software occupies the voice input device, and even if another application software that requires voice input is operated, the voice input device is used. A problem arises that it cannot be performed. In addition, when this method is used in a portable information terminal or the like, which is one of the important factors of being able to use for a long time with low power consumption, the power consumption increases because the audio signal is always taken in, so that the power consumption increases. There is a problem that the available time is shortened.

【００１０】本発明はそのような実情に鑑みてなされた
もので、スイッチを用いて必要な音声区間をユーザが直
接指定する方法において、抽出すべき音声区間の最初の
部分や最後の部分の欠落を少なくし、ユーザにスイッチ
を押すタイミングと発声のタイミングを意識させるよう
な煩わしさをできるだけ軽減するとともに、必要なとき
だけ音声入力部を動作させて音声信号を取り込むことに
より消費電力を抑えことができ、しかも１つのアプリケ
ーションソフトが常に音声入力部を占有することなく、
他のアプリケーションソフトからも音声入力部を使用す
ることが可能な入力制御を行うことができる音声認識装
置と、そのような処理プログラムを記録したコンピュー
タ読み取り可能な記録媒体の提供を目的とする。The present invention has been made in view of such circumstances, and in a method in which a user directly specifies a necessary voice section using a switch, the first section or the last section of the voice section to be extracted is missing. And reduce the hassle of making the user aware of the timing of pressing the switch and the timing of the utterance as much as possible.Also, the power consumption can be reduced by operating the voice input unit only when necessary and capturing the voice signal. Yes, and one application software does not always occupy the voice input unit,
It is an object of the present invention to provide a voice recognition device capable of performing input control that can use a voice input unit from other application software, and a computer-readable recording medium storing such a processing program.

【００１１】[0011]

【課題を解決するための手段】本発明の音声認識装置
は、音声信号の取り込みを指定するスイッチ操作に応じ
て音声信号を取り込み、取り込んだ音声信号を認識する
音声認識装置であって、音声入力部と、スイッチの操作
状態を検出するスイッチ状態検出部と、入力制御部を備
えており、その入力制御部は、音声信号の取り込みが必
要なときに音声入力部に対して音声入力部の使用を通知
し、音声信号の取り込みが必要でなくなったときに音声
入力部に対して音声入力部の使用終了を通知するととも
に、スイッチがＯＦＦからＯＮ状態になったときに、す
ぐに音声入力部に対して音声信号の取り込み開始を通知
し、スイッチがＯＮからＯＦＦ状態になったときに、所
定時間だけ遅れて音声信号の取り込み終了を通知する機
能を備えていることによって特徴づけられる。A voice recognition device according to the present invention is a voice recognition device for capturing a voice signal in response to a switch operation for designating the capture of a voice signal and recognizing the captured voice signal. Unit, a switch state detection unit for detecting an operation state of the switch, and an input control unit, and the input control unit uses the voice input unit for the voice input unit when it is necessary to capture a voice signal. When the switch from the OFF state to the ON state is sent to the voice input unit, the voice input unit is notified immediately when the use of the audio signal is no longer necessary. A function shall be provided to notify the start of audio signal capture and notify the end of audio signal capture after a predetermined time when the switch changes from ON to OFF. Thus characterized.

【００１２】本発明の音声認識装置によれば、抽出すべ
き音声区間の最初の部分や最後の部分の欠落を少なくす
ることができ、ユーザにスイッチを押すタイミングと発
声のタイミングを意識させるような煩わしさをできるだ
け軽減することができる。また、音声の認識に必要なと
きだけ音声入力部を動作させて音声信号を取り込むの
で、消費電力を抑えことができる。According to the speech recognition apparatus of the present invention, it is possible to reduce the loss of the first part and the last part of the speech section to be extracted, and to make the user aware of the timing of pressing the switch and the timing of utterance. The annoyance can be reduced as much as possible. In addition, since the voice input unit is operated to capture the voice signal only when necessary for voice recognition, power consumption can be reduced.

【００１３】本発明の音声認識装置において、入力制御
部は、スイッチがＯＦＦからＯＮ状態になったときに、
すぐに音声入力部に対して音声入力部の使用及び音声信
号の取り込み開始を通知し、スイッチがＯＮからＯＦＦ
状態になったときに、所定時間だけ遅れて音声入力部に
対して音声信号の取り込み終了及び音声入力部の使用終
了を通知する機能を備えていてもよい。[0013] In the voice recognition device of the present invention, the input control unit operates when the switch is turned from OFF to ON.
Immediately notifies the voice input unit of the use of the voice input unit and the start of capturing the audio signal, and switches from ON to OFF
A function may be provided for notifying the audio input unit of the end of capturing the audio signal and the end of use of the audio input unit after a predetermined time delay when the state is entered.

【００１４】この発明によれば、スイッチのＯＮ／ＯＦ
Ｆに応じて音声入力部を動作させて音声信号を取り込む
ので、消費電力を更に抑えことができる。According to the present invention, ON / OF of the switch is performed.
Since the audio input unit is operated according to F to capture an audio signal, power consumption can be further reduced.

【００１５】本発明の音声認識装置において、入力制御
部は、スイッチがＯＦＦからＯＮ状態になったときに、
すぐに音声入力部に対して音声信号の取り込み開始を通
知するとともに、スイッチがＯＮ状態になったときか
ら、予め設定した設定時間が経過した後に、ユーザに対
して音声入力可能な状態であることを通知する機能を備
えていてもよい。[0015] In the voice recognition device of the present invention, the input control unit operates when the switch is turned from OFF to ON.
Immediately notify the voice input unit of the start of voice signal capture, and enable voice input to the user after a preset time has elapsed since the switch was turned on. May be provided.

【００１６】この発明によれば、音声入力可能な状態を
表す通知（例えばアイコンの表示等）を行う時点の少し
前から音声信号の取り込みを行うようにしているので、
ユーザが音声入力可能な状態を表す通知よりも少し早く
発声しても、語頭の音声が欠けることを減らすことがで
きる。According to the present invention, the audio signal is fetched shortly before the notification (for example, display of an icon, etc.) indicating the state in which the voice can be input.
Even if the user speaks a little earlier than the notification indicating the state in which the voice can be input, it is possible to reduce the lack of the beginning voice.

【００１７】本発明の音声認識装置において、入力制御
部は、スイッチがＯＮからＯＦＦ状態になったときか
ら、予め設定した設定時間が経過した後に、音声入力部
に対して音声信号の取り込み終了を通知する機能を備え
ていてもよい。In the voice recognition device of the present invention, the input control unit terminates the capture of the voice signal to the voice input unit after a preset time has elapsed since the switch was turned from ON to OFF. A function for notifying may be provided.

【００１８】この発明によれば、ユーザがスイッチを離
すタイミングをそれほど意識しなくても、語尾の音声が
欠けることを減らすことができる。According to the present invention, the lack of voice at the end can be reduced even if the user does not notice the timing of releasing the switch.

【００１９】本発明の音声認識装置において、入力制御
部は、音声区間検出機能を備えているとともに、スイッ
チがＯＮからＯＦＦ状態になり、かつ音声区間の検出が
終了した後に、音声入力部に対して音声信号の取り込み
終了を通知する機能を備えていてもよい。In the speech recognition apparatus according to the present invention, the input control unit has a speech section detection function and, after the switch is turned from ON to OFF and the detection of the speech section has been completed, the input control unit controls the speech input unit. May be provided for notifying the end of the capture of the audio signal.

【００２０】この発明によれば、音声区間中に、音声入
力部が音声信号の取り込みを終了することがなくなるの
で、音声区間の最後の部分の音声の欠落を少なくするこ
とができる。According to the present invention, since the voice input unit does not end capturing the voice signal during the voice section, the loss of voice in the last part of the voice section can be reduced.

【００２１】本発明の記録媒体は、音声信号の取り込み
を指定するスイッチ操作に応じて音声信号を取り込み、
取り込まれた音声信号を認識する処理プログラムを記録
したコンピュータ読み取り可能な記録媒体であって、音
声信号の取り込みが必要なときに音声入力部に対して音
声入力部の使用を通知し、音声信号の取り込みが必要で
なくなったときに音声入力部に対して音声入力部の使用
終了を通知するとともに、スイッチがＯＦＦからＯＮ状
態になったときに、すぐに音声入力部に対して音声信号
の取り込み開始を通知し、スイッチがＯＮからＯＦＦ状
態になったときに、所定時間だけ遅れて音声信号の取り
込み終了を通知するための処理プログラムを記録してい
ることによって特徴づけられる。The recording medium of the present invention captures an audio signal in response to a switch operation for designating the capture of an audio signal,
A computer-readable recording medium on which a processing program for recognizing a captured audio signal is recorded. Notifies the voice input unit of the end of use of the voice input unit when capture is no longer required, and starts capturing voice signals to the voice input unit immediately when the switch is turned on from OFF. And a processing program for notifying the end of the capture of the audio signal with a delay of a predetermined time when the switch changes from the ON state to the OFF state.

【００２２】[0022]

【発明の実施の形態】以下、本発明の実施形態を図面に
基づいて説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００２３】図１は、本発明の音声認識装置の実施形態
の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an embodiment of the speech recognition apparatus of the present invention.

【００２４】図１の音声認識装置は、音声入力部１、入
力制御部２、スイッチ３、スイッチ状態検出部４、音声
認識部５及び表示部６などを主体として構成されてい
る。The speech recognition apparatus shown in FIG. 1 is mainly composed of a speech input unit 1, an input control unit 2, a switch 3, a switch state detection unit 4, a speech recognition unit 5, a display unit 6, and the like.

【００２５】音声入力部１は、マイク１Ａによって集音
されたアナログの音声信号をＡ／Ｄ変換器１Ｂでディジ
タル化する。ディジタル化された音声信号は、入力制御
部２及び音声認識部５に入力される。The audio input unit 1 digitizes an analog audio signal collected by the microphone 1A with an A / D converter 1B. The digitized voice signal is input to the input control unit 2 and the voice recognition unit 5.

【００２６】入力制御部２は、スイッチ３の状態や音声
区間の検出状況に応じて、音声入力部１を制御すること
により音声信号の取り込みを制御する。The input control unit 2 controls the input of the audio signal by controlling the audio input unit 1 in accordance with the state of the switch 3 and the detection state of the audio section.

【００２７】スイッチ３は、ユーザのスイッチ操作によ
りＯＮ、ＯＦＦ信号を発生する。ＯＮ、ＯＦＦ信号は、
スイッチ状態検出部４に入力される。The switch 3 generates an ON / OFF signal by a user's operation of the switch. ON and OFF signals are
The signal is input to the switch state detector 4.

【００２８】スイッチ状態検出部４は、スイッチ３の状
態がＯＦＦからＯＮ状態になったときにＯＮ信号を入力
制御部２に送り、また、スイッチ３の状態がＯＮからＯ
ＦＦ状態になったときにＯＦＦ信号を入力制御部２に送
る。The switch state detector 4 sends an ON signal to the input controller 2 when the state of the switch 3 changes from OFF to ON, and changes the state of the switch 3 from ON to O.
An OFF signal is sent to the input control unit 2 when the state becomes the FF state.

【００２９】音声認識部５は、例えば隠れマルコフモデ
ル( 以下、ＨＭＭという )を用いた音声認識手法の場
合、認識対象単語に対応するＨＭＭの全てについて、そ
の生起確率を求め、生起確率の最も高いＨＭＭに対応す
る単語を認識結果とする。For example, in the case of a speech recognition method using a Hidden Markov Model (hereinafter, referred to as HMM), the speech recognition unit 5 obtains the occurrence probability of all HMMs corresponding to the word to be recognized, and obtains the highest occurrence probability. A word corresponding to the HMM is set as a recognition result.

【００３０】ＨＭＭは、大量の音声データから得られる
音声の統計的特徴をモデル化したものであり、このＨＭ
Ｍを用いた音声認識手法の詳細は、中川聖一著「確率モ
デルによる音声認識」に開示されている。The HMM is a model of the statistical characteristics of speech obtained from a large amount of speech data.
Details of the speech recognition method using M are disclosed in "Speech Recognition by Stochastic Model" by Seichi Nakagawa.

【００３１】表示部６はアイコンや認識結果等を表示す
る。The display unit 6 displays icons, recognition results, and the like.

【００３２】以上の構成の音声認識装置における入力制
御の動作を具体的に説明する。The operation of input control in the above-structured speech recognition apparatus will be specifically described.

【００３３】まず、ユーザが、音声入力を行うためにス
イッチ３を押しながら発声する場合の例を、図２のフロ
ーチャート及び図１を参照しながら説明する。First, an example in which the user speaks while pressing the switch 3 to input a voice will be described with reference to the flowchart of FIG. 2 and FIG.

【００３４】ステップＳ１１：スイッチ状態検出部４に
よって、ユーザがスイッチ３を押したか否かが判断され
る。スイッチ３が押されたと判断した場合、入力制御部
２によって、音声入力部１に対して音声入力部１を使用
したい旨が通知される（ステップＳ１２）。Step S11: The switch state detector 4 determines whether the user has pressed the switch 3. If it is determined that the switch 3 has been pressed, the input control unit 2 notifies the voice input unit 1 that the user wants to use the voice input unit 1 (step S12).

【００３５】ステップＳ１３：入力制御部２によって、
音声入力部１からの情報に基づいて使用可能か否かが判
断される。他のアプリケーションソフトにより音声入力
部１が使用中である場合は音声入力部１を使用すること
ができないので、ステップＳ２２に移行して、現在、他
のアプリケーションソフトが音声入力部１を使用中であ
る旨のエラーメッセージをユーザに対して出力して処理
を終了する。Step S13: By the input control unit 2,
It is determined based on the information from the voice input unit 1 whether it can be used. If the voice input unit 1 is being used by another application software, the voice input unit 1 cannot be used. Therefore, the process proceeds to step S22, and the other application software is currently using the voice input unit 1. An error message to the effect is output to the user, and the process ends.

【００３６】音声入力部１が使用可能である場合、これ
以降、他のアプリケーションソフトにより音声入力部１
を使用することができなくなり、入力制御部２によっ
て、音声入力部１に対して音声信号の取り込み開始が通
知される（ステップＳ１４）。If the voice input unit 1 can be used, the voice input unit 1 is thereafter used by another application software.
Cannot be used, and the input control unit 2 notifies the voice input unit 1 of the start of capturing of the voice signal (step S14).

【００３７】ステップＳ１５：音声入力部１によって、
音声信号の取り込みが行われる。Step S15: The voice input unit 1
An audio signal is captured.

【００３８】ステップＳ１６：入力制御部２によって、
スイッチ３が押されてからの経過時間（ＯＮからの経過
時間）が、予め設定された設定時間ｔ0 よりも長いか否
かが判断される。Step S16: The input control unit 2
It is determined whether or not the elapsed time since the switch 3 was pressed (the elapsed time since ON) is longer than a preset time t0.

【００３９】設定時間ｔ0 を決定する方法としては、例
えばユーザに「音声入力可能な状態であることを確認し
てからすぐに発声してください。」という条件のもとで
音声入力の実験を行い、図６（ａ）に示すように、音声
入力可能な状態と発声のタイミングを調べて設定する方
法などがある。なお、スイッチ３をＯＮにしてから音声
入力可能な状態となるまでの設定時間ｔ0 が長すぎる
と、ユーザの使い勝手が悪くなるので、実際は数１００
ｍｓ程度が望ましい。As a method of determining the set time t 0, for example, a voice input experiment is performed on the user under the condition “Please confirm that the voice input is possible and immediately start speaking.” As shown in FIG. 6A, there is a method of checking and setting a state in which a voice can be input and a timing of utterance. If the set time t0 from the time when the switch 3 is turned on until the voice input becomes possible is too long, the usability of the user deteriorates.
ms is desirable.

【００４０】ステップＳ１６の判断において、ＯＮから
の経過時間が設定時間ｔ0 よりも短い場合は、ステップ
Ｓ１５に戻って音声信号の取り込みを続ける。ＯＮから
の経過時間が設定時間ｔ0 よりも長い場合は、ステップ
Ｓ１７に移行して、入力制御部２によって、ユーザに音
声入力可能な状態であることが通知される（図５のタイ
ミングチャート参照）。If it is determined in step S16 that the elapsed time from ON is shorter than the set time t0, the process returns to step S15 to continue capturing the audio signal. If the elapsed time from ON is longer than the set time t0, the process proceeds to step S17, and the input control unit 2 notifies the user that the voice can be input (see the timing chart of FIG. 5). .

【００４１】通知の方法としては、例えば図４に示すよ
うに、通常は（ａ）のようなアイコンを表示部６に表示
しておき、音声入力可能な状態になったときに、（ｂ）
のようなアイコンを表示する方法などがある。ユーザ
は、音声認識装置が入力可能な状態であること確認して
から発声を行う。As a method of notification, as shown in FIG. 4, for example, an icon as shown in FIG.
There is a method of displaying an icon like. The user speaks after confirming that the speech recognition device is ready for input.

【００４２】ステップＳ１８：スイッチ状態検出部４に
よって、ユーザがスイッチ３を離したか否かが判断され
る。スイッチ３が押されたままであると判断した場合、
ステップＳ１５に戻って音声信号の取り込みを続ける。
スイッチ３を離したと判断した場合、ステップＳ１９に
移行して、入力制御部２によって、スイッチ３を離して
からの経過時間（ＯＦＦからの経過時間）が、予め設定
された設定時間ｔ1 よりも長いか否かが判断される。Step S18: The switch state detector 4 determines whether or not the user has released the switch 3. When it is determined that the switch 3 is kept pressed,
Returning to step S15, the capture of the audio signal is continued.
If it is determined that the switch 3 has been released, the process proceeds to step S19, where the input control unit 2 determines that the elapsed time since the switch 3 was released (the elapsed time from OFF) is longer than the preset time t1. It is determined whether it is long.

【００４３】設定時間ｔ1 を決定する方法としては、例
えばユーザに「発声が終了したらすぐにスイッチをＯＦ
Ｆしてください。」という条件のもとで音声入力の実験
を行い、図６（ｂ）に示すように発声の終了とスイッチ
ＯＦＦのタイミングを調べて設定する方法などがある。As a method of determining the set time t 1, for example, the user is asked to “turn on the switch as soon as the utterance ends.
Please F. , An experiment of voice input is performed under the condition, and as shown in FIG. 6B, there is a method of checking and setting the timing of the end of the utterance and the switch OFF.

【００４４】ステップＳ１９の判断において、ＯＦＦか
らの経過時間が設定時間ｔ1 よりも短い場合は、ステッ
プＳ１５に戻り音声信号の取り込みを続ける。ＯＦＦか
らの経過時間が設定時間ｔ1 よりも長い場合は、ステッ
プＳ２０に移行し、入力制御部２によって、音声入力部
１に対して音声信号の取り込み終了が通知される。If it is determined in step S19 that the elapsed time from OFF is shorter than the set time t1, the process returns to step S15 to continue capturing the audio signal. If the elapsed time from OFF is longer than the set time t1, the process proceeds to step S20, and the input control unit 2 notifies the voice input unit 1 of the completion of the capture of the voice signal.

【００４５】ステップＳ２１：入力制御部２によって、
音声入力１に対して音声入力部１の使用が終了した旨が
通知される。これ以降、他のアプリケーションソフト
は、音声入力部１を自由に使用することが可能となる。Step S21: By the input control unit 2,
The voice input 1 is notified that the use of the voice input unit 1 has been completed. Thereafter, other application software can freely use the voice input unit 1.

【００４６】次に、音声信号の取り込み終了を音声区間
中か否かで判断する場合の例を、図３のフローチャート
及び図１を参照しながら説明する。Next, an example of a case where it is determined whether the end of the capture of the audio signal is in the audio section or not will be described with reference to the flowchart of FIG. 3 and FIG.

【００４７】ステップＳ３１：スイッチ状態検出部４に
よって、ユーザがスイッチ３を押したか否かが判断され
る。スイッチ３が押されたと判断した場合、入力制御部
２によって、音声入力部１に対して音声入力部１を使用
したい旨が通知される（ステップＳ３２）。Step S31: The switch state detector 4 determines whether the user has pressed the switch 3. If it is determined that the switch 3 has been pressed, the input control unit 2 notifies the voice input unit 1 that the user wants to use the voice input unit 1 (step S32).

【００４８】ステップＳ３３：入力制御部２によって、
音声入力部１からの情報に基づいて使用可能か否かが判
断される。他のアプリケーションソフトにより音声入力
部１が使用中である場合は音声入力部１を使用すること
ができないので、ステップＳ４３に移行して、現在、他
のアプリケーションソフトが音声入力部１を使用中であ
る旨のエラーメッセージをユーザに対して出力して処理
を終了する。Step S33: By the input control unit 2,
It is determined based on the information from the voice input unit 1 whether it can be used. If the voice input unit 1 is being used by another application software, the voice input unit 1 cannot be used. Therefore, the process proceeds to step S43, and the other application software is currently using the voice input unit 1. An error message to the effect is output to the user, and the process ends.

【００４９】音声入力部１が使用可能である場合、これ
以降、他のアプリケーションソフトにより音声入力部１
を使用することができなくなり、入力制御部２によっ
て、音声入力部１に対して音声信号の取り込み開始が通
知される（ステップＳ３４）。When the voice input unit 1 can be used, the voice input unit 1 is thereafter used by another application software.
Cannot be used, and the input control unit 2 notifies the voice input unit 1 of the start of capturing the voice signal (step S34).

【００５０】ステップＳ３５：音声入力部１によって、
音声信号の取り込みが行われる。Step S35: The voice input unit 1
An audio signal is captured.

【００５１】ステップＳ３６：入力制御部２によって、
スイッチ３が押されてからの経過時間（ＯＮからの経過
時間）が、予め設定された設定時間ｔ0 よりも長いか否
かが判断される。Step S36: By the input control unit 2,
It is determined whether or not the elapsed time since the switch 3 was pressed (the elapsed time since ON) is longer than a preset time t0.

【００５２】ＯＮからの経過時間が設定時間ｔ0 よりも
短い場合は、ステップＳ３５に戻って音声信号の取り込
みを続ける。ＯＮからの経過時間が設定時間ｔ0 よりも
長い場合は、ステップＳ３７に移行し、入力制御部２に
よって、ユーザに音声入力可能な状態であることが通知
される。なお、設定時間ｔ0 は前述と同じ方法で決定す
る。If the elapsed time from ON is shorter than the set time t0, the flow returns to step S35 to continue taking in the audio signal. If the elapsed time from ON is longer than the set time t0, the process proceeds to step S37, and the input control unit 2 notifies the user that the voice input is possible. The set time t0 is determined in the same manner as described above.

【００５３】ステップＳ３８：入力制御部２によって、
音声区間の切り出し処理を行う。Step S38: By the input control unit 2,
The voice section is cut out.

【００５４】音声区間の切り出しは、例えば音声信号か
ら算出した音声パワーについて、音声パワーが所定値以
上に増大した開始点を検出し、音声パワーが第１閾値以
下となる仮終了点を検出した後、所定の第２閾値を超え
て未満となる位置に仮終了点を順次移動していき、最後
の仮終了点が終了点と決定したことに対応し、前記開始
点から終了点までの音声信号を切り出すという処理にて
行う。The audio section is cut out, for example, after detecting a start point of the audio power calculated from the audio signal, at which the audio power has increased to a predetermined value or more, and detecting a temporary end point at which the audio power is equal to or less than the first threshold value. The temporary end point is sequentially moved to a position that exceeds the predetermined second threshold value and becomes less than the predetermined second threshold value, and the audio signal from the start point to the end point corresponds to the determination that the final temporary end point is the end point. Is performed in a process of cutting out.

【００５５】ステップＳ３９：スイッチ状態検出部４に
よって、ユーザがスイッチ３を離したか否かが判断され
る。スイッチ３が押されたままであると判断した場合、
ステップＳ３５に戻り音声信号の取り込みを続ける。Step S39: The switch state detector 4 determines whether or not the user has released the switch 3. When it is determined that the switch 3 is kept pressed,
Returning to step S35, the capture of the audio signal is continued.

【００５６】スイッチ３を離したと判断した場合、ステ
ップＳ４０に移行し、入力制御部２によって、開始点を
検出した後、音声区間の終了点が決定されたか否かが判
断される。終了点が決定されていない場合、ステップＳ
３５に戻って音声信号の取り込みを続ける。音声区間の
終了点が決定されている場合、ステップＳ４１に移行
し、入力制御部２によって、音声入力部１に対して音声
信号の取り込み終了が通知される。If it is determined that the switch 3 has been released, the process proceeds to step S40, and after the input control unit 2 detects the start point, it is determined whether or not the end point of the voice section has been determined. If the end point has not been determined, step S
Returning to 35, the acquisition of the audio signal is continued. If the end point of the voice section has been determined, the process proceeds to step S41, and the input control unit 2 notifies the voice input unit 1 of the end of the capture of the voice signal.

【００５７】ステップＳ４２：入力制御部２によって、
音声入力１に対して音声入力部１の使用が終了した旨が
通知される。Step S42: By the input control unit 2,
The voice input 1 is notified that the use of the voice input unit 1 has been completed.

【００５８】以上説明したような処理を音声認識処理の
前段に追加することにより、図５に示すように、従来の
処理では語尾の音声が欠けるような場合であっても、語
尾の音声が欠けることなく音声信号を取り込むことがで
きる。By adding the processing described above to the preceding stage of the speech recognition processing, as shown in FIG. 5, even in the case where the speech at the end is missing in the conventional processing, the speech at the end is missing. Audio signals can be captured without the need.

【００５９】すなわち、ユーザは、スイッチ３を押し、
音声入力可能な状態を表すアイコンを確認してから１回
だけ発声するということだけを覚えておけばよく、その
発声のタイミングとスイッチを離すタイミングをそれほ
ど意識しなくてもよい。さらに、音声入力可能な状態を
表すアイコンを表示する少し前から音声信号の取り込み
を行なっているので、万が一、音声入力可能な状態を表
すアイコンの表示より少し早く発声しても、語頭の音声
が欠けることを減らすことができる。That is, the user presses the switch 3 and
It is sufficient to remember that the user only speaks once after confirming the icon indicating the state in which the voice can be input, and it is not necessary to be so conscious of the timing of the utterance and the timing of releasing the switch. Furthermore, since the audio signal is captured shortly before the icon indicating the state where voice input is possible is displayed, even if the utterance is a little earlier than the display of the icon indicating the state where voice input is possible, the voice at the beginning of the word will be lost. Chipping can be reduced.

【００６０】また、本実施形態では、入力制御部２から
の音声入力部１の使用の通知／音声入力部１の使用終了
の通知により、必要なときだけ音声入力部１を動作させ
て音声信号を取り込むので、それ以外の間においては回
路動作を停止することにより消費電力を抑えることがで
きる。In this embodiment, the input control unit 2 notifies the use of the voice input unit 1 and notifies the end of the use of the voice input unit 1 that the voice input unit 1 is operated only when necessary and the voice signal is used. Therefore, the power consumption can be suppressed by stopping the circuit operation during other times.

【００６１】さらに、複数のアプリケーションソフトか
ら音声入力部１を使用する場合、図７（ａ）及び（ｂ）
に例示するように、各アプリケーションソフトＳ1 ，Ｓ
2、Ｓ3 が音声入力部１を必要なときだけ占有するの
で、各アプリケーションソフトＳ1 ，Ｓ 2、Ｓ3 が音声
入力部１を時分割で使用することができる。Further, when the voice input unit 1 is used from a plurality of application software, FIGS. 7A and 7B
As shown in the example, each application software S1, S
2, S3 occupies the voice input unit 1 only when necessary, so that each application software S1, S2, S3 can use the voice input unit 1 in a time-division manner.

【００６２】ここで、以上説明した処理は、プログラム
により実行されるが、このプログラムの全部または一部
を、直接あるいは通信回線を介してフロッピー（登録商
標）ディスクやハードディスク等のコンピュータ読み取
り可能な記録媒体に予め格納しておき、必要に応じてイ
ンストールして用いてもよい。Here, the above-described processing is executed by a program, and the whole or a part of the program is recorded on a computer-readable recording medium such as a floppy (registered trademark) disk or a hard disk directly or via a communication line. It may be stored in a medium in advance, and installed and used as needed.

【００６３】[0063]

【発明の効果】以上説明したように、本発明によれば、
スイッチを用いて必要な音声区間をユーザが直接指定す
る方法において、抽出すべき音声区間の最初の部分や最
後の部分の欠落を少なくすることができるとともに、ユ
ーザにスイッチを押すタイミングと発声のタイミングを
意識させるような煩わしさをできるだけ軽減することが
できる。また、必要なときだけ音声入力デバイスを動作
させて音声信号を取り込むことにより消費電力を抑える
ことができる。さらに、１つのアプリケーションソフト
が常に音声入力部を占有することなく、他のアプリケー
ションソフトからも音声入力部を使用することが可能と
なる。As described above, according to the present invention,
In a method in which a user directly specifies a necessary voice section using a switch, it is possible to reduce the loss of the first part and the last part of the voice section to be extracted, and to provide the user with a switch press timing and a utterance timing. It is possible to reduce the troublesomeness of making the user aware of the situation as much as possible. Further, power consumption can be suppressed by operating the audio input device only when necessary and taking in an audio signal. Furthermore, one application software does not always occupy the voice input unit, and it is possible to use the voice input unit from other application software.

[Brief description of the drawings]

【図１】本発明の音声認識装置の実施形態の構成を示す
ブロック図である。FIG. 1 is a block diagram illustrating a configuration of an embodiment of a speech recognition device of the present invention.

【図２】経過時間により音声信号の取り込みを制御する
動作を示すフローチャートである。FIG. 2 is a flowchart illustrating an operation of controlling capture of an audio signal based on elapsed time.

【図３】音声区間検出により音声信号の取り込みを制御
する動作を示すフローチャートである。FIG. 3 is a flowchart showing an operation of controlling the capture of a voice signal by voice section detection.

【図４】（ａ）音声入力が不可の状態を表すアイコンの
例を示す図、（ｂ）音声入力が可能な状態を表すアイコ
ンの例を示す図である4A is a diagram illustrating an example of an icon indicating a state where voice input is not possible, and FIG. 4B is a diagram illustrating an example of an icon indicating a state where voice input is possible;

【図５】スイッチ操作、音声信号の取り込み区間及び取
り込んだ音声信号等のタイミングを示す図である。FIG. 5 is a diagram showing a switch operation, an audio signal capturing section, and a timing of a captured audio signal.

【図６】（ａ）スイッチＯＮからの経過時間に対する設
定時間の決め方を説明する図、（ｂ）スイッチＯＦＦか
らの経過時間に対する設定時間の決め方を説明する図で
ある。6A is a diagram illustrating a method of determining a set time with respect to an elapsed time from a switch ON, and FIG. 6B is a diagram illustrating a method of determining a set time with respect to an elapsed time from a switch OFF.

【図７】（ａ）複数のアプリケーションソフトから音声
入力部を使用する場合を説明する図、（ｂ）複数のアプ
リケーションソフトから音声入力部に対して使用要求が
出された場合の処理を説明する図である。FIG. 7A illustrates a case where a voice input unit is used from a plurality of application software, and FIG. 7B illustrates a process when a use request is issued to the voice input unit from a plurality of application software. FIG.

【図８】従来の音声認識装置の概略構成を示す図であ
る。FIG. 8 is a diagram showing a schematic configuration of a conventional voice recognition device.

[Explanation of symbols]

１音声入力部１Ａマイク１ＢＡ／Ｄ変換器２入力制御部３スイッチ４スイッチ状態検出部５音声認識部６表示部 Reference Signs List 1 voice input unit 1A microphone 1B A / D converter 2 input control unit 3 switch 4 switch state detection unit 5 voice recognition unit 6 display unit

Claims

[Claims]

1. A voice recognition device which captures a voice signal in response to a switch operation for designating capture of a voice signal and recognizes the captured voice signal, comprising: a voice input unit; and a switch state detecting an operation state of the switch. A detection unit, comprising an input control unit, the input control unit,
Notifies the voice input unit of the use of the voice input unit when the capture of the voice signal is necessary, and notifies the voice input unit of the end of the use of the voice input unit when the capture of the voice signal is no longer required. And switch is turned on from OFF
Immediately notifies the audio input unit of the start of audio signal capture when the status changes, and switches from ON to OFF.
A voice recognition device having a function of notifying the end of capturing of a voice signal with a delay of a predetermined time when a state is entered.

2. The input control section notifies the voice input section of use of the voice input section and start of capturing of a voice signal to the voice input section immediately when the switch is turned on from the OFF state. 2. The voice according to claim 1, further comprising a function of notifying the voice input unit of the end of the capture of the audio signal and the end of use of the audio input unit after a predetermined time when the state is turned off. Recognition device.

3. The input control unit, when the switch is turned from OFF to ON, notifies the audio input unit immediately of the start of the capture of the audio signal, and starts from when the switch is turned ON. 2. The speech recognition device according to claim 1, further comprising a function of notifying a user that a speech input is possible after a preset time has elapsed.

4. The input control section has a function of notifying the audio input section of the completion of capturing of an audio signal after a preset set time has elapsed since the switch was turned from ON to OFF. The voice recognition device according to claim 1, wherein

5. The input control section has a voice section detection function and captures a voice signal to the voice input section after the switch is turned from ON to OFF and the detection of the voice section is completed. 2. The speech recognition device according to claim 1, further comprising a function of notifying an end.

6. A computer-readable recording medium storing a processing program for retrieving an audio signal in response to a switch operation for designating the capture of an audio signal and recognizing the captured audio signal, wherein the recording of the audio signal is performed. Notifies the voice input unit of the use of the voice input unit when necessary, and notifies the voice input unit of the end of use of the voice input unit when the capture of the voice signal is no longer necessary, and turns off the switch. ON
Immediately notifies the audio input unit of the start of audio signal capture when the status changes, and switches from ON to OFF.
A computer-readable recording medium having recorded thereon a processing program for notifying the end of taking in of an audio signal after a predetermined time when the state is set.