JP3762191B2

JP3762191B2 - Information input method, information input device, and storage medium

Info

Publication number: JP3762191B2
Application number: JP2000119505A
Authority: JP
Inventors: 哲夫小坂; 寛樹山本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2000-04-20
Filing date: 2000-04-20
Publication date: 2006-04-05
Anticipated expiration: 2020-04-20
Also published as: JP2001306293A

Description

【０００１】
【発明の属する技術分野】
本発明は情報入力方法、情報入力装置及び記憶媒体に関し、特に、音声認識技術を用いて文字情報を入力する技術に関するものである。
【０００２】
【従来の技術】
グラフィカルユーザインタフェース（ＧＵＩ）のもつ入力項目（入力フォーム）へ文字や記号等の文字情報を入力する場合、一般には、キーボード、マウスなどのポインティングデバイスを用いる。キーボードの場合には、文字や記号をタイプ入力し、ポインティングデバイスの場合には、選択候補の中から所望の文字や記号選択して入力する。
【０００３】
【発明が解決しようとする課題】
しかしながら、キーボード、マウスなどのポインティングデバイスを用いて所望の入力項目に文字情報を入力する場合には、入力項目の選択から文字情報の入力までを全て手操作によって行わなければならず、手操作の行いにくい環境では大変扱いにくいという問題があった。
【０００４】
本発明は前述の問題点にかんがみ、表示器に表示された複数の入力項目に対して、手操作による入力を行うことなく簡単に文字情報を入力することを目的とする。
【０００５】
【課題を解決するための手段】
本発明の情報入力方法は、表示器に表示された複数の入力項目に文字情報を入力する情報入力方法であって、音声を受信する受信ステップと、複数の文法規則のうち１つ又は複数を用いて前記音声から文字情報を認識する音声認識ステップと、前記入力項目の付加情報及び前記音声認識ステップで用いられた文法規則に基づいて前記文字情報を入力する入力項目を選択する選択ステップとを有し、前記付加情報は、前記入力項目と前記文法規則との対応関係を示す情報であり、前記音声認識ステップは、文字情報が入力されていない入力項目に関連付けられた文法規則を用いて前記音声から文字情報を認識することを特徴とする。
【０００６】
本発明の情報入力装置は、表示器に表示された複数の入力項目に文字情報を入力する情報入力装置であって、音声を受信する受信手段と、複数の文法規則のうち１つ又は複数を用いて前記音声から文字情報を認識する音声認識手段と、前記入力項目の付加情報及び前記音声認識手段で用いられた文法規則に基づいて前記文字情報を入力する入力項目を選択する選択手段とを有し、前記付加情報は、前記入力項目と前記文法規則との対応関係を示す情報であり、前記音声認識手段は、文字情報が入力されていない入力項目に関連付けられた文法規則を用いて前記音声から文字情報を認識することを特徴とする。
【０００７】
【発明の実施の形態】
（第１の実施形態）
図１は、本実施形態における情報入力装置の構成を示すブロック図である。
図１において、１は本実施形態のグラフィカル・ユーザインタフェース（ＧＵＩ）を表示するためのディスプレー装置である。ディスプレー装置１は、ＣＲＴ、液晶パネル、プラズマディスプレイパネル等の表示器を備える。２は数値演算・制御等の処理を行なう中央処理装置である。
【０００８】
３はＲＡＭ、ＲＯＭ、磁気ディスク、光ディスク、半導体メモリ、ハードディスク装置及びそれらの組み合わせからなる記憶装置である。記憶装置３は、本実施形態のＧＵＩ、本実施形態の処理手順に必要な制御プログラム、この制御プログラムを管理するオペレーティングシステム（ＯＳ）を保持する。
【０００９】
４は音声認識ユニットである。音声認識ユニット４は、マイクロフォン５から入力された音声に対して音響処理を行い、音響処理した結果に対して言語処理を行う。音響処理において使用する音響モデル１０、言語処理において使用するＮ種類の文法規則１１１〜１１ｎや単語辞書１２１〜１２ｎは、記憶装置３が保持する。
【００１０】
ここで、各文法規則１１１〜１１ｎは所定種類の文字情報の認識に最適な文法規則であり、各単語辞書１２１〜１２ｎは各文法規則１１１〜１１ｎに対応する単語辞書である。この音声認識ユニット４における音声認識処理は、ソフトウェアで実現することも可能である。
【００１１】
ディスプレー装置１，中央処理装置２，記憶装置３，音声認識ユニット４はバス１６により接続されている。中央処理装置２は、記憶装置３から本実施形態の処理手順に必要な制御プログラムを読み出し、音声認識ユニット４の音声認識処理、ディスプレー装置１の表示処理、記憶装置３の読み出し書き込み処理を統合的に制御する。
【００１２】
図２は、本実施形態の情報入力装置の処理手順を示すフローチャートである。図２に示したフローチャートの各ステップで行われる処理は、中央処理装置２が記憶装置３に格納された制御プログラムに基づいて各処理部を制御することにより実現する。
【００１３】
まず、ステップＳ１では、本実施形態のＧＵＩを記憶装置３から読み出してディスプレー装置１に表示する。このＧＵＩは、単数または複数の入力フォーム（入力項目ともいう）を含み、各入力フォームには付加情報が設定されている。この付加情報は、所定種類の文字情報の認識に最適な文法規則と対応関係を示す情報であり、例えば文法規則の種類を識別するためのインデックス（文法ＩＤ）である。本実施形態のＧＵＩは、例えばＨＴＭＬ（Hyper Text Markup Language）やＸＭＬ（Extensible Markup Language）のような記述言語によって記述する。
【００１４】
本実施形態のＧＵＩの一例を、図３を用いて説明する。図３は、３つの入力フォームを含むＧＵＩを示す図である。このＧＵＩは、駅間ルートを検索するアプリケーションプログラムのＧＵＩである。図３において、６は出発駅名、７は到着駅名、８は出発時刻を入力するための入力フォームである。
【００１５】
入力フォーム６と入力フォーム７とは共に駅名情報を入力するための入力フォームであり、これらは駅名情報の認識に最適な文法規則と関連付けられている。本実施形態では、文法規則１１１（図１の「文法１」）を入力フォーム６，７に対応する文法規則として説明する。
【００１６】
一方、入力フォーム８は時刻情報を入力するための入力フォームであるため、これは時刻情報の認識に最適な文法規則と関連付けられている。本実施形態では、文法規則１１２（図１の「文法２」）を入力フォーム８に対応する文法規則として説明する。
【００１７】
次に、ステップＳ２では、文字情報をまだ入力していない入力フォームに関連付けられた文法規則を認識し、認識した文法規則を設定する。未入力の入力フォームが１つの場合には、その入力フォームに対応する１種類の文法規則を設定し、２つ以上の場合には、１種類以上の文法規則を設定する。例えば、図３に示すＧＵＩの場合には、文法規則１１１と文法規則１１２とを設定する。
【００１８】
ステップＳ３では、音声の入力を受け付ける。ユーザの発声した音声は、マイクロフォン５で電気信号に変換された後、音声認識ユニット４に供給される。
【００１９】
ステップＳ４では、ステップＳ２で設定した１種類以上の文法規則を用いてステップＳ３で入力した音声を音声認識する。例えば、図３に示すＧＵＩの場合には、文法規則１１１，１１２と単語辞書１２１，１２２とを用いて、ステップＳ２で入力した音声から駅名情報や時刻情報を認識する。
【００２０】
ステップＳ４では、ステップＳ２で設定した文法規則が１種類の場合には、音響処理した結果を、その文法規則とその文法規則に対応する単語辞書とを用いて言語処理する。そして、その文法規則から得られた文字情報をステップＳ４の認識結果とする。
【００２１】
一方、ステップＳ２で設定した文法規則が２種類以上の場合には、音響処理した結果を、各文法規則と各文法規則に対応する単語辞書とを用いて言語処理する。そして、各文法規則から得られた文字情報の中から入力音声との尤度が所定値以上となる文字情報をステップＳ４の認識結果とする。
【００２２】
ステップＳ５では、各入力フォームと文法規則とを対応関係により、ステップＳ４で得た文字情報を入力する入力フォームが一意に決定できるか否かを判別する。本実施形態では、各入力フォームと文法規則とを対応関係により、ステップＳ４で得た文字情報を認識した文法規則からその文字情報を入力する入力フォームを自動的に決定する。
【００２３】
したがって、例えば、１種類の文法規則から文字情報を認識し、且つその文法規則が１つの入力フォームにのみ対応している場合には、ステップＳ４で得た文字情報を入力する入力フォームを１つだけ自動的に決定することができる。このように構成することによって、事前に入力フォームの選択をユーザに行わせることなく、ユーザの所望する入力フォームを自動的に選択、決定することが可能となる。
【００２４】
しかしながら、以下のような場合には、ステップＳ４で得た文字情報を入力する入力フォームを１つに決定することができないため、ステップＳ６の処理を実行する。
【００２５】
１）複数種類の文法規則が同音の語彙を認識する場合。これは、例えば、文法規則Ａが使用する単語辞書Ａに「仙台」（／ｓｅＮｄａｉ／）を、文法規則Ｂが使用する単語辞書Ｂに「先代」（／ｓｅＮｄａｉ／）を登録している場合に、ユーザが／ｓｅＮｄａｉ／と発音する場合である。この場合、文法規則Ａでは「仙台」を認識し、文法規則Ｂでは「先代」を認識してしまい、何れもステップＳ４の認識結果となり入力フォームを１つに決定することができない。
【００２６】
２）１種類の文法規則が複数の入力フォームに対応する場合。これは、例えば、図３のＧＵＩのように文法規則１１１が２つの入力フォーム６，７に対応する場合である。この場合、文法規則１１１で認識した文字情報を入力する入力フォームを１つに決定することができない。
【００２７】
ステップＳ６では、選択候補となる入力フォームをユーザに通知し、１つの入力フォームをユーザに選択させる。ユーザは、選択候補のフォーム名をマイクロフォン５に入力することによって１つの入力フォームを選択する。これにより、キーボードやポインティングデバイスを使用することなく簡単に入力フォームを選択することができる。
【００２８】
ここで、選択候補の通知方法には、様々な方法がある。例えば、図３のＧＵＩのように、全ての入力フォームを一画面内に表示している場合には、各選択候補の周辺や背景のグラフィックスの色、デザインを変える、選択候補のフォーム名の色、フォントを変える等の方法によって選択候補をユーザに通知する。
【００２９】
一方、図４のＧＵＩのように、全ての入力フォームを一画面内に表示していない場合には、選択候補のフォーム名をまとめて別のウインドウに表示する、選択候補のフォーム名をまとめて音声で提示する等の方法によって選択候補をユーザに通知する。このように構成することにより、選択候補となる入力フォームを分かりやすくユーザに通知することができる。
【００３０】
ステップＳ７では、ステップＳ５で決定した入力フォームあるいはステップＳ６で決定した入力フォームに、その入力フォームに対応する文法規則から得た文字情報を表示する。
【００３１】
ここで、図４のＧＵＩのように、全ての入力フォームを一画面内に表示していない場合で、一部しか表示していない或いは全て表示していない入力フォームに文字情報を表示する場合には、その入力フォームが画面の中央に配置されるようにＧＵＩを自動的にスクロールする。具体例を図４及び図５を用いて説明する。完全に表示されていない入力フォーム１１（図４のフォーム４）に文字情報を表示する場合、図４のＧＵＩは図５のようにスクロールし、入力フォーム１１を画面の中央に配置する。このように構成することにより、ユーザの選択した入力フォームがＧＵＩ上のどこにあるかを分かりやすく通知することができる。
【００３２】
ステップＳ８では、入力フォームに表示した文字情報が正しいか否かを判別する。表示した文字情報が正しくない場合、ユーザは「いいえ」と発声した音声をマイクロフォン５に入力する。この場合には、入力フォームに表示した文字情報をクリアしてステップＳ２の処理を実行する。
【００３３】
一方、表示した文字情報が正しい場合、ユーザは「はい」と発声した音声をマイクロフォン５に入力する。この場合には、入力フォームに表示した文字情報をその入力フォームに対する入力として決定する（ステップＳ９）。
【００３４】
ステップＳ１０では、未入力の入力フォームがあるか否かを判別し、未入力の入力フォームがある場合には、ステップＳ２の処理を実行し、未入力の入力フォームがない場合には、処理を終了する。
【００３５】
以上説明したように本実施形態によれば、ユーザの音声から認識した文字情報を入力する入力フォームを入力フォームと文法規則との対応関係に応じて決定することによって、キーボードやポインティングデバイスを使用することなく簡単にユーザの所望する入力フォームに文字情報を入力することができる。
【００３６】
（第２の実施形態）
図６は、第２の実施形態の処理手順を示すフローチャートである。第２の実施形態の情報入力装置の構成は、第１の実施形態と同様であるので説明を省略する。図６に示したフローチャートの各ステップで行われる処理は、中央処理装置２が記憶装置３に格納された制御プログラムに基づいて各処理部を制御することにより実現する。
【００３７】
まず、ステップＳ１１では、第２の実施形態のＧＵＩを記憶装置３から読み出してディスプレー装置１に表示する。このＧＵＩは単数または複数の入力フォーム（入力項目ともいう）を含み、各入力フォームには付加情報が設定されている。この付加情報には、所定種類の文字情報の認識に最適な文法規則との対応関係を示す情報と、所定の文字列（キーワードともいう）と対応関係を示す情報である。この文字列は、入力フォームのフォーム名や入力フォームへの文字情報の入力をサポートする文，語句，単語等であり、ＧＵＩに表示される。第２の実施形態において、各入力フォームに対応する文法規則は、キーワードを含む文字情報を認識するように構成されている。
【００３８】
第２の実施形態のＧＵＩの一例を図７を用いて説明する。図７は、３つの入力フォームを含むＧＵＩを示す図である。このＧＵＩは、駅間ルートを検索するアプリケーションプログラムのＧＵＩである。図７において、７１は出発駅名、７２は到着駅名、７３は出発時刻を入力するための入力フォームである。
【００３９】
入力フォーム７１と入力フォーム７２とは共に駅名情報を入力するための入力フォームである。入力フォーム７１のキーワードは「駅から」であり、入力フォーム７２のキーワードは「駅まで」である。入力フォーム７１，７２は、第１の実施形態と同様に、駅名情報の認識に最適な文法規則である文法規則１１１（図１の「文法１」）と関連付けられている。
【００４０】
一方、入力フォーム７３は時刻情報を入力するための入力フォームであり、そのキーワードは「時刻は」である。入力フォーム７３は、第１の実施形態と同様に、時刻情報の認識に最適な文法規則である文法規則１１２（図１の「文法２」）と関連付けられている。
【００４１】
次に、ステップＳ１２では、第１の実施形態と同様に、文字情報をまだ入力していない入力フォームに関連付けられた文法規則を認識し、認識した文法規則を設定する。例えば、図７に示すＧＵＩの場合には、文法規則１１１と文法規則１１２とを設定する。
【００４２】
ステップＳ１３では、音声の入力を受け付ける。このときユーザは、所望の入力フォームに入力する文字情報とともに、その入力フォームのキーワードを発声する。ユーザの発声した音声は、マイクロフォン５で電気信号に変換された後、音声認識ユニット４に供給される。
【００４３】
ステップＳ１４では、第１の実施例と同様に、ステップＳ１２で設定した１種類以上の文法規則を用いてステップＳ１３で入力した音声を音声認識する。例えば、図７に示すＧＵＩの場合には、文法規則１１１，１１２と単語辞書１２１，１２２とを用いて、ステップＳ１２で入力した音声から駅名情報や時刻情報を認識する。
【００４４】
ステップＳ１５では、各入力フォームと文法規則とを対応関係により、ステップＳ１４で得た文字情報を入力する入力フォームを選択するとともに、選択した入力フォームのキーワードと文字情報に含まれるキーワードとを比較して１つの入力フォームを決定する。
【００４５】
本実施形態では、各入力フォームと文法規則とを対応関係だけでなく、各入力フォームとキーワードとを対応関係を用いて、ステップＳ１４で得た文字情報を認識した文法規則からその文字情報を入力する入力フォームを自動的に決定する。このように構成することによって、事前に入力フォームの選択をユーザに行わせることなく、ユーザの所望する入力フォームを自動的に選択、決定することが可能となる。
【００４６】
ステップＳ１６では、ステップＳ１５で決定した入力フォームに、ステップＳ１４で得た文字情報を表示する。但し、キーワードは除く。例えば、ステップＳ１４で得た文字情報が「ＸＸ駅から」の場合、キーワード「駅から」に対応する入力フォーム７１に文字情報「ＸＸ」を表示する。
【００４７】
ここで、図４のＧＵＩのように、全ての入力フォームを一画面内に表示していない場合で、一部しか表示していない或いは全て表示していない入力フォームに文字情報を表示する場合には、第１の実施形態と同様に、その入力フォームが画面の中央に配置されるようにＧＵＩを自動的にスクロールする。
【００４８】
ステップＳ１７では、入力フォームに表示した文字情報が正しいか否かを判別する。表示した文字情報が正しくない場合、ユーザは「いいえ」と発声した音声をマイクロフォン５に入力する。この場合には、入力フォームに表示した文字情報をクリアしてステップＳ１２の処理を実行する。
【００４９】
一方、表示した文字情報が正しい場合、ユーザは「はい」と発声した音声をマイクロフォン５に入力する。この場合には、入力フォームに表示した文字情報をその入力フォームに対する入力として決定する（ステップＳ１８）。
【００５０】
ステップＳ１９では、未入力の入力フォームがあるか否かを判別し、未入力の入力フォームがある場合には、ステップＳ２の処理を実行し、未入力の入力フォームがない場合には、処理を終了する。
【００５１】
以上説明したように第２の実施形態によれば、ユーザの音声から認識した文字情報を入力する入力フォームを入力フォームとキーワードとの対応関係に応じて決定することによって、キーボードやポインティングデバイスを使用することなく簡単にユーザの所望する入力フォームに文字情報を入力することができる。
【００５２】
（第３の実施形態）
第２の実施形態では、ユーザの音声から認識した文字情報を入力する入力フォームを入力フォームとキーワードとの対応関係に応じて決定する例について説明した。
【００５３】
これに対して第３の実施形態では、ユーザの音声から認識した文字情報を入力する入力フォームを入力フォームの候補リストに応じて決定する例について説明する。ここで、候補リストとは、入力フォームに入力可能な文字情報を示す。
【００５４】
この場合、各入力フォームの付加情報は、所定種類の文字情報の認識に最適な文法規則との対応関係を示す情報と、入力フォームに入力可能な文字情報を示す候補リストとなる。
【００５５】
そして、ユーザの音声から認識した文字情報を入力する入力フォームは、各入力フォームと文法規則とを対応関係と、各入力フォームの候補リストとに基づいて自動的に決定される。このように構成することによって、事前に入力フォームの選択をユーザに行わせることなく、ユーザの所望する入力フォームを自動的に選択、決定することが可能となる。
【００５６】
以上説明したように第３の実施形態によれば、ユーザの音声から認識した文字情報を入力する入力フォームを入力フォームの候補リストに応じて決定することによって、キーボードやポインティングデバイスを使用することなく簡単にユーザの所望する入力フォームに文字情報を入力することができる。
【００５７】
（本発明の他の実施の形態）
前述した各実施形態は、複数の機器から構成されるシステムに適用しても１つの機器からなる装置に適用しても良い。
【００５８】
また、前述した各実施形態の機能を実現するための制御プログラムのプログラムコードを格納する記録媒体には、例えばフロッピーディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ等を用いることもできる。
【００５９】
また、前述した制御プログラムのプログラムコードが、中央処理装置２において稼働しているＯＳ（オペレーティングシステム）あるいは他のアプリケーションソフト等の共同して前述の実施の形態で示した機能が実現される場合にもかかるプログラムコードは本発明の実施の形態に含まれることは言うまでもない。
【００６０】
さらに、前述した制御プログラムのプログラムコードが機能拡張ボードや機能拡張ユニットに備わるメモリに格納された後、そのプログラムコードの指示に基づいてその機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施の形態の機能が実現される場合にも本発明に含まれる。
【００６１】
【発明の効果】
本発明によれば、文字情報が入力されていない入力項目に関連付けられた文法規則を用いて音声から文字情報を認識するようにしたので、適切に文法規則を制限することができ、処理精度及び処理速度を向上させることができる。
【図面の簡単な説明】
【図１】本実施形態における情報入力装置の構成を示すブロック図である。
【図２】第１の実施形態の処理手順を示すフローチャートである。
【図３】第１の実施形態におけるＧＵＩの一例を示す図である。
【図４】本実施形態におけるＧＵＩの他の例（スクロール前）を示す図である。
【図５】本実施形態におけるＧＵＩの他の例（スクロール後）を示す図である。
【図６】第２の実施形態の処理手順を示すフローチャートである。
【図７】第２の実施形態におけるＧＵＩの一例を示す図である。
【符号の説明】
１ディスプレー装置
２中央処理装置
３記憶装置
４Ａ／Ｄ変換装置
５マイクロフォン
６出発駅名の入力フォーム
７到着駅名の入力フォーム
８出発時刻の入力フォーム
１１入力対象フォーム[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an information input method, an information input device, and a storage medium, and more particularly to a technology for inputting character information using a speech recognition technology.
[0002]
[Prior art]
When inputting character information such as characters and symbols into an input item (input form) of a graphical user interface (GUI), generally, a pointing device such as a keyboard and a mouse is used. In the case of a keyboard, characters and symbols are typed, and in the case of a pointing device, a desired character or symbol is selected and input from selection candidates.
[0003]
[Problems to be solved by the invention]
However, when inputting character information to a desired input item using a pointing device such as a keyboard or a mouse, all operations from selection of the input item to input of character information must be performed manually. There was a problem that it was very difficult to handle in a difficult environment.
[0004]
In view of the above-described problems, an object of the present invention is to easily input character information to a plurality of input items displayed on a display unit without performing manual input.
[0005]
[Means for Solving the Problems]
An information input method of the present invention is an information input method for inputting character information to a plurality of input items displayed on a display, and includes a reception step of receiving speech and one or more of a plurality of grammatical rules. A speech recognition step for recognizing character information from the speech, and a selection step for selecting an input item for inputting the character information based on the additional information of the input item and the grammatical rule used in the speech recognition step. And the additional information is information indicating a correspondence relationship between the input item and the grammar rule, and the speech recognition step uses the grammar rule associated with the input item for which character information is not input. Character information is recognized from voice.
[0006]
An information input device according to the present invention is an information input device for inputting character information to a plurality of input items displayed on a display, and includes a receiving means for receiving speech and one or more of a plurality of grammatical rules. Voice recognition means for recognizing character information from the voice, and selection means for selecting an input item for inputting the character information based on the additional information of the input item and the grammatical rules used in the voice recognition means. And the additional information is information indicating a correspondence relationship between the input item and the grammar rule, and the speech recognition means uses the grammar rule associated with the input item for which character information is not input. Character information is recognized from voice.
[0007]
DETAILED DESCRIPTION OF THE INVENTION
(First embodiment)
FIG. 1 is a block diagram showing the configuration of the information input device in this embodiment.
In FIG. 1, reference numeral 1 denotes a display device for displaying a graphical user interface (GUI) of this embodiment. The display device 1 includes a display such as a CRT, a liquid crystal panel, or a plasma display panel. A central processing unit 2 performs processing such as numerical calculation and control.
[0008]
A storage device 3 includes a RAM, a ROM, a magnetic disk, an optical disk, a semiconductor memory, a hard disk device, and a combination thereof. The storage device 3 holds the GUI of the present embodiment, a control program necessary for the processing procedure of the present embodiment, and an operating system (OS) that manages this control program.
[0009]
Reference numeral 4 denotes a voice recognition unit. The voice recognition unit 4 performs acoustic processing on the voice input from the microphone 5 and performs language processing on the result of the acoustic processing. The storage device 3 holds an acoustic model 10 used in acoustic processing, N types of grammatical rules 111 to 11n and word dictionaries 121 to 12n used in language processing.
[0010]
Here, each grammar rule 111 to 11n is a grammar rule that is most suitable for recognition of a predetermined type of character information, and each word dictionary 121 to 12n is a word dictionary corresponding to each grammar rule 111 to 11n. The voice recognition processing in the voice recognition unit 4 can also be realized by software.
[0011]
The display device 1, the central processing unit 2, the storage device 3, and the voice recognition unit 4 are connected by a bus 16. The central processing unit 2 reads out a control program necessary for the processing procedure of the present embodiment from the storage device 3, and integrates the speech recognition processing of the speech recognition unit 4, the display processing of the display device 1, and the read / write processing of the storage device 3. To control.
[0012]
FIG. 2 is a flowchart showing a processing procedure of the information input apparatus of this embodiment. The processing performed at each step of the flowchart shown in FIG. 2 is realized by the central processing unit 2 controlling each processing unit based on a control program stored in the storage device 3.
[0013]
First, in step S1, the GUI of this embodiment is read from the storage device 3 and displayed on the display device 1. This GUI includes one or a plurality of input forms (also referred to as input items), and additional information is set in each input form. This additional information is information indicating a grammar rule optimum for recognition of a predetermined type of character information and a correspondence relationship, and is, for example, an index (grammar ID) for identifying the type of grammar rule. The GUI according to the present embodiment is described in a description language such as HTML (Hyper Text Markup Language) or XML (Extensible Markup Language).
[0014]
An example of the GUI of this embodiment will be described with reference to FIG. FIG. 3 is a diagram showing a GUI including three input forms. This GUI is an application program GUI for searching for a route between stations. In FIG. 3, 6 is a departure station name, 7 is an arrival station name, and 8 is an input form for inputting a departure time.
[0015]
Both the input form 6 and the input form 7 are input forms for inputting station name information, and these are associated with grammar rules that are optimal for recognizing station name information. In the present embodiment, the grammar rule 111 (“grammar 1” in FIG. 1) is described as a grammar rule corresponding to the input forms 6 and 7.
[0016]
On the other hand, since the input form 8 is an input form for inputting time information, it is associated with a grammar rule that is optimal for recognition of time information. In the present embodiment, the grammar rule 112 (“grammar 2” in FIG. 1) will be described as a grammar rule corresponding to the input form 8.
[0017]
Next, in step S2, a grammar rule associated with an input form for which character information has not yet been input is recognized, and the recognized grammar rule is set. When there is one unfilled input form, one type of grammar rule corresponding to the input form is set, and when there are two or more input forms, one or more types of grammar rules are set. For example, in the case of the GUI shown in FIG. 3, grammar rules 111 and grammar rules 112 are set.
[0018]
In step S3, a voice input is accepted. The voice uttered by the user is converted into an electrical signal by the microphone 5 and then supplied to the voice recognition unit 4.
[0019]
In step S4, the speech input in step S3 is recognized by using one or more kinds of grammar rules set in step S2. For example, in the case of the GUI shown in FIG. 3, station name information and time information are recognized from the voice input in step S2, using the grammatical rules 111 and 112 and the word dictionaries 121 and 122.
[0020]
In step S4, when there is one kind of grammar rule set in step S2, the result of the acoustic processing is subjected to language processing using the grammar rule and a word dictionary corresponding to the grammar rule. The character information obtained from the grammatical rules is used as the recognition result in step S4.
[0021]
On the other hand, when there are two or more types of grammar rules set in step S2, the sound processing result is subjected to language processing using each grammar rule and a word dictionary corresponding to each grammar rule. Then, character information whose likelihood with the input speech is equal to or greater than a predetermined value among character information obtained from each grammar rule is set as a recognition result in step S4.
[0022]
In step S5, it is determined whether or not the input form for inputting the character information obtained in step S4 can be uniquely determined based on the correspondence between each input form and the grammatical rule. In the present embodiment, the input form for inputting character information is automatically determined from the grammatical rule that recognizes the character information obtained in step S4 by the correspondence between each input form and the grammar rule.
[0023]
Therefore, for example, when character information is recognized from one kind of grammar rule and the grammar rule corresponds to only one input form, one input form for inputting the character information obtained in step S4 is provided. Can only be determined automatically. With this configuration, it is possible to automatically select and determine an input form desired by the user without causing the user to select an input form in advance.
[0024]
However, in the following case, since the input form for inputting the character information obtained in step S4 cannot be determined as one, the process of step S6 is executed.
[0025]
1) When multiple grammar rules recognize vocabulary of the same sound. For example, when “Sendai” (/ seNdai /) is registered in the word dictionary A used by the grammar rule A, and “predecessor” (/ seNdai /) is registered in the word dictionary B used by the grammar rule B. In this case, the user pronounces / seNdai /. In this case, “Sendai” is recognized in the grammatical rule A, and “predecessor” is recognized in the grammatical rule B, and both become recognition results in step S4, and one input form cannot be determined.
[0026]
2) A single grammar rule corresponds to multiple input forms. This is the case, for example, when the grammar rule 111 corresponds to the two input forms 6 and 7, as in the GUI of FIG. In this case, one input form for inputting character information recognized by the grammatical rule 111 cannot be determined as one.
[0027]
In step S6, the user is notified of input forms that are selection candidates, and the user is allowed to select one input form. The user selects one input form by inputting the selection candidate form name into the microphone 5. Thereby, it is possible to easily select an input form without using a keyboard or a pointing device.
[0028]
Here, there are various methods for notifying selection candidates. For example, when all input forms are displayed on one screen as in the GUI of FIG. 3, the color of the selection candidate's surroundings, background graphics color, and design are changed. The selection candidate is notified to the user by a method such as changing the color and font.
[0029]
On the other hand, if not all input forms are displayed on one screen as in the GUI of FIG. 4, the form names of the selection candidates are displayed together in a separate window. The user is notified of selection candidates by a method such as presenting by voice. With this configuration, it is possible to notify the user of input forms that are selection candidates in an easy-to-understand manner.
[0030]
In step S7, the character information obtained from the grammar rules corresponding to the input form is displayed on the input form determined in step S5 or the input form determined in step S6.
[0031]
Here, when all the input forms are not displayed on one screen as in the GUI of FIG. 4, when character information is displayed on an input form that is only partially displayed or not all displayed. Automatically scrolls the GUI so that the input form is centered on the screen. A specific example will be described with reference to FIGS. When displaying character information on the input form 11 (form 4 in FIG. 4) that is not completely displayed, the GUI in FIG. 4 scrolls as shown in FIG. 5 and arranges the input form 11 in the center of the screen. With this configuration, it is possible to easily notify where the input form selected by the user is on the GUI.
[0032]
In step S8, it is determined whether or not the character information displayed on the input form is correct. If the displayed character information is not correct, the user inputs the voice uttered “No” to the microphone 5. In this case, the character information displayed on the input form is cleared and the process of step S2 is executed.
[0033]
On the other hand, when the displayed character information is correct, the user inputs the voice uttered “Yes” to the microphone 5. In this case, the character information displayed on the input form is determined as an input for the input form (step S9).
[0034]
In step S10, it is determined whether or not there is an input form that has not been input. If there is an input form that has not been input, the process of step S2 is executed. If there is no input form that has not been input, the process is performed. finish.
[0035]
As described above, according to the present embodiment, the keyboard or pointing device is used by determining the input form for inputting the character information recognized from the user's voice according to the correspondence between the input form and the grammatical rule. It is possible to easily input character information into an input form desired by the user without any problem.
[0036]
(Second Embodiment)
FIG. 6 is a flowchart illustrating a processing procedure according to the second embodiment. Since the configuration of the information input device of the second embodiment is the same as that of the first embodiment, description thereof is omitted. The processing performed in each step of the flowchart shown in FIG. 6 is realized by the central processing unit 2 controlling each processing unit based on a control program stored in the storage device 3.
[0037]
First, in step S11, the GUI of the second embodiment is read from the storage device 3 and displayed on the display device 1. This GUI includes one or a plurality of input forms (also referred to as input items), and additional information is set in each input form. The additional information includes information indicating a correspondence relationship with a grammar rule optimum for recognition of a predetermined type of character information, and information indicating a correspondence relationship with a predetermined character string (also referred to as a keyword). The character string is a sentence, a phrase, a word, or the like that supports the input of the form name of the input form or character information to the input form, and is displayed on the GUI. In the second embodiment, the grammar rule corresponding to each input form is configured to recognize character information including a keyword.
[0038]
An example of the GUI of the second embodiment will be described with reference to FIG. FIG. 7 is a diagram showing a GUI including three input forms. This GUI is an application program GUI for searching for a route between stations. In FIG. 7, 71 is a departure station name, 72 is an arrival station name, and 73 is an input form for inputting a departure time.
[0039]
Both the input form 71 and the input form 72 are input forms for inputting station name information. The keyword of the input form 71 is “from the station”, and the keyword of the input form 72 is “to the station”. As in the first embodiment, the input forms 71 and 72 are associated with the grammar rule 111 (“grammar 1” in FIG. 1), which is the grammar rule optimal for the recognition of station name information.
[0040]
On the other hand, the input form 73 is an input form for inputting time information, and the keyword is “time is”. As in the first embodiment, the input form 73 is associated with the grammar rule 112 (“grammar 2” in FIG. 1), which is the grammar rule optimal for the recognition of time information.
[0041]
Next, in step S12, as in the first embodiment, a grammar rule associated with an input form for which character information has not yet been input is recognized, and the recognized grammar rule is set. For example, in the case of the GUI shown in FIG. 7, the grammar rule 111 and the grammar rule 112 are set.
[0042]
In step S13, a voice input is accepted. At this time, the user utters the keyword of the input form together with the character information to be input to the desired input form. The voice uttered by the user is converted into an electrical signal by the microphone 5 and then supplied to the voice recognition unit 4.
[0043]
In step S14, as in the first embodiment, the speech input in step S13 is recognized using one or more grammar rules set in step S12. For example, in the case of the GUI shown in FIG. 7, the station name information and time information are recognized from the voice input in step S12 using the grammatical rules 111 and 112 and the word dictionaries 121 and 122.
[0044]
In step S15, the input form for inputting the character information obtained in step S14 is selected based on the correspondence between each input form and the grammatical rule, and the keyword of the selected input form is compared with the keyword included in the character information. To determine one input form.
[0045]
In this embodiment, not only the correspondence between each input form and the grammar rule, but also the character information is input from the grammar rule that recognized the character information obtained in step S14 using the correspondence between each input form and the keyword. The input form to be automatically determined. With this configuration, it is possible to automatically select and determine an input form desired by the user without causing the user to select an input form in advance.
[0046]
In step S16, the character information obtained in step S14 is displayed on the input form determined in step S15. However, keywords are excluded. For example, when the character information obtained in step S14 is “From XX station”, the character information “XX” is displayed on the input form 71 corresponding to the keyword “From station”.
[0047]
Here, when all the input forms are not displayed on one screen as in the GUI of FIG. 4, when character information is displayed on an input form that is only partially displayed or not all displayed. As in the first embodiment, the GUI is automatically scrolled so that the input form is arranged at the center of the screen.
[0048]
In step S17, it is determined whether or not the character information displayed on the input form is correct. If the displayed character information is not correct, the user inputs the voice uttered “No” to the microphone 5. In this case, the character information displayed on the input form is cleared and the process of step S12 is executed.
[0049]
On the other hand, when the displayed character information is correct, the user inputs the voice uttered “Yes” to the microphone 5. In this case, the character information displayed on the input form is determined as an input for the input form (step S18).
[0050]
In step S19, it is determined whether or not there is an input form that has not been input. If there is an input form that has not been input, the process of step S2 is executed. If there is no input form that has not been input, the process is performed. finish.
[0051]
As described above, according to the second embodiment, the keyboard or pointing device is used by determining the input form for inputting the character information recognized from the user's voice according to the correspondence between the input form and the keyword. It is possible to easily input character information into an input form desired by the user without doing so.
[0052]
(Third embodiment)
In the second embodiment, the example in which the input form for inputting the character information recognized from the user's voice is determined according to the correspondence between the input form and the keyword has been described.
[0053]
In contrast, in the third embodiment, an example will be described in which an input form for inputting character information recognized from the user's voice is determined according to the input form candidate list. Here, the candidate list indicates character information that can be entered in the input form.
[0054]
In this case, the additional information of each input form is information indicating a correspondence relationship with a grammar rule optimum for recognition of a predetermined type of character information, and a candidate list indicating character information that can be input to the input form.
[0055]
The input form for inputting character information recognized from the user's voice is automatically determined based on the correspondence between each input form and the grammar rule and the candidate list of each input form. With this configuration, it is possible to automatically select and determine an input form desired by the user without causing the user to select an input form in advance.
[0056]
As described above, according to the third embodiment, the input form for inputting the character information recognized from the user's voice is determined according to the input form candidate list without using a keyboard or a pointing device. Character information can be easily input into an input form desired by the user.
[0057]
(Another embodiment of the present invention)
Each of the above-described embodiments may be applied to a system composed of a plurality of devices or an apparatus composed of a single device.
[0058]
In addition, for example, a floppy disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a magnetic tape, and a nonvolatile memory may be used as a recording medium for storing a program code of a control program for realizing the functions of the above-described embodiments A card, ROM, etc. can also be used.
[0059]
In addition, when the program code of the above-described control program realizes the functions described in the above-described embodiment in cooperation with an OS (operating system) or other application software running on the central processing unit 2 It goes without saying that such program code is included in the embodiment of the present invention.
[0060]
Further, after the program code of the control program is stored in the memory provided in the function expansion board or function expansion unit, the CPU or the like provided in the function expansion board or function expansion unit performs actual processing based on the instruction of the program code. The present invention also includes a case where the function of the above-described embodiment is realized by performing part or all of the above-described processing.
[0061]
【The invention's effect】
According to the present invention, character information is recognized from speech using grammar rules associated with input items for which character information has not been input. Therefore, grammar rules can be appropriately limited, and processing accuracy and The processing speed can be improved.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of an information input device according to an embodiment.
FIG. 2 is a flowchart illustrating a processing procedure according to the first embodiment.
FIG. 3 is a diagram illustrating an example of a GUI according to the first embodiment.
FIG. 4 is a diagram showing another example of the GUI (before scrolling) in the present embodiment.
FIG. 5 is a diagram showing another example (after scrolling) of the GUI according to the present embodiment.
FIG. 6 is a flowchart illustrating a processing procedure according to the second embodiment.
FIG. 7 is a diagram illustrating an example of a GUI according to the second embodiment.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Display apparatus 2 Central processing unit 3 Memory | storage device 4 A / D converter 5 Microphone 6 Departure station name input form 7 Arrival station name input form 8 Departure time input form 11 Input object form

Claims

An information input method for inputting character information into a plurality of input items displayed on a display unit,
A receiving step for receiving audio;
A speech recognition step for recognizing character information from the speech using one or more of a plurality of grammar rules;
A selection step of selecting an input item for inputting the character information based on the additional information of the input item and the grammatical rule used in the speech recognition step;
The additional information, Ri information der showing the correspondence between the grammar rules and the input item,
The speech recognition step recognizes character information from the speech using a grammatical rule associated with an input item for which character information is not input .

The information input method according to claim 1, wherein when there are a plurality of input item candidates for inputting the character information, the user is notified of the candidates.

The information input method according to claim 2, wherein an input item for inputting the character information is selected from the candidates based on a user's voice.

4. The control according to claim 1, wherein when the input item selected in the selection step is outside the display screen, the input item is controlled to be displayed in the display screen. The information input method described.

5. The information input method according to claim 1, wherein each of the plurality of grammar rules is a grammar rule for recognizing a predetermined type of character information.

An information input device for inputting character information to a plurality of input items displayed on a display,
Receiving means for receiving audio;
Speech recognition means for recognizing character information from the speech using one or more of a plurality of grammar rules;
Selection means for selecting an input item for inputting the character information based on the additional information of the input item and the grammatical rules used in the speech recognition means;
The additional information, Ri information der showing the correspondence between the grammar rules and the input item,
The speech recognition means recognizes character information from the speech using a grammatical rule associated with an input item for which character information is not input .

The information input device according to claim 6 , wherein when there are a plurality of input item candidates for inputting the character information, the candidates are notified to the user.

The information input device according to claim 7 , wherein an input item for inputting the character information is selected from the candidates based on a user's voice.

When the input item selected in the selection means is outside the display screen, to any one of claims 6-8, wherein the input fields and controls to be displayed on the display screen The information input device described.

Wherein the plurality of grammar rules Each of the information input device according to any one of claims 6-9, characterized in that a grammar rule for recognizing the predetermined type of character information.

A computer-readable storage medium storing a program for causing a computer to execute the information input method according to any one of claims 1 to 5 .