JP3718088B2

JP3718088B2 - Speech recognition correction method

Info

Publication number: JP3718088B2
Application number: JP27036999A
Authority: JP
Inventors: 哲也藤田
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 1999-09-24
Filing date: 1999-09-24
Publication date: 2005-11-16
Anticipated expiration: 2019-09-24
Also published as: JP2001092493A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識処理の結果得られた文字列の一部に誤りがある場合にこの誤り部分の修正を行う音声認識修正方式に関する。
【０００２】
【従来の技術】
最近の車載用機器、例えばナビゲーション装置やオーディオ機器に各種の操作指示を与える方法としては、利用者が操作パネルやリモートコントロール（リモコン）ユニットに備わった各種のキーを押下する方法の他に、利用者によって発せられた操作音声の内容を音声認識することによって行う方法がある。この操作音声の内容を音声認識する方法によれば、利用者は、各種の操作キーの配置等を覚える必要がなく、しかも走行中に車両が振動した状態でキーの操作を行わないですむため、操作の簡略化が可能である。また、操作音声の内容を音声認識する方法は、特に最近のプロセッサの高速化等に伴って比較的精度の高い音声認識処理が可能になりつつあるため、音声認識を用いた操作指示方法は、車載用機器についても汎用されている。
【０００３】
ところで、音声認識の対象となる操作音声をマイクロホンによって集音する場合に、同時にロードノイズやエンジンノイズ等が操作音声とともに集音されるため、静かな環境下で音声を集音する場合に比べて音声認識の認識率が低下する。したがって、通常は誤認識した音声の内容を修正する必要がある。認識結果として得られた文字列の一部に誤りがあった場合には、利用者は、再度同じ内容の音声を少し声の調子を変えて、例えば明瞭に発音するようにして発声し、２回目の音声認識処理が行われる。このようにして、同じ音声に対して何度か音声認識処理を繰り返すことにより、最終的に正しい認識結果としての文字列が得られるようになる。
【０００４】
【発明が解決しようとする課題】
ところで、上述した従来の音声認識結果の修正方法は、利用者が同じ音声を発声してその内容に対して音声認識処理が繰り返されるため、認識率が悪い単語を発声した場合に、何度も同じ音声の発声を繰り返すことになり、修正に手間がかかるという問題があった。例えば、認識結果としての文字列を表示させ、その中の修正箇所にカーソルを移動して、直接キーボード等から修正データを入力することができればこのような不都合は生じないが、車載用機器の操作を音声認識処理を用いて行う場合には、利用者による５０音等のキー入力が容易ではないため、音声を入力することによって効率よく認識結果の修正を行う方法が望まれている。
【０００５】
本発明は、このような点に鑑みて創作されたものであり、その目的は、誤認識された文字を効率良く修正することができる音声認識修正方式を提供することにある。
【０００６】
【課題を解決するための手段】
上述した課題を解決するために、本発明の音声認識修正方式では、第１の音声認識手段によって、入力音声に対して音声認識処理を行って、複数の文字からなる認識結果としての文字列データを得るとともに、第２の音声認識手段によって、この得られた文字列データの中の誤認識箇所を修正するために入力された修正用音声に対して音声認識処理を行って、１文字以上のｎ文字からなる修正用データを取得している。そして、上述した第１の音声認識手段によって得られた文字列データの一部を、上述した第２の音声認識手段によって得られた修正用データに置き換えている。誤認識箇所を修正する際に、上述した第２の音声認識装置によって音声認識処理を行うことにより、１文字以上のｎ文字からなる修正用データを取得しているので、従来のように、修正用データを入力する際にキーボード等の操作部を操作する必要がなく誤認識箇所を修正する際の操作の簡略化が可能となる。また、誤認識された箇所に対応して修正に必要な音声のみを入力して認識結果の修正を行っているので、従来のように認識させたい音声を声の調子等を変えて何度も入力する手間が省け、効率よく認識結果の修正を行うことができる。
【０００７】
また、修正候補通知手段によって、文字列データの中で修正用データに置き換えが可能な修正箇所を通知し、修正箇所選択手段によって、この通知された１あるいは複数の修正箇所の中から、利用者の操作に応じていずれかを選択することが望ましい。修正用データによって置き換えが可能な修正箇所であるか否かの判断については、例えば、置き換えが行われた文字列データが、一般的に用いられる言葉（単語）として存在するか否かを調べることにより判断すればよい。また、特定の操作指示等に対応する音声のみを音声認識の対象としている場合であれば、特定の操作指示に対応した言葉の中に、置き換え後の文字列が含まれるか否かを判断してもよい。このように、文字列データの一部を修正用データに置き換えることが可能な修正箇所だけを通知し、置き換えが不可能な修正箇所については通知しないようにすることで、通知する必要のない情報を排除することができ、修正箇所に関する通知が煩雑になることを防ぐことができる。また、通知された１あるいは複数の修正箇所の中から修正箇所として適しているものを利用者の操作に応じて選択することにより、利用者の意図に沿った修正内容を確実に反映させることができる。
【０００８】
また、上述した第１の音声認識手段による音声認識処理に用いられる音声認識辞書を備え、文字列置き換え手段によって修正用データの置き換えが行われた修正後の文字列データに基づいて、音声認識辞書の内容を更新することが望ましい。文字列データの修正が行われたということは、もとの文字列データに対応した音声に対する認識率が低いということなので、この音声に対応する音声認識辞書の内容を修正後の文字列データに基づいて更新することにより、認識率を向上させることができる。
【０００９】
【発明の実施の形態】
以下、本発明の音声認識修正方式を適用した一実施形態の音声認識装置およびこの音声認識装置を含んで構成された車載用システムについて図面を参照しながら説明する。
【００１０】
〔第１の実施形態〕
図１は、第１の実施形態の車載用システムの構成を示す図である。図１に示す車載用システムは、利用者から音声により与えられる各種の入力指示に対応してナビゲーション処理等の各種動作を行うものであり、利用者が発した音声に対して音声認識処理を行って利用者の発声した音声に対応する文字列を特定する音声認識装置１と、自車位置を検出して自車位置周辺の地図を表示したり、利用者によって選択された目的地までの経路探索および経路誘導等を行うナビゲーション装置２と、音声認識装置１から出力される音声認識結果やナビゲーション装置２から出力される自車位置周辺の地図画像等を表示するディスプレイ装置３と、音声認識装置１から出力される音声認識結果やナビゲーション装置２から出力される各種の案内音声等を出力するオーディオ部４とを備えている。
【００１１】
上述した音声認識装置１は、マイクロホン１０、音声認識部１２、音声認識辞書１４、修正用音声認識部１６、修正用辞書１８、文字列置換部２０、修正候補検索部２２、修正候補格納部２４、認識結果判定部２６、画像生成部２８、辞書更新部３０を含んで構成されている。
【００１２】
マイクロホン１０は、利用者から発声された音声を集音して電気信号に変換する。音声認識部１２は、音声認識辞書１４を検索することにより、マイクロホン１０を介して入力された音声信号に対して音声認識処理を行い、利用者が発声した音声に対応する文字列データを特定する。
【００１３】
修正用音声認識部１６は、音声認識結果を修正する際に必要な修正用音声が入力されたときに、修正用辞書１８を検索することにより、この修正用音声に対して音声認識処理を行い、修正用音声に対応する文字データ（これを、以後「修正用データ」と称する）を特定する。本実施形態では、修正用音声として１音の文字を考えるものとする。
【００１４】
文字列置換部２０は、音声認識部１２によって特定された文字列データと修正用音声認識部１６によって特定された修正用データとを取得し、文字列データに含まれる各文字データを修正用データに置換した文字列データを生成する。具体的には、例えば、音声認識部１２によって特定された文字列データが「たきざわ」であり、修正用音声認識部１６によって特定された修正用データが「か」であるとすると、文字列置換部２０は、文字列データ「たきざわ」に含まれる文字データ「た」と修正用データ「か」を置換した文字列データ「かきざわ」と、文字列データ「たきざわ」に含まれる文字データ「き」と修正用データ「か」を置換した文字列データ「たかざわ」と、文字列データ「たきざわ」に含まれる文字データ「ざ」と修正用データ「か」を置換した文字列データ「たきかわ」と、文字列データ「たきざわ」に含まれる文字データ「わ」と修正用データ「か」を置換した文字列データ「たきざか」とを生成する。
【００１５】
修正候補検索部２２は、音声認識辞書１４を検索し、文字列置換部２０によって生成された各文字列データ（置換処理後の各文字列データ）が、音声認識の対象となる文字列として音声認識辞書１４に登録されているか否かを調べる。各文字列データが音声認識辞書１４に登録されていた場合には、修正候補検索部２２は、その文字列データを「修正候補データ」として修正候補格納部２４に格納する。
【００１６】
認識結果判定部２６は、音声認識部１２によって得られた音声認識結果を利用者に対して通知し、利用者から与えられる指示入力に基づいて音声認識結果の適否を判定して出力する。また、音声認識結果が誤っていた場合には、認識結果判定部２６は、修正候補検索部２２によって抽出された音声認識結果の修正候補を利用者に対して通知し、利用者から与えられる指示入力に基づいて最適な修正結果を判定して出力する。この認識結果判定部２６によって判定された音声認識結果または最適な修正結果が、音声認識装置１からの出力としてナビゲーション装置２に向けて出力される。
【００１７】
画像生成部２８は、認識結果判定部２６から与えられる指示に基づいて、音声認識結果や音声認識結果の修正候補等の各種画像を表示するための画像データを生成する。画像生成部２８によって生成され表示される画像の具体的な表示例については後述する。
【００１８】
辞書更新部３０は、認識結果判定部２６から出力される情報に基づいて音声認識結果に対して修正が行われたか否かを調べ、修正が行われた場合には、修正結果に基づいて音声認識辞書１４に格納されたデータの内容を更新する。
【００１９】
上述した音声認識部１２、音声認識辞書１４が第１の音声認識手段に、修正用音声認識部１６、修正用辞書１８が第２の音声認識手段に、文字列置換部２０が文字列置き換え手段に、修正候補検索部２２、修正候補格納部２４、認識結果判定部２６、画像生成部２８、ディスプレイ装置３が修正候補通知手段に、認識結果判定部２６が修正箇所選択手段に、辞書更新部３０が辞書更新手段にそれぞれ対応している。
【００２０】
本実施形態の車載用システムは上述した構成を有しており、次に、音声認識装置１において行われる音声認識処理について詳細に説明する。図２および図３は、音声認識装置１において行われる音声認識処理の動作を示す流れ図である。例えば、ナビゲーション装置２に対して、経路探索を行う際の出発地名や目的地名を音声で入力する場合を考え、利用者により音声入力された文字列「かきざわ」が文字列「たきざわ」に誤認識され、これを文字列「かきざわ」に修正する際の動作について説明する。
【００２１】
音声認識部１２は、マイクロホン１０を介して利用者から音声入力が行われたか否かを判定する（ステップ１００）。音声入力が行われるまで、ステップ１００において否定判断がなされて待機状態となる。音声入力が行われると、音声認識部１２は、音声認識辞書１４を用いて音声認識処理を行い、利用者が発声した音声に対応する文字列データを特定する（ステップ１０１）。上述した例では、文字列データとして「たきざわ」が特定される。音声認識部１２によって得られた音声認識結果は認識結果判定部２６に出力される。
【００２２】
認識結果判定部２６は、音声認識結果を表示するための画像データを生成するよう画像生成部２８に指示を行うとともに、音声認識結果を音声で出力するための音声データを生成してオーディオ部４に出力する。この結果、ディスプレイ装置３の画面上に音声認識結果が表示されるとともに、オーディオ部４から音声認識結果に対応する音声が出力され、音声認識結果が利用者に対して通知される（ステップ１０２）。
【００２３】
図４は、音声認識結果の表示例を示す図である。図４に示すように、ディスプレイ装置３の画面上に、音声認識結果が「たきざわ」である旨の表示と、この音声認識結果が誤っている場合には修正用音声を入力するように促す表示とが行われる。また、図４に示したような表示と並行して、オーディオ部４から、例えば、「音声認識結果は「たきざわ」でよろしいですか？」等のアナウンスが出力される。
【００２４】
上述したようにして利用者に対して音声認識結果が通知されると、認識結果判定部２６は、一定時間（例えば、３０秒間）が経過したか否かを判定し（ステップ１０３）、一定時間が経過していない場合には、音声入力が行われたか否かを判定する（ステップ１０４）。音声入力が行われずに一定時間が経過すると、ステップ１０３において肯定判断がなされ、認識結果判定部２６は、音声認識結果をナビゲーション装置２に向けて出力する（ステップ１０５）。また、一定時間が経過する前に利用者によって音声入力が行われた場合には、ステップ１０４において肯定判断がなされ、認識結果判定部２６は、入力された音声が一文字であるか否かを判定する（ステップ１０６）。例えば、入力された音声が一文字であるか否かについては修正用音声認識部１６が常に監視しており、認識結果判定部２６は、修正用音声認識部１６から得られる情報に基づいて、入力音声が一文字であるか否かを判定する。
【００２５】
入力音声が一文字でない場合には、ステップ１０６において否定判断がなされ、認識結果判定部２６は、入力された音声は認識結果修正用の音声ではなく、次の操作指示等に関する音声であると判断し、上述したステップ１０１において得られた音声認識結果をナビゲーション装置２に向けて出力する（ステップ１０７）。その後、ステップ１０１に戻り、次の操作指示等に関する入力音声（ステップ１０４での判定処理の対象となった入力音声）に対して音声認識処理以降の動作を行う。
【００２６】
また、入力された音声が一文字であった場合には、修正用音声認識部１６は、入力された音声が認識結果修正用の音声であると判断し、この音声に対して修正用辞書１８を用いて音声認識処理を行い、この音声に対応する修正用データを特定する（ステップ１０８）。上述した例では、誤認識されている「たきざわ」を「かきざわ」に修正するために利用者によって「か」が音声入力されるので、この音声「か」に対して音声認識処理が行われ、対応する修正用データが特定される。
【００２７】
次に、文字列置換部２０は、修正対象となる文字列データを音声認識部１２から取得するとともに、修正用データを修正用音声認識部１６から取得する（ステップ１０９）。その後、文字列置換部２０は、修正対象の文字列データに含まれる最初の文字データを修正用データと置換し、修正候補検索部２２に出力する（ステップ１１０）。上述した例では、修正対象の文字列データ「たきざわ」の最初の文字データ「た」が修正用データ「か」に置換されて生成された文字列データ「かきざわ」が出力される。
【００２８】
修正候補検索部２２は、音声認識辞書１４を検索し、文字列置換部２０から出力された文字列データ（置換後の文字列データ）が音声認識の対象となる文字列として音声認識辞書１４に登録されているか否かを調べる（ステップ１１１）。置換後の文字列データが音声認識辞書１４に登録されている場合には、修正候補検索部２２は、この文字列データを修正候補データとして修正候補格納部２４に格納する（ステップ１１２）。また、置換後の文字列データが音声認識辞書１４に登録されていない場合には、ステップ１１１において否定判断がなされ、この場合には、修正候補検索部２２は、ステップ１１２に示した修正候補データの格納動作を行わない。
【００２９】
次に、文字列置換部２０は、修正対象の文字列データに含まれる最後の文字データが修正用データに置き換えられたか否かを調べることにより、文字列の置換処理が完了したか否かを判定する（ステップ１１３）。置換処理が完了していない場合には、ステップ１１３において否定判断がなされ、文字列置換部２０は、次に置換処理の対象となる文字データを修正用データと置換する（ステップ１１４）。上述した例では、文字列データ「たきざわ」の２文字目の文字データ「き」が修正用データ「か」と置換され、文字列データ「たかざわ」が出力される。また、２回目以降の処理では、文字列データ「たきざわ」の３文字目の文字データ「ざ」が修正用データ「か」と置換された文字列データ「たきかわ」、および文字列データ「たきざわ」の最後の文字データ「わ」が修正用データ「か」と置換された文字列データ「たきざか」がそれぞれ出力される。置換処理が行われると、ステップ１１１に戻って、置換後の文字列データが音声認識辞書１４に格納されているか否かの判定以降の動作が繰り返される。
【００３０】
置換処理が完了するとステップ１１３において肯定判断がなされ、次に、認識結果判定部２６は、修正候補格納部２４に修正候補データが格納されているか否かを判定する（ステップ１１５）。修正候補データが格納されていた場合には、認識結果判定部２６は、この修正候補データを読み出し、画像生成部２８に指示を送り、修正候補の表示を行う（ステップ１１６）。図５は、修正候補の表示例を示す図であり、修正候補データとして、「かきざわ」、「たかざわ」、「たきかわ」、「たきざか」の各々に対応する文字列データが格納されていた場合の表示例を示している。図５に示すように、各修正候補に対して、「１：かきざわ」、「２：たかざわ」、「３：たきかわ」、「４：たきざか」というように番号が付加されて表示が行われるとともに、最適な修正結果に対応する番号を選択するよう利用者に対して促す表示が行われる。
【００３１】
表示された修正候補の中から修正結果として適するものが利用者により選択されると、認識結果判定部２６は、選択された修正候補に対応する修正候補データを音声認識結果としてナビゲーション装置２に向けて出力する（ステップ１１７）。上述した例では、１番の「かきざわ」が利用者によって選択されるものとする。なお、利用者による修正候補の選択方法としては、各修正候補に付加しておいた番号を所定の操作部（図示せず）を介して利用者に選択させるようにしてもよく、また、利用者に番号を音声入力してもらい、これに対して音声認識処理を行って修正候補を選択するようにしてもよい。
【００３２】
次に、辞書更新部３０は、修正後の音声認識結果に関する情報を認識結果判定部２６から取得し、この音声認識結果に対応して音声認識辞書１４の内容を更新する（ステップ１１８）。音声認識辞書１４の内容の更新が行われた後は、ステップ１００に戻り、音声入力が行われたか否かの判定以降の動作が繰り返される。
【００３３】
また、上述したステップ１１５において、修正候補が格納されていなかった場合には、修正対象の文字列データに２箇所以上の誤認識箇所が含まれている等の理由により修正候補が抽出不可能であったと考えられるので、認識結果判定部２６は、画像生成部２８に指示を送り、利用者に対して、音声認識結果の修正ができなかったことを知らせ、音声入力を再度行うよう促すエラー通知を表示する（ステップ１１９）。エラー通知が行われると、ステップ１００に戻り、音声入力が行われたか否かの判定以降の動作が繰り返される。
【００３４】
このように、本実施形態の音声認識装置１では、音声認識部１２によって得られた文字列データに含まれる誤認識箇所に対する修正を行う場合に、修正用音声認識部１６によって音声認識処理を行って修正用データを取得し、この修正用データを音声認識部１２によって得られた文字列データの一部と置き換えることにより誤認識箇所の修正を行っている。したがって、従来のように認識させたい文字列データに対応する音声を声の調子等を変えて何度も入力する等の手間がなく、効率よく認識結果の修正を行うことができる。また、修正用データの入力を音声入力により行っているので、キーボード等の操作部を用いる必要がなく、操作を簡略化することができる。また、音声認識部１２によって得られた文字列データの一部を修正用データと置き換える際に、置き換えが行われた文字列データが音声認識辞書１４に登録されているか否かを調べることにより、登録されている文字列データだけを抽出して利用者に通知している。すなわち、文字列データの一部を修正用データに置き換えることが可能な修正箇所だけを利用者に通知しているということであり、修正箇所に関する通知内容が煩雑になるのを防ぐことができる。また、修正箇所として適しているものを利用者の操作に対応して選択しているので、利用者の意図に沿った修正を確実に行うことができる。しかも、認識結果に対して修正を行った場合には、この修正結果に対応して音声認識辞書１４の内容を更新しているので、音声認識処理を繰り返し行うことにより認識率を向上させることができるという利点も有する。
【００３５】
〔第２の実施形態〕
ところで、上述した第１の実施形態では、音声認識結果に含まれる誤認識箇所が一文字のみの場合について説明したが、同様な処理手順により複数の文字が誤認識されている場合についても音声により修正を行うことができる。
【００３６】
図６および図７は、図１に示した音声認識装置１において行われる音声認識処理の動作の変形例を示す流れ図であり、音声認識処理によって得られた文字列データの中の一文字以上のｎ文字について修正を行う場合の動作手順が示されている。図６および図７に示した動作手順は、図２および図３に示した動作手順に対して、入力音声が一文字であるか否かを判定するステップ１０６の動作を、利用者による修正指示があったか否かを判定するステップ１０６Ａの動作に置き換えるとともに、修正対象の文字列データに含まれる最初の文字データを修正用データに置き換えるステップ１１０の動作を、文字列データの最初のｎ文字を修正用データに置き換えるステップ１１０Ａの動作に置き換えた点が異なっている。
【００３７】
すなわち、上述した第１の実施形態では、入力音声が一文字であるか否かを判定することによって、この入力音声が修正用の音声なのか、それとも次の通常の操作用の音声なのかを区別していたため、修正用の音声として複数文字が許容される場合には、このような区別を行うことができなくなる。このため、利用者によって何らかの修正指示がなされた後に入力された音声を修正用の音声として取り扱うことにしている。
【００３８】
修正用音声の入力に先立って、利用者による修正指示がなされない場合には、ステップ１０６Ａの判定動作において否定判断が行われ、次に、認識結果判定部２６は、入力された音声が認識結果修正用の音声ではなく、次の操作指示等に関する音声であると判断し、認識結果をナビゲーション装置２に向けて出力するステップ１０７以降の動作が行われる。
【００３９】
一方、修正用音声の入力に先立って、利用者による修正指示がなされた場合には、ステップ１０６Ａの判定動作において肯定判断が行われ、次に、修正用音声認識部１６は、入力された音声が認識結果修正用の音声であると判断し、この音声に対する音声認識処理を行うステップ１０８以降の動作が行われる。
【００４０】
また、利用者によって修正指示を行う具体的な方法としてはいくつかの方法が考えられる。例えば、操作部（図示せず）の特定キーが押下されたときに修正指示がなされたものと判定したり、利用者が特定の言葉をマイクロホン１０に向かって発声したときに修正指示がなされたものと判定する場合などが考えられる。
【００４１】
このように、認識結果としての文字列データの修正を行う際に、利用者に何らかの意思表示をさせることにより、複数文字（一文字であってもよい）を対象にした修正が可能になる。
【００４２】
また、利用者によって修正指示がなされた後に入力される修正用の音声の文字数ｎは、あらかじめ設定された固定値を用いることもできるが、その都度自由に設定するようにしてもよい。例えば、特に修正用文字の文字数ｎが設定されておらず、修正用音声に対する音声認識処理によって得られた修正用データの文字数をこのｎの値として採用するようにしてもよい。この場合には、認識結果としての文字列の誤り箇所を利用者が判断し、その都度最適な文字数の修正用音声をマイクロホン１０に向かって発声すればよいため、さらに効率よく認識結果の修正を行うことができる。
【００４３】
なお、本発明は上記実施形態に限定されるものではなく、本発明の要旨の範囲内において種々の変形実施が可能である。例えば、上述した実施形態では、車載用システムにおいて、本発明を適用した音声認識装置１を用いてナビゲーション装置２に対して所定の指示入力を行う場合について説明したが、これ以外にも、例えば、オーディオ装置等の他の車載用機器に対して所定の指示入力を行うようにしてもよい。また、車載用システム以外の各種システム、例えば、パーソナルコンピュータやワークステーション等の各種コンピュータに対して各種の指示入力を行うような場合においても、本発明を適用することができる。
【００４４】
【発明の効果】
上述したように、本発明によれば、音声認識処理の結果得られた文字列の一部に誤認識箇所がある場合に、所定文字数の修正用音声を入力し、この修正用音声に対して音声認識処理を行って修正用データを取得し、文字列データに含まれる誤認識箇所をこの修正用データに置き換えることにより認識結果の修正を行っているため、従来のように認識させたい文字列データに対応する音声を声の調子等を変えて何度も入力する等の手間がなく、効率よく認識結果の修正を行うことができる。また、修正用データの入力を音声入力により行っているので、キーボード等の操作部を用いる必要がなく、操作を簡略化することができる。
【図面の簡単な説明】
【図１】第１の実施形態の車載用システムの構成を示す図である。
【図２】音声認識装置において行われる音声認識処理の動作を示す流れ図である。
【図３】音声認識装置において行われる音声認識処理の動作を示す流れ図である。
【図４】音声認識結果の表示例を示す図である。
【図５】修正候補の表示例を示す図である。
【図６】音声認識装置において行われる音声認識処理の変形例の動作を示す流れ図である。
【図７】音声認識装置において行われる音声認識処理の変形例の動作を示す流れ図である。
【符号の説明】
１音声認識装置
２ナビゲーション装置
３ディスプレイ装置
４オーディオ部
１０マイクロホン
１２音声認識部
１４音声認識辞書
１６修正用音声認識部
１８修正用辞書
２０文字列置換部
２２修正候補検索部
２４修正候補格納部
２６認識結果判定部
２８画像生成部
３０辞書更新部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech recognition correction method for correcting an error portion when a part of a character string obtained as a result of speech recognition processing has an error.
[0002]
[Prior art]
As a method of giving various operation instructions to recent in-vehicle devices, such as navigation devices and audio devices, in addition to the method in which the user presses various keys on the operation panel or remote control (remote control) unit, it can be used. There is a method of recognizing the contents of operation voices issued by a person. According to the method for recognizing the contents of the operation voice, the user does not need to remember the arrangement of various operation keys, and the user does not have to operate the keys while the vehicle vibrates while driving. The operation can be simplified. In addition, since the method for recognizing the contents of the operation voice is becoming possible with a relatively high accuracy voice recognition process especially with recent increase in processor speed, the operation instruction method using voice recognition is It is also widely used for in-vehicle devices.
[0003]
By the way, when operating sound that is subject to speech recognition is collected with a microphone, road noise and engine noise are simultaneously collected along with the operating sound, so compared to when collecting sound in a quiet environment. The recognition rate of voice recognition decreases. Therefore, it is usually necessary to correct the content of misrecognized speech. If there is an error in a part of the character string obtained as a recognition result, the user utters the same content again with a slightly different tone, for example, clearly pronounced. A second speech recognition process is performed. In this way, by repeating the voice recognition process several times for the same voice, a character string as a correct recognition result can be finally obtained.
[0004]
[Problems to be solved by the invention]
By the way, the conventional speech recognition result correcting method described above is repeated many times when a user utters the same speech and repeats speech recognition processing on the content, and utters a word with a poor recognition rate. There was a problem that it took time and effort to correct the same voice. For example, if you can display a character string as a recognition result, move the cursor to the correction location in the character string, and input correction data directly from the keyboard etc., such inconvenience will not occur. Is performed using voice recognition processing, it is not easy for a user to input a key such as 50 sounds. Therefore, a method for efficiently correcting a recognition result by inputting voice is desired.
[0005]
The present invention has been made in view of such a point, and an object of the present invention is to provide a speech recognition correction method that can efficiently correct misrecognized characters.
[0006]
[Means for Solving the Problems]
In order to solve the above-described problem, in the speech recognition correction method of the present invention, the first speech recognition means performs speech recognition processing on the input speech, and character string data as a recognition result including a plurality of characters. And the second speech recognition means performs speech recognition processing on the correction speech input in order to correct the erroneous recognition location in the obtained character string data. Correction data consisting of n characters is acquired. Then, a part of the character string data obtained by the first voice recognition means described above is replaced with correction data obtained by the second voice recognition means described above. When correcting a misrecognized part, correction data consisting of one or more n characters is acquired by performing the voice recognition process by the second voice recognition device described above, so correction is performed as in the past. It is not necessary to operate an operation unit such as a keyboard when inputting business data, and it is possible to simplify the operation when correcting a misrecognized portion. In addition, since only the voice necessary for correction is input corresponding to the misrecognized part and the recognition result is corrected, the voice to be recognized as before is changed many times by changing the tone of the voice. This eliminates the need for input and can correct the recognition result efficiently.
[0007]
Further, the correction candidate notification means notifies the correction portion that can be replaced with the correction data in the character string data, and the correction portion selection means notifies the user of one or more of the notified correction portions. It is desirable to select one according to the operation. For determining whether or not it is a correction part that can be replaced by the correction data, for example, checking whether or not the character string data that has been replaced exists as a commonly used word (word) Judgment may be made by Also, if only speech corresponding to a specific operation instruction or the like is targeted for speech recognition, it is determined whether or not the replaced character string is included in words corresponding to the specific operation instruction. May be. In this way, information that does not need to be notified is notified by notifying only the correction part that can replace part of the character string data with correction data, and not reporting the correction part that cannot be replaced. Can be prevented, and it is possible to prevent the notification regarding the correction part from becoming complicated. In addition, by selecting one of the notified correction locations that is suitable as the correction location according to the user's operation, the correction content according to the user's intention can be reliably reflected. it can.
[0008]
The speech recognition dictionary includes a speech recognition dictionary used for the speech recognition processing by the first speech recognition unit described above, and based on the corrected character string data in which the correction data is replaced by the character string replacement unit. It is desirable to update the contents. The fact that the character string data has been modified means that the recognition rate for the voice corresponding to the original character string data is low, so the content of the voice recognition dictionary corresponding to this voice is changed to the corrected character string data. By updating based on the recognition rate, the recognition rate can be improved.
[0009]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a voice recognition device according to an embodiment to which a voice recognition correction method of the present invention is applied and a vehicle-mounted system including the voice recognition device will be described with reference to the drawings.
[0010]
[First Embodiment]
FIG. 1 is a diagram illustrating a configuration of an in-vehicle system according to the first embodiment. The in-vehicle system shown in FIG. 1 performs various operations such as navigation processing in response to various input instructions given by voice from the user, and performs voice recognition processing on the voice uttered by the user. A voice recognition device 1 that identifies a character string corresponding to the voice uttered by the user, and displays a map around the vehicle position by detecting the vehicle position, or a route to the destination selected by the user A navigation device 2 that performs search, route guidance, and the like; a display device 3 that displays a voice recognition result output from the voice recognition device 1; a map image around the vehicle position output from the navigation device 2; and a voice recognition device 1 and an audio unit 4 for outputting various guidance voices output from the navigation device 2.
[0011]
The voice recognition device 1 described above includes a microphone 10, a voice recognition unit 12, a voice recognition dictionary 14, a correction voice recognition unit 16, a correction dictionary 18, a character string replacement unit 20, a correction candidate search unit 22, and a correction candidate storage unit 24. , A recognition result determination unit 26, an image generation unit 28, and a dictionary update unit 30.
[0012]
The microphone 10 collects sound uttered by the user and converts it into an electrical signal. The voice recognition unit 12 searches the voice recognition dictionary 14 to perform voice recognition processing on the voice signal input via the microphone 10 and specifies character string data corresponding to the voice uttered by the user. .
[0013]
The correction voice recognition unit 16 performs a voice recognition process on the correction voice by searching the correction dictionary 18 when a correction voice necessary for correcting the voice recognition result is input. The character data corresponding to the correction voice (hereinafter referred to as “correction data”) is specified. In the present embodiment, it is assumed that one sound character is considered as the correction sound.
[0014]
The character string replacement unit 20 acquires the character string data specified by the voice recognition unit 12 and the correction data specified by the correction voice recognition unit 16, and converts each character data included in the character string data to the correction data. Generate character string data replaced with. Specifically, for example, if the character string data specified by the voice recognition unit 12 is “Takizawa” and the correction data specified by the correction voice recognition unit 16 is “ka”, the character string The replacement unit 20 replaces the character data “K” included in the character string data “Takizawa” with the correction data “K”, and the characters included in the character string data “Takizawa”. Character string data “Takazawa” that replaces data “ki” and correction data “ka”, and character string that replaces character data “za” and correction data “ka” included in character string data “Takizawa” Data “Takikawa” and character string data “Takizawa” in which the character data “wa” included in the character string data “Takizawa” and the correction data “ka” are replaced are generated.
[0015]
The correction candidate search unit 22 searches the speech recognition dictionary 14, and each character string data generated by the character string replacement unit 20 (each character string data after the replacement process) is converted into a character string to be subjected to speech recognition. It is checked whether or not it is registered in the recognition dictionary 14. If each character string data is registered in the speech recognition dictionary 14, the correction candidate search unit 22 stores the character string data in the correction candidate storage unit 24 as “correction candidate data”.
[0016]
The recognition result determination unit 26 notifies the user of the voice recognition result obtained by the voice recognition unit 12, determines whether the voice recognition result is appropriate based on an instruction input given by the user, and outputs the result. If the voice recognition result is incorrect, the recognition result determination unit 26 notifies the user of the correction candidate of the voice recognition result extracted by the correction candidate search unit 22 and gives an instruction given by the user. Based on the input, the optimum correction result is determined and output. The speech recognition result or the optimum correction result determined by the recognition result determination unit 26 is output to the navigation device 2 as an output from the speech recognition device 1.
[0017]
The image generation unit 28 generates image data for displaying various images such as a speech recognition result and a correction candidate of the speech recognition result based on an instruction given from the recognition result determination unit 26. A specific display example of the image generated and displayed by the image generation unit 28 will be described later.
[0018]
The dictionary update unit 30 checks whether or not the speech recognition result has been corrected based on the information output from the recognition result determination unit 26. If the correction has been performed, the dictionary update unit 30 The contents of the data stored in the recognition dictionary 14 are updated.
[0019]
The voice recognition unit 12 and the voice recognition dictionary 14 described above are the first voice recognition unit, the correction voice recognition unit 16 and the correction dictionary 18 are the second voice recognition unit, and the character string replacement unit 20 is the character string replacement unit. Further, the correction candidate search unit 22, the correction candidate storage unit 24, the recognition result determination unit 26, the image generation unit 28, and the display device 3 are used as the correction candidate notification unit, the recognition result determination unit 26 is used as the correction location selection unit, and the dictionary update unit. Reference numeral 30 corresponds to the dictionary updating means.
[0020]
The in-vehicle system according to the present embodiment has the above-described configuration. Next, the speech recognition process performed in the speech recognition apparatus 1 will be described in detail. 2 and 3 are flowcharts showing the operation of the speech recognition process performed in the speech recognition apparatus 1. FIG. For example, consider a case in which the name of a departure place or a destination at the time of performing a route search is input to the navigation device 2 by voice. A description will be given of an operation when a recognition error is made and the character string “Kakizawa” is corrected.
[0021]
The voice recognition unit 12 determines whether or not voice input has been performed by the user via the microphone 10 (step 100). Until a voice input is performed, a negative determination is made in step 100 and a standby state is entered. When voice input is performed, the voice recognition unit 12 performs voice recognition processing using the voice recognition dictionary 14, and specifies character string data corresponding to the voice uttered by the user (step 101). In the example described above, “Takizawa” is specified as the character string data. The voice recognition result obtained by the voice recognition unit 12 is output to the recognition result determination unit 26.
[0022]
The recognition result determination unit 26 instructs the image generation unit 28 to generate image data for displaying the voice recognition result, and also generates audio data for outputting the voice recognition result by voice to generate the audio unit 4. Output to. As a result, the voice recognition result is displayed on the screen of the display device 3, and the voice corresponding to the voice recognition result is output from the audio unit 4 to notify the user of the voice recognition result (step 102). .
[0023]
FIG. 4 is a diagram illustrating a display example of a voice recognition result. As shown in FIG. 4, on the screen of the display device 3, an indication that the speech recognition result is “Takizawa”, and if the speech recognition result is incorrect, the user is prompted to input correction speech. Display. Further, in parallel with the display as shown in FIG. 4, for example, from the audio unit 4, is “Are the voice recognition result“ Takizawa ”? Is announced.
[0024]
When the voice recognition result is notified to the user as described above, the recognition result determination unit 26 determines whether or not a certain time (for example, 30 seconds) has passed (step 103), and the certain time. If it has not elapsed, it is determined whether or not voice input has been performed (step 104). When a predetermined time has elapsed without voice input, an affirmative determination is made in step 103, and the recognition result determination unit 26 outputs the voice recognition result to the navigation device 2 (step 105). In addition, when a voice input is performed by the user before the predetermined time has elapsed, an affirmative determination is made in step 104, and the recognition result determination unit 26 determines whether or not the input voice is a single character. (Step 106). For example, whether or not the input voice is a single character is always monitored by the correction voice recognition unit 16, and the recognition result determination unit 26 inputs based on information obtained from the correction voice recognition unit 16. It is determined whether or not the voice is a single character.
[0025]
If the input voice is not a single character, a negative determination is made in step 106, and the recognition result determination unit 26 determines that the input voice is not a voice for correcting the recognition result but a voice related to the next operation instruction or the like. The voice recognition result obtained in step 101 is output to the navigation device 2 (step 107). Thereafter, the process returns to step 101, and the operation after the voice recognition process is performed on the input voice related to the next operation instruction or the like (the input voice subjected to the determination process in step 104).
[0026]
If the input voice is a single character, the correction voice recognition unit 16 determines that the input voice is a voice for correcting the recognition result, and sets the correction dictionary 18 for the voice. Using this, voice recognition processing is performed, and correction data corresponding to the voice is specified (step 108). In the example described above, since “T” is input by the user in order to correct “Takizawa” that has been misrecognized to “Kakizawa”, voice recognition processing is performed on this voice “K”. Corresponding correction data is identified.
[0027]
Next, the character string replacement unit 20 acquires character string data to be corrected from the voice recognition unit 12 and also acquires correction data from the correction voice recognition unit 16 (step 109). After that, the character string replacement unit 20 replaces the first character data included in the character string data to be corrected with the correction data, and outputs it to the correction candidate search unit 22 (step 110). In the above-described example, the character string data “Kakizawa” generated by replacing the first character data “ta” of the character string data “Takizawa” to be corrected with the correction data “ka” is output.
[0028]
The correction candidate search unit 22 searches the speech recognition dictionary 14 and the character string data (character string data after replacement) output from the character string replacement unit 20 is stored in the speech recognition dictionary 14 as a character string to be subjected to speech recognition. It is checked whether or not it is registered (step 111). If the replaced character string data is registered in the speech recognition dictionary 14, the correction candidate search unit 22 stores the character string data as correction candidate data in the correction candidate storage unit 24 (step 112). If the replaced character string data is not registered in the speech recognition dictionary 14, a negative determination is made in step 111. In this case, the correction candidate search unit 22 performs the correction candidate data shown in step 112. The storage operation is not performed.
[0029]
Next, the character string replacement unit 20 checks whether or not the character string replacement processing is completed by checking whether or not the last character data included in the character string data to be corrected has been replaced with correction data. Determination is made (step 113). If the replacement process has not been completed, a negative determination is made in step 113, and the character string replacement unit 20 replaces the character data to be subjected to the replacement process next with the correction data (step 114). In the example described above, the second character data “ki” of the character string data “Takizawa” is replaced with the correction data “ka”, and the character string data “Takazawa” is output. In the second and subsequent processing, the character data “Takikawa” in which the third character data “za” of the character string data “Takizawa” is replaced with the correction data “ka”, and the character string data “ Character string data “Takizawa” in which the last character data “Wa” of “Takizawa” is replaced with correction data “ka” is output. When the replacement process is performed, the process returns to step 111, and the operations after the determination as to whether or not the character string data after the replacement is stored in the speech recognition dictionary 14 are repeated.
[0030]
When the replacement process is completed, an affirmative determination is made in step 113, and then the recognition result determination unit 26 determines whether correction candidate data is stored in the correction candidate storage unit 24 (step 115). If correction candidate data has been stored, the recognition result determination unit 26 reads the correction candidate data, sends an instruction to the image generation unit 28, and displays the correction candidates (step 116). FIG. 5 is a diagram illustrating a display example of correction candidates, and character string data corresponding to each of “Kakizawa”, “Takazawa”, “Takikawa”, and “Takizawa” is stored as correction candidate data. The example of a display when it was done is shown. As shown in FIG. 5, numbers are added to the respective correction candidates such as “1: Kakizawa”, “2: Takazawa”, “3: Takikawa”, “4: Takizawa”. In addition to the display, a display prompting the user to select a number corresponding to the optimum correction result is performed.
[0031]
When a user selects a suitable correction result from the displayed correction candidates, the recognition result determination unit 26 directs the correction candidate data corresponding to the selected correction candidate to the navigation device 2 as a voice recognition result. (Step 117). In the example described above, it is assumed that the first “Kakizawa” is selected by the user. As a method for selecting correction candidates by the user, the user may be allowed to select a number added to each correction candidate via a predetermined operation unit (not shown). It is also possible to have the person input the number by voice and perform a voice recognition process on this to select a correction candidate.
[0032]
Next, the dictionary update unit 30 acquires information related to the corrected speech recognition result from the recognition result determination unit 26, and updates the contents of the speech recognition dictionary 14 in accordance with the speech recognition result (step 118). After the content of the speech recognition dictionary 14 is updated, the process returns to step 100 and the operations after the determination as to whether or not speech input has been performed are repeated.
[0033]
Further, in the above-described step 115, if no correction candidate is stored, the correction candidate cannot be extracted because the character string data to be corrected includes two or more erroneous recognition locations. The recognition result determination unit 26 sends an instruction to the image generation unit 28 to inform the user that the speech recognition result has not been corrected, and to prompt the user to perform voice input again. Is displayed (step 119). When the error notification is performed, the process returns to step 100, and the operations after the determination as to whether or not the voice input has been performed are repeated.
[0034]
As described above, in the speech recognition apparatus 1 according to the present embodiment, when correcting the erroneous recognition portion included in the character string data obtained by the speech recognition unit 12, the speech recognition unit 16 performs speech recognition processing. Thus, the correction data is acquired, and the correction data is replaced with a part of the character string data obtained by the voice recognition unit 12 to correct the erroneous recognition portion. Accordingly, there is no need to input the voice corresponding to the character string data to be recognized as many times as before by changing the tone of the voice, and the recognition result can be corrected efficiently. In addition, since the correction data is input by voice input, it is not necessary to use an operation unit such as a keyboard, and the operation can be simplified. Further, when a part of the character string data obtained by the speech recognition unit 12 is replaced with the correction data, by checking whether or not the replaced character string data is registered in the speech recognition dictionary 14, Only registered character string data is extracted and notified to the user. In other words, the user is notified of only a correction location where a part of the character string data can be replaced with the correction data, and it is possible to prevent the content of notification regarding the correction location from becoming complicated. In addition, since a suitable correction location is selected in response to the user's operation, the correction according to the user's intention can be reliably performed. In addition, when the recognition result is corrected, the content of the speech recognition dictionary 14 is updated in accordance with the correction result, so that the recognition rate can be improved by repeatedly performing the speech recognition processing. It also has the advantage of being able to.
[0035]
[Second Embodiment]
By the way, in the first embodiment described above, the case where only one character is erroneously recognized included in the speech recognition result has been described. However, even when a plurality of characters are erroneously recognized by the same processing procedure, correction is also performed by speech. It can be performed.
[0036]
6 and 7 are flowcharts showing a modification of the operation of the speech recognition process performed in the speech recognition apparatus 1 shown in FIG. 1, and n or more of one character or more in the character string data obtained by the speech recognition process. An operation procedure for correcting characters is shown. The operation procedure shown in FIGS. 6 and 7 is the same as the operation procedure shown in FIGS. 2 and 3 except that the operation in step 106 for determining whether or not the input voice is one character is performed by a correction instruction by the user. The operation in step 106A for determining whether or not there is a replacement, and the operation in step 110 in which the first character data included in the character string data to be corrected is replaced with the correction data is used to correct the first n characters of the character string data. The difference is that the operation is replaced with the operation of step 110A.
[0037]
That is, in the first embodiment described above, by determining whether or not the input voice is a single character, it is distinguished whether the input voice is a correction voice or a next normal operation voice. Therefore, such a distinction cannot be made when a plurality of characters are allowed as a correction sound. For this reason, the voice input after some correction instruction is given by the user is handled as a correction voice.
[0038]
If the user does not give a correction instruction prior to the input of the correction voice, a negative determination is made in the determination operation of step 106A. Next, the recognition result determination unit 26 determines that the input voice is a recognition result. It is determined that the sound is not a sound for correction but a sound related to the next operation instruction or the like, and the operation after Step 107 for outputting the recognition result to the navigation device 2 is performed.
[0039]
On the other hand, if the user gives a correction instruction prior to the input of the correction voice, an affirmative determination is made in the determination operation of step 106A, and then the correction voice recognition unit 16 receives the input voice. Is the speech for correcting the recognition result, and the operations after step 108 for performing speech recognition processing on this speech are performed.
[0040]
In addition, there are several possible methods for giving a correction instruction by the user. For example, it is determined that a correction instruction is given when a specific key of an operation unit (not shown) is pressed, or a correction instruction is given when a user speaks a specific word toward the microphone 10. The case where it determines with a thing etc. can be considered.
[0041]
As described above, when correcting the character string data as the recognition result, it is possible to correct a plurality of characters (may be one character) by causing the user to display some intention.
[0042]
Further, the fixed number n of voices for correction input after a correction instruction is given by the user may be a preset fixed value, but may be set freely each time. For example, the number n of correction characters is not particularly set, and the number of characters of correction data obtained by the speech recognition process for the correction speech may be adopted as the value of n. In this case, since the user determines the error location of the character string as the recognition result and utters the correction voice having the optimum number of characters to the microphone 10 each time, the recognition result can be corrected more efficiently. It can be carried out.
[0043]
In addition, this invention is not limited to the said embodiment, A various deformation | transformation implementation is possible within the range of the summary of this invention. For example, in the above-described embodiment, the case where a predetermined instruction is input to the navigation device 2 using the voice recognition device 1 to which the present invention is applied in the in-vehicle system has been described. You may make it perform a predetermined instruction | indication input with respect to other vehicle equipment, such as an audio apparatus. The present invention can also be applied to various systems other than the in-vehicle system, for example, when various instructions are input to various computers such as personal computers and workstations.
[0044]
【The invention's effect】
As described above, according to the present invention, when a part of the character string obtained as a result of the speech recognition process has a misrecognized portion, a correction voice having a predetermined number of characters is input, and the correction voice is Since the correction result is obtained by performing voice recognition processing to acquire correction data and replacing the erroneous recognition location included in the character string data with this correction data, the character string that you want to be recognized as before The recognition result can be corrected efficiently without the trouble of inputting the voice corresponding to the data many times by changing the tone of the voice. In addition, since the correction data is input by voice input, it is not necessary to use an operation unit such as a keyboard, and the operation can be simplified.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration of an in-vehicle system according to a first embodiment.
FIG. 2 is a flowchart showing the operation of speech recognition processing performed in the speech recognition apparatus.
FIG. 3 is a flowchart showing an operation of speech recognition processing performed in the speech recognition apparatus.
FIG. 4 is a diagram illustrating a display example of a voice recognition result.
FIG. 5 is a diagram illustrating a display example of correction candidates.
FIG. 6 is a flowchart showing the operation of a modified example of the speech recognition process performed in the speech recognition apparatus.
FIG. 7 is a flowchart showing the operation of a modified example of the speech recognition process performed in the speech recognition apparatus.
[Explanation of symbols]
1 Voice recognition device
2 Navigation device
3 Display device
4 Audio section
10 Microphone
12 Voice recognition unit
14 Speech recognition dictionary
16 Voice recognition unit for correction
18 Correction Dictionary
20 Character string replacement part
22 Correction candidate search section
24 Correction candidate storage
26 Recognition result determination unit
28 Image generator
30 Dictionary Update Department

Claims

First speech recognition means for performing speech recognition processing on input speech to obtain character string data as a recognition result composed of a plurality of characters;
Second speech recognition means for performing speech recognition processing on the input correction speech to obtain correction data consisting of one or more n characters;
A part of the character string data obtained by the first voice recognition means and different portions are replaced with the correction data obtained by the second voice recognition means to obtain a plurality of correction candidate data. and string replacement means you generate,
Correction candidate notification means for notifying the plurality of correction candidate data generated by the character string replacement means;
A correction location selection unit that selects one of the plurality of correction candidate data notified by the correction candidate notification unit according to a user operation;
A speech recognition correction method comprising: correcting the recognition result by the first speech recognition means using the correction candidate data selected by the correction location selection means .

In claim 1,
A speech recognition dictionary used for speech recognition processing by the first speech recognition means;
Dictionary updating means for updating the contents of the speech recognition dictionary based on the correction candidate data selected by the correction location selecting means ;
A speech recognition correction method characterized by comprising:

In claim 2,
The correction candidate notification unit determines whether each of the plurality of correction candidate data generated by the character string replacement unit is registered in advance as a character string to be subjected to voice recognition in the voice recognition dictionary. A speech recognition correction method, wherein only the correction candidate data that has been registered is to be notified.