JP2013068665A

JP2013068665A - Speech recognition device

Info

Publication number: JP2013068665A
Application number: JP2011205165A
Authority: JP
Inventors: Kenichi Moriguchi; 健一森口
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2011-09-20
Filing date: 2011-09-20
Publication date: 2013-04-18

Abstract

【課題】同じシナリオであっても使用したい音声認識部を使い分けることが出来る音声認識装置を提供すること。
【解決手段】本発明の音声認識装置は、一又は複数の接続機器と接続されて音声認識を行う音声認識装置であって、音声を取り込むマイクと、前記一又は複数の接続機器と接続するための接続部と、前記一又は複数の接続機器より取得した音声認識対象の文字データを記憶する記憶部と、前記記憶部の音声認識対象の文字データから音声認識処理用の音素への変換を行う変換部と、前記変換部が変換した音素を含み音声認識に利用される音声認識辞書と、前記音声認識辞書と前記マイクより取り込んだ音声を利用して音声認識を実施する音声認識部と、前記一又は複数の接続機器から前記音声認識対象の文字データを取得しているか否かを判断し、その判断結果に基づいて前記マイクより取り込んだ音声を前記音声認識部で認識させるか否かを制御する制御部とを備える。
【選択図】図１To provide a voice recognition device capable of properly using a voice recognition unit to be used even in the same scenario.
A speech recognition apparatus according to the present invention is a speech recognition apparatus that performs speech recognition by being connected to one or a plurality of connected devices, and is connected to a microphone that captures sound and the one or more connected devices. A connection unit, a storage unit for storing character data for speech recognition obtained from the one or more connected devices, and conversion from the character data for speech recognition in the storage unit to phonemes for speech recognition processing A conversion unit; a speech recognition dictionary that includes phonemes converted by the conversion unit and is used for speech recognition; a speech recognition unit that performs speech recognition using speech captured from the speech recognition dictionary and the microphone; It is determined whether or not the voice recognition target character data is acquired from one or a plurality of connected devices, and whether or not the voice captured from the microphone is recognized by the voice recognition unit based on the determination result And a control unit for controlling.
[Selection] Figure 1

Description

本発明は、音声で装置を制御する音声認識技術を利用した音声認識装置に関するものである。 The present invention relates to a voice recognition device using a voice recognition technology for controlling a device by voice.

装置を操作する方法として、人間が発話した音声を認識してコマンド等に変換することで操作を実現する装置が一般に普及している。特に車載環境においては、ドライバーが運転しながら車載端末を操作する方法として有効である。 As a method of operating a device, a device that realizes an operation by recognizing a voice spoken by a human and converting it into a command or the like is generally popular. Particularly in an in-vehicle environment, it is effective as a method for operating an in-vehicle terminal while a driver is driving.

一方、車載端末のような組込みソフトウェアの環境においては、メモリおよびＣＰＵパワー等の資源が制約される傾向がある。そのため、あらかじめ音声で認識させる対象の“コマンド”および認識対象の候補を、任意の文字列ではなく、ある特定のパターンまたは集合に限定しておくことが行われる。こうすることでメモリおよびＣＰＵパワー等の資源を抑制して装置を安価に実現することが可能となる。 On the other hand, in an embedded software environment such as an in-vehicle terminal, resources such as memory and CPU power tend to be restricted. Therefore, the “command” to be recognized in speech and the recognition target candidates are limited to a specific pattern or set, not an arbitrary character string. By doing so, it is possible to realize resources at low cost by suppressing resources such as memory and CPU power.

また音声認識の対話型システムにおいては、認識結果を早く提示すること、すなわちレスポンスを早くすることも重要である。限られた資源でレスポンスを向上する為には、認識対象の候補を限定することは非常に有効である。 In a speech recognition interactive system, it is also important to present the recognition result early, that is, to speed up the response. In order to improve the response with limited resources, it is very effective to limit the candidates for recognition.

従来の音声認識装置として、音声対話シナリオに記述された選択子に従い、「端末（自分自身）の音声認識部」又は「センター（通信回線を介して接続された遠隔装置）の音声認識部」のいずれかを選択する音声対話制御部を備え、音声認識の処理の難易度に応じて音声認識部を使い分けることが可能な音声対話システムが開示されている（例えば特許文献１参照）。 As a conventional voice recognition device, according to the selector described in the voice dialogue scenario, the “voice recognition unit of the terminal (self)” or the “voice recognition unit of the center (remote device connected via a communication line)” There has been disclosed a voice dialogue system that includes a voice dialogue control unit that selects one of them, and that can properly use the voice recognition unit according to the difficulty level of voice recognition processing (see, for example, Patent Document 1).

特開２００５−３７６６２号公報JP 2005-37662 A

しかしながら、従来の音声対話システムにおいては、例えば同じシナリオ“再生する楽曲の選択”の場合に、使用する音声認識部を使い分けることが出来ない場合がある。シナリオに使用する音声認識部がどれかを記述する方式では、各々のシナリオに対して音声認識部を記述するため、１つのシナリオに対して、複数の音声認識部を柔軟に選択することが出来ない。例えば楽曲名には自装置の認識部を使用、地名にはセンターの認識部を使用、とシナリオに記載してある場合、楽曲名は常に自装置の認識部を使用することになる。 However, in the conventional voice dialogue system, for example, in the case of the same scenario “selection of music to be played back”, there are cases where the voice recognition unit to be used cannot be properly used. In the method of describing which voice recognition unit is used for a scenario, since the voice recognition unit is described for each scenario, a plurality of voice recognition units can be flexibly selected for one scenario. Absent. For example, in the scenario where the recognition unit of the own device is used for the song name and the recognition unit of the center is used for the place name, the recognition unit of the own device is always used for the song name.

本発明の目的は、同じ使用用途（例．再生する楽曲の選択）でも、音声認識対象の候補の文字列の配置に従い、使用したい音声認識部を使い分けることが出来る音声認識装置を提供することである。 An object of the present invention is to provide a voice recognition device that can use different voice recognition units according to the arrangement of candidate character strings for voice recognition even in the same usage (eg, selection of music to be played). is there.

本発明の一態様として、音声認識装置は、一又は複数の接続機器と接続されて音声認識を行う音声認識装置であって、音声を取り込むマイクと、前記一又は複数の接続機器と接続するための接続部と、前記一又は複数の接続機器より取得した音声認識対象の文字データを記憶する記憶部と、前記記憶部の音声認識対象の文字データから音声認識処理用の音素への変換を行う変換部と、前記変換部が変換した音素を含み音声認識に利用される音声認識辞書と、前記音声認識辞書と前記マイクより取り込んだ音声を利用して音声認識を実施する音声認識部と、前記一又は複数の接続機器から前記音声認識対象の文字データを取得しているか否かを判断し、その判断結果に基づいて前記マイクより取り込んだ音声を前記音声認識部で認識させるか否かを制御する制御部とを備える。 As one aspect of the present invention, a speech recognition apparatus is a speech recognition apparatus that performs speech recognition by being connected to one or a plurality of connection devices, and is connected to a microphone that captures sound and the one or more connection devices. A connection unit, a storage unit for storing character data for speech recognition obtained from the one or more connected devices, and conversion from the character data for speech recognition in the storage unit to phonemes for speech recognition processing A conversion unit; a speech recognition dictionary that includes phonemes converted by the conversion unit and is used for speech recognition; a speech recognition unit that performs speech recognition using speech captured from the speech recognition dictionary and the microphone; Whether or not the voice recognition target character data is acquired from one or more connected devices, and whether or not the voice recognition unit recognizes the voice captured from the microphone based on the determination result And a control unit for controlling.

本発明によれば、同じ使用用途（例．再生する楽曲の選択）でも、音声認識対象の候補の文字列の配置に従い、使用したい音声認識部を使い分けることが出来る。 According to the present invention, even in the same usage (for example, selection of music to be played back), it is possible to use a different voice recognition unit to be used according to the arrangement of candidate character strings to be recognized.

本発明の実施の形態における音声認識装置１０および接続機器２０のブロック図Block diagram of voice recognition device 10 and connected device 20 in an embodiment of the present invention 接続機器２０の記憶部２０４に蓄積されたデータの一例を示す図The figure which shows an example of the data accumulate | stored in the memory | storage part 204 of the connection apparatus 20. 音声認識装置１０の記憶部１０４に蓄積されたデータの一例（１）An example of data accumulated in the storage unit 104 of the speech recognition apparatus 10 (1) 音声認識装置１０の音声認識辞書１０６の一例（１）An example of the speech recognition dictionary 106 of the speech recognition apparatus 10 (1) 音声認識装置１０の記憶部１０４に蓄積されたデータの一例（２）Example of data accumulated in storage unit 104 of speech recognition apparatus 10 (2) 音声認識装置１０の音声認識辞書１０６の一例（２）An example (2) of the speech recognition dictionary 106 of the speech recognition apparatus 10 音声認識装置１０の音声認識辞書１０６の一例（３）An example (3) of the speech recognition dictionary 106 of the speech recognition apparatus 10 本実施の形態における変形例を示すブロック図Block diagram showing a modification of the present embodiment 各接続機器２０Ａ，２０Ｂ，２０Ｃの記憶部の構成を説明するための図The figure for demonstrating the structure of the memory | storage part of each connection apparatus 20A, 20B, 20C.

本発明の実施の形態に係る音声認識装置は、音声認識対象の候補の文字列、つまり「データベース（楽曲名や人名など）」が、どの装置に配置されているかを判断し、その判断結果に従い、どの装置の音声認識手段を使用するかを選択することができる。 The speech recognition apparatus according to the embodiment of the present invention determines to which apparatus a character string of a candidate for speech recognition, that is, “database (music name, person name, etc.)” is arranged, and according to the determination result. , It is possible to select which device voice recognition means is used.

以下、本発明の実施の形態における音声認識装置について図面を参照しながら説明する。図１は本発明の実施の形態における音声認識装置１０のブロック図である。図１に示す音声認識装置１０は音声認識部１０１と、接続部１０２と、制御部１０３と、記憶部１０４と、Ｇ２Ｐ変換部１０５と、音声認識辞書１０６と、マイク１０７とを備える。図１に示す接続機器２０は音声認識部２０１と、接続部２０２と、記憶部２０４とを備える。 Hereinafter, a speech recognition apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram of a speech recognition apparatus 10 according to an embodiment of the present invention. The speech recognition apparatus 10 illustrated in FIG. 1 includes a speech recognition unit 101, a connection unit 102, a control unit 103, a storage unit 104, a G2P conversion unit 105, a speech recognition dictionary 106, and a microphone 107. The connected device 20 illustrated in FIG. 1 includes a voice recognition unit 201, a connection unit 202, and a storage unit 204.

音声認識装置１０の具体例としては、車載端末、ナビゲーションシステム、車載オーディオが挙げられる。接続機器２０の具体例としては、車内に持ち込んだスマートフォン、ポータブルオーディオ機器、電話機、等が挙げられる。音声認識装置１０は接続機器２０と有線接続又は無線通信のインターフェースにより接続される。有線接続のインターフェースには、たとえばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）がある。無線通信のインターフェースには、たとえばＷｉＦｉ（ＷｉｒｅｌｅｓｓＦｉｄｅｌｉｔｙ）（登録商標）、およびＢｌｕｅｔｏｏｔｈ（登録商標）がある。 Specific examples of the speech recognition device 10 include an in-vehicle terminal, a navigation system, and an in-vehicle audio. Specific examples of the connection device 20 include a smartphone, a portable audio device, a telephone, etc. brought into the vehicle. The voice recognition device 10 is connected to the connection device 20 through a wired connection or a wireless communication interface. An example of a wired connection interface is USB (Universal Serial Bus). Wireless communication interfaces include, for example, WiFi (Wireless Fidelity) (registered trademark) and Bluetooth (registered trademark).

以下、本実施の形態では、音声認識装置１０は車載オーディオ、接続機器２０はポータブルオーディオ機器の場合を具体例として説明する。 Hereinafter, in the present embodiment, a case where the voice recognition device 10 is an in-vehicle audio and the connection device 20 is a portable audio device will be described as a specific example.

接続機器２０は、音声認識部２０１と、接続部２０２と、記憶部２０４とを備える。記憶部２０４には、図２のように“楽曲データ（音楽ファイル）”およびこの楽曲データに付随する“楽曲情報データ”が蓄えられている。 The connected device 20 includes a voice recognition unit 201, a connection unit 202, and a storage unit 204. As shown in FIG. 2, the storage unit 204 stores “music data (music file)” and “music information data” attached to the music data.

図２に示す楽曲情報データは、この楽曲データに付随するメタデータである。楽曲情報データは、楽曲データの曲名の他に、例えばこの楽曲データが含まれるアルバム名、楽曲データを所有するアーティスト名、楽曲データのジャンル名を含んでも良い。 The music information data shown in FIG. 2 is metadata accompanying the music data. The music information data may include, for example, the name of the album in which the music data is included, the name of the artist who owns the music data, and the genre name of the music data in addition to the music data.

接続部２０２は、音声認識装置１０の接続部１０２に接続され、記憶部２０４が保持するデータを音声認識装置１０に送信し、音声認識装置１０の制御部１０３からの指示を受信する。接続部２０２は、音声認識装置１０の制御部１０３からの指示を音声認識部２０１へ出力する。音声認識部２０１は、内蔵する音声認識辞書を用いて音声認識装置１０の制御部１０３からの指示に基づき音声認識処理を行うことが可能である。 The connection unit 202 is connected to the connection unit 102 of the voice recognition device 10, transmits data held in the storage unit 204 to the voice recognition device 10, and receives an instruction from the control unit 103 of the voice recognition device 10. The connection unit 202 outputs an instruction from the control unit 103 of the voice recognition device 10 to the voice recognition unit 201. The speech recognition unit 201 can perform speech recognition processing based on an instruction from the control unit 103 of the speech recognition apparatus 10 using a built-in speech recognition dictionary.

接続機器２０は接続部２０２を介して、“楽曲情報データ”を音声認識装置１０に提供可能な機器と、提供不可能な機器が存在する。その理由として、（１）接続機器２０の仕様による制限、（２）音声認識装置１０と接続機器２０間の接続のインターフェース仕様による制限が挙げられる。 There are devices that can provide “music information data” to the speech recognition apparatus 10 and devices that cannot be provided via the connection unit 202. The reasons include (1) restrictions due to the specifications of the connection device 20 and (2) restrictions due to the interface specifications of the connection between the speech recognition apparatus 10 and the connection device 20.

以下、本実施の形態では、接続機器２０の“楽曲情報データ”を音声認識装置１０が取得している場合＜ケース１＞と、接続機器２０の“楽曲情報データ”を音声認識装置１０が取得していない場合＜ケース２＞のそれぞれのケースについて、音声認識装置１０の動作を説明する。 Hereinafter, in the present embodiment, when the voice recognition device 10 acquires “music information data” of the connected device 20 <case 1>, the voice recognition device 10 acquires “music information data” of the connected device 20. If not, the operation of the speech recognition apparatus 10 will be described for each case of <Case 2>.

＜ケース１＞
ケース１として、接続機器２０の“楽曲情報データ”を音声認識装置１０が取得している場合の音声認識装置１０の各部の動作について説明する。 <Case 1>
As Case 1, the operation of each unit of the speech recognition device 10 when the speech recognition device 10 acquires the “music information data” of the connected device 20 will be described.

接続部１０２は接続機器２０と接続して情報（“楽曲情報データ”等）を送受信する。 The connection unit 102 is connected to the connection device 20 and transmits / receives information (such as “music information data”).

記憶部１０４は接続機器２０より取得した「音声認識対象の文字データ」を記憶する。ここで、「音声認識対象の文字データ」の具体例としては、接続機器２０に蓄えられた“楽曲情報データ”が挙げられる。この“楽曲情報データ”には、楽曲名、アルバム名、アーティスト名、およびジャンル名などが含まれる。なお、制御部１０３は、音声認識装置１０が接続機器２０から“楽曲情報データ”を取得した際に、“楽曲情報データ”とこの“楽曲情報データ”の取得先である接続機器２０とを関連付けて記憶部１０４に保持する。 The storage unit 104 stores “character data for speech recognition” acquired from the connected device 20. Here, as a specific example of “character data subject to speech recognition”, “music information data” stored in the connected device 20 can be cited. The “music information data” includes a music name, an album name, an artist name, a genre name, and the like. When the speech recognition apparatus 10 acquires “music information data” from the connection device 20, the control unit 103 associates “music information data” with the connection device 20 from which the “music information data” is acquired. And stored in the storage unit 104.

図３に、音声認識装置１０の記憶部１０４に蓄えられた“楽曲情報データ”の一例を示す。図３に示すように、記憶部１０４は、“楽曲データ（音楽ファイル）”以外に、楽曲情報データとして、曲名リスト（“曲名１”、“曲名２”、…“曲名Ｎ”）を含む。図３に示すように、本実施の形態では、音声認識装置１０の記憶部１０４に蓄えられた“楽曲情報データ”は、この“楽曲情報データ”の取得先である接続機器２０に関連付けられている。 FIG. 3 shows an example of “music information data” stored in the storage unit 104 of the speech recognition apparatus 10. As shown in FIG. 3, the storage unit 104 includes a song name list (“song name 1”, “song name 2”,... “Song name N”) as song information data in addition to “music data (music file)”. As shown in FIG. 3, in the present embodiment, “music information data” stored in the storage unit 104 of the speech recognition apparatus 10 is associated with the connected device 20 from which the “music information data” is acquired. Yes.

Ｇ２Ｐ変換部１０５は、記憶部１０４に記憶された「音声認識対象の文字データ」から音声認識処理用の音素への変換を行う。これは一般に「Ｇｒａｐｈｅｍｅ：文字もしくは書記素」から「Ｐｈｏｎｅｍｅ：音素」へ変換する処理であり、Ｇ２Ｐ（ＧｒａｐｈｅｍｅＴｏＰｈｏｎｅｍｅ）変換と呼ばれる。 The G2P conversion unit 105 performs conversion from “character data to be recognized by speech recognition” stored in the storage unit 104 to phonemes for speech recognition processing. This is a process of converting from “Grapheme: character or grapheme” to “Phonee: phoneme”, and is called G2P (Grapheme To Phoneme) conversion.

音声認識辞書１０６は、音声認識処理における辞書や文法を記述したものであり、基本的にはこの辞書に記載されている内容が認識可能な語彙を決定する。また、音声認識辞書１０６は、音声認識装置１０が動作中に動的に変更することも可能である。これはＧ２Ｐ変換する元となる文字列群が得られれば、その文字列群をＧ２Ｐ変換して音声認識辞書１０６を動的に変更して認識対象の語彙を変更することで実現できる。 The speech recognition dictionary 106 describes a dictionary and grammar in speech recognition processing, and basically determines a vocabulary that can recognize the contents described in this dictionary. The voice recognition dictionary 106 can also be changed dynamically while the voice recognition device 10 is operating. If a character string group that is a source of G2P conversion is obtained, this can be realized by G2P converting the character string group and dynamically changing the speech recognition dictionary 106 to change the vocabulary to be recognized.

例えば“楽曲情報データ”を取得してＧ２Ｐ変換済の場合には、音声認識辞書１０６は、図４に示すように固定コマンドの一例として“ＰｌａｙＭｕｓｉｃ”、“Ｓｔｏｐ”、“Ｐａｕｓｅ”という語彙を保持する。また、音声認識辞書１０６は、これら固定コマンドに加えて、可変コマンド、つまり接続機器２０から取得した情報に基づく文字データである曲名リストの“曲名１”、“曲名２”、…“曲名Ｎ”という語彙を含んで良い。ここでＮは数字であり、昨今のポータブルオーディオ機器であれば数千から数万になる場合もある。 For example, when “music information data” is acquired and converted to G2P, the speech recognition dictionary 106 uses the vocabulary “Play Music”, “Stop”, and “Pause” as examples of fixed commands as shown in FIG. Hold. In addition to these fixed commands, the speech recognition dictionary 106 also includes variable commands, that is, “Song Title 1”, “Song Title 2”,... “Song Title N” in the title list which is character data based on information acquired from the connected device 20. May be included. Here, N is a number, which may be thousands to tens of thousands in the case of recent portable audio devices.

音声認識部１０１は、音声認識辞書１０６に記述された辞書や文法に基づき、音声認識処理を行う。つまり、音声認識部１０１は、音声認識辞書１０６に記述された語彙のうち、マイク１０７より取り込んだ音声と良くマッチする語彙をユーザが発話した音声だと判定すると、この語彙が発話されたと見做して音声認識処理を行う。例えば、音声認識部１０１は、音声認識辞書１０６に“ＰｌａｙＭｕｓｉｃ”、“Ｓｔｏｐ”、“Ｐａｕｓｅ”という語彙に相当する内容が登録されている場合には、ユーザが発話した音声とこれらの登録内容とのマッチング処理により、良くマッチする語彙を選出する。そして、音声認識部１０１は、選出した語彙が発話されたと見做して音声認識処理を行う。 The speech recognition unit 101 performs speech recognition processing based on the dictionary and grammar described in the speech recognition dictionary 106. That is, when the speech recognition unit 101 determines that a vocabulary that matches well with the speech captured from the microphone 107 among the vocabulary described in the speech recognition dictionary 106 is speech uttered by the user, the speech recognition unit 101 considers that this vocabulary has been uttered. Then, voice recognition processing is performed. For example, when the content corresponding to the words “Play Music”, “Stop”, and “Pause” is registered in the speech recognition dictionary 106, the speech recognition unit 101 and the registered content of the speech uttered by the user. Vocabulary that matches well is selected by matching process. Then, the speech recognition unit 101 performs speech recognition processing assuming that the selected vocabulary is spoken.

制御部１０３は、接続機器２０に記憶された“楽曲情報データ”が接続部１０２を介して記憶部１０４に転送され蓄積されているか否かを判断する。制御部１０３は、音声認識装置１０が接続機器２０から“楽曲情報データ”を取得していると判断した場合には、マイク１０７より取り込んだ音声を音声認識部１０１にルーティングもしくは転送する。制御部１０３は、この転送又はルーティングされた音声に基づき音声認識処理を実施するように音声認識部１０１に指示する。なお、制御部１０３は、音声認識装置１０が接続機器２０から“楽曲情報データ”を取得した際に、“楽曲情報データ”とこの“楽曲情報データ”の取得先である接続機器２０とを関連付けて記憶部１０４に保持する。 The control unit 103 determines whether or not “music information data” stored in the connected device 20 is transferred to and stored in the storage unit 104 via the connection unit 102. When it is determined that the voice recognition device 10 has acquired “music information data” from the connected device 20, the control unit 103 routes or transfers the voice captured from the microphone 107 to the voice recognition unit 101. The control unit 103 instructs the voice recognition unit 101 to perform voice recognition processing based on the transferred or routed voice. When the speech recognition apparatus 10 acquires “music information data” from the connection device 20, the control unit 103 associates “music information data” with the connection device 20 from which the “music information data” is acquired. And stored in the storage unit 104.

なお、ケース１において、接続機器２０に記憶された“楽曲情報データ”が記憶部１０４に転送され蓄積されているか否かの判断の代わりに、制御部１０３は、音声認識辞書１０６に接続機器２０の“楽曲情報データ”相当の語彙が追加されているか否かで判断しても良い。 In Case 1, instead of determining whether or not the “music information data” stored in the connected device 20 is transferred to and stored in the storage unit 104, the control unit 103 stores the connected device 20 in the voice recognition dictionary 106. It may be determined whether or not a vocabulary equivalent to “music information data” is added.

＜ケース２＞
ケース２として、接続機器２０の“楽曲情報データ”を音声認識装置１０が取得していない場合の音声認識装置１０の各部の動作について説明する。この場合、音声認識装置１０は、接続部１０２が接続機器２０と接続しているものの、接続機器２０から楽曲情報データ”を取得できないものとする。つまり、接続機器２０は、接続部２０２を介して、“楽曲情報データ”を音声認識装置１０に提供不可能な機器であるとする。 <Case 2>
As Case 2, the operation of each part of the speech recognition device 10 when the speech recognition device 10 has not acquired the “music information data” of the connected device 20 will be described. In this case, it is assumed that the speech recognition apparatus 10 cannot acquire the music information data “from the connection device 20 although the connection unit 102 is connected to the connection device 20. That is, the connection device 20 is connected via the connection unit 202. Therefore, it is assumed that the “music information data” cannot be provided to the voice recognition device 10.

図５は、ケース２における記憶部１０４の概念図である。上述のように音声認識装置１０は、接続機器２０から楽曲情報データ”を取得できない。そのため、記憶部１０４は、図５に示すように、楽曲データに曲名リストが無い状態となる。 FIG. 5 is a conceptual diagram of the storage unit 104 in Case 2. As described above, the speech recognition apparatus 10 cannot acquire the music information data ”from the connected device 20. For this reason, as shown in FIG. 5, the storage unit 104 has no music title list.

Ｇ２Ｐ変換部１０５は、記憶部１０４に記憶された「音声認識対象の文字データ」から音声認識処理用の音素への変換を行う。つまり、Ｇ２Ｐ変換部１０５は、記憶部１０４に記憶した文字データを変換するが、元となる文字データ（“楽曲情報データ”）が記憶部１０４に無い（空集合）ために、その結果作成される音素も無い（空集合）ものとなる。 The G2P conversion unit 105 performs conversion from “character data to be recognized by speech recognition” stored in the storage unit 104 to phonemes for speech recognition processing. In other words, the G2P conversion unit 105 converts the character data stored in the storage unit 104, but the original character data ("music information data") is not stored in the storage unit 104 (empty set), and thus is generated as a result. There is no phoneme (empty set).

音声認識辞書１０６は、その結果、例えば図６に示すように、固定コマンドの一例として“ＰｌａｙＭｕｓｉｃ”、“Ｓｔｏｐ”、“Ｐａｕｓｅ”という語彙を保持するが、動的に変更される可変コマンドの一例として“楽曲情報データ”に基づく語彙（例えば、曲名リスト）を保持しない。 As a result, as shown in FIG. 6, for example, the speech recognition dictionary 106 holds the vocabulary “Play Music”, “Stop”, and “Pause” as an example of a fixed command. As an example, a vocabulary (for example, a song name list) based on “music information data” is not held.

制御部１０３は、接続機器２０に記憶された“楽曲情報データ”が接続部１０２を介して記憶部１０４に転送され蓄積されているか否かを判断する。制御部１０３は、音声認識装置１０が接続機器２０から“楽曲情報データ”を未取得と判断した場合には、制御部１０３は、マイク１０７より取り込んだ音声を、自装置の音声認識部１０１ではなく、接続機器２０の音声認識部２０１へとルーティングもしくは転送し、接続機器２０の音声認識部２０１で音声認識処理を実施するように制御する。 The control unit 103 determines whether or not “music information data” stored in the connected device 20 is transferred to and stored in the storage unit 104 via the connection unit 102. When the voice recognition device 10 determines that the “music information data” has not been acquired from the connected device 20, the control unit 103 uses the voice recognition unit 101 of its own device to pick up the voice captured from the microphone 107. Rather, it is routed or transferred to the voice recognition unit 201 of the connected device 20, and the voice recognition unit 201 of the connected device 20 is controlled to perform voice recognition processing.

ここで、接続機器２０の音声認識部２０１の動作について説明する。音声認識部２０１は、制御部１０３の制御に基づき、音声認識装置１０のマイク１０７から転送又はルーティングされた音声について、音声認識処理を行う。つまり、音声認識部２０１は、内蔵する音声認識辞書に記述された語彙のうち、音声認識装置１０のマイク１０７から取り込んだ音声と良くマッチする語彙をユーザが発話した音声だと判定すると、この語彙が発話されたと見做して音声認識処理を行う。例えば、音声認識部２０１は、内蔵する音声認識辞書に“ＰｌａｙＭｕｓｉｃ”、“Ｓｔｏｐ”、“Ｐａｕｓｅ”という語彙に相当する内容が登録されている場合には、ユーザが発話した音声とこれらの登録内容とのマッチング処理により、良くマッチする語彙を選出する。そして、音声認識部２０１は、選出した語彙が発話されたと見做して音声認識処理を行う。 Here, the operation of the voice recognition unit 201 of the connected device 20 will be described. The voice recognition unit 201 performs voice recognition processing on the voice transferred or routed from the microphone 107 of the voice recognition device 10 based on the control of the control unit 103. That is, when the speech recognition unit 201 determines that a vocabulary that matches well with the speech captured from the microphone 107 of the speech recognition device 10 among the vocabulary described in the built-in speech recognition dictionary is speech uttered by the user, The voice recognition process is performed assuming that is spoken. For example, when the content corresponding to the vocabulary “Play Music”, “Stop”, and “Pause” is registered in the built-in speech recognition dictionary, the speech recognition unit 201 registers the speech uttered by the user and these Select matching vocabulary by matching content. Then, the speech recognition unit 201 performs speech recognition processing assuming that the selected vocabulary is spoken.

なお、ケース２においても、接続機器２０に記憶された“楽曲情報データ”が記憶部１０４に転送され蓄積されているか否かの判断の代わりに、制御部１０３は、音声認識辞書１０６に接続機器２０の“楽曲情報データ”相当の語彙が追加されているか否かで判断することができる。 In case 2 as well, instead of determining whether or not the “music information data” stored in the connected device 20 is transferred to and stored in the storage unit 104, the control unit 103 connects the connected device to the voice recognition dictionary 106. It can be determined by whether or not a vocabulary equivalent to 20 “music information data” has been added.

以上のように、本実施の形態に係る音声認識装置１０によれば、接続機器２０から文字データを取得しているか否かを判断し、その判断結果に基いてマイク１０７より取り込んだ音声を音声認識部１０１で認識させるか否かを制御する制御部１０３を備える。この構成により、本実施の形態に係る音声認識装置１０は、同じ使用用途（例えば、再生する楽曲の選択）であったとしても、音声認識対象の候補の文字列の配置に従い、使用したい音声認識部を使い分けることができる。したがって、本実施の形態に係る音声認識装置１０は、どの装置の音声認識部を使用するかを選択することができる。 As described above, according to the speech recognition apparatus 10 according to the present embodiment, it is determined whether or not character data is acquired from the connected device 20, and the speech captured from the microphone 107 is determined based on the determination result. A control unit 103 that controls whether the recognition unit 101 recognizes or not is provided. With this configuration, the speech recognition apparatus 10 according to the present exemplary embodiment recognizes speech recognition to be used according to the arrangement of candidate character strings to be recognized even if they have the same usage (for example, selection of music to be played). You can use different parts. Therefore, the speech recognition device 10 according to the present embodiment can select which device the speech recognition unit is to use.

なお、本実施の形態においては「音声認識対象の文字データ」の一例として“楽曲情報データ”として“曲名”を具体例にして説明したが曲名に限定されるものではない。例えばアルバム名、アーティスト名、ジャンル名、等でも良い。 In the present embodiment, “song name” is described as a specific example of “music information data” as an example of “text data to be recognized by speech”, but the present invention is not limited to the name of a song. For example, an album name, an artist name, a genre name, etc. may be used.

なお、本実施の形態においては「音声認識対象の文字データ」の一例として“楽曲情報データ”を取り上げたが、文字データ楽曲情報データに限定されるものではないことは明らかである。例えば「音声認識対象の文字データ」を“電話帳データ”とすることにより、接続機器２０が電話機やスマートフォンの場合に、“人名”や“電話番号”を対象とした音声認識を実施する際に容易に適用できる。さらに例えば「音声認識対象の文字データ」を“地名データ”とすることにより、接続機器２０が地名データを含む機器である場合などでも応用可能である。 In the present embodiment, “music information data” is taken as an example of “character data for speech recognition”, but it is obvious that the present invention is not limited to character data music information data. For example, when “character data for voice recognition” is set to “phonebook data”, when the connected device 20 is a telephone or a smartphone, voice recognition for “person name” or “phone number” is performed. Easy to apply. Furthermore, for example, by setting “character data for speech recognition” to “place name data”, the present invention can be applied even when the connected device 20 is a device including place name data.

電話帳データは、“名称（名前）”のデータと、この“名称（名前）のデータ”に付随する少なくとも１つ以上の電話番号情報データを含んで良い。電話番号情報データは、この名称（名前）に紐付けられた少なくとも１つ以上の電話番号の属性情報（携帯/仕事/家庭/その他）などの情報を含んで良い。なお、電話帳データに含まれる“名称”のデータとは、人名、会社名、ニックネーム等の文字列のデータである。また、「音声認識対象の文字データ」としては、この“名称”のデータが用いられるのが通常である。 The phone book data may include “name (name)” data and at least one or more telephone number information data attached to the “name (name) data”. The telephone number information data may include information such as attribute information (mobile / work / home / other) of at least one or more telephone numbers associated with this name (name). The “name” data included in the phone book data is character string data such as a person name, a company name, and a nickname. In addition, the “name” data is usually used as the “character data for speech recognition”.

なお、本実施の形態においては、音声認識辞書１０６として、固定コマンドを元からある部分、可変コマンドを接続機器２０から取得した情報に基づく部分として説明したが、これに限られるものではない。図７に音声認識辞書１０６の構成を示す他の例を示す。図７に示すように、例えば音声認識装置１０自身が音楽データを保有している場合は、音声認識辞書１０６は、固定コマンドとして、“ＰｌａｙＭｕｓｉｃ”、“Ｓｔｏｐ”、“Ｐａｕｓｅ”という語彙以外に、可変コマンドとして、接続機器２０の楽曲情報データに基づく曲名リスト（曲名１、曲名２、…、曲名Ｎ）および音声認識装置１０自身が保持する楽曲情報データに基づく、曲名リスト（曲名Ａ、曲名Ｂ、…、曲名Ｚ）により構成することも可能である。 In the present embodiment, the voice recognition dictionary 106 has been described as a part that is based on a fixed command and a part that is based on information obtained from a connected device 20 that is a variable command. However, the present invention is not limited to this. FIG. 7 shows another example of the configuration of the voice recognition dictionary 106. As shown in FIG. 7, for example, when the speech recognition apparatus 10 itself has music data, the speech recognition dictionary 106 uses a fixed command other than the vocabulary “Play Music”, “Stop”, and “Pause”. As a variable command, a song name list (song name 1, song name 2,..., Song name N) based on song information data of the connected device 20 and a song name list (song name A, song name) based on song information data held by the speech recognition apparatus 10 itself. B,..., Song title Z) can also be used.

なお、本実施の形態において、音声認識装置１０と接続機器２０間の接続のインターフェースは、特に限定されるものではなく、また、複数のインターフェースを複数種類混在させることも可能である。さらに情報の種類も混在させることが可能である。以下、図８を参照して、複数の接続機器２０と本実施の形態に係る音声認識装置１０とが接続された場合を例に説明する。 In the present embodiment, the interface for connection between the speech recognition apparatus 10 and the connection device 20 is not particularly limited, and a plurality of types of interfaces can be mixed. Furthermore, it is possible to mix information types. Hereinafter, a case where a plurality of connected devices 20 and the speech recognition apparatus 10 according to the present embodiment are connected will be described with reference to FIG. 8 as an example.

なお、本実施の形態において、音声認識装置１０は、音声を外部から取り込む手段として、マイク１０７を備えるが、これにかぎらない。音声認識装置１０は、音声を外部から取得するデバイスであればマイク１０７の代わりに用いることができる。 In the present embodiment, the speech recognition apparatus 10 includes the microphone 107 as a means for capturing sound from outside, but this is not a limitation. The voice recognition device 10 can be used in place of the microphone 107 as long as it is a device that acquires voice from the outside.

（変形例）
図８は、本実施の形態における変形例を示すブロック図である。以下、図８に示す複数の接続機器２０を、それぞれ接続機器２０Ａ、接続機器２０Ｂ、接続機器２０Ｃと称し、互いに区別するが、これら接続機器２０Ａ，２０Ｂ，２０Ｃの構成は、記憶部２０４Ａ，２０４Ｂ，２０４Ｃ以外、図１に示す接続機器２０と同じである。 (Modification)
FIG. 8 is a block diagram showing a modification of the present embodiment. Hereinafter, the plurality of connection devices 20 illustrated in FIG. 8 are referred to as connection device 20A, connection device 20B, and connection device 20C, respectively, and are distinguished from each other. The configurations of these connection devices 20A, 20B, and 20C are storage units 204A and 204B. , 204C other than the connection device 20 shown in FIG.

図８に示すように、接続機器２０Ａは、音声認識部２０１と、接続部２０２と、記憶部２０４Ａとを備える。同様に、接続機器２０Ｂは、音声認識部２０１と、接続部２０２と、記憶部２０４Ｂとを備え、接続機器２０Ｃは、音声認識部２０１と、接続部２０２と、記憶部２０４Ｃとを備える。これら接続機器２０Ａ，２０Ｂ，２０Ｃの音声認識部２０１、接続部２０２の動作は、上述した接続機器２０の音声認識部２０１、接続部２０２と同じであるため、詳細な説明を省略する。 As shown in FIG. 8, the connected device 20A includes a voice recognition unit 201, a connection unit 202, and a storage unit 204A. Similarly, the connected device 20B includes a voice recognition unit 201, a connection unit 202, and a storage unit 204B, and the connected device 20C includes a voice recognition unit 201, a connection unit 202, and a storage unit 204C. Since the operations of the voice recognition unit 201 and the connection unit 202 of the connection devices 20A, 20B, and 20C are the same as those of the voice recognition unit 201 and the connection unit 202 of the connection device 20 described above, detailed description thereof is omitted.

図９を参照して、各接続機器２０Ａ，２０Ｂ，２０Ｃの記憶部の構成を説明する。図９は各接続機器２０Ａ，２０Ｂ，２０Ｃの記憶部の構成を説明するための図である。図９では、説明のため、接続機器２０Ａ，２０Ｂ，２０Ｃの構成を一部省略し、記憶部２０４Ａ，２０４Ｂ，２０４Ｃの構成のみを示している。 With reference to FIG. 9, the structure of the memory | storage part of each connection apparatus 20A, 20B, 20C is demonstrated. FIG. 9 is a diagram for explaining the configuration of the storage unit of each of the connected devices 20A, 20B, and 20C. In FIG. 9, for the sake of explanation, some of the configurations of the connected devices 20A, 20B, and 20C are omitted, and only the configurations of the storage units 204A, 204B, and 204C are shown.

接続機器２０Ａの記憶部２０４Ａは、楽曲データに加え、「音声認識対象の文字データ」となる情報である“楽曲情報データＡ”を保持する。この“楽曲情報データＡ”には、この楽曲情報データＡに基づく曲名リスト（曲名１、曲名２、…、曲名Ｎ）が含まれる。 The storage unit 204A of the connected device 20A holds “music information data A” which is information that becomes “character data to be recognized by voice” in addition to the music data. The “music information data A” includes a music title list (music title 1, music title 2,..., Music title N) based on the music information data A.

接続機器２０Ｂの記憶部２０４Ｂは、楽曲データに加え、「音声認識対象の文字データ」となる情報である“楽曲情報データＢ”を保持する。この“楽曲情報データＢ”には、この楽曲情報データＢに基づく曲名リスト（曲名Ａ、曲名Ｂ、…、曲名Ｚ）が含まれる。 In addition to the music data, the storage unit 204B of the connected device 20B holds “music information data B” that is information that becomes “character data to be recognized by voice”. The “music information data B” includes a music name list (song name A, music name B,..., Music name Z) based on the music information data B.

接続機器２０Ｃの記憶部２０４Ｃは、電話番号のデータに加え、「音声認識対象の文字データ」となる情報である“電話帳データＣ”を保持する。この“電話帳データＣ”には、この電話帳データＣに基づく人名リスト（人名α、人名β、…、人名ω）が含まれる。 The storage unit 204C of the connected device 20C holds “phone book data C”, which is information that becomes “voice recognition target character data”, in addition to the telephone number data. The “phone book data C” includes a list of person names (person name α, person name β,..., Person name ω) based on the phone book data C.

接続部１０２は、各接続機器２０Ａ，２０Ｂ，２０Ｃから「音声認識対象の文字データ」となる情報（“楽曲情報データＡ”、“電話帳データＣ”）を受信し、制御部１０３からの指示を各接続機器２０Ａ，２０Ｂ，２０Ｃに送信する。 The connection unit 102 receives information (“music information data A” and “phone book data C”) that is “character data for speech recognition” from each of the connected devices 20A, 20B, and 20C, and receives an instruction from the control unit 103. Is transmitted to each connected device 20A, 20B, 20C.

以下、音声認識装置１０は、接続機器２０Ａから“楽曲情報データＡ”を取得済であり、接続機器２０Ｃから“電話帳データＣ”を取得済みであるとする。さらに、音声認識装置１０は、接続機器２０Ｂからは“楽曲情報データＢ”を未取得であるとする。そのため、音声認識装置１０の記憶部１０４は、“楽曲情報データＡ”および“電話帳データＣ”を保持する（図９参照）。言い換えると、上述した＜ケース１＞の接続機器２０に対応するのが接続機器２０Ａ，２０Ｃであり、上述した＜ケース２＞の接続機器２０に対応するのが接続機器２０Ｂである。なお、記憶部１０４には、“楽曲情報データＡ”とこの“楽曲情報データＡ”の取得先である接続機器２０Ａとを関連付けて保持されている。同様に、記憶部１０４には、“電話帳データＣ”とこの“電話帳データＣ”の取得先である接続機器２０Ｃとを関連付けて保持されている。 Hereinafter, it is assumed that the speech recognition apparatus 10 has already acquired “music information data A” from the connected device 20A and has already acquired “phone book data C” from the connected device 20C. Furthermore, it is assumed that the speech recognition apparatus 10 has not acquired “music information data B” from the connected device 20B. Therefore, the storage unit 104 of the speech recognition apparatus 10 holds “music information data A” and “phone book data C” (see FIG. 9). In other words, the connection devices 20A and 20C correspond to the connection device 20 of <Case 1> described above, and the connection device 20B corresponds to the connection device 20 of <Case 2> described above. The storage unit 104 holds “music information data A” and the connected device 20A from which the “music information data A” is acquired in association with each other. Similarly, the storage unit 104 holds “phone book data C” and the connected device 20C from which the “phone book data C” is acquired in association with each other.

Ｇ２Ｐ変換部１０５は、記憶部１０４に記憶された「音声認識対象の文字データ」から音声認識処理用の音素への変換を行う。 The G2P conversion unit 105 performs conversion from “character data to be recognized by speech recognition” stored in the storage unit 104 to phonemes for speech recognition processing.

音声認識辞書１０６は、固定コマンドとして、“ＰｌａｙＭｕｓｉｃ”、“Ｓｔｏｐ”、“Ｐａｕｓｅ”という語彙以外に、電話機能に関する“ｃａｌｌ”という語彙を保持する。さらに、音声認識装置１０の音声認識辞書１０６は、可変コマンドとして、接続機器２０Ａの楽曲情報データＡに基づく曲名リスト（曲名１、曲名２、…、曲名Ｎ）および接続機器２０Ｃの電話帳データＣに基づく人名リスト（人名α、人名β、…、人名ω）を保持する。 The speech recognition dictionary 106 holds a vocabulary “call” related to the telephone function in addition to the vocabulary “Play Music”, “Stop”, and “Pause” as fixed commands. Further, the speech recognition dictionary 106 of the speech recognition apparatus 10 uses, as variable commands, a song name list (song name 1, song name 2,..., Song name N) based on the song information data A of the connected device 20A and the phone book data C of the connected device 20C. A personal name list (person name α, person name β,..., Person name ω) based on is stored.

音声認識部１０１は、後述する制御部１０３の指示に基づき、音声認識辞書１０６に記述された語彙のうち、マイク１０７より取り込んだ音声と良くマッチする語彙をユーザが発話した音声だと判定すると、この語彙が発話されたと見做して音声認識処理を行う。 When the speech recognition unit 101 determines, based on an instruction from the control unit 103 described later, a vocabulary that matches well with the speech captured from the microphone 107 among the vocabulary described in the speech recognition dictionary 106, is a speech uttered by the user. Speech recognition processing is performed assuming that this vocabulary is spoken.

制御部１０３は、各接続機器２０Ａ，２０Ｂ，２０Ｃに保持された“楽曲情報データ”又は“電話帳データ”が接続部１０２を介して記憶部１０４に転送されているか否かを判断する。制御部１０３は、音声認識装置１０が各接続機器２０Ａ，２０Ｂ，２０Ｃのいずれかの接続機器から“楽曲情報データ”又は“電話帳データ”を取得していると判断した場合には、マイク１０７より取り込んだ音声を音声認識部１０１にルーティングもしくは転送する。制御部１０３は、この転送又はルーティングされた音声に基づき音声認識処理を実施するように音声認識部１０１に指示する。
なお、制御部１０３は、音声認識装置１０が接続機器２０Ａから“楽曲情報データＡ”を取得した際に、“楽曲情報データＡ”とこの“楽曲情報データＡ”の取得先である接続機器２０Ａとを関連付けて記憶部１０４に保持する。同様に、制御部１０３は、音声認識装置１０が接続機器２０Ｃから“電話帳データＣ”を取得した際に、“電話帳データＣ”とこの“電話帳データＣ”の取得先である接続機器２０Ｃとを関連付けて記憶部１０４に保持する。 The control unit 103 determines whether “music information data” or “phone book data” held in each of the connected devices 20A, 20B, and 20C has been transferred to the storage unit 104 via the connection unit 102. When the control unit 103 determines that the voice recognition device 10 has acquired “music information data” or “phone book data” from any one of the connected devices 20A, 20B, and 20C, the microphone 107 The captured voice is routed or transferred to the voice recognition unit 101. The control unit 103 instructs the voice recognition unit 101 to perform voice recognition processing based on the transferred or routed voice.
When the voice recognition apparatus 10 acquires “music information data A” from the connected device 20A, the control unit 103 acquires “music information data A” and the connected device 20A from which the “music information data A” is acquired. Are stored in the storage unit 104. Similarly, when the speech recognition apparatus 10 acquires “phone book data C” from the connected device 20C, the control unit 103 connects “phone book data C” and the connected device from which the “phone book data C” is acquired. And stored in the storage unit 104.

この変形例では、記憶部１０４は、“楽曲情報データＡ”に関する「音声認識対象の文字データ」、つまり曲名リスト（曲名１，曲名２，…，曲名Ｎ）を保持している（図９参照）。したがって、制御部１０３は、音声認識装置１０が接続機器２０Ａから“楽曲情報データ”を取得していると判断し、マイク１０７より取り込んだ音声を音声認識部１０１にルーティングもしくは転送する。そして、制御部１０３は、この転送又はルーティングされた音声に基づき音声認識処理を実施するように音声認識部１０１に指示する。 In this modification, the storage unit 104 holds “text data for speech recognition” relating to “music information data A”, that is, a song title list (song title 1, song title 2,..., Song title N) (see FIG. 9). ). Therefore, the control unit 103 determines that the voice recognition device 10 has acquired “music information data” from the connected device 20 </ b> A, and routes or transfers the voice captured from the microphone 107 to the voice recognition unit 101. Then, the control unit 103 instructs the voice recognition unit 101 to perform a voice recognition process based on the transferred or routed voice.

また、この変形例では、記憶部１０４は、“電話帳データＣ”に関する「音声認識対象の文字データ」、つまり人名リスト（人名α，人名β，…，人名ω）を保持している（図９参照）。したがって、制御部１０３は、音声認識装置１０が接続機器２０Ｃから“電話帳データＣ”を取得していると判断し、マイク１０７より取り込んだ音声を音声認識部１０１にルーティングもしくは転送する。そして、制御部１０３は、この転送又はルーティングされた音声に基づき音声認識処理を実施するように音声認識部１０１に指示する。 Further, in this modification, the storage unit 104 holds “character data for speech recognition” relating to “phonebook data C”, that is, a personal name list (person name α, person name β,..., Person name ω) (FIG. 9). Therefore, the control unit 103 determines that the voice recognition apparatus 10 has acquired “phone book data C” from the connected device 20 </ b> C, and routes or transfers the voice captured from the microphone 107 to the voice recognition unit 101. Then, the control unit 103 instructs the voice recognition unit 101 to perform a voice recognition process based on the transferred or routed voice.

この変形例では、記憶部１０４は、“楽曲情報データＢ”に関する「音声認識対象の文字データ」、つまり曲名リスト（曲名Ａ，曲名Ｂ，…，曲名Ｚ）を保持していない（図９参照）。したがって、制御部１０３は、接続機器２０Ｂに記憶された“楽曲情報データＢ”が接続部１０２を介して記憶部１０４に転送され蓄積されているか否かを判断する。制御部１０３は、音声認識装置１０が接続機器２０Ｂから“楽曲情報データＢ”を未取得と判断し、制御部１０３は、マイク１０７より取り込んだ音声を、接続機器２０Ｂの音声認識部２０１へとルーティングもしくは転送し、接続機器２０Ｂの音声認識部２０１で音声認識処理を実施するように制御する。 In this modification, the storage unit 104 does not hold “character data for speech recognition” regarding “music information data B”, that is, a music title list (music title A, music title B,..., Music title Z) (see FIG. 9). ). Therefore, the control unit 103 determines whether or not the “music information data B” stored in the connected device 20B is transferred to and stored in the storage unit 104 via the connection unit 102. The control unit 103 determines that the voice recognition device 10 has not acquired the “music information data B” from the connected device 20B, and the control unit 103 transfers the voice captured from the microphone 107 to the voice recognition unit 201 of the connected device 20B. Routing or transferring is performed, and control is performed so that the voice recognition unit 201 of the connected device 20B performs voice recognition processing.

なお、上記実施の形態およびその変形例において、マイク１０７より取り込んだ音声を接続機器２０へとルーティングもしくは転送する場合には、実際には接続機器２０からさらにサーバなどへの遠隔装置で音声認識を実施する形態も考えられるが、いずれも本発明の基本的な発想の範囲内であることは言うまでもない。 In the above-described embodiment and its modification, when the voice captured from the microphone 107 is routed or transferred to the connection device 20, the voice recognition is actually performed by the remote device from the connection device 20 to a server or the like. Although the form to implement is also considered, it cannot be overemphasized that all are in the range of the fundamental idea of this invention.

本発明の音声認識装置は、同じ使用用途（例．再生する楽曲の選択）であったとしても、音声認識対象の候補の文字列の配置に従い、使用したい音声認識部を使い分けることが出来るという効果を有し、ナビゲーション装置や車載用オーディオ装置などの車載端末等として有用である。 The speech recognition apparatus of the present invention has an effect that even if it is used for the same purpose (for example, selection of music to be played back), it is possible to properly use a speech recognition unit to be used according to the arrangement of candidate character strings for speech recognition. It is useful as an in-vehicle terminal such as a navigation device or an in-vehicle audio device.

１０音声認識装置
１０１音声認識部
１０２接続部
１０３制御部
１０４記憶部
１０５Ｇ２Ｐ変換部
１０６音声認識辞書
１０７マイク
２０接続機器
２０１音声認識部
２０２接続部
２０４記憶部
２０Ａ，２０Ｂ，２０Ｃ接続機器 DESCRIPTION OF SYMBOLS 10 Voice recognition apparatus 101 Voice recognition part 102 Connection part 103 Control part 104 Storage part 105 G2P conversion part 106 Voice recognition dictionary 107 Microphone 20 Connection apparatus 201 Voice recognition part 202 Connection part 204 Storage part 20A, 20B, 20C Connection apparatus

Claims

A speech recognition device that performs speech recognition connected to one or more connected devices,
A microphone that captures audio,
A connection part for connecting to the one or more connection devices;
A storage unit for storing voice recognition target character data acquired from the one or more connected devices;
A conversion unit that performs conversion from character data to be recognized in the storage unit into phonemes for speech recognition processing;
A speech recognition dictionary used for speech recognition including the phonemes converted by the conversion unit;
A speech recognition unit that performs speech recognition using speech captured from the speech recognition dictionary and the microphone;
It is determined whether or not the character data subject to speech recognition is acquired from the one or more connected devices, and whether or not the speech recognition unit recognizes the speech captured from the microphone based on the determination result. A control unit for controlling,
Voice recognition device.

The controller is
When the character data for speech recognition is acquired from the one or more connected devices, the speech recognition unit of its own device recognizes the speech captured from the microphone,
When the character data for speech recognition is not acquired from the one or more connected devices, the voice captured from the microphone is transmitted to the one or more connected devices via the connection unit, Make recognition
The speech recognition apparatus according to claim 1.

The character data for speech recognition is music information data attached to music data and including at least the music title of the music,
The speech recognition apparatus according to claim 1 or 2.

The speech recognition target character data is telephone number information data attached to name data and including attribute information of at least one telephone number associated with the name data.
The speech recognition apparatus according to claim 1 or 2.