JP5426913B2

JP5426913B2 - Speech recognition dictionary editing device and speech recognition device

Info

Publication number: JP5426913B2
Application number: JP2009090143A
Authority: JP
Inventors: 千春武田
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2009-04-02
Filing date: 2009-04-02
Publication date: 2014-02-26
Anticipated expiration: 2029-04-02
Also published as: JP2010243653A

Description

本発明は、音声認識装置において音声認識に用いる音声認識辞書を編集する技術に関するものである。 The present invention relates to a technique for editing a speech recognition dictionary used for speech recognition in a speech recognition apparatus.

音声認識は、テキストと当該テキストの発音データとを登録した音声認識データを、認識対象とする各テキストについて蓄積した音声認識辞書を予め用意し、ユーザの発話した音声にマッチした発音データが登録されている音声認識データに登録されているテキストを、認識テキストとすることにより行われることが一般的である（たとえば、特許文献１）。 For speech recognition, a speech recognition dictionary that stores speech recognition data in which text and pronunciation data of the text are registered is prepared in advance for each text to be recognized, and pronunciation data that matches the speech uttered by the user is registered. Generally, this is performed by using a text registered in the voice recognition data being recognized as a recognized text (for example, Patent Document 1).

また、このような音声認識辞書を作成する技術としては、指定されたテキストのリストに含まれる各テキストについて、テキストツースピーチ(TTS ; Text To Speech)の技術を用いて生成した当該テキストを発声した発音データを当該テキストと共に登録した音声認識データを作成すると共に、各テキストについて作成した音声認識データを格納した音声認識辞書を作成する技術も知られている（たとえば、特許文献２）。 In addition, as a technology for creating such a speech recognition dictionary, for each text included in a specified text list, the corresponding text generated using a text-to-speech (TTS) technology is uttered. There is also known a technique for creating speech recognition data in which pronunciation data is registered together with the text, and creating a speech recognition dictionary storing speech recognition data created for each text (for example, Patent Document 2).

また、このような音声認識辞書を編集する技術としては、ユーザから指定されたテキストの音声認識データに登録された発音データを、ユーザの発話した音声にマッチした発音データに修正する技術も知られている（たとえば、特許文献３）。 In addition, as a technique for editing such a speech recognition dictionary, there is also known a technique for correcting pronunciation data registered in speech recognition data of text designated by the user to pronunciation data that matches the voice spoken by the user. (For example, Patent Document 3).

特開２００８−１５８５１１号公報JP 2008-158511 A 特開２００４−５３９７９号公報JP 2004-53979 A 特開２００７−２４８８８６号公報JP 2007-248886 A

さて、音声認識辞書に登録されている一部の音声認識データの編集を行う機能を備えていない場合、ユーザから指定されたテキストの音声認識辞書に登録された発音データの修正等の編集は、音声認識辞書の全体を作り直すことにより行う必要があるために、効率的な音声認識データの編集を行うことができなかった。 Now, if it does not have a function to edit some of the speech recognition data registered in the speech recognition dictionary, editing such as correction of pronunciation data registered in the speech recognition dictionary of text specified by the user, Since it is necessary to recreate the entire speech recognition dictionary, it has been impossible to efficiently edit speech recognition data.

そこで、本発明は、テキストと当該テキストの発音データとを登録した音声認識データを、認識対象とする各テキストについて蓄積した音声認識辞書に登録されている一部の音声認識データの編集を行う機能を備えていない場合においても、より効率的に、当該一部の音声認識データの編集を行えるようにすることを課題とする。 Therefore, the present invention is a function for editing a part of speech recognition data registered in the speech recognition dictionary stored for each text to be recognized, with the speech recognition data in which the text and the pronunciation data of the text are registered. It is an object of the present invention to make it possible to edit some of the voice recognition data more efficiently even when not provided.

前記課題達成のために、本発明は、テキストと当該テキストの発音を表す発音データとの対応を表す音声認識データを、認識対象とする各テキストについて格納した、人間が発声した音声が表すテキストを認識する音声認識装置に用いられる音声認識辞書を編集する音声認識辞書編集装置に、前記音声認識辞書を記憶する音声認識辞書記憶部と、テキストからの発音データの生成法を規定したヨミ変換ルールを記憶したヨミ変換ルール記憶部と、前記認識対象とする各テキストをｎ（但し、ｎは１以上の整数）個のテキスト毎のグループにグループ分けし、各グループについて、当該グループ内の各テキストと当該テキストの前記ヨミ変換ルールに従って生成した発音データとの対応を表す音声認識データを格納した前記音声認識辞書を、当該グループの音声認識辞書として作成し、前記音声認識辞書記憶部に格納する音声認識辞書生成部と、ユーザによって指定されたテキストである指定テキストについての発音データの生成法がユーザから指定された生成法となるように、前記ヨミ変換ルールを修正するヨミ変換ルール修正部と、前記指定テキストが含まれる前記グループを修正対象グループとして、当該修正対象グループの音声認識辞書を前記音声認識辞書記憶部から削除すると共に、当該修正対象グループ内の各テキストと当該テキストの前記ヨミ変換ルール修正部による修正後のヨミ変換ルールに従って生成した発音データとの対応を表す音声認識データを格納した前記音声認識辞書を、当該グループの新たな音声認識辞書として作成し、前記音声認識辞書記憶部に格納する音声認識辞書修正部とを備えたものである。 To achieve the above object, the present invention stores speech recognition data representing correspondence between a text and pronunciation data representing the pronunciation of the text for each text to be recognized, and represents the text represented by speech uttered by a human being. A speech recognition dictionary editing device that edits a speech recognition dictionary used in a speech recognition device that recognizes a speech recognition dictionary storage unit that stores the speech recognition dictionary, and a yomi conversion rule that defines a method for generating pronunciation data from text The stored Yomi conversion rule storage unit and each of the texts to be recognized are grouped into groups of n (where n is an integer of 1 or more) texts, and for each group, each text in the group The speech recognition dictionary storing speech recognition data representing correspondence with pronunciation data generated according to the reading conversion rule of the text A speech recognition dictionary generation unit that is created as a group speech recognition dictionary and stored in the speech recognition dictionary storage unit, and a generation method in which pronunciation data generation method for a specified text that is a text specified by the user is specified by the user The Yomi conversion rule correction unit for correcting the Yomi conversion rule, and the group including the specified text as the correction target group, the voice recognition dictionary of the correction target group is deleted from the voice recognition dictionary storage unit In addition, the speech recognition dictionary storing speech recognition data representing correspondence between each text in the correction target group and pronunciation data generated according to the reading conversion rule corrected by the reading conversion rule correction unit of the text, Created as a new voice recognition dictionary for the group and stored in the voice recognition dictionary storage unit That is obtained by a voice recognition dictionary modification unit.

ここで、このような音声認識辞書編集装置は、前記音声認識辞書修正部において、前記指定テキストと同じテキストが含まれる各グループの各々を前記修正対象グループとし、各修正対象グループについて、当該修正対象グループの音声認識辞書を前記音声認識辞書記憶部から削除すると共に、当該修正対象グループ内の各テキストと当該テキストの前記ヨミ変換ルール修正部による修正後のヨミ変換ルールに従って生成した発音データとの対応を表す音声認識データを格納した前記音声認識辞書を、当該グループの新たな音声認識辞書として作成し、前記音声認識辞書記憶部に格納するように構成してもよい。 Here, in such a speech recognition dictionary editing device, in the speech recognition dictionary correction unit, each group including the same text as the designated text is set as the correction target group, and the correction target is set for each correction target group. Correspondence between each text in the correction target group and pronunciation data generated in accordance with the Yomi conversion rule after correction by the Yomi conversion rule correction section of the text as well as deleting the voice recognition dictionary of the group from the voice recognition dictionary storage unit The voice recognition dictionary storing the voice recognition data representing can be created as a new voice recognition dictionary of the group and stored in the voice recognition dictionary storage unit.

また、このような音声認識辞書編集装置には、前記認識対象とするテキストが追加されたときに、追加された認識対象とする各テキストをｎ個のテキスト毎のグループにグループ分けし、各グループについて、当該グループ内の各テキストと当該テキストの前記ヨミ変換ルールに従って生成した発音データとの対応を表す音声認識データを格納した前記音声認識辞書を、当該グループの音声認識辞書として作成し、前記音声認識辞書記憶部に追加格納する音声認識辞書追加部を設けるようにしてもよい。 Further, in such a speech recognition dictionary editing device, when the text to be recognized is added, the added text to be recognized is grouped into groups of n texts, and each group The speech recognition dictionary storing speech recognition data representing correspondence between each text in the group and pronunciation data generated according to the reading conversion rule of the text is created as the speech recognition dictionary of the group, and the speech A speech recognition dictionary adding unit for additionally storing in the recognition dictionary storage unit may be provided.

また、このような音声認識辞書編集装置は、前記音声認識辞書修正部において、テキストの前記認識対象とするテキストからの除外が発生したときに、当該除外されたテキストが含まれる前記グループを修正対象グループとして、当該修正対象グループの音声認識辞書を前記音声認識辞書記憶部から削除すると共に、当該修正対象グループから当該除外されたテキストを除いた上で、当該修正対象グループの各テキストと当該テキストの前記ヨミ変換ルールに従って生成した発音データとの対応を表す音声認識データを格納した前記音声認識辞書を、当該グループの新たな音声認識辞書として作成し、前記音声認識辞書記憶部に格納するようにしてもよい。 In addition, when the speech recognition dictionary correction unit causes the text to be excluded from the text to be recognized, the speech recognition dictionary editing device corrects the group including the excluded text. As a group, the speech recognition dictionary of the correction target group is deleted from the speech recognition dictionary storage unit, and after the excluded text is excluded from the correction target group, each text of the correction target group and the text The speech recognition dictionary storing speech recognition data representing correspondence with pronunciation data generated according to the reading conversion rule is created as a new speech recognition dictionary of the group, and stored in the speech recognition dictionary storage unit. Also good.

ここで、前記認識対象とするテキストは、たとえば、楽曲の曲名、アーティスト名、アルバム名、ジャンル名のいずれかとしてもよい。
これらのような音声認識辞書編集装置によれば、認識対象とする各テキストをｎ（但し、ｎは１以上の整数）個のテキスト毎のグループにグループ分けし、グループ毎に音声認識辞書を作成すると共に、テキストと当該テキストの発音を表す発音データとの対応を表す音声認識データの修正や消去を、修正や消去が必要な音声認識データを含む音声認識辞書のみの再作成を行うことにより実現するので、認識対象とするテキスト全てについての音声認識データを格納した単一の音声認識辞書を設け、音声認識辞書に登録されている一部の音声認識データの編集を行う機能を備えずに、当該単一の音声認識辞書を再作成することにより、一部の音声認識データの編集を実現する場合に比べ、より効率的に、音声認識データの編集を行うことができる。 Here, the text to be recognized may be, for example, any one of a song title, artist name, album name, and genre name.
According to these speech recognition dictionary editing devices, each text to be recognized is grouped into groups of n (where n is an integer of 1 or more) texts, and a speech recognition dictionary is created for each group. In addition, the voice recognition data that represents the correspondence between the text and the pronunciation data representing the pronunciation of the text can be corrected or deleted by re-creating only the voice recognition dictionary that contains the voice recognition data that needs to be corrected or deleted Therefore, a single speech recognition dictionary storing speech recognition data for all texts to be recognized is provided, and without a function of editing some speech recognition data registered in the speech recognition dictionary, By re-creating the single speech recognition dictionary, it is possible to edit speech recognition data more efficiently than when editing some speech recognition data. That.

なお、このような音声認識辞書編集装置と、前記音声認識辞書記憶部に格納されている前記各グループの音声認識辞書を用いて、人間が発声した音声が表すテキストを認識する音声認識部とより音声認識装置を構成するようにしてもよい。
また、このような音声認識装置と、楽曲を表す楽曲データを記憶した楽曲データ記憶部と、前記楽曲データ記憶部に記憶されている前記楽曲データを再生する楽曲再生部とよりオーディオ再生装置を構成するようにしてもよい。また、この場合に、前記認識対象とするテキストを、前記楽曲データ記憶部に記憶されている楽曲データが表す楽曲の曲名とし、前記楽曲再生部において、前記音声認識部が認識した曲名の楽曲の楽曲データを再生するようにしてもよい。 The speech recognition dictionary editing apparatus and the speech recognition unit for recognizing text represented by speech uttered by a person using the speech recognition dictionary of each group stored in the speech recognition dictionary storage unit. A speech recognition apparatus may be configured.
Further, an audio playback device is configured by such a voice recognition device, a music data storage unit that stores music data representing music, and a music playback unit that plays back the music data stored in the music data storage unit. You may make it do. In this case, the text to be recognized is the song name of the song represented by the song data stored in the song data storage unit, and the song reproduction unit recognizes the song with the song name recognized by the voice recognition unit. You may make it reproduce | regenerate music data.

このように、本発明によれば、テキストと当該テキストの発音データとを登録した音声認識データを、認識対象とする各テキストについて蓄積した音声認識辞書に登録されている一部の音声認識データの編集を行う機能を備えていない場合に、効率的に、当該一部の音声認識データの編集を行えるようになる。 As described above, according to the present invention, a part of the speech recognition data registered in the speech recognition dictionary stored for each text to be recognized is the speech recognition data in which the text and the pronunciation data of the text are registered. When a function for editing is not provided, it is possible to efficiently edit some of the voice recognition data.

本発明の実施形態に係るオーディオ再生装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio reproduction apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る記録楽曲情報を示す図である。It is a figure which shows the recorded music information which concerns on embodiment of this invention. 本発明の実施形態に係るＨＤＤ記録楽曲用楽曲ＤＢ作成処理とＨＤＤ記録楽曲用音声認識辞書作成処理を示すフローチャートである。It is a flowchart which shows the music DB creation process for HDD recorded music and the voice recognition dictionary creation process for HDD recorded music which concern on embodiment of this invention. 本発明の実施形態に係るＨＤＤ記録楽曲用楽曲ＤＢ作成処理とＨＤＤ記録楽曲用音声認識辞書作成処理の処理例を示す図である。It is a figure which shows the process example of the music DB creation process for HDD recorded music which concerns on embodiment of this invention, and the voice recognition dictionary creation process for HDD recorded music. 本発明の実施形態に係る接続デバイス記録楽曲用楽曲ＤＢ作成処理と接続デバイス記録楽曲用音声認識辞書作成処理を示すフローチャートである。It is a flowchart which shows the music DB creation process for connection device recording music and the speech recognition dictionary creation process for connection device recording music which concern on embodiment of this invention. 本発明の実施形態に係る接続デバイス記録楽曲用楽曲ＤＢ作成処理と接続デバイス記録楽曲用音声認識辞書作成処理の処理例を示す図である。It is a figure which shows the process example of the music DB creation process for connection device recording music which concerns on embodiment of this invention, and the speech recognition dictionary creation process for connection device recording music. 本発明の実施形態に係る音声認識辞書編集処理と音声認識辞書修正処理を示すフローチャートである。It is a flowchart which shows the speech recognition dictionary edit process and speech recognition dictionary correction process which concern on embodiment of this invention. 本発明の実施形態に係るオーディオ再生装置の表示例を示す図である。It is a figure which shows the example of a display of the audio reproduction apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る音声認識辞書編集処理と音声認識辞書修正処理の処理例を示す図である。It is a figure which shows the process example of the speech recognition dictionary edit process and speech recognition dictionary correction process which concern on embodiment of this invention.

以下、本発明の実施形態をオーディオ再生装置への適用を例にとり説明する。
図１に、本実施形態に係るオーディオ再生装置の構成を示す。
図示するように、オーディオ再生装置は、マイクロフォン１、スピーカ２、入力装置３、表示装置４、ＨＤＤ５、ＣＤドライブ６、ポータブルオーディオプレイヤインタフェース７、音声合成エンジン８、音声認識エンジン９、オーディオ出力部１０、音声認識辞書編集部１１、リッピング処理部１２、楽曲管理部１３、再生制御部１４とを備えている。 Hereinafter, embodiments of the present invention will be described by taking application to an audio playback device as an example.
FIG. 1 shows the configuration of an audio playback apparatus according to this embodiment.
As shown in the figure, the audio playback device includes a microphone 1, a speaker 2, an input device 3, a display device 4, an HDD 5, a CD drive 6, a portable audio player interface 7, a speech synthesis engine 8, a speech recognition engine 9, and an audio output unit 10. , A voice recognition dictionary editing unit 11, a ripping processing unit 12, a music management unit 13, and a reproduction control unit 14.

ただし、このようなオーディオ再生装置は、ハードウエア的には、マイクロプロセッサや、メモリや、その他の周辺デバイスを有する一般的な構成を備えたコンピュータを利用して構成されるものであってよく、この場合、以上に示したオーディオ再生装置の音声合成エンジン８、音声認識エンジン９、オーディオ出力部１０、音声認識辞書編集部１１、リッピング処理部１２、楽曲管理部１３、再生制御部１４の各部もしくはその一部は、マイクロプロセッサが予めＨＤＤ５に記憶されたプログラムを実行することにより具現化するプロセスとして実現されるものであって良い。 However, such an audio playback device may be configured using a computer having a general configuration including a microprocessor, a memory, and other peripheral devices in terms of hardware. In this case, each part of the speech synthesis engine 8, speech recognition engine 9, audio output unit 10, speech recognition dictionary editing unit 11, ripping processing unit 12, music management unit 13, and playback control unit 14 of the audio playback device described above or A part of the process may be realized as a process realized by a microprocessor executing a program stored in the HDD 5 in advance.

さて、ここで、ＣＤドライブ６は、装着されたＣＤ-ＤＡディスク２０に記録された楽曲データの読み出しを行う。
また、ポータブルオーディオプレイヤインタフェース７には、ポータブルオーディオプレイヤ２１が選択的に接続される。ポータブルオーディオプレイヤ２１は、楽曲のオーディオファイルを記録していると共に、記録しているオーディオファイルの楽曲を再生する機能を備えた装置である。また、ポータブルオーディオプレイヤ２１は、記録している各オーディオファイルの楽曲の、当該楽曲が属するアルバム名や、当該楽曲のジャンル名や、当該楽曲のアーティスト名や、当該楽曲の曲名（タイトル名）などの属性情報を管理しており、ポータブルオーディオプレイヤインタフェース７に接続されているときに、オーディオ再生装置からの楽曲管理情報の転送要求に応答して、管理している属性情報や、各楽曲のオーディオファイルの識別子を、楽曲管理情報としてオーディオ再生装置に出力する機能を有する。また、ポータブルオーディオプレイヤインタフェース７に接続されているときに、オーディオ再生装置からの、オーディオファイルの識別子を指定した再生要求に応じてオーディオファイルの識別子で指定されたオーディオファイルを再生し、再生した信号/データを、オーディオ再生装置に出力する機能を備えている。 Now, the CD drive 6 reads out the music data recorded on the loaded CD-DA disc 20.
A portable audio player 21 is selectively connected to the portable audio player interface 7. The portable audio player 21 is an apparatus that records an audio file of a music piece and has a function of playing back the music piece of the recorded audio file. The portable audio player 21 also records the names of albums to which the music belongs, the genre name of the music, the artist name of the music, the music name (title name) of the music, etc. Attribute information, and when connected to the portable audio player interface 7, in response to a request for transfer of music management information from the audio playback device, the managed attribute information and the audio of each music It has a function of outputting the file identifier to the audio playback device as music management information. In addition, when connected to the portable audio player interface 7, in response to a playback request specifying an audio file identifier from the audio playback device, the audio file specified by the audio file identifier is played back, and the played back signal / Has the function of outputting data to an audio playback device.

また、ＨＤＤ５には、ＣＤＤＢ、ヨミ変換ルール、ＨＤＤ記録楽曲情報、接続デバイス記録楽曲情報、楽曲のオーディオデータを格納したオーディオファイルが記録されている。
そして、ＣＤＤＢには、市場に流通しているＣＤ-ＤＡディスク２０に記録されている各楽曲の、当該楽曲が属するアルバム名や、当該楽曲のジャンル名や、当該楽曲のアーティスト名や、当該楽曲の曲名（タイトル名）などの属性情報が蓄積されている。
また、ヨミ変換ルールは、音声合成エンジン８において、各種テキストから、当該テキストを読み上げた発音を表すヨミデータをどのように生成するかを規定するルールが格納されている。
また、ＨＤＤ記録楽曲情報と接続デバイス記録楽曲情報とは同様の構成を備えており、それぞれ、楽曲ＤＢと音声認識辞書とを含む。但し、ＨＤＤ記録楽曲情報の楽曲ＤＢは、ＨＤＤ５にオーディオファイルに記録されている楽曲を対象楽曲として作成され、接続デバイス記録楽曲情報の楽曲ＤＢは、ポータブルオーディオプレイヤインタフェース７に接続されているポータブルオーディオプレイヤ２１に記録されている楽曲を対象楽曲として作成される。また、ＨＤＤ記録楽曲情報の音声認識辞書は、ＨＤＤ記録楽曲情報の楽曲ＤＢに対応して設けられ、接続デバイス記録楽曲情報の音声認識辞書は、接続デバイス記録楽曲情報の楽曲ＤＢに対応して設けられる。 The HDD 5 stores an audio file storing CDDB, reading conversion rules, HDD recorded music information, connected device recorded music information, and audio data of the music.
In the CDDB, the name of the album to which the music belongs, the genre name of the music, the artist name of the music, and the music of each music recorded on the CD-DA disc 20 distributed in the market are stored. Attribute information such as the song name (title name) is stored.
Further, the rule for converting the reading is stored in the speech synthesis engine 8 so as to define how to generate reading data representing the pronunciation of the text read out from the various texts.
Further, the HDD recorded music information and the connected device recorded music information have the same configuration, and each includes a music DB and a voice recognition dictionary. However, the music DB of the HDD recorded music information is created with the music recorded in the audio file in the HDD 5 as the target music, and the music DB of the connected device recorded music information is the portable audio player connected to the portable audio player interface 7. The music recorded in the player 21 is created as the target music. The voice recognition dictionary of HDD recorded music information is provided corresponding to the music DB of HDD recorded music information, and the voice recognition dictionary of connected device recorded music information is provided corresponding to the music DB of connected device recorded music information. It is done.

以下、楽曲ＤＢと音声認識辞書の構成について説明する。
まず、楽曲ＤＢには、図２ａに示すように対象楽曲毎に、当該楽曲に対応して設けたエントリ（図の各行）を有し、各エントリには、対応する対象楽曲の当該楽曲テーブル内の識別子となる楽曲IDと、対応する対象楽曲の曲名と、対応する対象楽曲のアーティストのアーティスト名と、対応する対象楽曲のアルバムのアルバム名と、対応する対象楽曲のジャンルのジャンル名と、対象楽曲のオーディオファイルの識別子を表すファイル識別子とが登録されている。ここで、楽曲ＤＢの各エントリは、楽曲IDの小さいものがより前にくるように配置されている。 Hereinafter, the configuration of the music DB and the speech recognition dictionary will be described.
First, as shown in FIG. 2a, the music DB has an entry (each row in the figure) provided corresponding to the music for each target music, and each entry in the music table of the corresponding target music. The song ID that is the identifier of the song, the song name of the corresponding target song, the artist name of the artist of the corresponding target song, the album name of the album of the corresponding target song, the genre name of the genre of the corresponding target song, and the target A file identifier representing the identifier of the audio file of the music is registered. Here, each entry in the music DB is arranged so that the one with the smaller music ID comes in front.

次に、図２ｂに示すように、音声認識辞書は、連続するｎ個（図では５個）の楽曲ID毎に対応して作成され、各音声認識辞書には異なる辞書IDが連番で付与されている。
そして、各音声認識辞書は、対応するｎ個の楽曲IDの各々に対応するエントリを有し、各エントリには、対応する楽曲ID、対応する楽曲ＤＢに登録されている対応する楽曲IDのエントリに登録されている楽曲の曲名の発音を表すヨミデータが登録されている。ここで、ヨミデータの形式は任意でよいが、基本的には発音記号列と等価な内容を持つものとする。 Next, as shown in FIG. 2b, a voice recognition dictionary is created corresponding to each of n consecutive music IDs (five in the figure), and different dictionary IDs are assigned consecutive numbers to each voice recognition dictionary. Has been.
Each voice recognition dictionary has an entry corresponding to each of the corresponding n music IDs, and each entry includes an entry of a corresponding music ID and a corresponding music ID registered in the corresponding music DB. Yomi data representing the pronunciation of the song name of the song registered in is registered. Here, the format of the reading data may be arbitrary, but basically has contents equivalent to a phonetic symbol string.

以下、このようなＨＤＤ記録楽曲情報と接続デバイス記録楽曲情報の作成の動作について説明する。
まず、ＨＤＤ記録楽曲情報の作成動作について説明する。
リッピング処理部１２は、ユーザのリッピング指示を入力装置３を介して受け付けると、ＣＤドライブ６に装着されているＣＤ-ＤＡディスク２０に記録されている各楽曲の楽曲データを読み出して符号化し、楽曲毎にオーディオファイルとしてＨＤＤ５に記録する。また、この際に、ＣＤＤＢを参照して、ＨＤＤ５に記録するオーディオファイルの楽曲の曲名やアーティスト名やアルバム名やジャンル名などを算定し、オーディオファイルに、当該オーディオファイルの楽曲の属性情報として格納する。そして、楽曲管理部１３に、新規楽曲のリッピング発生を楽曲管理部１３に通知する。 Hereinafter, an operation of creating such HDD recorded music information and connected device recorded music information will be described.
First, an operation for creating HDD recorded music information will be described.
When the ripping processing unit 12 receives a user's ripping instruction via the input device 3, the ripping processing unit 12 reads and encodes the music data of each music recorded on the CD-DA disc 20 mounted on the CD drive 6, Every time, it is recorded in the HDD 5 as an audio file. At this time, the CDDB is referred to, and the song name, artist name, album name, genre name, etc. of the audio file recorded in the HDD 5 are calculated and stored in the audio file as the attribute information of the audio file song. To do. Then, the music management unit 13 is notified of the occurrence of ripping of a new music.

一方、楽曲管理部１３は、図３ａに示すＨＤＤ記録楽曲用楽曲ＤＢ作成処理において、リッピング処理部１２から新規楽曲のリッピング発生の通知を検出したならば（ステップ３０２）、ＨＤＤ記録楽曲情報の楽曲ＤＢに登録されている最終楽曲の楽曲IDを取得し、ｍとする（ステップ３０４）。ＨＤＤ記録楽曲情報の楽曲ＤＢに楽曲が登録されていない場合には、ｍ=０とする。 On the other hand, if the music management unit 13 detects a notification of ripping occurrence of a new music from the ripping processing unit 12 in the HDD recorded music DB creation process shown in FIG. The music ID of the last music registered in the DB is acquired and is set to m (step 304). If no music is registered in the music DB of the HDD recorded music information, m = 0 is set.

そして、「floor」を、実数 x に対して x 以下の最大の整数を返す関数である床関数として、
k=floor((m-1)/n)
p=1+n+(k×n)により、
pを求める（ステップ３０６）。ただし、ｎは、一つの音声認識辞書が対応する楽曲ID数である。 Then, let “floor” be a floor function that returns the largest integer less than or equal to x for a real number x.
k = floor ((m-1) / n)
p = 1 + n + (k × n)
p is obtained (step 306). However, n is the number of music IDs corresponding to one voice recognition dictionary.

そして、ＨＤＤ記録楽曲情報の楽曲ＤＢに、リッピング処理部１２によって新たなにＨＤＤ５に記録された各オーディオファイルに対応するエントリを新たに作成し、各エントリに、pから１ずつ増加する楽曲IDを登録すると共に、対応するオーディオファイルの属性情報が表す曲名やアーティスト名やアルバム名やジャンル名と、オーディオファイルの識別子とを登録する（ステップ３０８）。すなわち、ＨＤＤ記録楽曲情報の楽曲ＤＢに新たに作成したｒ番目のエントリには、p+r-1の楽曲IDと、リッピング処理部１２によって新たなにＨＤＤ５に記録されたｒ番目のオーディオファイルの属性情報とオーディオファイルの識別子とを登録する。 Then, a new entry corresponding to each audio file newly recorded in the HDD 5 by the ripping processor 12 is created in the song DB of the HDD recorded song information, and a song ID that increases by 1 from p is added to each entry. At the same time as registration, the song name, artist name, album name, genre name, and audio file identifier represented by the attribute information of the corresponding audio file are registered (step 308). That is, the r-th entry newly created in the song DB of the HDD-recorded song information includes the song ID of p + r−1 and the r-th audio file newly recorded on the HDD 5 by the ripping processor 12. Register attribute information and audio file identifier.

そして、音声認識辞書編集部１１に、ＨＤＤ記録楽曲情報の音声認識辞書作成を要求する（ステップ３１０）。
次に、ＨＤＤ記録楽曲情報の音声認識辞書作成を要求された、音声認識辞書編集部１１は、図３ｂに示すＨＤＤ記録楽曲用音声認識辞書作成処理を開始し、ＨＤＤ記録楽曲情報の楽曲ＤＢの、楽曲IDがp以降の各エントリから抽出した楽曲IDと曲名との組を、楽曲IDの順に登録した辞書元リストを作成する（ステップ３５２）。 Then, the voice recognition dictionary editing unit 11 is requested to create a voice recognition dictionary of HDD recorded music information (step 310).
Next, the voice recognition dictionary editing unit 11 requested to create the voice recognition dictionary of the HDD recorded music information starts the HDD recorded music voice recognition dictionary creation process shown in FIG. Then, a dictionary source list is created in which pairs of song IDs and song names extracted from the entries whose song IDs are p and later are registered in the order of song IDs (step 352).

そして、辞書元リストを、楽曲IDの順にｎ個の楽曲毎に分割した分割辞書元リストを作成し、作成した各分割辞書元リストに、登録されている楽曲IDの範囲が小さい順に、１から１ずつ増加する分割番号を与える（ステップ３５４）。すなわち、先頭に登録されている楽曲IDがｔ番目に大きい分割辞書元リストには、分割番号としてｔを与える。 Then, a divided dictionary source list is created by dividing the dictionary source list into n pieces of music in the order of the music IDs, and each of the created divided dictionary source lists starts from 1 in ascending order of the range of registered music IDs. A division number that is incremented by 1 is given (step 354). That is, t is given as the division number to the divided dictionary source list having the t-th largest music ID registered at the top.

そして、各分割辞書元リストの各々について、音声認識辞書を作成し、ＨＤＤ記録楽曲情報に格納する（ステップ３５６）。また、分割番号がｔの分割辞書元リストから作成した音声認識辞書には、
ｔ-1+floor((p+n-1)/n)を辞書IDとして付与する。 Then, for each of the divided dictionary source lists, a voice recognition dictionary is created and stored in the HDD recorded music information (step 356). In addition, the speech recognition dictionary created from the divided dictionary original list with the division number t includes
t-1 + floor ((p + n-1) / n) is assigned as a dictionary ID.

ここで、分割辞書元リストからの音声認識辞書の作成は、分割辞書元リストに基づく音声認識辞書の作成を音声合成エンジン８に要求することにより行う。音声合成エンジン８は、分割辞書元リストに基づく音声認識辞書の作成を要求されたならば、分割辞書元リスト中の各曲名を、ＨＤＤ５に格納されているヨミ変換ルールに従って生成した当該曲名の発音を表すヨミデータに置き換えた音声認識辞書を作成し、ＨＤＤ５に格納する。 Here, the creation of the speech recognition dictionary from the divided dictionary source list is performed by requesting the speech synthesis engine 8 to create a speech recognition dictionary based on the divided dictionary source list. When the speech synthesis engine 8 is requested to create a speech recognition dictionary based on the divided dictionary source list, the pronunciation of the song name generated in accordance with the yomi conversion rule stored in the HDD 5 is generated for each song name in the divided dictionary source list. A speech recognition dictionary replaced with reading data representing “” is created and stored in the HDD 5.

ここで、このようなＨＤＤ記録楽曲用楽曲ＤＢ作成処理と、ＨＤＤ記録楽曲用音声認識辞書作成処理の処理例を示す。
いま、一つの音声認識辞書が対応する楽曲数ｎが５であるとして、図４の４０１に示すように、ＨＤＤ記録楽曲情報の楽曲ＤＢに１から１２の楽曲IDが付与された１２個の楽曲のエントリが存在し、楽曲IDが１から５の楽曲の曲名のヨミデータが楽曲IDと対応づけて登録された辞書ID１の音声認識辞書と、楽曲IDが６から１０の楽曲の曲名のヨミデータが楽曲IDと対応づけて登録された辞書ID２の音声認識辞書と、楽曲IDが１１から１２の楽曲の曲名のヨミデータが楽曲IDと対応づけて登録された辞書ID３の音声認識辞書とがＨＤＤ記録楽曲情報に含まれているものとする。 Here, a processing example of such an HDD recorded music DB creation process and an HDD recorded music voice recognition dictionary creation process will be described.
Now, assuming that the number of songs n corresponding to one voice recognition dictionary is 5, as shown by 401 in FIG. 4, 12 songs with song IDs 1 to 12 assigned to the song DB of the HDD recorded song information. The voice recognition dictionary of the dictionary ID1 in which the Yomi data of the song names of the songs whose song IDs are 1 to 5 are registered in association with the song IDs, and the Yomi data of the song names of the songs whose song IDs are 6 to 10 are the songs. The HDD-recognized music information includes the voice recognition dictionary of the dictionary ID2 registered in association with the ID, and the voice recognition dictionary of the dictionary ID3 in which the Yomi data of the music titles of the music IDs 11 to 12 are registered in association with the music ID. It shall be included in

そして、この状態において、７つの楽曲のオーディオファイルがリッピング処理部１２によって、新たにＨＤＤ５に記録されたものとする。
この場合、まず、ＨＤＤ記録楽曲用楽曲ＤＢ作成処理によって、楽曲ＤＢに登録されている最終楽曲の楽曲IDは１２であるので、ｍ=１２として、
k=floor((m-1)/n))、p=1+n+(k×n)により、
p=１６が求められ（ステップ３０４、３０６）、矢印４０２によって示すように、楽曲ＤＢに楽曲IDが１６から２２の７つのエントリが新たに作成され、各エントリに新たにＨＤＤ５に記録された７個のオーディオファイルの属性情報とオーディオファイルの識別子とが各々登録される（ステップ３０８）。 In this state, it is assumed that audio files of seven music pieces are newly recorded on the HDD 5 by the ripping processing unit 12.
In this case, first, the music ID of the final music registered in the music DB by the music DB creation process for HDD recording music is 12, so m = 12.
k = floor ((m-1) / n)), p = 1 + n + (k × n)
p = 16 is obtained (steps 304 and 306), and as indicated by the arrow 402, seven entries having the song IDs 16 to 22 are newly created in the song DB, and 7 newly recorded in the HDD 5 in each entry. Each piece of audio file attribute information and audio file identifier is registered (step 308).

そして、次に、ＨＤＤ記録楽曲用音声認識辞書作成処理によって、矢印４０３に示すようにＨＤＤ記録楽曲情報の楽曲ＤＢの、楽曲IDが１６以降の各エントリから抽出した楽曲IDと曲名との組を、楽曲IDの順に登録した辞書元リストが作成される（ステップ３５２）。 Then, by the HDD recorded music voice recognition dictionary creation process, as shown by an arrow 403, a set of the music ID and the music title extracted from each entry of the music ID 16 or later in the music DB of the HDD recorded music information is shown. Then, a dictionary source list registered in the order of the music ID is created (step 352).

そして、矢印４０４に示すように、辞書元リストが、楽曲IDの順に５個の楽曲毎に分割され、楽曲IDが１６から２０の曲名が登録された分割番号１の分割辞書元リストと、楽曲IDが２１から２２の曲名が登録された分割番号２の分割辞書元リストが作成される（ステップ３５４）。 Then, as indicated by an arrow 404, the dictionary source list is divided into five music pieces in the order of the music IDs, and the divided dictionary source list of division number 1 in which the music titles 16 to 20 are registered, A division dictionary source list of division number 2 in which the song names with IDs 21 to 22 are registered is created (step 354).

そして、矢印４０５に示すように分割番号１の分割辞書元リストから、楽曲IDが１６から２０の曲名のヨミデータが楽曲IDと対応づけて登録された音声認識辞書が作成され、分割番号２の分割辞書元リストから、楽曲IDが２１から２２の曲名のヨミデータが楽曲IDと対応づけて登録された音声認識辞書が作成される（ステップ３５６）。 Then, as shown by an arrow 405, a voice recognition dictionary is created from the divided dictionary original list of division number 1 and the melody data of the song names 16 to 20 are registered in association with the song ID, and the division of division number 2 is created. From the dictionary source list, a speech recognition dictionary is created in which the reading data of song names 21 to 22 are registered in association with the song IDs (step 356).

ここで、p=１６であるので、分割番号ｔが１の分割辞書元リストから作成された音声認識辞書には、ｔ-1+floor((p+n-1)/n)に従って、辞書IDとして４が付与され、分割番号ｔが２の分割辞書元リストから作成された音声認識辞書には、ｔ-1+floor((p+n-1)/n)に従って、辞書IDとして５が付与される。 Here, since p = 16, the speech recognition dictionary created from the divided dictionary source list with the division number t is 1 has a dictionary ID according to t-1 + floor ((p + n-1) / n). 4 is assigned to the speech recognition dictionary created from the divided dictionary source list having the division number t of 2, and 5 is assigned as the dictionary ID according to t-1 + floor ((p + n-1) / n). Is done.

結果、リッピング処理によって新たに記録されたオーディオファイルの楽曲５曲毎に、当該楽曲の曲名の音声認識辞書が、既存の音声認識辞書に引き続く辞書IDが付与された形態で、ＨＤＤ記録楽曲情報に追加されたことになる。
次に、接続デバイス記録楽曲情報の作成の動作について説明する。
楽曲管理部１３は、図５ａに示す接続デバイス記録楽曲用楽曲ＤＢ作成処理において、ポータブルオーディオプレイヤインタフェース７へのポータブルオーディオプレイヤ２１の接続を監視し（ステップ５０２）、ポータブルオーディオプレイヤ２１の接続が発生したならば、接続されたポータブルオーディオプレイヤ２１が前回接続を検出したポータブルオーディオプレイヤ２１と同じポータブルオーディオプレイヤ２１であるかどうかをポータブルオーディオプレイヤ２１の識別情報に基づいて調べ（ステップ５０４）、前回接続を検出したポータブルオーディオプレイヤ２１と同じでなければステップ５０８に進む。 As a result, for every five songs of the audio file newly recorded by the ripping process, the voice recognition dictionary of the song title of the song is given a dictionary ID following the existing voice recognition dictionary in the HDD recorded song information. It has been added.
Next, the operation of creating connected device recorded music information will be described.
The music management unit 13 monitors the connection of the portable audio player 21 to the portable audio player interface 7 in the connection device recording music DB creation process shown in FIG. 5A (step 502), and the connection of the portable audio player 21 occurs. If so, whether or not the connected portable audio player 21 is the same portable audio player 21 as the portable audio player 21 that detected the previous connection is checked based on the identification information of the portable audio player 21 (step 504). If it is not the same as that of the portable audio player 21 that detected, the process proceeds to step 508.

一方、接続されたポータブルオーディオプレイヤ２１が前回接続を検出したポータブルオーディオプレイヤ２１と同じプレイヤであれば、ポータブルオーディオプレイヤ２１に記録されているオーディオファイルが前回の接続時以降に変更されているかどうかを調べ（ステップ５０６）、変更されていなければステップ５０２に戻り、変更されていればステップ５０８に進む。 On the other hand, if the connected portable audio player 21 is the same player as the portable audio player 21 that detected the previous connection, it is determined whether the audio file recorded in the portable audio player 21 has been changed since the previous connection. Inspect (step 506), if not changed, the process returns to step 502, and if changed, the process proceeds to step 508.

ここで、ポータブルオーディオプレイヤ２１に記録されているオーディオファイルが前回の接続時以降に変更されているかどうかは、ポータブルオーディオプレイヤ２１に楽曲管理情報の転送を要求することにより、ポータブルオーディオプレイヤ２１から楽曲管理情報を取得し、取得した楽曲管理情報と接続デバイス記録楽曲情報の楽曲ＤＢとの一致性を比較することにより行ってもよい。また、ポータブルオーディオプレイヤ２１からポータブルオーディオプレイヤ２１に記録されているオーディオファイルが最後に更新された日時の情報を取得できる場合には、この最後に更新された日時と、前回ポータブルオーディオプレイヤがオーディオ再生装置のポータブルオーディオプレイヤインタフェース７に接続された日時との比較により、ポータブルオーディオプレイヤ２１に記録されているオーディオファイルが前回の接続時以降に変更されているかどうかを判定するようにしてもよい。 Here, whether or not the audio file recorded in the portable audio player 21 has been changed since the last connection is determined by requesting the portable audio player 21 to transfer music management information, so that the music from the portable audio player 21 can be transmitted. The management information may be acquired, and the acquired music management information may be compared with the music DB of the connected device recorded music information. In addition, when information about the date and time when the audio file recorded in the portable audio player 21 was last updated can be acquired from the portable audio player 21, the date and time when the audio file was last updated and the previous portable audio player played the audio. It may be determined whether or not the audio file recorded in the portable audio player 21 has been changed since the previous connection by comparing with the date and time connected to the portable audio player interface 7 of the apparatus.

そして、ステップ５０４またはステップ５０６からステップ５０８に進んだ場合には、接続デバイス記録楽曲情報の楽曲ＤＢを消去する。
そして、ポータブルオーディオプレイヤ２１から楽曲管理情報を取得し、当該楽曲管理情報が属性情報とオーディオファイルの識別子を表す各楽曲に対応するエントリを備えた、新たな楽曲ＤＢを、接続デバイス記録楽曲情報に格納する（ステップ５１２）。ここで、新たに作成した楽曲ＤＢの各エントリには、１から１ずつ増加する楽曲IDを登録すると共に、対応する楽曲の属性情報が表す曲名やアーティスト名やアルバム名やジャンル名と、オーディオファイルの識別子を登録する。すなわち、接続デバイス記録楽曲情報の楽曲ＤＢに新たに作成したｒ番目のエントリには、ｒの楽曲IDと、ポータブルオーディオプレイヤ２１から取得した楽曲管理情報が示す、ポータブルオーディオプレイヤ２１に記録されている楽曲のうちのｒ番目の楽曲の属性情報とオーディオファイルの識別子とを登録する。 When the process proceeds from step 504 or step 506 to step 508, the music DB of the connected device recording music information is deleted.
Then, the music management information is acquired from the portable audio player 21, and the new music DB including entries corresponding to the music pieces for which the music management information represents the attribute information and the identifier of the audio file is used as the connection device recording music information. Store (step 512). Here, in each newly created song DB entry, a song ID that is incremented by 1 is registered, and the song name, artist name, album name, genre name, and audio file indicated by the attribute information of the corresponding song are recorded. Register the identifier. That is, the r-th entry newly created in the song DB of the connected device recorded song information is recorded in the portable audio player 21 indicated by the song ID of r and the song management information acquired from the portable audio player 21. The attribute information of the r-th music piece and the audio file identifier are registered.

そして、音声認識辞書編集部１１に、接続デバイス記録楽曲情報の音声認識辞書作成を要求する。
次に、接続デバイス記録楽曲情報の音声認識辞書作成を要求された、音声認識辞書編集部１１は、図５ｂに示す接続デバイス記録楽曲用音声認識辞書作成処理を開始し、まず、接続デバイス記録楽曲情報の音声認識辞書を全て消去する（ステップ５５２）。
そして、接続デバイス記録楽曲情報の楽曲ＤＢの各エントリから抽出した楽曲IDと曲名との組を、楽曲IDの順に登録した辞書元リストを作成する（ステップ５５４）。
そして、辞書元リストを、楽曲IDの順にｎ個の楽曲毎に分割した分割辞書元リストを作成し、作成した各分割辞書元リストに、先頭に登録されている楽曲IDの小さい順に、１から１ずつ増加する分割番号を与える（ステップ５５６）。すなわち、先頭に登録されている楽曲IDがｔ番目に大きい分割辞書元リストには、分割番号としてｔを与える。 Then, the voice recognition dictionary editing unit 11 is requested to create a voice recognition dictionary of the connected device recorded music information.
Next, the voice recognition dictionary editing unit 11 requested to create the voice recognition dictionary of the connected device recorded music information starts the connected device recorded music voice recognition dictionary creating process shown in FIG. All the speech recognition dictionaries of information are deleted (step 552).
Then, a dictionary source list is created in which pairs of song IDs and song names extracted from each entry in the song DB of the connected device recorded song information are registered in the order of song IDs (step 554).
Then, a divided dictionary source list is created by dividing the dictionary source list into n pieces of music in the order of the music IDs, and each divided dictionary source list is created in ascending order of the music IDs registered at the beginning. A division number increasing by 1 is given (step 556). That is, t is given as the division number to the divided dictionary source list having the t-th largest music ID registered at the top.

そして、各分割辞書元リストの各々について、音声認識辞書を作成し、ＨＤＤ記録楽曲情報に格納する（ステップ５５８）。また、分割番号がｔの分割辞書元リストから作成した音声認識辞書には、ｔを辞書IDとして付与する。
ここで、分割辞書元リストからの音声認識辞書の作成は、上述のように、分割辞書元リストに基づく音声認識辞書の作成を音声合成エンジン８に要求することにより行う。音声合成エンジン８は、分割辞書元リストに基づく音声認識辞書の作成を要求されたならば、分割辞書元リスト中の各曲名を、ＨＤＤ５に格納されているヨミ変換ルールに従って生成した当該曲名の発音を表すヨミデータに置き換えた音声認識辞書を作成し、ＨＤＤ５に格納する。 Then, for each of the divided dictionary source lists, a voice recognition dictionary is created and stored in the HDD recorded music information (step 558). Further, t is assigned as a dictionary ID to the speech recognition dictionary created from the divided dictionary original list with the division number t.
Here, the speech recognition dictionary is created from the divided dictionary source list by requesting the speech synthesis engine 8 to create a speech recognition dictionary based on the divided dictionary source list as described above. When the speech synthesis engine 8 is requested to create a speech recognition dictionary based on the divided dictionary source list, the pronunciation of the song name generated in accordance with the yomi conversion rule stored in the HDD 5 is generated for each song name in the divided dictionary source list. A speech recognition dictionary replaced with reading data representing “” is created and stored in the HDD 5.

ここで、このような接続デバイス記録楽曲用楽曲ＤＢ作成処理と、接続デバイス記録楽曲用音声認識辞書作成処理の処理例を示す。
いま、一つの音声認識辞書が対応する楽曲数ｎが５であるとして、図６に示すように、１２個のオーディオファイルが記録されているポータブルオーディオプレイヤ２１がオーディオ再生装置に接続され、当該ポータブルオーディオプレイヤ２１は前回オーディオ再生装置に接続されたポータブルオーディオプレイヤ２１と異なるか、前回の接続時以降に、記録されているオーディオファイルが更新されたポータブルオーディオプレイヤ２１であるものとする。 Here, a processing example of such a connected device recording music tune DB creation process and a connected device recorded music speech recognition dictionary creation process will be shown.
Now, assuming that the number of songs n corresponding to one voice recognition dictionary is 5, as shown in FIG. 6, a portable audio player 21 in which 12 audio files are recorded is connected to an audio playback device, and the portable audio player 21 is connected to the portable audio player 21. It is assumed that the audio player 21 is different from the portable audio player 21 connected to the previous audio playback device or is the portable audio player 21 in which the recorded audio file has been updated since the previous connection.

この場合、まず、ＨＤＤ記録楽曲用楽曲ＤＢ作成処理によって、既存の楽曲ＤＢが消去された後に、矢印６０１によって示すように、楽曲IDが１から１２のエントリを有する楽曲ＤＢが新たに作成され、各エントリに新たにポータブルオーディオプレイヤ２１に記録されている１２個のオーディオファイルの属性情報とオーディオファイルの識別子とが各々登録される（ステップ５１０）。 In this case, first, after the existing music DB is deleted by the HDD recorded music DB creation processing, a music DB having entries with music IDs 1 to 12 is newly created as indicated by an arrow 601. The attribute information of the 12 audio files and the identifier of the audio file newly recorded in the portable audio player 21 are registered in each entry (step 510).

そして、次に、接続デバイス記録楽曲用音声認識辞書作成処理によって、既存の音声認識辞書が消去された後に、矢印６０２に示すように接続デバイス記録楽曲情報の楽曲ＤＢの各エントリから抽出した楽曲IDと曲名との組を、楽曲IDの順に登録した辞書元リストが作成される（ステップ５５４）。 Then, after the existing voice recognition dictionary is deleted by the voice recognition dictionary creation process for connected device recorded music, the song ID extracted from each entry in the music DB of the connected device recorded music information as indicated by an arrow 602 A dictionary source list in which pairs of song names and song names are registered in the order of song IDs is created (step 554).

そして、矢印６０３に示すように、辞書元リストが、楽曲IDの順に５個の楽曲毎に分割され、楽曲IDが１から５の曲名が登録された分割番号１の分割辞書元リストと、楽曲IDが６から１０の曲名が登録された分割番号２の分割辞書元リストと、楽曲IDが１１から１２の曲名が登録された分割番号３の分割辞書元リストが作成される（ステップ５５６）。 Then, as indicated by an arrow 603, the dictionary source list is divided into five music pieces in the order of the music IDs, and the divided dictionary source list of division number 1 in which the music titles of music IDs 1 to 5 are registered, A division dictionary source list with division number 2 in which the song names with IDs 6 to 10 are registered and a division dictionary source list with division number 3 in which the song names with song IDs 11 to 12 are registered are created (step 556).

そして、矢印６０４に示すように分割番号１の分割辞書元リストから、楽曲IDが１から５の曲名のヨミデータが楽曲IDと対応づけて登録された、辞書IDが１の音声認識辞書が作成され、分割番号２の分割辞書元リストから、楽曲IDが６から１０の曲名のヨミデータが楽曲IDと対応づけて登録された、辞書IDが２の音声認識辞書が作成され、分割番号３の分割辞書元リストから、楽曲IDが１１から１２の曲名のヨミデータが楽曲IDと対応づけて登録された、辞書IDが３の音声認識辞書が作成される（ステップ５５８）。 Then, as shown by the arrow 604, a voice recognition dictionary with a dictionary ID of 1 is created from the divided dictionary source list with a division number of 1, in which Yomi data of the song names of the song IDs 1 to 5 are registered in association with the song ID. A voice recognition dictionary with a dictionary ID of 2 is created from the divided dictionary source list with a division number of 2 and the Yomi data with the song names of 6 to 10 are registered in association with the song ID. From the original list, a voice recognition dictionary with a dictionary ID of 3 is created, in which Yomi data with song names of 11 to 12 are registered in association with the song ID (step 558).

結果、ポータブルオーディオプレイヤ２１に記録されているオーディオファイルの楽曲５曲毎に、当該楽曲の曲名の音声認識辞書が、１からの連番の辞書IDが付与された形態で、接続デバイス記録楽曲情報に作成されたことになる。
以上、ＨＤＤ記録楽曲情報と接続デバイス記録楽曲情報の作成の動作について説明した。
次に、再生制御部１４が行う再生制御処理について説明する。
再生制御部１４は、入力装置３を介してユーザからＨＤＤ記録楽曲の再生を指示されると、ＨＤＤ再生モードを設定し、ユーザの操作に応じてＨＤＤ５に記録されているオーディオファイルの再生を制御する。ここで、オーディオファイルの再生は、再生するオーディオファイルを再生対象オーディオファイルとしてオーディオ出力部１０に指示することにより行う。オーディオ出力部１０は、再生対象オーディオファイルとして指示されたオーディオファイルをＨＤＤ５から読み出して復号しスピーカ２に出力する。 As a result, for every five songs of the audio file recorded in the portable audio player 21, the connected device recording song information in a form in which the voice recognition dictionary of the song name is assigned with the consecutive dictionary ID from 1. It will be created.
The operation for creating HDD recorded music information and connected device recorded music information has been described above.
Next, the reproduction control process performed by the reproduction control unit 14 will be described.
When the user gives an instruction to play back the music recorded on the HDD via the input device 3, the playback control unit 14 sets the HDD playback mode and controls the playback of the audio file recorded on the HDD 5 according to the user's operation. To do. Here, the playback of the audio file is performed by instructing the audio output unit 10 to play the audio file to be played back as an audio file to be played back. The audio output unit 10 reads out the audio file designated as the reproduction target audio file from the HDD 5, decodes it, and outputs it to the speaker 2.

また、再生制御部１４は、ＨＤＤ再生モードを設定したならば、ＨＤＤ記録楽曲情報に含まれる全ての音声認識辞書を使用音声認識辞書として音声認識エンジン９に設定する。そして、音声認識エンジン９は、使用音声認識辞書として設定された各音声認識辞書を用いて、マイクロフォン１から入力する音声の音声認識を行い、入力音声にマッチするヨミデータと対応づけられて、いずれかの使用音声認識辞書に登録されている楽曲IDを再生制御部１４に通知する。そして、再生制御部１４は、音声認識エンジン９から楽曲IDが通知されたならば、ＨＤＤ記録楽曲情報の楽曲ＤＢの、通知された楽曲IDが登録されたエントリのファイル識別子が示すオーディオファイルを再生対象オーディオファイルとしてオーディオ出力部１０に設定することにより、当該楽曲IDのオーディオファイル、すなわち、ユーザが発話した曲名の楽曲の再生とスピーカ２への出力を行う。 Further, when the HDD playback mode is set, the playback control unit 14 sets all the voice recognition dictionaries included in the HDD recorded music information to the voice recognition engine 9 as used voice recognition dictionaries. Then, the voice recognition engine 9 performs voice recognition of the voice input from the microphone 1 using each voice recognition dictionary set as the use voice recognition dictionary, and is associated with the reading data that matches the input voice. The reproduction control unit 14 is notified of the music ID registered in the used voice recognition dictionary. Then, when the music ID is notified from the voice recognition engine 9, the playback control unit 14 plays the audio file indicated by the file identifier of the entry in which the notified music ID is registered in the music DB of the HDD recorded music information. By setting the audio output unit 10 as the target audio file, the audio file having the music ID, that is, the music having the song name spoken by the user is reproduced and output to the speaker 2.

次に、再生制御部１４は、ポータブルオーディオプレイヤ２１が接続されているときに、入力装置３を介してユーザから接続デバイス記録楽曲の再生を指示されると、接続デバイス再生モードを設定し、ユーザの操作に応じてポータブルオーディオプレイヤ２１に記録されているオーディオファイルの再生を制御する。ここで、ポータブルオーディオプレイヤ２１の再生は、再生するオーディオファイルの識別子を指定した再生要求をポータブルオーディオプレイヤ２１に発行して、ポータブルオーディオプレイヤ２１に、当該オーディオファイルの再生と、再生した信号/データの出力をおこなわせると共に、オーディオ出力部１０にポータブルオーディオプレイヤ２１から出力された信号/データの表す音声をスピーカ２に出力させることにより行う。 Next, when the portable audio player 21 is connected, the playback control unit 14 sets the connected device playback mode when instructed to play the connected device recorded music by the user via the input device 3. The playback of the audio file recorded in the portable audio player 21 is controlled in accordance with the operation. Here, the reproduction of the portable audio player 21 issues a reproduction request designating the identifier of the audio file to be reproduced to the portable audio player 21, and the portable audio player 21 reproduces the audio file and the reproduced signal / data. This is performed by causing the audio output unit 10 to output the sound represented by the signal / data output from the portable audio player 21 to the speaker 2.

また、再生制御部１４は、接続デバイス再生モードを設定したならば、接続デバイス記録楽曲情報に含まれる全ての音声認識辞書を使用音声認識辞書として音声認識エンジン９に設定する。そして、音声認識エンジン９は、使用音声認識辞書として設定された各音声認識辞書を用いて、マイクロフォン１から入力する音声の音声認識を行い、入力音声にマッチするヨミデータと対応づけられて、いずれかの使用音声認識辞書に登録されている楽曲IDを再生制御部１４に通知する。そして、再生制御部１４は、音声認識エンジン９から楽曲IDが通知されたならば、接続デバイス記録楽曲情報の楽曲ＤＢの、通知された楽曲IDが登録されたエントリのファイル識別子が示すオーディオファイルを再生するオーディオファイルとし、再生するオーディオファイルの識別子を指定した再生要求をポータブルオーディオプレイヤ２１に発行することにより、当該楽曲IDのオーディオファイル、すなわち、ユーザが発話した曲名の楽曲の再生とスピーカ２への出力を行う。 In addition, when the connected device playback mode is set, the playback control unit 14 sets all the voice recognition dictionaries included in the connected device recorded music information as the use voice recognition dictionary in the voice recognition engine 9. Then, the voice recognition engine 9 performs voice recognition of the voice input from the microphone 1 using each voice recognition dictionary set as the use voice recognition dictionary, and is associated with the reading data that matches the input voice. The reproduction control unit 14 is notified of the music ID registered in the used voice recognition dictionary. Then, when the music ID is notified from the voice recognition engine 9, the reproduction control unit 14 selects the audio file indicated by the file identifier of the entry in which the notified music ID is registered in the music DB of the connected device recording music information. By issuing a playback request designating the identifier of the audio file to be played back to the portable audio player 21 as a playback audio file, playback of the audio file with the song ID, that is, the song with the song name spoken by the user, and the speaker 2 is performed. Is output.

次に、オーディオ再生装置において行う音声認識辞書の編集動作について説明する。
音声認識辞書編集部１１は、入力装置３を介してユーザから音声認識辞書の編集要求を受け付けると、図７ａに示す音声認識辞書編集処理を実行する。
図示するように、この音声認識辞書編集処理では、まず、ヨミデータを修正する曲名の指定と、当該曲名の新たなヨミデータの入力を受け付けて、指定された曲名に対するヨミデータとして、入力されたヨミデータが優先的に用いられるように、ＨＤＤ５のヨミ変換ルールを修正する（ステップ７０２）。 Next, the speech recognition dictionary editing operation performed in the audio playback device will be described.
When the speech recognition dictionary editing unit 11 receives a speech recognition dictionary editing request from the user via the input device 3, the speech recognition dictionary editing unit 11 executes the speech recognition dictionary editing process illustrated in FIG.
As shown in the drawing, in this speech recognition dictionary editing process, first, designation of a song name for correcting the song data and input of new song data for the song name are accepted, and the entered song data is given priority as the reading data for the designated song name. The reading conversion rule of the HDD 5 is corrected so as to be used for the purpose (step 702).

ここで、このステップ７０２は、たとえば、次のように行う。
すなわち、音声認識辞書編集部１１は、図８ａに示すようなヨミ変更曲名選択ウインドウを表示装置４に表示する。図示するように、ヨミ変更曲名選択ウインドウには、ＨＤＤ記録楽曲情報の楽曲ＤＢと接続デバイス記録楽曲情報の楽曲ＤＢのいずれかに登録されている曲名の一覧８０１を表示し、一覧８０１上で曲名の選択を受け付ける。そして、「ヨミ変更」ボタン８０２が操作されたならば、図８ｂに示すようなヨミ変更ウインドウを表示し、ヨミ変更曲名選択ウインドウの一覧８０１上で選択された曲名を表示８１１すると共に、入力ボックス８１２への当該曲名の新たな発音を指定するヨミデータの入力を受け付ける。そして、ヨミ変更ウインドウの「確認ボタン」８１３が操作されたならば、入力ボックス８１２に入力されているヨミデータを伴う音声出力を音声合成エンジン８に指示する。音声合成エンジン８は、音声出力を指示されたならば、当該指示に伴うヨミデータが表す合成音声を生成し、オーディオ出力部１０を介してスピーカ２に出力する。 Here, this step 702 is performed as follows, for example.
That is, the speech recognition dictionary editing unit 11 displays a reading change song name selection window as shown in FIG. As shown in the drawing, the Yomi change song name selection window displays a list 801 of song names registered in either the song DB of HDD recorded song information or the song DB of connected device recorded song information, and the song name on the list 801 is displayed. Accept the selection. Then, if the “Change Yomi” button 802 is operated, a Yomi change window as shown in FIG. 8B is displayed, and the song names selected on the Yomi change song name selection window list 801 are displayed 811 and an input box is displayed. The input of reading data for designating a new pronunciation of the song name to 812 is accepted. When the “confirmation button” 813 of the reading change window is operated, the voice synthesis engine 8 is instructed to output voice accompanied with reading data input to the input box 812. When the voice synthesis engine 8 is instructed to output the voice, the voice synthesis engine 8 generates a synthesized voice represented by the reading data according to the instruction and outputs the synthesized voice to the speaker 2 via the audio output unit 10.

また、ヨミ変更ウインドウの「変更」ボタン８１４が操作されたならば、音声認識辞書編集部１１は、ヨミ変更曲名選択ウインドウの一覧７０１上で選択された曲名に対するヨミデータを、入力ボックス８１２に入力されているヨミデータに変更する。すなわち、ヨミ変更曲名選択ウインドウの一覧７０１上で選択された曲名に対するヨミデータとして、入力ボックス８１２に入力されているヨミデータが優先的に用いられるように、ＨＤＤ５のヨミ変換ルールを修正する。 If the “change” button 814 of the reading change window is operated, the speech recognition dictionary editing unit 11 inputs the reading data for the song name selected on the list 701 of the reading change song name selection window 701 into the input box 812. Change to the Yomi data. That is, the reading conversion rule of the HDD 5 is corrected so that the reading data input to the input box 812 is preferentially used as reading data for the song name selected on the list 701 of the reading change song name selection window.

さて、図７ａに戻り、このようにしてヨミ変換ルールを修正したならば、ＨＤＤ記録楽曲情報を対象楽曲情報として（ステップ７０４）、音声認識辞書修正処理を実行する（ステップ７０６）。そして、次に、接続デバイス記録楽曲情報を対象楽曲情報として（ステップ７０８）、音声認識辞書修正処理を実行する（ステップ７１０）。 Now, returning to FIG. 7a, if the reading conversion rule is corrected in this way, the HDD-recorded music information is set as the target music information (step 704), and the speech recognition dictionary correcting process is executed (step 706). Then, the connected device recording music information is set as the target music information (step 708), and the speech recognition dictionary correction process is executed (step 710).

図７ｂに、ステップ７０６、７１０で行う音声認識辞書修正処理の手順を示す。
図示するように、この音声認識辞書修正処理では、まず、対象楽曲情報の楽曲ＤＢのエントリのうちの、音声認識辞書編集処理のステップ７０２でヨミデータが変更された曲名と同じ曲名が登録されているエントリに登録されている楽曲IDを全て抽出する（ステップ７５２）。 FIG. 7 b shows the procedure of the speech recognition dictionary correction process performed in steps 706 and 710.
As shown in the figure, in this speech recognition dictionary correction process, first, the same song name as the song name whose Yomi data was changed in step 702 of the speech recognition dictionary editing process is registered in the entry of the song DB of the target song information. All music IDs registered in the entry are extracted (step 752).

そして、抽出した各楽曲IDの各々について（ステップ７５４、７６２、７６４）、ステップ７５６-７６０の処理を行う。
ここで、ステップ７５６-７６０では、抽出した楽曲IDの値をｑとして（ステップ７５６）、辞書IDがfloor((q+n-1)/n)の音声認識辞書を、当該音声認識辞書として既に設定されていない場合には（ステップ７５８）、再作成音声認識辞書に設定する（ステップ７６０）処理を行う。 Then, for each extracted music ID (steps 754, 762, 764), the processing of steps 756-760 is performed.
Here, in steps 756-760, the extracted music ID value is set to q (step 756), and the speech recognition dictionary whose dictionary ID is floor ((q + n-1) / n) is already set as the speech recognition dictionary. If not set (step 758), the re-created speech recognition dictionary is set (step 760).

次に、以上のようにしてステップ７５２で抽出した各楽曲IDについてステップ７５６-７６０の処理を終了したならば、ステップ７６０で、再作成対象辞書とした音声認識辞書の各々について（ステップ７６６、７７６、７７８）、ステップ７６８-７７４の処理を行う。 Next, when the processing in steps 756-760 is completed for each music ID extracted in step 752 as described above, in step 760, each of the speech recognition dictionaries that are re-created dictionaries (steps 766, 776). 778) and steps 768-774.

ここで、ステップ７６８-７７４では、再作成対象辞書とした音声認識辞書の辞書IDをｇとして、対象楽曲情報の辞書IDがｇの音声認識辞書を消去する（ステップ７７０）。そして、対象楽曲情報の楽曲ＤＢの、1-n+（ｇ×n）からｇ×nまでの楽曲IDが登録されている各エントリの楽曲IDと曲名を登録した辞書元リストを作成し（ステップ７７２）、作成した辞書元リストから辞書IDがｇの音声認識辞書を作成し、対象楽曲情報に格納する（ステップ７７４）。ここで、辞書元リストからの音声認識辞書の作成は、辞書元リストに基づく音声認識辞書の作成を音声合成エンジン８に要求することにより行う。音声合成エンジン８は、辞書元リストに基づく音声認識辞書の作成を要求されたならば、辞書元リスト中の各曲名を、ＨＤＤ５に格納されているヨミ変換ルールに従って生成した当該曲名の発音を表すヨミデータに置き換えた音声認識辞書を作成し、ＨＤＤ５の対象楽曲情報に格納する。 Here, in Steps 768-774, the dictionary ID of the speech recognition dictionary as the re-creation target dictionary is set as g, and the speech recognition dictionary whose dictionary ID of the target music information is g is deleted (Step 770). Then, a dictionary source list in which the song ID and the song name of each entry in which the song IDs from 1-n + (g × n) to g × n are registered in the song DB of the target song information is created (step 772). ), A speech recognition dictionary with a dictionary ID g is created from the created dictionary source list and stored in the target music information (step 774). Here, the speech recognition dictionary is created from the dictionary source list by requesting the speech synthesis engine 8 to create a speech recognition dictionary based on the dictionary source list. When the speech synthesis engine 8 is requested to create a speech recognition dictionary based on the dictionary source list, the speech synthesis engine 8 represents the pronunciation of the song name generated in accordance with the yomi conversion rule stored in the HDD 5 for each song name in the dictionary source list. A voice recognition dictionary replaced with reading data is created and stored in the target music information of the HDD 5.

ここで、以上のような音声認識辞書編集処理の処理例を示す。
いま、一つの音声認識辞書が対応する楽曲数ｎが５であるとして、図９ａの９０１に示すように、ＨＤＤ記録楽曲情報の楽曲ＤＢに１から１２の楽曲IDが付与された１２個の楽曲のエントリが存在し、楽曲IDが１から５の楽曲の曲名のヨミデータが楽曲IDと対応づけて登録された辞書ID１の音声認識辞書と、楽曲IDが６から１０の楽曲の曲名のヨミデータが楽曲IDと対応づけて登録された辞書ID２の音声認識辞書と、楽曲IDが１１から１２の楽曲の曲名のヨミデータが楽曲IDと対応づけて登録された辞書ID３の音声認識辞書とがＨＤＤ記録楽曲情報に含まれているものとする。 Here, a processing example of the speech recognition dictionary editing process as described above is shown.
Now, assuming that the number of songs n corresponding to one voice recognition dictionary is 5, as shown by 901 in FIG. 9A, 12 songs with 1 to 12 song IDs assigned to the song DB of the HDD recorded song information. The voice recognition dictionary of the dictionary ID1 in which the Yomi data of the song names of the songs whose song IDs are 1 to 5 are registered in association with the song IDs, and the Yomi data of the song names of the songs whose song IDs are 6 to 10 are the songs. The HDD-recognized music information includes the voice recognition dictionary of the dictionary ID2 registered in association with the ID, and the voice recognition dictionary of the dictionary ID3 in which the Yomi data of the music titles of the music IDs 11 to 12 are registered in association with the music ID. It shall be included in

そして、このときに、ユーザによって、曲名「sade」のヨミデータが、「seydo」から「shaaday」に変更されたものとする。
すると、ヨミ変換ルールは、「sade」のヨミデータとして「shaaday」を優先して用いるように修正される（ステップ７０２）。
そして、楽曲ＤＢの曲名「sade」が登録されているエントリの楽曲IDとして、２と１１が抽出される（ステップ７５２）。また、抽出した楽曲IDの２に対して２をｑとしてfloor((q+n-1)/n)により辞書IDが１の音声認識辞書が再作成音声認識辞書に設定され、抽出した楽曲IDの１１に対して１１をｑとしてfloor((q+n-1)/n)により辞書IDが３の音声認識辞書が再作成音声認識辞書に設定される（ステップ７６０）。 At this time, it is assumed that the reading of the song name “sade” is changed from “seydo” to “shaaday” by the user.
Then, the reading conversion rule is modified so that “shaaday” is preferentially used as reading data for “sade” (step 702).
Then, 2 and 11 are extracted as the song IDs of the entries in which the song name “sade” in the song DB is registered (step 752). A voice recognition dictionary with a dictionary ID of 1 is set as the re-created voice recognition dictionary using floor ((q + n-1) / n) where 2 is q for the extracted music ID of 2, and the extracted music ID is 11 is set to 11 as q and floor ((q + n-1) / n) sets the speech recognition dictionary with dictionary ID 3 to the recreated speech recognition dictionary (step 760).

そして、再作成音声認識辞書に設定された辞書IDが１と３の音声認識辞書が消去される（ステップ７７０）。
そして、次に、再作成音声認識辞書の辞書IDの１をｇとして、楽曲ＤＢの1-n+（ｇ×n）=１からｇ×n=５までのエントリの楽曲IDと曲名を登録した辞書元リストが矢印９０２に示すように作成され（ステップ７７２）、この辞書元リストから、矢印９０４に示すように、辞書IDが１の新たな音声認識辞書が作成される（ステップ７７４）。このとき、この新たな認識辞書は、ステップ７０２で修正された後のヨミ変換ルールに従ってヨミデータが作成されるので、曲名の「sade」の楽曲IDの２に対するヨミデータとしては「shaaday」が登録されたものとなる。 Then, the speech recognition dictionaries with dictionary IDs 1 and 3 set in the re-created speech recognition dictionary are deleted (step 770).
Next, the dictionary ID 1 of the re-created speech recognition dictionary is set as g, and a dictionary in which song IDs and song names of entries from 1-n + (g × n) = 1 to g × n = 5 in the song DB are registered. An original list is created as indicated by an arrow 902 (step 772), and a new speech recognition dictionary having a dictionary ID of 1 is created from the dictionary original list as indicated by an arrow 904 (step 774). At this time, the new recognition dictionary is created in accordance with the Yomi conversion rule after being corrected in Step 702, so “shaaday” is registered as the Yomi data for the song ID “2” of the song name “sade”. It will be a thing.

また、もう一つの再作成音声認識辞書の辞書IDの３についても同様に処理が行われ、辞書IDの３をｇとして、楽曲ＤＢの1-n+（ｇ×n）=１１からｇ×n=１５までのエントリの楽曲IDと曲名を登録した辞書元リストが矢印９０３に示すように作成され、この辞書元リストから、矢印９０５に示すように、辞書IDが３の新たな音声認識辞書が作成される。また、このとき、この新たな認識辞書は、ステップ７０２で修正された後のヨミ変換ルールに従ってヨミデータが作成されるので、曲名の「sade」の楽曲IDの１１に対するヨミデータとしては「shaaday」が登録されたものとなる。 The same processing is performed for the dictionary ID 3 of another re-created speech recognition dictionary, and the dictionary ID 3 is g, and 1-n + (g × n) = 11 to g × n = of the music DB. A dictionary source list in which song IDs and song names of up to 15 entries are registered is created as indicated by an arrow 903, and a new speech recognition dictionary with a dictionary ID of 3 is created from this dictionary source list as indicated by an arrow 905. Is done. At this time, the new recognition dictionary is created according to the Yomi conversion rule after being corrected in Step 702. Therefore, “shaaday” is registered as the Yomi data for the song ID 11 of the song name “sade”. Will be.

結果、ＨＤＤ記録楽曲情報の音声認識辞書のうちの、「sade」の曲名の楽曲の楽曲IDを含む音声認識辞書のみを更新することにより、各音声認識辞書中の「sade」のヨミデータの全てが「shaaday」に修正されたことになる。
なお、図９では、ＨＤＤ記録楽曲情報の音声認識辞書を編集する場合について示したが、接続デバイス記録楽曲情報の音声認識の辞書修正も同様に行われ、接続デバイス記録楽曲情報の音声認識辞書のうちの「sade」の曲名の楽曲の楽曲IDを含む音声認識辞書のみを更新することにより、全ての「sade」のヨミデータが「shaaday」に修正されることになる。 As a result, by updating only the voice recognition dictionary including the song ID of the song having the song name “sade” in the voice recognition dictionary of the HDD recorded song information, all the reading data of “sade” in each voice recognition dictionary is obtained. It has been corrected to “shaaday”.
Although FIG. 9 shows the case of editing the voice recognition dictionary of the HDD recorded music information, the dictionary correction of the voice recognition of the connected device recorded music information is performed in the same manner, and the voice recognition dictionary of the connected device recorded music information is changed. By updating only the voice recognition dictionary including the song ID of the song with the song name “sade”, all the “sade” reading data is corrected to “shaaday”.

以上、本発明の実施形態について説明した。
ところで、以上の実施形態では、音声認識辞書編集処理のステップ７０２で特定の曲名のヨミデータの変更を受け付け、音声認識辞書修正処理で、各記録楽曲情報の楽曲ＤＢのエントリのうちの、音声認識辞書編集処理のステップ７０２でヨミデータが変更された曲名と同じ曲名が登録されているエントリに登録されている楽曲IDを全て抽出し（ステップ７５２）、抽出した楽曲IDが登録されている音声認識辞書を新たに作成した音声認識辞書に置き換える（ステップ７５６-７６０）処理を行ったが、このような処理に代えて以下の処理を行うようにしてもよい。 The embodiment of the present invention has been described above.
By the way, in the above embodiment, the voice recognition dictionary editing process accepts the change of the reading data of the specific music title at step 702, and the voice recognition dictionary correction process corrects the voice recognition dictionary among the entries in the music DB of each recorded music information. All the song IDs registered in the entry in which the same song name as the song name whose Yomi data was changed in step 702 of the editing process are extracted (step 752), and the voice recognition dictionary in which the extracted song ID is registered is extracted. The process of replacing the newly created voice recognition dictionary (steps 756-760) has been performed, but the following process may be performed instead of such a process.

すなわち、音声認識処理のステップ７０２で特定の記録楽曲情報の特定の楽曲IDの楽曲の曲名のヨミデータの変更を受け付け、音声認識辞書修正処理では、当該特定の記録楽曲情報の、当該特定の楽曲IDが登録されている音声認識辞書のみを新たに作成した音声認識辞書に置き換えるようにしてもよい。 That is, in step 702 of the voice recognition process, a change in the reading of the song name of the song with the specific song ID of the specific recorded song information is accepted, and in the voice recognition dictionary correction process, the specific song ID of the particular recorded song information is received. Only the speech recognition dictionary in which is registered may be replaced with a newly created speech recognition dictionary.

また、以上の実施形態においては、ユーザからＨＤＤ記録楽曲情報の特定の楽曲IDの楽曲の消去を受け付け、消去を受け付けた楽曲IDの楽曲のオーディオファイルをＨＤＤ５から消去すると共に、ＨＤＤ記録楽曲情報の楽曲ＤＢから当該特定の楽曲IDのエントリを消去するようにしてもよい。また、この場合には、当該特定の楽曲IDのエントリを有する音声認識辞書を、当該特定の楽曲IDのエントリを消去後の楽曲ＤＢに基づいて作成した音声認識辞書に置き換えるようにする。すなわち、消去した楽曲の楽曲IDが、ｑであれば、楽曲ＤＢから楽曲IDがｑのエントリを消去し、g=floor((q+n-1)/n)の音声認識辞書を消去し楽曲情報の楽曲ＤＢの、1-n+（ｇ×n）からｇ×nまでの楽曲IDが登録されている各エントリの楽曲IDと曲名を登録した辞書元リストを作成し、作成した辞書元リストから辞書IDがｇの音声認識辞書を作成し、ＨＤＤ記録楽曲情報に格納する。 Further, in the above embodiment, deletion of a song having a specific song ID in the HDD-recorded song information is accepted from the user, and the audio file of the song with the song ID that has been accepted to be erased is deleted from the HDD 5. You may make it delete the entry of the said specific music ID from music DB. In this case, the voice recognition dictionary having the entry of the specific music ID is replaced with the voice recognition dictionary created based on the music DB after deleting the entry of the specific music ID. That is, if the music ID of the deleted music is q, the entry with the music ID q is deleted from the music DB, and the voice recognition dictionary of g = floor ((q + n-1) / n) is deleted. Create a dictionary source list in which the song ID and song name of each entry in which the song IDs from 1-n + (g × n) to g × n are registered in the information song DB, and from the created dictionary source list A speech recognition dictionary with dictionary ID g is created and stored in the HDD recorded music information.

また、以上の実施形態は、楽曲の曲名を音声認識する場合について説明したが、本実施形態における音声認識辞書の作成や編集/修正の技術は、楽曲のアーティスト名、アルバム名、ジャンル名、または、その他の任意のテキストの音声認識辞書の作成や編集/修正を行う場合についても同様に適用することができる。 In the above embodiment, the case of recognizing the song name of the song has been described. However, the technique of creating or editing / modifying the voice recognition dictionary in the present embodiment is not limited to the artist name, album name, genre name, or The present invention can be similarly applied to the case of creating or editing / correcting a speech recognition dictionary for other arbitrary texts.

１…マイクロフォン、２…スピーカ、３…入力装置、４…表示装置、５…ＨＤＤ、６…ＣＤドライブ、７…ポータブルオーディオプレイヤインタフェース、８…音声合成エンジン、９…音声認識エンジン、１０…オーディオ出力部、１１…音声認識辞書編集部、１２…リッピング処理部、１３…楽曲管理部、１４…再生制御部、２０…ＣＤ-ＤＡディスク、２１…ポータブルオーディオプレイヤ。 DESCRIPTION OF SYMBOLS 1 ... Microphone, 2 ... Speaker, 3 ... Input device, 4 ... Display device, 5 ... HDD, 6 ... CD drive, 7 ... Portable audio player interface, 8 ... Speech synthesis engine, 9 ... Speech recognition engine, 10 ... Audio output 11: Voice recognition dictionary editing section, 12 ... Ripping processing section, 13 ... Music management section, 14 ... Playback control section, 20 ... CD-DA disc, 21 ... Portable audio player.

Claims

A speech recognition dictionary used in a speech recognition device for recognizing text represented by speech uttered by humans, storing speech recognition data representing correspondence between text and pronunciation data representing pronunciation of the text for each text to be recognized. A speech recognition dictionary editing device for editing
A voice recognition dictionary storage unit for storing the voice recognition dictionary;
A Yomi conversion rule storage unit that stores Yomi conversion rules that stipulate how to generate pronunciation data from text;
The texts to be recognized are grouped into groups of n (where n is an integer of 1 or more) texts, and each group is generated according to the text in the group and the text conversion rule for the texts. A speech recognition dictionary generating unit that stores the speech recognition dictionary storing the speech recognition data representing the correspondence with the generated pronunciation data as the speech recognition dictionary of the group, and storing the speech recognition dictionary in the speech recognition dictionary storage unit;
A Yomi conversion rule correction unit for correcting the Yomi conversion rule so that the generation method of the pronunciation data for the specified text that is the text specified by the user is the generation method specified by the user;
The group including the designated text is set as a correction target group, the voice recognition dictionary of the correction target group is deleted from the voice recognition dictionary storage unit, and each text in the correction target group and the text conversion rule of the text The speech recognition dictionary storing the speech recognition data representing the correspondence with the pronunciation data generated according to the corrected Yomi conversion rule by the correcting unit is created as a new speech recognition dictionary of the group, and stored in the speech recognition dictionary storage unit A speech recognition dictionary editing apparatus, comprising: a speech recognition dictionary correction unit for storing.

The speech recognition dictionary editing device according to claim 1,
The speech recognition dictionary correcting unit sets each group including the same text as the designated text as the correction target group, and for each correction target group, the speech recognition dictionary of the correction target group is stored in the speech recognition dictionary storage unit. The speech recognition dictionary storing the speech recognition data representing the correspondence between each text in the correction target group and the pronunciation data generated according to the correction conversion rule of the text by the correction conversion rule correction unit. A speech recognition dictionary editing device, which is created as a new speech recognition dictionary of the group and stored in the speech recognition dictionary storage unit.

The speech recognition dictionary editing device according to claim 1,
When the text to be recognized is added, the added text to be recognized is grouped into groups of n texts, and for each group, the text in the group and the text of the text A voice recognition dictionary adding unit that creates the voice recognition dictionary storing the voice recognition data representing the correspondence with the pronunciation data generated according to the yomi conversion rule as a voice recognition dictionary of the group and additionally storing the voice recognition dictionary in the voice recognition dictionary storage unit A speech recognition dictionary editing apparatus comprising:

The speech recognition dictionary editing device according to claim 1,
The speech recognition dictionary correcting unit, when the text is excluded from the text to be recognized, the group including the excluded text as a correction target group, the speech recognition dictionary of the correction target group Correspondence between each text of the correction target group and the pronunciation data generated according to the reading conversion rule of the text after removing the excluded text from the correction target group and deleting from the speech recognition dictionary storage unit The speech recognition dictionary editing apparatus is characterized in that the speech recognition dictionary storing speech recognition data representing is created as a new speech recognition dictionary of the group and stored in the speech recognition dictionary storage unit.

A speech recognition dictionary editing device according to claim 1;
A speech recognition apparatus comprising: a speech recognition unit that recognizes text represented by speech uttered by a person using the speech recognition dictionary of each group stored in the speech recognition dictionary storage unit.

A speech recognition dictionary editing apparatus according to claim 3;
A speech recognition apparatus comprising: a speech recognition unit that recognizes text represented by speech uttered by a person using the speech recognition dictionary of each group stored in the speech recognition dictionary storage unit.

A speech recognition dictionary editing device according to claim 4,
A speech recognition apparatus comprising: a speech recognition unit that recognizes text represented by speech uttered by a person using the speech recognition dictionary of each group stored in the speech recognition dictionary storage unit.

The speech recognition dictionary editing device according to claim 1,
The speech recognition dictionary editing apparatus, wherein the text to be recognized is any one of a song title, artist name, album name, and genre name.

The speech recognition device according to claim 5, a music data storage unit that stores music data representing music, and a music playback unit that plays back the music data stored in the music data storage unit,
The text to be recognized is the song name of the song represented by the song data stored in the song data storage unit,
The audio reproducing apparatus, wherein the music reproducing unit reproduces music data of a music having a music name recognized by the voice recognition unit.

A computer program that is read and executed by a computer,
The computer program stores the computer,
A speech recognition dictionary used for speech recognition for recognizing text represented by speech uttered by humans, storing speech recognition data representing correspondence between text and pronunciation data representing pronunciation of the text for each text to be recognized A voice recognition dictionary storage unit for storing;
A Yomi conversion rule storage unit that stores Yomi conversion rules that stipulate how to generate pronunciation data from text;
The texts to be recognized are grouped into groups of n (where n is an integer of 1 or more) texts, and each group is generated according to the text in the group and the text conversion rule for the texts. A speech recognition dictionary generating unit that stores the speech recognition dictionary storing the speech recognition data representing the correspondence with the generated pronunciation data as the speech recognition dictionary of the group, and storing the speech recognition dictionary in the speech recognition dictionary storage unit;
A Yomi conversion rule correction unit for correcting the Yomi conversion rule so that the generation method of the pronunciation data for the specified text that is the text specified by the user is the generation method specified by the user;
The group including the designated text is set as a correction target group, the voice recognition dictionary of the correction target group is deleted from the voice recognition dictionary storage unit, and each text in the correction target group and the text conversion rule of the text The speech recognition dictionary storing the speech recognition data representing the correspondence with the pronunciation data generated according to the corrected Yomi conversion rule by the correcting unit is created as a new speech recognition dictionary of the group, and stored in the speech recognition dictionary storage unit A computer program that functions as a stored voice recognition dictionary correction unit.

A computer program according to claim 10,
The speech recognition dictionary correcting unit sets each group including the same text as the designated text as the correction target group, and for each correction target group, the speech recognition dictionary of the correction target group is stored in the speech recognition dictionary storage unit. The speech recognition dictionary storing the speech recognition data representing the correspondence between each text in the correction target group and the pronunciation data generated according to the correction conversion rule of the text by the correction conversion rule correction unit. A computer program created as a new speech recognition dictionary of the group and stored in the speech recognition dictionary storage unit.

A computer program according to claim 10,
The computer further
When the text to be recognized is added, the added text to be recognized is grouped into groups of n texts, and for each group, the text in the group and the text of the text A voice recognition dictionary adding unit that creates the voice recognition dictionary storing the voice recognition data representing the correspondence with the pronunciation data generated according to the yomi conversion rule as a voice recognition dictionary of the group and additionally storing the voice recognition dictionary in the voice recognition dictionary storage unit A computer program that functions as a computer program.

A computer program according to claim 10,
The speech recognition dictionary correcting unit, when the text is excluded from the text to be recognized, the group including the excluded text as a correction target group, the speech recognition dictionary of the correction target group Correspondence between each text of the correction target group and the pronunciation data generated according to the reading conversion rule of the text after removing the excluded text from the correction target group and deleting from the speech recognition dictionary storage unit A computer program for creating the speech recognition dictionary storing speech recognition data representing a new speech recognition dictionary of the group and storing it in the speech recognition dictionary storage unit.

A computer program according to claim 10,
The computer program further causes the computer to function as a speech recognition unit that recognizes text represented by speech uttered by a person using the speech recognition dictionary of each group stored in the speech recognition dictionary storage unit. A computer program characterized by the above.

A computer program according to claim 14,
The speech recognition apparatus according to claim 1, wherein the text to be recognized is any one of a song title, artist name, album name, and genre name.

A computer program according to claim 14,
Further causing the computer to function as a music playback unit that plays back the music data stored in a music data storage unit that stores music data representing music;
The text to be recognized is the song name of the song represented by the song data stored in the song data storage unit,
The computer program according to claim 1, wherein the music playback unit plays back music data of a song having a song name recognized by the voice recognition unit.

A speech recognition dictionary used in a speech recognition device for recognizing text represented by speech uttered by humans, storing speech recognition data representing correspondence between text and pronunciation data representing pronunciation of the text for each text to be recognized. A speech recognition dictionary editing method for editing
Each of the texts to be recognized is grouped into groups of n (where n is an integer of 1 or more) texts, and for each group, each text in the group and the pronunciation data from the text of the text. A voice recognition dictionary generating step of creating and storing the voice recognition dictionary storing the voice recognition data representing the correspondence with the pronunciation data generated in accordance with the Yomi conversion rule that defines the generation method of
A Yomi conversion rule correction step for correcting the Yomi conversion rule so that the generation method of the pronunciation data for the specified text which is the text specified by the user is the generation method specified by the user;
The group including the specified text as a correction target group, the speech recognition dictionary of the correction target group is deleted, and each text in the correction target group and the pronunciation generated according to the corrected Yomi conversion rule of the text A speech recognition dictionary editing method, comprising: a speech recognition dictionary correcting step of creating and storing the speech recognition dictionary storing speech recognition data representing correspondence with data as a new speech recognition dictionary of the group.

The speech recognition dictionary editing method according to claim 17,
In the speech recognition dictionary correction step, each group including the same text as the designated text is set as the correction target group, and for each correction target group, the voice recognition dictionary of the correction target group is deleted, and the correction target The speech recognition dictionary storing speech recognition data representing correspondence between each text in the group and pronunciation data generated according to the corrected reading conversion rule of the text is created and stored as a new speech recognition dictionary of the group A speech recognition dictionary editing method characterized by:

The speech recognition dictionary editing method according to claim 17,
When the text to be recognized is added, the added text to be recognized is grouped into groups of n texts, and for each group, the text in the group and the text of the text A voice recognition dictionary having a voice recognition dictionary adding step of creating and storing the voice recognition dictionary storing the voice recognition data representing the correspondence with the pronunciation data generated according to the reading conversion rule as the voice recognition dictionary of the group Dictionary editing method.

The speech recognition dictionary editing method according to claim 17,
In the speech recognition dictionary correction step, when the text is excluded from the text to be recognized, the speech recognition dictionary of the correction target group is set as the correction target group including the group including the excluded text. In addition to deleting the excluded text from the correction target group, voice recognition data representing correspondence between each text of the correction target group and the pronunciation data generated according to the reading conversion rule of the text is stored. The speech recognition dictionary editing method, wherein the speech recognition dictionary is created and stored as a new speech recognition dictionary of the group.