JP2002366190A

JP2002366190A - Statistical language model generator and statistical language model generator

Info

Publication number: JP2002366190A
Application number: JP2001172260A
Authority: JP
Inventors: Akio Kobayashi; 彰夫小林; Shinichi Honma; 真一本間; Akio Ando; 彰男安藤
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2001-06-07
Filing date: 2001-06-07
Publication date: 2002-12-20
Anticipated expiration: 2021-06-07
Also published as: JP4340024B2

Abstract

(57)【要約】【課題】音声認識における認識性能を向上することが
でき、発話内容に含まれやすい単語の出現確率を上げる
ことができる統計的言語モデルを生成する統計的言語モ
デル生成装置および統計的言語モデル生成プログラムを
提供する。【解決手段】統計的言語モデル生成装置は、直近のテ
キストデータと過去の大量テキストデータとを音声とし
て認識する音声認識手段７と、直近のテキストデータ
（集積手段５）、および過去の大量テキストデータ（集
積手段３）、ならびに音声認識手段７によって認識され
た認識結果を蓄積する蓄積手段９と、過去の大量のテキ
ストデータに基づいて、ｎ−ｇｒａｍの第一の確率重み
を算出し、前記直近のテキストデータに基づいて、ｎ−
ｇｒａｍの第二の確率重みを算出し、前記認識結果に基
づいて、ｎ−ｇｒａｍの第三の確率重みを算出する確率
重み算出手段と、第一の確率重み、および第二の確率重
み、ならびに第三の確率重みに基づいて、統計的言語モ
デルを生成する言語モデル生成手段１１とを備えた。 (57) [Summary] [Problem] To provide a statistical language model generation device that generates a statistical language model that can improve recognition performance in speech recognition and increase the probability of appearance of words that are likely to be included in utterance contents. Provide a statistical language model generator. SOLUTION: A statistical language model generation device includes a speech recognition unit 7 for recognizing the most recent text data and past mass text data as speech, the latest text data (accumulation unit 5), and the past mass text data. (Accumulating means 3), accumulating means 9 for accumulating recognition results recognized by the voice recognizing means 7, and a first probability weight of n-gram is calculated based on a large amount of past text data. Based on the text data of
a probability weight calculating means for calculating a second probability weight of n-gram, and calculating a third probability weight of n-gram, based on the recognition result, a first probability weight, and a second probability weight; and A language model generating means for generating a statistical language model based on the third probability weight.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識装置に供
される統計的言語モデルを生成する統計的言語モデル生
成装置および統計的言語モデル生成プログラムに関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a statistical language model generating apparatus and a statistical language model generating program for generating a statistical language model for use in a speech recognition apparatus.

【０００２】[0002]

【従来の技術】従来、音声認識装置における、音声の認
識性能を向上するための方法として、統計的（確率的）
言語モデルを利用する方法が提唱されており、代表的な
ものに下記に示すようなものがある。なお、統計的言語
モデルとは、言語における単語や音素間の関係が統計量
に基づいてモデル化されたものである。2. Description of the Related Art Conventionally, a statistical (stochastic) method has been used as a method for improving speech recognition performance in a speech recognition apparatus.
A method using a language model has been proposed, and typical ones are as follows. The statistical language model is a model in which the relationship between words and phonemes in a language is modeled based on statistics.

【０００３】（１）キャッシュモデルによる方法（Ｒ．
Ｋｕｈｎ，Ｒ．ＤｅＭｏｒｉ“ＡＣａｃｈｅ−Ｂａｓ
ｅｄＮａｔｕｒａｌＬａｎｇｕａｇｅＭｏｄｅｌ
ｆｏｒＳｐｅｅｃｈＲｅｃｏｇｎｉｔｉｏｎ，”Ｉ
ＥＥＥＴｒａｎｓ．ＰＡＭＩ，ｖｏｌ．１２，ｎｏ．
６，１９９０，ｐｐ．５７０〜５８３）。この方法は、
過去の大量の原稿（テキストデータ）から学習されたｎ
−ｇｒａｍ確率値と、最近の音声認識結果における単語
の出現確率とを線形補間等で結合することによって音声
の認識性能を向上させる方法である。なお、念のために
補足しておくと、ｎ−ｇｒａｍ確率値とは、単語の系列
をマルコフ連鎖としてモデル化した単語ｎ−ｇｒａｍに
おける生起確率、つまり、ある単語の生起確率は直前の
（ｎ−１）単語に依存するというものである。線形補間
（ｌｉｎｅａｒｉｎｔｅｒｐｏｌａｔｉｏｎ）とは、
ｎ−ｇｒａｍ確率値と、低次のｍ−ｇｒａｍ確率値（ｍ
＜ｎ）とを線形に補間することである。(1) A method based on a cache model (R.
Kuhn, R .; De Mori “ACache-Bas
ed Natural Language Model
for Speech Recognition, "I
EEE Trans. PAMI, vol. 12, no.
6, 1990, p. 570-583). This method
N learned from a large number of past manuscripts (text data)
This is a method for improving the speech recognition performance by combining the gram probability value and the word appearance probability in the latest speech recognition result by linear interpolation or the like. It should be noted that the n-gram probability value is an occurrence probability in a word n-gram in which a word sequence is modeled as a Markov chain, that is, the occurrence probability of a certain word is the immediately preceding (n -1) It depends on words. What is linear interpolation?
An n-gram probability value and a lower-order m-gram probability value (m
<N) is interpolated linearly.

【０００４】（２）ＭＡＰ（事後確率最大化）推定に基
づく方法（小林、今井、安藤、“ＴｉｍｅＤｅｐｅｎ
ｄｅｎｔＬａｎｇｕａｇｅＭｏｄｅｌｆｏｒＢ
ｒｏａｄｃａｓｔＮｅｗｓＴｒａｎｓｃｒｉｐｔｉ
ｏｎａｎｄＩｔｓＰｏｓｔｃｏｒｒｅｃｔｉｏ
ｎ，”ＩＣＳＬＰ−１９９８）。この方法は、あるタス
クのｎ−ｇｒａｍ確率値を、タスクに依存しない大量の
原稿に対して小量の原稿をＭＡＰ推定により得られた適
当な重みにより足し合わせ、言語モデルの統計的な精度
を高め、認識性能を向上させる方法である。言語モデル
を生成するための語彙（コーパス）は、小量の原稿中の
すべての単語と、大量の原稿の一部の単語を合わせて構
成されている。なお、念のために補足しておくと、タス
クとは、一般的には仕事、つまり、処理される対象であ
り、適当な重みとは、統計的（確率）言語モデルにおけ
る、ある単語の出現確率が高くなるように付加された数
値であり、語彙（コーパス）とは、言語モデルを生成す
るための元になるデータであり、一般的には通常、数十
万個以上の単語を含んだテキストデータベースのことで
ある。(2) Method based on MAP (posterior probability maximization) estimation (Kobayashi, Imai, Ando, “Time Depen
Dent Language Model for B
loadcast News Transcripti
on and Its Postcorrection
n, “ICSLP-1998). This method adds the n-gram probability value of a task to a large number of documents independent of a task by adding a small amount of the document with an appropriate weight obtained by MAP estimation, This is a method to improve the statistical accuracy of the language model and improve the recognition performance.The vocabulary (corpus) for generating the language model includes all the words in a small manuscript and a part of a large manuscript. It should be noted that a task is generally a task, that is, an object to be processed, and an appropriate weight is statistical (probability) ) A numerical value added so as to increase the probability of occurrence of a certain word in a language model. A vocabulary (corpus) is data used as a basis for generating a language model. More than 100,000 words Do it is that of text database.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、従来の
キャッシュモデルによる方法では、過去の音声認識結果
を利用するため、言語モデルを生成するための語彙に登
録されていない単語（最近、広く使われるようになった
言葉）については考慮されていない。このため、報道番
組（ニュース等）のように、一つの話題が少ない文章数
で構成されている場合が多く、人名、地名、組織名とい
った固有名詞（新しい単語）が極めて頻繁に出現しやす
いタスクにおいては、常に新しい単語を登録した語彙に
基づいた言語モデルを利用しなければ、音声認識の認識
性能の向上は見込めないという問題がある。However, in the conventional cache model-based method, a word that is not registered in a vocabulary for generating a language model (as recently used widely) is used in order to use past speech recognition results. Is not taken into account. For this reason, tasks such as news programs (news, etc.) in which one topic is often composed of a small number of sentences, and in which proper nouns (new words) such as personal names, place names, and organization names are likely to appear very frequently However, there is a problem that the recognition performance of speech recognition cannot be improved unless a language model based on a vocabulary in which new words are registered is always used.

【０００６】また、ＭＡＰ（事後確率最大化）推定に基
づく方法では、実際の発話内容ではなく、書き言葉で記
された原稿を利用するので、発話内容に含まれやすい単
語の出現確率を上げることができないという問題があ
る。In the method based on MAP (maximum posterior probability) estimation, a manuscript written in written language is used instead of the actual utterance content. There is a problem that can not be.

【０００７】本発明の目的は前記した従来の技術が有す
る課題を解消し、音声認識における認識性能を向上する
ことができ、発話内容に含まれやすい単語の出現確率を
上げることができる統計的言語モデルを生成する統計的
言語モデル生成装置および統計的言語モデル生成プログ
ラムを提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems of the prior art, to improve the recognition performance in speech recognition, and to increase the probability of appearance of words likely to be included in utterance contents. An object of the present invention is to provide a statistical language model generation device for generating a model and a statistical language model generation program.

【０００８】[0008]

【課題を解決するための手段】請求項１記載の統計的言
語モデル生成装置は、統計的言語モデルを生成する統計
的言語モデル生成装置であって、出現頻度の高くなるこ
とが予測される単語を含む直近のテキストデータを取得
するテキストデータ取得手段と、前記直近のテキストデ
ータおよびこの直近のテキストデータよりデータ量の多
い、それ以前の過去の大量テキストデータを音声として
認識する音声認識手段と、前記直近のテキストデータ、
前記過去の大量テキストデータ、前記音声認識手段によ
る認識結果、を蓄積する蓄積手段と、前記過去の大量の
テキストデータに基づいて、ｎ−ｇｒａｍの第一の確率
重みを算出し、前記直近のテキストデータに基づいて、
ｎ−ｇｒａｍの第二の確率重みを算出し、前記認識結果
に基づいて、ｎ−ｇｒａｍの第三の確率重みを算出する
確率重み算出手段と、前記第一の確率重み、および前記
第二の確率重み、ならびに前記第三の確率重みに基づい
て、統計的言語モデルを生成する言語モデル生成手段
と、を備えたことを特徴とする。According to a first aspect of the present invention, there is provided a statistical language model generating apparatus for generating a statistical language model, wherein a word whose appearance frequency is predicted to be high is increased. Text data obtaining means for obtaining the most recent text data, including the most recent text data and a larger amount of data than the most recent text data, a voice recognition means for recognizing a large amount of past text data before that as a voice, The latest text data,
A storage unit for storing the past large amount of text data and the recognition result by the voice recognition unit; and a first probability weight of n-gram is calculated based on the past large amount of text data, and the most recent text is calculated. Based on the data,
calculating a second probability weight of n-gram; calculating a third probability weight of n-gram based on the recognition result; a first probability weight; Language model generating means for generating a statistical language model based on the probability weight and the third probability weight.

【０００９】かかる構成によれば、テキストデータ取得
手段によって、出現頻度の高くなることが予測される単
語を含む直近のテキストデータが取得され、音声認識手
段によって、直近のテキストデータおよびそれ以前の大
量テキストデータが音声として認識され、蓄積手段によ
って、直近のテキストデータおよび過去の大量テキスト
データならびに認識した認識結果が蓄積される。そし
て、確率重み算出手段によって、それぞれのｎ−ｇｒａ
ｍにおける確率重みが算出され、この算出結果に基づい
て言語モデル生成手段によって言語モデルが生成され
る。According to this configuration, the latest text data including the word whose appearance frequency is predicted to be high is obtained by the text data obtaining means, and the latest text data and a large amount of the previous text data are obtained by the voice recognition means. The text data is recognized as voice, and the storage unit stores the latest text data, past large-volume text data, and the recognized recognition result. Then, the respective n-gras are calculated by the probability weight calculating means.
The probability weight for m is calculated, and a language model is generated by the language model generating means based on the calculation result.

【００１０】なお、出現頻度の高くなることが予測され
る直近のテキストデータとは、例えば、放送直前または
直後の放送番組に供される原稿、発売直前または直後の
新聞や雑誌に掲載されている記事、等が挙げられる。ま
た、過去の大量テキストデータとは、例えば、数年から
数十年分の放送番組に供された原稿、或いは、ブラウン
・コーパス、ＬＯＢコーパス等が挙げられる。The most recent text data that is expected to appear more frequently is, for example, a manuscript provided for a broadcast program immediately before or immediately after a broadcast, a newspaper or a magazine just before or immediately after a release, etc. Articles and the like. Further, the past large amount of text data includes, for example, manuscripts provided for broadcast programs for several years to several decades, a brown corpus, a LOB corpus, and the like.

【００１１】請求項２記載の統計的言語モデル生成装置
は、請求項１に記載の統計的言語モデル生成装置におい
て、前記音声認識手段によって認識された認識結果を修
正する認識結果修正手段を備え、前記確率重み算出手段
が、修正された認識結果に基づいてｎ−ｇｒａｍの第三
の確率重みを算出することを特徴とする。According to a second aspect of the present invention, there is provided a statistical language model generating apparatus according to the first aspect, further comprising a recognition result correcting means for correcting a recognition result recognized by the speech recognition means. The probability weight calculating means calculates a third probability weight of n-gram based on the corrected recognition result.

【００１２】かかる構成によれば、認識結果修正手段に
よって、テキストデータが音声として認識された結果が
修正され、この修正された認識結果に基づいて、確率重
み算出手段によって、ｎ−ｇｒａｍの第三の確率重みが
算出される。[0012] According to this configuration, the result of the recognition of the text data as speech is corrected by the recognition result correcting means, and based on the corrected recognition result, the probability weight calculating means calculates the third n-gram. Is calculated.

【００１３】請求項３記載の統計的言語モデル生成プロ
グラムは、コンピュータを、出現頻度の高くなることが
予測される単語を含む直近のテキストデータを取得する
テキストデータ取得手段、前記直近のテキストデータお
よびこの直近のテキストデータよりデータ量の多い、そ
れ以前の過去の大量テキストデータを音声として認識す
る音声認識手段、前記直近のテキストデータ、および前
記過去の大量テキストデータ、ならびに前記音声認識手
段によって認識された認識結果を蓄積する蓄積手段、前
記過去の大量のテキストデータに基づいて、ｎ−ｇｒａ
ｍの第一の確率重みを算出し、前記直近のテキストデー
タに基づいて、ｎ−ｇｒａｍの第二の確率重みを算出
し、前記認識結果に基づいて、ｎ−ｇｒａｍの第三の確
率重みを算出する確率重み算出手段、前記第一の確率重
み、および前記第二の確率重み、ならびに前記第三の確
率重みに基づいて、統計的言語モデルを生成する言語モ
デル生成手段、を備えたことを特徴とする。According to a third aspect of the present invention, there is provided a statistical language model generation program, comprising: a computer for obtaining text data that includes the most recent text data including a word whose appearance frequency is predicted to be high; Speech recognition means for recognizing a large amount of past large-volume text data as voice having a larger data amount than the latest text data as speech, the latest text data, and the large-volume text data of the past, and the speech recognition means. Storage means for storing the recognized recognition result, based on the large amount of past text data,
m, a second probability weight of n-gram is calculated based on the latest text data, and a third probability weight of n-gram is calculated based on the recognition result. Probability weight calculation means to calculate, the first probability weight, and the second probability weight, and, based on the third probability weight, based on the language model generation means to generate a statistical language model, Features.

【００１４】かかる構成によれば、テキストデータ取得
手段によって、出現頻度の高くなることが予測される単
語を含む直近のテキストデータが取得され、音声認識手
段によって、直近のテキストデータおよびそれ以前の大
量テキストデータが音声として認識され、蓄積手段によ
って、直近のテキストデータおよび過去の大量テキスト
データならびに認識した認識結果が蓄積される。そし
て、確率重み算出手段によって、それぞれのｎ−ｇｒａ
ｍにおける確率重みが算出され、この算出結果に基づい
て言語モデル生成手段によって言語モデルが生成され
る。According to this configuration, the latest text data including the word whose appearance frequency is predicted to be high is obtained by the text data obtaining unit, and the latest text data and the large amount of the previous text data are obtained by the voice recognition unit. The text data is recognized as voice, and the storage unit stores the latest text data, past large-volume text data, and the recognized recognition result. Then, the respective n-gras are calculated by the probability weight calculating means.
The probability weight for m is calculated, and a language model is generated by the language model generating means based on the calculation result.

【００１５】[0015]

【発明の実施の形態】以下、本発明の実施形態を図面に
基づいて詳細に説明する。（統計的言語モデル生成装置：第一の実施形態）図１
に、統計的言語モデル生成装置の第一の実施形態の機能
説明図を示す。図１に示すように、統計的言語モデル生
成装置１は、図示を省略した主制御部、記憶部、表示出
力部、入力部、外部接続部等を備えて構成されており、
過去ニュース原稿集積手段３と、直近記者原稿集積手段
５と、音声認識手段７と、認識結果集積手段９と、言語
モデル計算手段１１とを機能的に実現する。Embodiments of the present invention will be described below in detail with reference to the drawings. (Statistical Language Model Generator: First Embodiment) FIG.
FIG. 3 shows a functional explanatory diagram of the first embodiment of the statistical language model generation device. As illustrated in FIG. 1, the statistical language model generation device 1 includes a main control unit, a storage unit, a display output unit, an input unit, an external connection unit, and the like, which are not illustrated.
The past news manuscript accumulation means 3, the latest report manuscript accumulation means 5, the speech recognition means 7, the recognition result accumulation means 9, and the language model calculation means 11 are functionally realized.

【００１６】統計的言語モデル生成装置１は、大量のテ
キストデータに基づいて、音声認識装置（図示せず）に
おける音声認識時に供される統計的言語モデルを生成す
る装置である。なお、この実施の形態では、統計的言語
モデル生成装置１は、一般的なコンピュータであり、図
示を省略した各主制御部、記憶部、表示出力部、入力
部、外部接続部は、ＣＰＵ、メモリ、ハードディスク、
キーボード等から構成されている。The statistical language model generation device 1 is a device for generating a statistical language model to be used at the time of speech recognition in a speech recognition device (not shown) based on a large amount of text data. In this embodiment, the statistical language model generation device 1 is a general computer, and each main control unit (not shown), storage unit, display output unit, input unit, and external connection unit are a CPU, Memory, hard disk,
It is composed of a keyboard and the like.

【００１７】過去ニュース原稿集積手段３は、図示を省
略した記憶部に記憶（集積）されたデータベースであっ
て、請求項に記載した過去の大量テキストデータが集積
されたものである。この過去ニュース原稿集積手段３に
は、過去の大量のニュース原稿がテキストファイル形式
（テキストデータ）で蓄積されている。このテキストフ
ァイルは、原稿を構成する一つ一つの単語間にスペース
を挟んだものである。The past news manuscript accumulating means 3 is a database stored (accumulated) in a storage unit, not shown, in which a large amount of past text data described in the claims is accumulated. The past news manuscript accumulation means 3 stores a large amount of past news manuscripts in a text file format (text data). This text file is a file in which a space is interposed between each word constituting the manuscript.

【００１８】なお、この実施の形態では、ニュース原稿
に含まれている句読点は、その句読点の直前の単語に一
体化され取り扱われるものとする。さらに、補足してお
くと、この明細書中において、記憶、集積、蓄積という
語句は、実質的に差がないものとして記載している。In this embodiment, the punctuation marks included in the news manuscript are integrated into the word immediately before the punctuation marks and handled. Further, to supplement, in this specification, the terms storage, accumulation, and accumulation are described as having substantially no difference.

【００１９】直近記者原稿集積手段５は、図示を省略し
た主制御部に展開するプログラムと記憶部に記憶された
データベースとであって、出現頻度の高くなることが予
測される単語を含む直近のテキストデータを取得するテ
キストデータ取得手段と取得された直近のテキストデー
タが集積されたものである。この直近記者原稿集積手段
５では、まず、直近の放送番組（特に報道番組）に供さ
れる記者原稿を取得する。取得の仕方は、ニュース原稿
をオペレーター（操作者）が統計的言語モデル生成装置
１に入力、或いはＯＣＲ等で読みとって、その読みとっ
た結果を外部接続部を介して入力することで行われる。The most recent reporter manuscript accumulation means 5 is a program developed in the main control unit (not shown) and a database stored in the storage unit, and includes a most recent word including a word whose appearance frequency is predicted to be high. The text data acquiring means for acquiring text data and the latest text data acquired are integrated. The latest report manuscript accumulation means 5 first obtains a reporter manuscript provided for the latest broadcast program (especially a news program). The acquisition method is performed by an operator (operator) inputting a news manuscript into the statistical language model generation device 1 or reading the news manuscript using an OCR or the like, and inputting the read result via an external connection unit.

【００２０】直近記者原稿集積手段５では、記者原稿を
取得後、自動的に一定の修正が加えられ、或いは、オペ
レーターによって校正されて、テキストデータとして、
記憶部（図示せず）の直近テキストデータベースに蓄積
される。記者原稿における個々の文章は、一つの話題単
位で、テキストファイル化されており、このテキストフ
ァイルは、過去ニュース原稿集積手段３と同様に、記者
原稿を構成する一つ一つの単語間にスペースを挟んだも
のとして構成されている。In the latest report manuscript stacking means 5, after the reporter manuscript is obtained, a certain correction is automatically made or the operator proofreads it as text data.
It is stored in a latest text database in a storage unit (not shown). Each sentence in the reporter manuscript is converted into a text file in units of one topic, and this text file has a space between each word constituting the reporter manuscript, similarly to the past news manuscript accumulation means 3. It is configured as a sandwich.

【００２１】音声認識手段７は、テキストファイルを音
声として認識する（テキストファイルを読み上げる）も
のである。この音声認識手段７は、一般的なテキスト音
声変換エンジン等であり、このテキスト音声変換エンジ
ンは数十万語を格納した辞書を搭載し、まず、過去ニュ
ース原稿集積手段３および直近記者原稿集積手段５のテ
キストファイルから、このファイルに含まれる単語を認
識する。そして、認識された単語を、例えば、波形重畳
法等によって合成し、ＰＣＭ出力形式で出力する。The voice recognition means 7 recognizes a text file as voice (reads out the text file). The speech recognition means 7 is a general text-to-speech conversion engine or the like. This text-to-speech conversion engine is equipped with a dictionary storing hundreds of thousands of words. First, the past news manuscript accumulation means 3 and the latest report manuscript accumulation means From the text file No. 5, the words included in this file are recognized. Then, the recognized words are synthesized by, for example, a waveform superposition method or the like, and output in a PCM output format.

【００２２】認識結果集積手段９は、図示を省略した記
憶部に記憶（集積）されたデータベースであって、音声
認識手段７によって認識された認識結果に、過去ニュー
ス原稿集積手段３および直近記者原稿集積手段５のテキ
ストファイルが参照され、認識結果の各文章ごとに日
付、時刻がタイムスタンプとして付与されて、蓄積され
るものである。The recognition result accumulating means 9 is a database stored (accumulated) in a storage unit (not shown), and stores the past news manuscript accumulating means 3 and the latest report manuscript in the recognition result recognized by the voice recognizing means 7. The text file of the accumulating means 5 is referred to, and a date and a time are given as a time stamp for each sentence of the recognition result and are accumulated.

【００２３】言語モデル計算手段１１は、過去ニュース
原稿集積手段３による過去の大量テキストデータと、直
近記者原稿集積手段５による直近の記者原稿のテキスト
データと、音声認識手段７による認識結果とに基づい
て、統計的言語モデルを生成するプログラムである。こ
の実施の形態では、言語モデルにｂｉｇｒａｍモデル
（ｂｉｇｒａｍモデルを含むｎ−ｇｒａｍモデルについ
ては、例えば、「確率モデルによる音声認識」、中川聖
一、電子情報通信学会、ｐｐ．１０９参照）を用いてい
る。The language model calculating means 11 is based on the past mass text data by the past news manuscript accumulating means 3, the text data of the latest report manuscript by the latest report manuscript accumulating means 5, and the recognition result by the voice recognition means 7. And a program for generating a statistical language model. In this embodiment, a bigram model is used as a language model (for an n-gram model including a bigram model, see, for example, “Speech Recognition by Probabilistic Model”, Seiichi Nakagawa, IEICE, pp.109). I have.

【００２４】この言語モデル計算手段１１では、後記す
る数式に基づき、以下に示す順序で、種々の計算がなさ
れる。まず、過去の大量テキストデータと、直近の記者
原稿のテキストデータと、認識結果とに基づいて、言語
モデルのｂｉｇｒａｍＰ₀、Ｐ₁、Ｐ₂を線形補間（線形
補間については、例えば、「音声言語処理」、北、中
村、永田共著、森北出版、ｐｐ．２９参照）によって表
すと、重み付けされた言語モデルはIn the language model calculating means 11, various calculations are performed in the following order based on mathematical expressions described later. First of all, the past of a large amount text data, and text data of the most recent press manuscript, on the basis of the recognition result, for the linear interpolation (linear interpolation the bigramP of the language model _0, P _1, P ₂ is, for example, "spoken language Processing ”, co-authored by Kita, Nakamura, and Nagata, Morikita Publishing, pp. 29), the weighted language model is

【００２５】[0025]

【数１】によって表される。(Equation 1) Represented by

【００２６】この数１において、ｙ_n、ｙ_n-1は、語彙に
登録されている単語である。確率Ｐ（ｙ_n｜ｙ_n-1）は、
単語ｙ_n-1が発声された後に、単語ｙ_nが発声される確率
を意味する。一般にｎ−ｇｒａｍの言語モデルでは、ｎ
を大きくするほど長い連続単語列が取り扱われ、次の単
語の認識精度は高くなる。ただし認識精度が高くなる代
わりに、膨大な量（ｎ乗倍）の語彙数を含むテキストデ
ータを必要とする。λは各言語モデルにおける確率重み
を、Ｖは語彙を示すものである。In Equation 1, y _n and y _n-1 are words registered in the vocabulary. The probability P (y _n | y _n-1 ) is
After word y _n-1 is uttered, it means a probability of words y _n is uttered. Generally, in an n-gram language model, n
The larger the is, the longer the continuous word string is handled, and the higher the recognition accuracy of the next word is. However, instead of increasing the recognition accuracy, text data including an enormous amount (n times the number of words) of vocabulary is required. λ indicates a probability weight in each language model, and V indicates a vocabulary.

【００２７】重み付けされた言語モデルの単語ｙ_n、ｙ
_n-1に対するｂｉｇｒａｍが大きければ、この統計的言
語モデル生成装置１によって生成された言語モデルが音
声認識装置（図示せず）に供された場合に、当該装置の
音声認識時において、それらｙ _nとｙ_n-1との組み合わせ
が出現しやすくなる。つまり、当該装置の音声認識時
に、読み上げる文章（音声認識される文章）に対し、ｂ
ｉｇｒａｍの積が最大となるように確率重みλの値が決
定されればよいことになる。或いは、評価データ（音声
認識される文章）のエントロピー（例えば、「確率モデ
ルによる音声認識」中川聖一、電子情報通信学会、ｐ
ｐ．１１１、および数２参照）が最小となるように確率
重みλの値が決定されればよいことになる。The word y of the weighted language model_n, Y
_n-1If the bigram for
The language model generated by the word model generation device 1
When provided to a voice recognition device (not shown),
During speech recognition, those y _nAnd y_n-1Combination with
Is more likely to appear. In other words, when the device recognizes voice
Then, for a sentence to be read out (a sentence recognized by speech), b
The value of the probability weight λ is determined so that the product of i.g.
It will be good if it is set. Alternatively, evaluation data (voice
Entropy (eg, “probability model”)
Speech Recognition by Le ", Seiichi Nakagawa, IEICE, p
p. 111, and Equation 2) are minimized.
It suffices if the value of the weight λ is determined.

【００２８】[0028]

【数２】 (Equation 2)

【００２９】この数式２において、Ｎは評価テキスト中
（テキストデータ）の総単語数を示すものであり、評価
テキストは、評価データの単語列ｙ＝ｙ₁ｙ₂・・・・・
・ｙ _Nで表されるものとする。なお、この式のλは、期
待値最大化アルゴリズム（ＥＭアルゴリズム（ＥＭアル
ゴリズムについては、例えば、「音声言語処理」、北、
中村、永田共著、森北出版、ｐｐ．３１参照））を用い
るものとし、数式３によって繰り返し計算により求めら
れる。In Equation 2, N is in the evaluation text.
Indicates the total number of words in (text data)
The text is the word string y = y of the evaluation data₁y_Two・・・・・
・ Y _NIt is assumed that Note that λ in this equation is the period
Waiting value maximization algorithm (EM algorithm (EM algorithm)
For the algorithm, for example, "speech language processing", north,
Nakamura, Nagata co-author, Morikita Publishing, pp. 31)))
And it is obtained by repeated calculation using Equation 3.
It is.

【００３０】[0030]

【数３】 (Equation 3)

【００３１】この数式３において、λ_iを更新しなが
ら、評価テキストに対するエントロピーが収束するまで
繰り返し計算される。この計算によって、各言語モデル
に対する確率重みλを自動的に得ることができる。ただ
し、読み上げる文章に最適な確率重みλを求めること
は、通常、評価テキストの内容が未知であるため困難で
ある。このため、事前に評価テキストに係る既知の発話
内容の書き起こし（テキストデータ化したもの）を準備
し、これを用いて確率重みλの値を実験的に求めてお
く。In Equation 3, while updating λ _i , calculation is repeated until the entropy for the evaluation text converges. By this calculation, the probability weight λ for each language model can be automatically obtained. However, it is usually difficult to obtain the optimal probability weight λ for the text to be read because the content of the evaluation text is unknown. For this reason, a transcript of the known utterance content (converted into text data) relating to the evaluation text is prepared in advance, and the value of the probability weight λ is experimentally obtained using this.

【００３２】次に、テキスト重みｗを求める。このテキ
スト重みｗの値は重み付けされた単語頻度を与えるもの
である。過去の大量テキストデータＧ₀の総単語数をｍ₀
と、直近のテキストデータＧ₁の総単語数をｍ₁と、認識
結果Ｇ₂の総単語数をｍ₂とすると、テキスト重みｗは、
収束した確率重みλ₀、λ₁、λ₂（ｎ−ｇｒａｍの第一
の確率重み、第二の確率重み、第三の確率重み）を用い
て、数式４により計算される。Next, a text weight w is obtained. The value of this text weight w gives the weighted word frequency. The total number of words in the past mass text data G ₀ is m ₀
If the total number of words in the latest text data G ₁ is m ₁ and the total number of words in the recognition result G ₂ is m ₂ , the text weight w is
It is calculated by Equation 4 using the converged probability weights λ ₀ , λ ₁ , and λ ₂ (the first probability weight, the second probability weight, and the third probability weight of n-gram).

【００３３】[0033]

【数４】 (Equation 4)

【００３４】この数式４において、過去の大量テキスト
データＧ₀に加える直近のテキストデータＧ₁および認識
結果Ｇ₂の足し合わせ回数（テキスト重み）ｗ₁、ｗ
₂は、確率重みλ₀、λ₁、λ₂から計算される。この数式
４では、統計的言語モデルでの確率重みλが複数のテキ
ストデータの集合での確率重みに正規化されるものであ
る。In equation (4), the number of additions (text weights) w ₁ , w of the latest text data G ₁ and the recognition result G ₂ added to the past mass text data G ₀
₂ is calculated from the probability weights λ ₀ , λ ₁ , λ ₂ . In Equation 4, the probability weight λ in the statistical language model is normalized to the probability weight in a set of a plurality of text data.

【００３５】計算されたテキスト重みｗ₁、ｗ₂に基づい
て、直近のテキストデータがテキスト重みｗ₁で、ま
た、認識結果がテキスト重みｗ₂で、重み付けされ、過
去の大量テキストデータに足し合わされ、新たな語彙が
求められる。つまり、ある単語の出現頻度ｆは、過去の
大量テキストデータＧ₀での頻度ｆ₀、直近のテキストデ
ータＧ₁での頻度ｆ₁、認識結果Ｇ₂での頻度ｆ₂とする
と、Based on the calculated text weights w ₁ and w ₂ , the most recent text data is weighted by the text weight w ₁ , and the recognition result is weighted by the text weight w ₂ , and is added to the past mass text data. New vocabulary is required. In other words, the frequency f of a word, the frequency f ₀ in the past of a large amount text data G _0, frequency f ₁ of the most recent text data G _1, and the frequency f ₂ of the recognition result G _2,

【００３６】[0036]

【数５】となる。(Equation 5) Becomes

【００３７】そして、頻度ｆの大きい順に、単語を語彙
Ｖに登録する。ただし、語彙の登録数には、予め上限
（Ｖ_max）が設定されており、この上限を越えないよう
に登録される。このため、語彙の総登録語数が制限され
つつ、直近のテキストデータＧ ₁に含まれていた、それ
まで出現頻度の低かった単語が重み付けられ、語彙に登
録される。Then, words are vocabulary in descending order of frequency f.
Register with V. However, the maximum number of registered vocabulary
(V_max) Is set, and make sure that this limit is not exceeded.
Registered in. This limits the total number of registered words in the vocabulary.
While the latest text data G ₁Was included in it
Words that appear less frequently are weighted and added to the vocabulary.
Is recorded.

【００３８】つまり、統計的言語モデル生成装置１の言
語モデル計算手段１１では、直近のテキストデータ（最
新のニュース原稿等）の中の新しい（過去の大量テキス
トデータに含まれていない）単語の出現頻度が高められ
る。しかも、音声認識手段７による認識結果も踏まえ
て、新たな語彙が決定されているので、この統計的言語
モデル生成装置１によって生成された言語モデルを、音
声認識装置（図示せず）が利用することにより、音声認
識時の認識性能が向上する。なお、この実施の形態で
は、生成された言語モデルが音声認識手段７にフィード
バックされ、音声認識の際に再び利用される。That is, in the language model calculating means 11 of the statistical language model generating apparatus 1, the appearance of a new word (not included in the past mass text data) in the latest text data (the latest news manuscript etc.) Frequency is increased. In addition, since a new vocabulary is determined based on the recognition result by the voice recognition means 7, the speech recognition device (not shown) uses the language model generated by the statistical language model generation device 1. Thereby, the recognition performance at the time of speech recognition is improved. In this embodiment, the generated language model is fed back to the speech recognition means 7 and is used again for speech recognition.

【００３９】（統計的言語モデル生成装置：第二の実施
形態）図２に統計的言語モデル生成装置の第二の実施形
態の機能説明図を示す。この統計的言語モデル生成装置
１Ａにおいて、統計的言語モデル１の構成と同じもの
は、同一の符号を付して、その説明は省略する。(Statistical Language Model Generation Apparatus: Second Embodiment) FIG. 2 is a functional explanatory diagram of a second embodiment of the statistical language model generation apparatus. In the statistical language model generation device 1A, the same components as those of the statistical language model 1 are denoted by the same reference numerals, and description thereof is omitted.

【００４０】統計的言語モデル生成装置１Ａの認識結果
修正手段１３は、音声認識手段７の認識結果を修正する
プログラムであって、例えば、音声認識手段７によって
テキストデータを読み上げる際に、テキストデータには
ひらがなで「あめがふる」とあった場合、「雨が降
る」と読み上げたとする。つまり、この場合には“あ”
にアクセントがあることになる。実際には「飴が降る」
であった場合（“め”にアクセントがあることにな
る）、「あめがふる」の前後の文脈から類推して、
認識結果を修正するものである。The recognition result correcting means 13 of the statistical language model generating apparatus 1A is a program for correcting the recognition result of the speech recognition means 7, and for example, when the speech recognition means 7 reads out the text data, If the hiragana reads "Ame wa Furu", it is read as "Raining." In other words, in this case, "a"
Will have an accent. Actually, "candy falls"
If (me is accented with “me”), by analogy with the context before and after “
This is to correct the recognition result.

【００４１】修正認識結果集積手段１５は、図示を省略
した記憶部に記憶（集積）されたデータベースであっ
て、認識結果修正手段１３によって修正された認識結果
を集積（蓄積）するものである。なお、この修正認識結
果集積手段１５には、修正前の音声認識手段７による認
識結果が一時的に蓄積される。The correction recognition result accumulating means 15 is a database stored (accumulated) in a storage unit not shown, and accumulates (accumulates) the recognition results corrected by the recognition result correcting means 13. Note that the correction recognition result accumulating unit 15 temporarily stores the recognition result obtained by the voice recognition unit 7 before the correction.

【００４２】言語モデル計算手段１１Ａは、言語モデル
計算手段１１と同様に、以下に示す順序で、種々の計算
がなされる。なお、この実施の形態では、過去の大量テ
キストデータおよび直近のテキストデータ、ならびに、
これらのテキストデータを音声認識手段７によって認識
後、修正認識結果集積手段１５によって修正された認識
結果（評価データ）に基づいて、下記の計算が言語モデ
ル計算手段１１Ａによってなされる。The language model calculation means 11A, like the language model calculation means 11, performs various calculations in the following order. Note that, in this embodiment, a large amount of past text data and the latest text data, and
After recognizing the text data by the voice recognition means 7, the following calculation is performed by the language model calculation means 11A based on the recognition result (evaluation data) corrected by the correction recognition result accumulation means 15.

【００４３】まず、過去の大量テキストデータと、直近
の記者原稿のテキストデータと、修正された認識結果と
に基づいて、言語モデルのｂｉｇｒａｍＰ₀、Ｐ₁、
Ｐ₂′を線形補間して表す（数式１参照）。すると、重
み付けされた言語モデルの確率重みλが定義される。First, based on the past mass text data, the text data of the most recent reporter's manuscript, and the corrected recognition results, the language models bigramP ₀ , P ₁ ,
P ₂ ′ is represented by linear interpolation (see Equation 1). Then, the probability weight λ of the weighted language model is defined.

【００４４】評価するデータ（過去の大量テキストデー
タと、直近の記者原稿のテキストデータと、修正された
認識結果）のエントロピーが最小になるように確率重み
λの値が求められれば（数式２を参照）よく、この確率
重みλが期待値最大化アルゴリズムを用いることによ
り、繰り返し計算により求められる（数式３参照）。If the value of the probability weight λ is determined so that the entropy of the data to be evaluated (past mass text data, text data of the most recent reporter's manuscript, and the corrected recognition result) is minimized, Equation 2 is obtained. Often, the probability weight λ is obtained by repeated calculation by using an expected value maximizing algorithm (see Equation 3).

【００４５】収束した確率重みλ₀、λ₁、λ₂′（ｎ−
ｇｒａｍの第一の確率重み、第二の確率重み、第三の確
率重み）を用いて、テキスト重みｗ₁、ｗ₂′が計算され
る（数式４参照）。計算されたテキスト重みｗ₁、ｗ₂′
に基づいて、直近のテキストデータがテキスト重みｗ₁
で、また、修正された認識結果がテキスト重みｗ₂′
で、重み付けされ、過去の大量テキストデータに足し合
わされ、新たな語彙が求められる。The converged probability weights λ ₀ , λ ₁ , λ ₂ ′ (n−
The text weights w ₁ and w ₂ ′ are calculated using the first probability weight, the second probability weight, and the third probability weight of the gram (see Equation 4). Calculated text weights w ₁ , w ₂ ′
The most recent text data is based on the text weight w ₁
And the corrected recognition result is the text weight w ₂ ′
, And is added to the past large amount of text data to obtain a new vocabulary.

【００４６】つまり、ある単語の出現頻度ｆは、過去の
大量テキストデータＧ₀での頻度ｆ₀、直近のテキストデ
ータＧ₁での頻度ｆ₁、修正された認識結果Ｇ₂′での頻
度ｆ₂′とされ、テキスト重みｗ₁、ｗ₂′との積によっ
て表される（数式５参照）。そして、頻度ｆの大きい順
に、単語を語彙Ｖに登録する。ただし、語彙の登録数に
は、予め上限（Ｖ_max）が設定されており、この上限を
越えないように登録される。[0046] In other words, the frequency f of a word, the frequency of the past of the frequency f ₀ of a mass text data G _0, frequency f ₁ of the most recent text data G _1, modified recognition result G ₂ 'f ₂ ′, and is represented by the product of the text weights w ₁ and w ₂ ′ (see Equation 5). Then, the words are registered in the vocabulary V in the descending order of the frequency f. However, an upper limit (V _max ) is set in advance for the number of registered vocabularies, and the vocabulary is registered so as not to exceed the upper limit.

【００４７】つまり、統計的言語モデル生成装置１Ａの
言語モデル計算手段１１Ａでは、直近のテキストデータ
（最新のニュース原稿等）の中の新しい（過去の大量テ
キストデータに含まれていない）単語の出現頻度が高め
られる。しかも、音声認識手段７による認識結果を認識
結果修正手段１３によって修正し、その修正した認識結
果も踏まえて、新たな語彙が決定されているので、この
統計的言語モデル生成装置１Ａによって生成された言語
モデルを、音声認識装置（図示せず）が利用することに
より音声認識時の認識性能が向上する。なお、この実施
の形態では、生成された言語モデルが音声認識手段７に
フィードバックされ、音声認識の際に再び利用される。That is, in the language model calculating means 11A of the statistical language model generating apparatus 1A, the appearance of a new word (not included in the past mass text data) in the latest text data (the latest news manuscript etc.) Frequency is increased. Moreover, since the recognition result by the speech recognition means 7 is corrected by the recognition result correcting means 13 and a new vocabulary is determined based on the corrected recognition result, the vocabulary is generated by the statistical language model generating apparatus 1A. The use of the language model by a speech recognition device (not shown) improves the recognition performance at the time of speech recognition. In this embodiment, the generated language model is fed back to the speech recognition means 7 and is used again for speech recognition.

【００４８】統計的言語モデル生成装置１Ａでは、直近
記者原稿集積手段５によって、音声認識する直近のニュ
ース番組等を対象に取得・集積され、音声認識手段７と
認識結果修正手段１３とによって、音声認識の出力が修
正され、認識された音声に対応する正しい文字列が作成
される。このため、この統計的言語モデル生成装置１Ａ
によって生成された言語モデルを利用すれば、時間的に
ごく近い時刻の放送番組（音声認識する対象となる）に
対する正しい文字列の情報を利用することになり、音声
認識性能を向上することができる。In the statistical language model generating apparatus 1A, the latest reporter document accumulating means 5 acquires and accumulates the latest news program or the like for which speech is to be recognized. The output of the recognition is modified to create a correct string corresponding to the recognized speech. Therefore, this statistical language model generation device 1A
If the language model generated by the above is used, the information of a correct character string for a broadcast program (target of voice recognition) at a very close time is used, and the voice recognition performance can be improved. .

【００４９】また、時間的にごく近い時刻の放送番組
（音声認識する対象となる）に対する正しい文字列の情
報を参照して、過去の大量データベースの音声認識出力
に含まれる認識誤りを検出し、認識結果修正手段１３に
よって修正することができる。Further, by referring to the information of the correct character string for the broadcast program (target of voice recognition) at a time very close in time, a recognition error included in the voice recognition output of the past mass database is detected. It can be corrected by the recognition result correcting means 13.

【００５０】（統計的言語モデル生成装置の動作）次
に、図３に示すフローチャートを参照して、統計的言語
モデル生成装置１の動作を説明する。まず、過去ニュー
ス原稿集積手段３によって、過去の大量テキストデータ
が集積され（集積されている）、この過去の大量テキス
トデータに含まれている各単語の出現頻度に応じて初期
の語彙が決定される（Ｓ１）。通常、初期の語彙は、数
十万以上の単語から形成されている。一般に、言語モデ
ルにおける語彙は、記憶部（図示せず）の記憶容量また
は主制御部（図示せず）の処理能力に応じて、予め登録
語数が設定されており、この登録語数に収まるように、
集積或いは学習されるデータ中の単語で出現頻度の高い
単語順に、当該単語が語彙に登録され決定される。(Operation of Statistical Language Model Generating Apparatus) Next, the operation of the statistical language model generating apparatus 1 will be described with reference to the flowchart shown in FIG. First, the past news manuscript accumulation means 3 accumulates (accumulates) a large amount of past text data, and determines an initial vocabulary according to the appearance frequency of each word contained in the past large amount of text data. (S1). Typically, the initial vocabulary is formed from hundreds of thousands of words. Generally, the vocabulary in the language model is set in advance according to the storage capacity of a storage unit (not shown) or the processing capacity of a main control unit (not shown). ,
The words are registered in the vocabulary and determined in the order of words having a high frequency of appearance in the words to be accumulated or learned.

【００５１】一方、直近記者原稿集積手段５によって、
直近の放送番組等に供されるテキストデータ（直近のテ
キストデータ）が集積されており、これらの過去の大量
テキストデータおよび直近のテキストデータが音声認識
手段７によって音声認識される。音声認識された認識結
果が認識結果集積手段９に集積されている。On the other hand, the latest reporter manuscript stacking means 5
Text data (the latest text data) provided for the latest broadcast program or the like is accumulated, and the past large-volume text data and the latest text data are voice-recognized by the voice recognition unit 7. Recognition results obtained by voice recognition are accumulated in the recognition result accumulation means 9.

【００５２】そして、言語モデル計算手段１１によっ
て、まず、各言語モデル（ｂｉｇｒａｍＰ₀、Ｐ₁、
Ｐ₂）が作成され、これらのｂｉｇｒａｍＰ₀、Ｐ₁、Ｐ₂
が線形補間される（数式１参照）（Ｓ２）。これらの言
語モデルの確率重みλ₀、λ₁、λ₂をＥＭアルゴリズム
によって算出（計算）し（数式３参照）（Ｓ３）、これ
らの確率重みλ₀、λ₁、λ₂に基づいてテキスト重み
ｗ₁、ｗ₂が算出（計算）される（数式４参照）（Ｓ
４）。Then, the language model calculating means 11 first sets each language model (bigram P ₀ , P ₁ ,
P ₂ ) are created and these bigrams P ₀ , P ₁ , P ₂
Is linearly interpolated (see Equation 1) (S2). The probability weights λ ₀ , λ ₁ and λ ₂ of these language models are calculated (calculated) by the EM algorithm (see Equation 3) (S3), and the text weights are calculated based on these probability weights λ ₀ , λ ₁ and λ _2. w ₁ and w ₂ are calculated (calculated) (see Equation 4) (S
4).

【００５３】さらに、言語モデル計算手段１１がテキス
ト重みｗ₁、ｗ₂に基づいて、単語の出現頻度ｆを算出
（計算）し（数式５参照）（Ｓ５）、この出現頻度ｆに
基づいて、この出現頻度ｆの大きい単語順に、登録語数
に収まるように新たな語彙が決定される（Ｓ６）。そし
て、新たな語彙に基づいて、言語モデルが生成される
（Ｓ７）。Further, the language model calculating means 11 calculates (calculates) the appearance frequency f of the word based on the text weights w ₁ and w ₂ (see Equation 5) (S5). A new vocabulary is determined in the order of the words having the higher appearance frequency f so as to be included in the number of registered words (S6). Then, a language model is generated based on the new vocabulary (S7).

【００５４】以上、実施形態に基づいて本発明を説明し
たが、本発明はこれに限定されるものではない。Although the present invention has been described based on the embodiments, the present invention is not limited to these embodiments.

【００５５】例えば、統計的言語モデル生成装置１、１
Ａにおいて実現した各構成を、特定の記憶媒体に記憶さ
せたプログラムとして取り扱うことは可能である。さら
に、ｂｉｇｒａｍ以上のｎ−ｇｒａｍ（ｔｒｉｇｒａ
ｍ、４−ｇｒａｍ）については、ｂｉｇｒａｍの場合と
同様に、各確率重みλを計算し、この確率重みλをテキ
スト重みに変換し、新たな語彙Ｖを作成して、この語彙
から統計的言語モデルを生成することは可能である。For example, the statistical language model generators 1, 1
Each configuration realized in A can be handled as a program stored in a specific storage medium. Furthermore, n-gram (trigram or more) larger than bigram
m, 4-gram), as in biggram, each probability weight λ is calculated, the probability weight λ is converted into a text weight, a new vocabulary V is created, and a statistical language is calculated from this vocabulary. It is possible to generate a model.

【００５６】[0056]

【発明の効果】請求項１記載の発明によれば、テキスト
データ取得手段によって、出現頻度の高くなることが予
測される単語を含む直近のテキストデータが取得され、
音声認識手段によって、直近のテキストデータおよびそ
れ以前の大量テキストデータが音声として認識され、蓄
積手段によって、直近のテキストデータおよび過去の大
量テキストデータならびに認識した認識結果が蓄積され
る。そして、確率重み算出手段によって、それぞれのｎ
−ｇｒａｍにおける確率重みが算出され、この算出結果
に基づいて言語モデル生成手段によって言語モデルが生
成されるので、この言語モデルが音声認識装置に利用さ
れれば、音声認識時の認識性能を向上させることができ
る。According to the first aspect of the present invention, the text data obtaining means obtains the most recent text data including a word whose appearance frequency is predicted to be high,
The latest text data and the previous mass text data are recognized as speech by the voice recognition unit, and the latest text data, the past large text data, and the recognized recognition result are stored by the storage unit. Then, each of the n
The probability weight in -gram is calculated, and the language model is generated by the language model generating means based on the calculation result. If this language model is used in the voice recognition device, the recognition performance at the time of voice recognition is improved. be able to.

【００５７】また、直近のテキストデータに一定の確率
重みを付加して、言語モデルを生成する語彙に含めてい
るので、直近の発話内容に含まれやすい単語の出現確率
を上げることができる。Also, since a certain probability weight is added to the most recent text data and included in the vocabulary for generating the language model, the appearance probability of words that are likely to be included in the most recent utterance content can be increased.

【００５８】請求項２記載の発明によれば、認識結果修
正手段によって、テキストデータが音声として認識され
た結果が修正され、この修正された認識結果に基づい
て、確率重み算出手段によって、ｎ−ｇｒａｍの第三の
確率重みが算出されるので、修正された認識結果を踏ま
えて得られる言語モデルが音声認識装置に利用されれ
ば、音声認識時の認識性能をさらに向上させることがで
きる。According to the second aspect of the present invention, the result of the recognition of the text data as speech is corrected by the recognition result correcting means, and based on the corrected recognition result, n- Since the third probability weight of gram is calculated, if a language model obtained based on the corrected recognition result is used in the speech recognition device, the recognition performance at the time of speech recognition can be further improved.

【００５９】請求項３記載の発明によれば、統計的言語
モデル生成プログラムのテキストデータ取得手段によっ
て、出現頻度の高くなることが予測される単語を含む直
近のテキストデータが取得され、音声認識手段によっ
て、直近のテキストデータおよびそれ以前の大量テキス
トデータが音声として認識され、蓄積手段によって、直
近のテキストデータおよび過去の大量テキストデータな
らびに認識した認識結果が蓄積される。そして、確率重
み算出手段によって、それぞれのｎ−ｇｒａｍにおける
確率重みが算出され、この算出結果に基づいて言語モデ
ル生成手段によって言語モデルが生成されるので、この
言語モデルが音声認識装置に利用されれば、音声認識時
の認識性能を向上させることができる。According to the third aspect of the present invention, the text data obtaining means of the statistical language model generation program obtains the most recent text data including the word whose appearance frequency is predicted to be high, and the voice recognition means As a result, the most recent text data and the previous mass text data are recognized as speech, and the accumulation means accumulates the most recent text data, the past mass text data, and the recognized recognition result. Then, the probability weights in the respective n-grams are calculated by the probability weight calculating means, and the language model is generated by the language model generating means based on the calculation results. This language model is used in the speech recognition device. Thus, the recognition performance at the time of voice recognition can be improved.

【００６０】また、この統計的言語モデル生成プログラ
ムを記憶させた記憶媒体として市場で流通させることも
可能である。It is also possible to distribute the statistical language model generation program in the market as a storage medium storing the program.

[Brief description of the drawings]

【図１】本発明による第一の実施形態である統計的言語
モデル生成装置の機能説明図である。FIG. 1 is a functional explanatory diagram of a statistical language model generation device according to a first embodiment of the present invention.

【図２】本発明による第二の実施形態である統計的言語
モデル生成装置の機能説明図である。FIG. 2 is a functional explanatory diagram of a statistical language model generation device according to a second embodiment of the present invention.

【図３】統計的言語モデル生成装置の動作を説明したフ
ローチャートである。FIG. 3 is a flowchart illustrating an operation of the statistical language model generation device.

[Explanation of symbols]

１、１Ａ統計的言語モデル生成装置３過去ニュース原稿集積手段５直近記者原稿集積手段７音声認識手段９認識結果集積手段１１、１１Ａ言語モデル計算手段１３認識結果修正手段１５修正認識結果集積手段 DESCRIPTION OF SYMBOLS 1, 1A Statistical language model generation device 3 Past news manuscript accumulation means 5 Latest report manuscript accumulation means 7 Speech recognition means 9 Recognition result accumulation means 11, 11A Language model calculation means 13 Recognition result correction means 15 Correction recognition result accumulation means

───────────────────────────────────────────────────── フロントページの続き (72)発明者安藤彰男東京都世田谷区砧一丁目10番11号日本放送協会放送技術研究所内Ｆターム(参考） 5D015 HH00 ──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Akio Ando 1-10-11 Kinuta, Setagaya-ku, Tokyo Japan Broadcasting Research Institute F-Term (Reference) 5D015 HH00

Claims

[Claims]

1. A statistical language model generating apparatus for generating a statistical language model, comprising: text data obtaining means for obtaining latest text data including a word whose appearance frequency is predicted to be high; Voice recognition means for recognizing, as speech, past large-volume text data having a larger data amount than that of the latest text data and the latest text data; and the latest text data, the past large-volume text data, and the voice recognition means. A storage unit for storing a recognition result obtained by the above-described method, and n-gr based on the past large amount of text data.
am, a second probability weight of n-gram is calculated based on the latest text data, and a third probability weight of n-gram is calculated based on the recognition result. Probability weight calculating means for calculating, and a language model generating means for generating a statistical language model based on the first probability weight, the second probability weight, and the third probability weight. A statistical language model generation device characterized by the following.

2. The apparatus according to claim 1, further comprising a recognition result correcting unit for correcting a recognition result recognized by the voice recognition unit, wherein the probability weight calculating unit calculates a third probability weight of n-gram based on the corrected recognition result. The statistical language model generator according to claim 1, wherein the statistical language model is calculated.

3. A computer, comprising: text data obtaining means for obtaining latest text data including a word whose appearance frequency is predicted to be high; a data amount larger than the latest text data and the latest text data; Voice recognition means for recognizing a large amount of past text data as voice, a storage means for storing the latest text data and the large volume of text data in the past, and a recognition result recognized by the voice recognition means; N-gr based on a large amount of text data
am, a second probability weight of n-gram is calculated based on the latest text data, and a third probability weight of n-gram is calculated based on the recognition result. Calculating a probability weight calculating means, based on the first probability weight, the second probability weight, and the third probability weight, a language model generating means for generating a statistical language model, A statistical language model generation program featuring the feature.