[go: up one dir, main page]

CN100568222C - Divergence elimination language model - Google Patents

Divergence elimination language model Download PDF

Info

Publication number
CN100568222C
CN100568222C CNB021065306A CN02106530A CN100568222C CN 100568222 C CN100568222 C CN 100568222C CN B021065306 A CNB021065306 A CN B021065306A CN 02106530 A CN02106530 A CN 02106530A CN 100568222 C CN100568222 C CN 100568222C
Authority
CN
China
Prior art keywords
character
phrase
language model
word
word phrase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB021065306A
Other languages
Chinese (zh)
Other versions
CN1369830A (en
Inventor
朱云正
F·A·阿列瓦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Corp
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/773,342 external-priority patent/US6507453B2/en
Application filed by Microsoft Corp filed Critical Microsoft Corp
Publication of CN1369830A publication Critical patent/CN1369830A/en
Application granted granted Critical
Publication of CN100568222C publication Critical patent/CN100568222C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

一种用于语言处理系统如语音识别系统的语言模型,以相关字符、单词词组和上下文标记的功能来构成。一种用于生成训练语料库的方法和装置,该训练语料库用于训练该语言模型,以及一种使用这种公开的语言模型的系统或模块。

Figure 02106530

A language model for a language processing system such as a speech recognition system, structured as a function of associated characters, word phrases, and contextual tokens. A method and apparatus for generating a training corpus for training the language model, and a system or module for using the disclosed language model.

Figure 02106530

Description

歧义消除语言模型 disambiguation language model

发明背景Background of the invention

本发明涉及语言建模。更特别地,本发明涉及创建及使用一种用于使诸如输入语音的字符识别期间的歧义最小化的语言模型。The present invention relates to language modeling. More particularly, the invention relates to creating and using a language model for minimizing ambiguity during character recognition, such as input speech.

准确的语音识别不只需要一种声学模型来选择用户所说的正确的单词。换句话说,如果一个语音识别器必须选择或确定所发音的是哪一个单词,如果所有的单词都具有相同的发音,则该语音识别器将显然不能满意地执行。一种语言模型提供了一种指定词汇表中哪一个单词序列是可能的的方法或装置,或者通常,提供了有关各种单词序列相似性的信息。Accurate speech recognition requires more than an acoustic model to select the correct word spoken by the user. In other words, if a speech recognizer had to select or determine which word was pronounced, the speech recognizer would obviously not perform satisfactorily if all words had the same pronunciation. A language model provides a method or means of specifying which sequences of words in a vocabulary are possible, or generally, information about the similarity of various sequences of words.

语音识别经常被看作是一种自上至下的语言处理形式。语言处理的两种一般形式包括“自上至下”和“自下至上”。自上至下语言处理是以语言的最大单元开始来识别,例如一个句子,通过将其分类为比较小的单元来处理,例如词组,依次再分为更小的单元,例如单词。相反,自下至上语言处理是以单词开始,并从那里开始构造较大的词组和/或句子。语言处理的这两种形式都可以从语言模型中得到帮助。Speech recognition is often viewed as a top-down form of language processing. Two general forms of language processing include "top-down" and "bottom-up". Top-down language processing starts by identifying the largest unit of language, such as a sentence, and processes it by classifying it into smaller units, such as phrases, which in turn are subdivided into smaller units, such as words. In contrast, bottom-up language processing starts with words and builds larger phrases and/or sentences from there. Both forms of language processing can benefit from language models.

一种公知的分类技术是使用一种N个字符列语言模型。因为N个字符列可以结交大量的数据,N个单词的相关性通常提供句法和语义的压制的浅部结构。尽管N个字符列语言模型对于一般的口授可以执行的很好,但是同音异义字会产生很大的错误。一个同音异义字是诸如字符或音节这样的语言代码的一个元素,也就是发音类似但具有不同拼写的两个或多个元素之一。例如,当一个用户正拼写字符时,由于一些字符发音相同语音识别模块会输出错误的字符。同样的,对于当发音的时候听起来互相类似的不同字符语音识别模块也会输出错误的字符(例如“m”和“n”)。A well-known classification technique uses an N-gram language model. Because N-word columns can associate large amounts of data, N-word correlations usually provide shallow structure that suppresses both syntax and semantics. Although N-gram language models can perform well for general dictation, homophones can make large errors. A homonym is an element of a language code, such as a character or syllable, that is, one of two or more elements that sound similar but have different spellings. For example, when a user is spelling characters, the speech recognition module may output wrong characters because some characters sound the same. Likewise, the speech recognition module may output erroneous characters for different characters that sound similar to each other when pronounced (such as "m" and "n").

歧义问题在如日语或汉语等语言中尤其普遍,其主要是以汉字写入系统来书写。这些语言的字符是很多复杂的表示声音和意思的象形文字。这些字符形成了有限的音节,依次产生大量同音异义字,大大增加了通过口授生成文件所需的时间。特别是,在文件中必须识别错误的同音异义字字符并插入正确的同音异义字字符。The problem of ambiguity is especially prevalent in languages such as Japanese or Chinese, which are primarily written in the Kanji writing system. The characters of these languages are many complex pictographs representing sounds and meanings. These characters formed a limited number of syllables, which in turn produced a large number of homonyms, greatly increasing the time required to generate documents through dictation. In particular, incorrect homophone characters must be identified and correct homonym characters inserted in the file.

因此有一种持续的需求去开发新的方法,用于使在发同音异义字和具有不同意思的相似发音的语音时的歧义最小化。随着技术的发展,在更多的应用中都提供有语音识别,这就必须要得到一种更准确的语言模型。There is thus a continuing need to develop new methods for minimizing ambiguity in pronouncing homonyms and similar-sounding sounds with different meanings. With the development of technology, speech recognition is provided in more applications, which requires a more accurate language model.

发明概述Summary of the invention

语音识别器通常使用一种如N个字符列语言模型的语言模型来提高准确性。本发明的第一个方面包括生成一种语言模型,其在一个讲话者正识别一个字符或多个字符(例如一个音节)例如当拼写一个单词时特别有用。该语言模型有助于同音异义字和听起来互相类似的不同字符的歧义消除。该语言模型由包含一个字符串(可以是单个字符)的相关元素、一个具有字符串的单词词组(可以是单个单词)和一个上下文标记的训练语料库构造。使用一个单词表或字典,通过为每一个包含单词词组、上下文标记和单词词组的一个字符串的单词词组形成一个局部的句子或词组可以自动生成训练语料库。在另一个实施例中,为单词词组的每一个单词符生成一个词组。Speech recognizers typically use a language model such as an N-gram language model to improve accuracy. A first aspect of the invention involves generating a language model that is particularly useful when a speaker is recognizing a character or characters (eg a syllable) such as when spelling a word. The language model helps disambiguate homophones and different characters that sound similar to each other. The language model is constructed from a training corpus containing a string (which can be a single character) of associated elements, a word phrase with strings (which can be a single word), and a contextual token. Using a wordlist or dictionary, the training corpus can be automatically generated by forming a partial sentence or phrase for each wordphrase containing the wordphrase, context tokens, and a string of wordphrases. In another embodiment, a phrase is generated for each token of the word phrase.

本发明的另一个方面是一种使用上述用于识别所说的字符的语言模型的系统或模块。当说一个字符串时结合相关的单词词组中的上下文标记,语音识别模块确定用户正在拼写或识别字符的方式。该语音识别模块将只输出被识别的字符,而不输出上下文标记或相关的单词词组。在又一个实施例中,语音识别模块比较被识别的字符和一个被识别的单词词组以验证已被识别的正确的字符。如果被识别的字符不在被识别的单词词组中,则输出的字符是被识别单词词组的一个字符。Another aspect of the present invention is a system or module using the above-described language model for recognizing spoken characters. Combined with contextual markers in related word phrases when speaking a string, the speech recognition module determines how the user is spelling or recognizing characters. The speech recognition module will only output the recognized characters, not contextual tokens or associated word phrases. In yet another embodiment, the speech recognition module compares the recognized character to a recognized word phrase to verify that the correct character has been recognized. If the recognized character is not in the recognized word phrase, the output character is a character of the recognized word phrase.

附图的简要说明Brief description of the drawings

附图1是一个语言处理系统的方框图。Accompanying drawing 1 is a block diagram of a language processing system.

附图2是一个典型的计算环境的方框图。Figure 2 is a block diagram of a typical computing environment.

附图3是一个典型的语音识别系统的方框图。Accompanying drawing 3 is a block diagram of a typical speech recognition system.

附图4是本发明的一种方法的流程图。Accompanying drawing 4 is the flowchart of a kind of method of the present invention.

附图5是用于实现附图4的方法的模块框图。Accompanying drawing 5 is a block diagram for realizing the method of accompanying drawing 4.

附图6是一种语音识别模块和一种可选的字符验证模块的方框图。Figure 6 is a block diagram of a voice recognition module and an optional character verification module.

说明性实施例的详细描述Detailed Description of Illustrative Embodiments

附图1示出了一种语言处理系统10,其接收一个语言输入12,并处理该语言输入12以提供一个语言输出14。例如,该语言处理系统10可以被具体化为一种接收由用户所说或所记录的语言的语言输入12的语音识别系统或模块。语言处理系统10处理所说的语言并提供以文字输出形式的识别单词和/或字符作为一个输出。FIG. 1 shows a language processing system 10 that receives a language input 12 and processes the language input 12 to provide a language output 14 . For example, the language processing system 10 may be embodied as a speech recognition system or module that receives linguistic input 12 in a language spoken or recorded by a user. The language processing system 10 processes spoken language and provides as an output recognized words and/or characters in the form of textual output.

在处理期间,语音识别系统或模块10可以访问一个语言模型16以确定是哪个单词,特别是,是所说语言中的哪一个同音异义字或其他相似发音的元素。语言模型16对一种特定的语言编码,如英语、汉语、日语等等。在所释的实施例中,语言模型16可以作为一种统计语言模型,如一种N个字符列语言模型,上下文无关文法,或上述的混合,所有这些都是本领域公知的。本发明的一个主要方面是一种创建和构造语言模型16的方法。另一个主要方面是在语音识别中使用该方法。During processing, speech recognition system or module 10 may access a language model 16 to determine which word, in particular, which homonym or other similar-sounding element of the spoken language. The language model 16 encodes a specific language, such as English, Chinese, Japanese, and so on. In the illustrated embodiment, the language model 16 may be a statistical language model, such as an N-gram language model, a context-free grammar, or a mixture of the above, all of which are known in the art. A main aspect of the invention is a method of creating and constructing a language model 16 . Another major aspect is the use of the method in speech recognition.

在详细讨论本发明以前,先概述一下操作环境是很有用的。附图2和相关的论述对实现本发明所在的一个适当的计算环境20提供了一个简要概括的说明。该计算环境20只是一个合适的计算环境的一个例子,并不会对本发明的使用范围或功能有任何限制。该计算环境20也不应解释为与图示典型操作环境20中的各部分之一或合成有任何依赖或需求。Before discussing the invention in detail, it is useful to provide an overview of the operating environment. Figure 2 and the associated discussion provide a brief overview of one suitable computing environment 20 in which the present invention may be practiced. The computing environment 20 is only one example of a suitable computing environment and is not intended to limit the scope of use or functionality of the invention. Neither should the computing environment 20 be interpreted as having any dependency or requirement relating to any one or combination of components within the illustrated exemplary operating environment 20 .

本发明以许多其他通用或专用计算系统环境或配置来操作。适合本发明使用的公知计算系统、环境、和/或配置的例子包括,但不局限于,个人计算机,服务器计算机,手持或膝上设备,多处理器系统,基于微处理器的系统,置顶盒,可编程消费电子设备,网络PC,小型机,大型机,包括任何上述系统或设备的分布式计算环境及类似设备。此外,本发明可以用在电话系统中。The invention operates with numerous other general purpose or special purpose computing system environments or configurations. Examples of known computing systems, environments, and/or configurations suitable for use with the present invention include, but are not limited to, personal computers, server computers, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set-top boxes , programmable consumer electronic devices, network PCs, minicomputers, mainframes, distributed computing environments including any of the above systems or devices, and similar devices. Furthermore, the present invention can be used in telephone systems.

本发明可以以通常的计算机可执行指令的上下文来说明,如程序模块,由计算机执行。通常,程序模块包括例程、程序、对象、组件、数据结构等等执行特殊任务或实现特殊的抽象数据类型。本发明还应用在分布式计算环境中,其中由通过通讯网链接的远程处理设备执行任务。在分布式计算环境中,程序模块可以位于本地或远程的包含存储器存储设备的计算机存储介质中。通过程序和模块所执行的任务在下面并借助于附图加以说明。The invention may be described generally in the context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention also has application in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices. The tasks performed by the programs and modules are explained below with the aid of the figures.

参考附图2,用于实现本发明的典型系统包括一个以计算机30为形式的通用计算设备。计算机30的构成包括,但不局限于,一个处理部件40,一个系统存储器50,和一条耦合包含系统存储器的各系统部件到处理部件40的系统总线41。该系统总线41可以是包括一个存储器总线或存储控制器、一条外部总线,和一条使用任意一种总线结构的局部总线的任意几种类型的总线结构。为了示范,但并不局限于此,这种结构包括工业标准结构(ISA)总线,微通道结构(MCA)总线,增强的ISA(EISA)总线,视频电子标准协会(VESA)局部总线,以及被称为附加板总线的外设部件互连(PCI)总线。Referring to FIG. 2 , an exemplary system for implementing the present invention includes a general purpose computing device in the form of computer 30 . The computer 30 is comprised of, but is not limited to, a processing unit 40 , a system memory 50 , and a system bus 41 coupling the system components including the system memory to the processing unit 40 . The system bus 41 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of the bus structures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and the A Peripheral Component Interconnect (PCI) bus called an add-in board bus.

计算机30典型地包括一种多种计算机可读介质。计算机可读介质可以是任何通过计算机30可以被访问的可用的介质,包括易失性的和非易失性的介质,可移动的和非可移动的介质。为了示例,并不局限于此,计算机可读介质可以包含计算机存储介质和通讯介质。计算机存储介质包括在用于存储诸如计算机可读指令、数据结构、程序模块或其他数据的任何方法和技术中所实现的易失性的和非易失性的,可移动的和非可移动的介质。计算机存储介质包括,但不局限于,RAM、ROM、EEPROM、闪速存储器或其他存储技术、CD-ROM、数字化通用光盘(DVD)或其他光盘存储器、盒式磁带、磁带、磁盘存储器或其他磁盘存储设备,或任何其他可以用于存储所需信息并可由计算机20访问的介质。通讯介质典型地包含计算机可读指令,数据结构,程序模块或其他在已调制数据信号中的数据如载波或其他传输结构,并包括任何信息发送介质。术语“已调制数据信号”是指一种具有一个或多个字符组或以这种方式所改变的来编码信号中的信息的信号。作为示例,但不局限于此,通讯介质包括有线介质如有线网或直接有线连接、和无线介质如声音、FR、红外线和其他无线介质。上述任意组合都可以包含在计算机可读介质的范围内。Computer 30 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 30 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data medium. Computer storage media including, but not limited to, RAM, ROM, EEPROM, flash memory or other storage technology, CD-ROM, Digital Versatile Disc (DVD) or other optical disk storage, cassette tapes, magnetic tape, magnetic disk storage or other disk storage device, or any other medium that can be used to store the desired information and that can be accessed by computer 20 . Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport structure and includes any information delivery media. The term "modulated data signal" means a signal that has one or more bursts or altered in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as audio, FR, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.

系统存储器50包括以易失性和/或非易失性存储器如只读存储器(ROM)51和随机存储器(RAM)52为形式的计算机存储介质。一个基本输入/输出系统53(BIOS)典型地被存储在ROM51中,它包含基本例程,帮助转换计算机30内各部件间的信息,如启动期间。RAM52典型地包含可即时被访问和/或由处理部件40操作的数据和/或程序模块。作为示例,但不局限于此,附图2示出了操作系统54,应用程序55,其他程序模块56和程序数据57。System memory 50 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 51 and random access memory (RAM) 52 . A Basic Input/Output System 53 (BIOS) is typically stored in ROM 51 and contains basic routines that help transfer information between components within computer 30, such as during start-up. RAM 52 typically contains data and/or program modules that may be immediately accessed and/or operated on by processing element 40 . By way of example, and not limitation, FIG. 2 shows an operating system 54 , application programs 55 , other program modules 56 and program data 57 .

计算机30还包括其他可移动/非可移动、易失性/非易失性的计算机存储介质。仅仅作为示例,附图2示出了一种从非可移动、非易失性磁介质中读取或向其写入的硬盘驱动器61,一个从一个可移动、非易失性磁盘72读取或向其写入的磁盘驱动器71,和一个从一个可移动、非易失性光盘76如CD ROM或其他光学介质中读取或向其写入的光盘驱动器75。其他可以用于典型操作环境中的可移动/非可移动,易失性/非易失性计算机存储介质包括,但不局限于,盒式磁带、闪存卡、数字化通用光盘,数字录像带、固态RAM、固态ROM和类似部件。硬盘驱动器61典型地通过一个如接口60的非可移动存储器接口与系统总线41相连,磁盘驱动器71和光盘驱动器75典型地通过一个如接口70的可移动存储器接口与系统总线41相连。The computer 30 also includes other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 2 shows a hard drive 61 reading from or writing to non-removable, non-volatile magnetic media, and a hard drive 61 reading from a removable, non-volatile magnetic disk 72. A disk drive 71 that reads from or writes to, and an optical disk drive 75 that reads from or writes to a removable, non-volatile optical disk 76 such as a CD ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in typical operating environments include, but are not limited to, cassette tapes, flash memory cards, digital versatile discs, digital video tapes, solid-state RAM , solid state ROM and similar components. Hard disk drive 61 is typically connected to system bus 41 through a non-removable memory interface such as interface 60 , and magnetic disk drive 71 and optical disk drive 75 are typically connected to system bus 41 through a removable memory interface such as interface 70 .

上述讨论并示于图2中的驱动器和他们相关的计算机存储介质提供了计算机可读指令、数据结构、程序模块和其他用于计算机30的数据的存储器。在图2中,例如,硬盘驱动器61图示存储有操作系统64,应用程序65,其他程序模块66和程序数据67。注意这些部分既可以相同也可以不同于操作系统54,应用程序55,其他程序模块56和程序数据57。操作系统64,应用程序65,其他程序模块66和程序数据67在此给出不同的标记来说明,至少,他们是不同的版本。The drives and their associated computer storage media, discussed above and shown in FIG. 2 , provide storage of computer readable instructions, data structures, program modules and other data for the computer 30 . In FIG. 2 , for example, a hard drive 61 is illustrated storing an operating system 64 , application programs 65 , other program modules 66 and program data 67 . Note that these parts can be the same as or different from operating system 54, application programs 55, other program modules 56 and program data 57. Operating system 64, application programs 65, other program modules 66 and program data 67 are given different labels to illustrate, at least, they are different versions.

一个用户通过如键盘82这样的输入设备,麦克风83,和一个诸如鼠标、跟踪球或触摸垫这样的定点设备81,可以输入命令和信息到计算机30中。其他输入设备A user may enter commands and information into the computer 30 through input devices such as a keyboard 82, a microphone 83, and a pointing device 81 such as a mouse, trackball or touch pad. other input devices

(未示出)可以包括一个操纵杆、游戏垫、卫星反射器、扫描仪或类似设备。这些和其他输入设备通常通过一个与系统总线耦合的用户输入接口80与处理部件40相连,但是也可以通过其他接口和总线结构连接,如并行端口,游戏端口或通用串行总线(USB)。一个显示器84或其他类型的显示设备也通过一个接口如视频接口85与系统总线41相连。除了显示器,计算机还可以包括其他外围输出设备如扬声器87和打印机86,外围输出设备通过一个输出外设接口88连接。(not shown) may include a joystick, game pad, satellite reflector, scanner or similar device. These and other input devices are typically connected to processing unit 40 through a user input interface 80 coupled to the system bus, but may also be connected through other interfaces and bus structures, such as parallel ports, game ports or universal serial bus (USB). A display 84 or other type of display device is also connected to system bus 41 via an interface such as video interface 85 . In addition to the display, the computer can also include other peripheral output devices such as speakers 87 and a printer 86 , and the peripheral output devices are connected through an output peripheral interface 88 .

计算机30使用本地连接到一个或多个远程计算机如远程计算机94而操作于网络环境中。远程计算机94可以是一台个人计算机,一个手持设备,一台服务器,一个路由器,一个网络PC机,一个统计设备或其他公共网络节点,并典型地包括许多或所有上述与计算机30相关的部件。图2中描述的逻辑连接包括一个局域网(LAN)91和一个广域网(WAN)93,还可以包括其他网络。这种联网环境在办公、企业范围计算机网、企业内部互联网和因特网中很普遍。Computer 30 operates in a network environment using a local connection to one or more remote computers, such as remote computer 94 . Remote computer 94 may be a personal computer, a handheld device, a server, a router, a network PC, a statistical device or other public network node, and typically includes many or all of the components described above in connection with computer 30 . The logical connections depicted in Figure 2 include a local area network (LAN) 91 and a wide area network (WAN) 93, and may include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.

当用于LAN联网环境中时,计算机30通过一个网络接口或适配器90连接到LAN91。当用于WAN联网环境中时,计算机30典型地包括一个调制解调器92或其他用于在WAN93上建立通讯的装置,如因特网。调制解调器92,可以是内置的或外置的,通过用户输入接口80或其他合适的结构与系统总线41相连。在一个网络环境中,描述的与计算机30或其中的某些部分相关的程序模块可以存储在远程存储器存储设备中。作为示例,但不局限于此,附图2示出了驻留在远程计算机94中的远程应用程序95。可以理解到所示的网络连接是典型的,可以使用其他建立计算机间通讯链路的装置。When used in a LAN networking environment, computer 30 is connected to LAN 91 through a network interface or adapter 90 . When used in a WAN networking environment, computer 30 typically includes a modem 92 or other means for establishing communications over WAN 93, such as the Internet. Modem 92, which may be internal or external, is connected to system bus 41 through user input interface 80 or other suitable structure. In a network environment, program modules described relative to the computer 30, or portions thereof, can be stored in the remote memory storage device. By way of example, and not limitation, FIG. 2 shows a remote application 95 residing on a remote computer 94 . It will be appreciated that the network connections shown are typical and other means of establishing a communications link between the computers may be used.

一种语音识别系统100的一个典型实施例如图3所示。该语音识别系统100包括麦克风83,一个模数(A/D)转换器104,一个训练模块105,特征提取模块106,一个词典存储模块110、一个沿着senone树的声音模型112,一个树状搜索引擎114,语言模型16和一个通用语言模型111。应当注意整个系统100,或语音识别系统100的一部分,可以在图2所示环境中实现。例如,麦克风83通过一个合适的接口,并通过A/D转换器104可以作为计算机30的一个输入设备。训练模块105和特征提取模块106既可以是计算机30中的硬件模块,也可以是存储在图2公开的任意信息存储设备中的软件模块,并通过处理部件40或其他合适的处理器访问。此外,词典存储模块110,声音模型112和语言模型16和111最好也存储在如2所示的任意存储设备中。而且,树状搜索引擎114在处理部件40中实现(可以包括一个或多个处理器)或通过由计算机30使用的一个专用的语音识别处理器来执行。A typical implementation of a speech recognition system 100 is shown in FIG. 3 . The speech recognition system 100 includes a microphone 83, an analog-to-digital (A/D) converter 104, a training module 105, a feature extraction module 106, a dictionary storage module 110, an acoustic model 112 along a senone tree, a tree search engine 114, language model 16 and a general language model 111. It should be noted that the entire system 100, or a portion of the speech recognition system 100, can be implemented in the environment shown in FIG. 2 . For example, microphone 83 can be used as an input device to computer 30 through an appropriate interface and through A/D converter 104 . The training module 105 and the feature extraction module 106 can be hardware modules in the computer 30, or software modules stored in any information storage device disclosed in FIG. 2, and accessed by the processing unit 40 or other suitable processors. In addition, the dictionary storage module 110, the acoustic model 112 and the language models 16 and 111 are also preferably stored in any storage device as shown in 2. Furthermore, the tree search engine 114 is implemented in the processing component 40 (which may include one or more processors) or executed by a dedicated speech recognition processor used by the computer 30 .

在图示的实施例中,在语音识别期间,以音频声音信号的形式由用户提供语音作为系统100的输入给麦克风83。麦克风83转换音频语音信号为一个模拟电子信号,并提供给A/D转换器104。A/D转换器104转换该模拟语音信号为一串数字信号,并提供给特征提取模块106。在一个实施例中,特征提取模块106是一种传统的对数字信号执行光谱分析并为每一个频谱的频段计算量值的阵列处理器。在一个图示的实施例中,该信号以大约16kHz的采样频率通过A/D转换器104提供给特征提取模块106。In the illustrated embodiment, speech is provided by the user as an input to the system 100 to the microphone 83 in the form of an audio sound signal during speech recognition. The microphone 83 converts the audio voice signal into an analog electronic signal, and supplies it to the A/D converter 104 . The A/D converter 104 converts the analog voice signal into a series of digital signals and provides them to the feature extraction module 106 . In one embodiment, the feature extraction module 106 is a conventional array processor that performs spectral analysis on the digital signal and calculates a magnitude for each frequency bin of the spectrum. In one illustrated embodiment, the signal is provided to feature extraction module 106 via A/D converter 104 at a sampling frequency of approximately 16 kHz.

特征提取模块106将从A/D转换器104接收的数字信号分成包含多个数字样本的帧。每一个帧的持续时间大约是10毫秒。然后这些帧由特征提取模块106编码成一个反映多个频段频谱特征的特征向量。在离散和半连续隐含式马尔科夫模型的情况下,特征提取模块106还使用向量量化技术和从训练数据中获得的编码簿编码该特征向量为一个或多个代码字。因此,特征提取模块106提供其输出用于每一个所说的话的特征向量(或代码字)。特征提取模块106以大约每10毫秒一个特征向量或(代码字)的频率提供特征向量(或代码字)。Feature extraction module 106 divides the digital signal received from A/D converter 104 into frames containing a plurality of digital samples. The duration of each frame is about 10 milliseconds. These frames are then encoded by the feature extraction module 106 into a feature vector reflecting spectral features of multiple frequency bands. In the case of discrete and semi-continuous hidden Markov models, the feature extraction module 106 also encodes the feature vector into one or more codewords using vector quantization techniques and a codebook obtained from the training data. Accordingly, the feature extraction module 106 provides its output as a feature vector (or codeword) for each utterance. The feature extraction module 106 provides feature vectors (or codewords) at a frequency of approximately one feature vector or (codeword) every 10 milliseconds.

然后利用正被分析的特定帧的特征向量(或代码字)依照隐含式马尔科夫模型计算输出概率的分布。这些概率分布后面会用于执行一个维特比解码过程或类似类型的处理技术中。The distribution of output probabilities is then calculated according to a Hidden Markov Model using the feature vectors (or codewords) of the particular frame being analyzed. These probability distributions are then used to perform a Viterbi decoding process or similar type of processing technique.

在从特征提取模块106接收到代码字之后,树状搜索引擎114访问存储在声学模型112中的信息。该模型112存储有声学模型,如隐含式马尔科夫模型,其表示由语音识别系统100检测的语音部件。在一个实施例中,声学模型112包括一个与隐含式马尔科夫模型中的每一个马尔科夫状态相关的senone树。隐含式马尔科夫模型在一个图示实施例中表示音素。基于声学模型112中的senone,树状搜索引擎114确定由从特征提取模块106接收的特征向量(或代码单词)表示的最相似的音素,及从系统用户接收的说话方式的表示。After receiving the codeword from the feature extraction module 106 , the tree search engine 114 accesses the information stored in the acoustic model 112 . The model 112 stores an acoustic model, such as a Hidden Markov Model, which represents the speech components detected by the speech recognition system 100 . In one embodiment, the acoustic model 112 includes a senone tree associated with each Markov state in the hidden Markov model. Hidden Markov models represent phonemes in one illustrated embodiment. Based on the senones in the acoustic model 112, the tree search engine 114 determines the most similar phonemes represented by feature vectors (or code words) received from the feature extraction module 106, and representations of speaking patterns received from system users.

树状搜索引擎114还访问存储在模块110中的词典。由基于其声学模型112的访问的树状搜索引擎114接收的信息被用于搜索词典存储模块110以确定一个最可能表示从特征提取模块106接收的代码字或特征向量的单词。同时,搜索引擎114访问语言模型16和111。在一个实施例中,语言模型16是用于识别由输入语音表示的最相似字符或多个字符的一个单词的N个字符列,其包含一个字符(多个字符),上下文标记和一个单词词组以识别字符。例如,输入语音可以是“N as inNancy”,其中“N”(也可以是小写)是所需的字符,“as in”是上下文标记,“Nancy”是一个与字符“N”相关的单词词组以阐明或识别所需的字符。至于词组“N as in Nancy”,语音识别系统100的输出可能只有字符“N”。换句话说语音识别系统100在分析完关于词组“N as in Nancy”的输入语音数据之后,确定用户已经选择拼写字符。因此,上下文标记和相关的单词词组从输出的文本中被忽略了。搜索引擎114在必要时可以删除上下文际记和相关的单词词组。Tree search engine 114 also accesses a dictionary stored in module 110 . The information received by the tree search engine 114 based on its access to the acoustic model 112 is used to search the dictionary storage module 110 to determine a word that most likely represents the codeword or feature vector received from the feature extraction module 106 . At the same time, the search engine 114 accesses the language models 16 and 111 . In one embodiment, the language model 16 is a sequence of N characters for identifying the most similar character or characters represented by input speech to a word, comprising a character(s), contextual markers, and a word phrase to recognize characters. For example, the input speech could be "N as inNancy", where "N" (which can also be lowercase) is the desired character, "as in" is the context marker, and "Nancy" is a word phrase related to the character "N" to clarify or identify the desired character. As for the phrase "N as in Nancy", the output of the speech recognition system 100 may only be the character "N". In other words, the speech recognition system 100 determines that the user has selected spelling characters after analyzing the input speech data about the phrase "N as in Nancy". Therefore, context tokens and related word phrases are ignored from the output text. The search engine 114 can remove contextual notations and related word phrases as necessary.

应当注意在该实施例中,语言模型111为一个用于识别由一般口授的输入语音表示的最相似的单词的一个单词N个字符列。例如,当语音识别系统100具体表现为一个口授系统时,语言模型111提供用于一般口授的最相似单词的指示;然而,当用户使用具有上下文标记的词组时,来自语言模型16的输出会比用于相同词组的语言模型111具有较高值。语言模型16的较高值被用作用户正以上下文标记和单词词组进行识别的系统100中的一个指示。因此,对于一个具有上下文标记的输入词组,搜索引擎114或其他语音识别系统100的处理部件会忽略上下文标记和单词词组,而仅输出所需的字符。以下将继续讨论语言模型16的使用。It should be noted that in this embodiment, the language model 111 is a word-N-word sequence for identifying the most similar word represented by a generally dictated input speech. For example, when speech recognition system 100 is embodied as a dictation system, language model 111 provides an indication of the most similar words for general dictation; The language model 111 for the same phrase has a higher value. A higher value for the language model 16 is used as an indication in the system 100 that the user is identifying with contextual tags and word phrases. Thus, for an input phrase with contextual markers, the search engine 114 or other processing components of the speech recognition system 100 ignore the contextual markers and word phrases and output only the required characters. The discussion of the use of language models 16 continues below.

尽管在此描述的语音识别系统100使用HMM模型和senone树,应当理解这只是一个说明性的实施例。本领域普通技术人员会意识到,语音识别系统100可以有许多形式,其所需的就是使用语言模型16的特征并提供用户所说的文本作为输出。Although the speech recognition system 100 described herein uses HMM models and senone trees, it should be understood that this is only an illustrative embodiment. Those of ordinary skill in the art will appreciate that the speech recognition system 100 can take many forms, all that is required is to use the features of the language model 16 and provide as output the text spoken by the user.

众所周知,一种统计的N个字符列语言模型为一个单词产生一种概率估算,假定直到那单词的单词序列。(即假定单词的历史记录H)。N个字符列语言模型只考虑历史记录H中对下一单词的概率有影响的前(n-1)个单词。例如,一个双阵列(或2个阵列)语言模型考虑对下一个单词有影响的前一个单词。因此,在一个N个字符阵列语言模型中,一个单词出现的概率表示如下:It is well known that a statistical N-gram language model produces a probability estimate for a word, given the sequence of words up to that word. (i.e. assume the word's history H). The N character string language model only considers the first (n-1) words in the history H that have an impact on the probability of the next word. For example, a double-array (or 2-array) language model considers the previous word as having an influence on the next word. Therefore, in an N character array language model, the probability of a word appearing is expressed as follows:

P(w/H)=P(w/w1,w2,…w(n-1))    (1)P(w/H)=P(w/w1,w2,…w(n-1)) (1)

其中w是一个重要的词;where w is an important word;

w1是位于单词w前面位于n-1位置的单词;w1 is the word at position n-1 in front of word w;

w2是位于单词w前面位于n-2位置的单词;且w2 is the word at position n-2 preceding word w; and

w(n-1)是序列中单词w前面的第一个单词。w(n-1) is the first word preceding word w in the sequence.

同时,一个单词序列的概率基于每一个给定其历史记录的单词的概率的增加来确定。因此,一个单词序列(w1…wm)的概率表示如下:At the same time, the probability of a sequence of words is determined based on the increasing probability of each word given its history. Therefore, the probability of a word sequence (w1...wm) is expressed as follows:

PP 11 (( ww 11 ΛwmΛwm )) == ΠΠ ii == 11 mm (( PP (( ww ii 2020 // Hh ii )) )) -- -- -- (( 22 ))

N个字符列模型通过应用一个N字符列算法到一个文本训练数据语料库中(词组、句子、语句段、段落等等的一个集合)来获得。一个N字符列算法可以使用,例如,熟知的统计技术例如Katz技术,或二项式后端分布补偿技术。在使用这些技术中,该算法估算单词w(n)将跟随一个单词序列W1,W2,…,W(n-1)的概率。这些概率值共同形成N个字符列语言模型。以下所述本发明的某些方面可以用于构造一个标准的统计型N字符列模型。N-gram models are obtained by applying an N-gram algorithm to a corpus of textual training data (a collection of phrases, sentences, sentence segments, paragraphs, etc.). An N-word algorithm can use, for example, well-known statistical techniques such as the Katz technique, or the binomial back-end distribution compensation technique. In using these techniques, the algorithm estimates the probability that a word w(n) will follow a sequence of words W1, W2, . . . , W(n-1). Together, these probability values form an N character sequence language model. Certain aspects of the invention described below can be used to construct a standard statistical N-character sequence model.

本发明的第一个主要方面如图4中所示,作为一种创建一个用于语言处理系统以指示字符的语言模型的方法140。也可以参照图5,一种包括具有用于实现方法140的具有指令的模块的系统或装置142。通常,方法140包括,对于一个单词词组表的每一个单词词组,在步骤144将该单词词组的字符串和该单词词组与表示识别该字符串的上下文标记相联系。应当注意该字符串可以包括一个单一字符。同样,一个单词词组可以包含一个单一的单词。例如,对于一个等于一个字符的字符串和一个等于一个单词的单词词组,步骤144将该单词的一个字符与单词表141中用于每一个单词的上下文标记相联系。一个上下文标记通常是由讲话者使用的特定语言中的一个单词或单词词组,以识别单词词组中的一个语言元素。在英文中上下文标记的例子包括“as in”,“for example”,“as found in”,“like”,“such as”等等。类似的单词或单词词组在其他语言中也可以找到,例如日语中的の和汉语中的的。在一个实施例中,步骤144包括构造一个单词词组143的语料库。每一个单词词组包括一个字符串、单词词组和上下文标记。典型地,当一个单一字符与一个单词关联时,使用第一个字符,尽管也可以使用该单词的另一个字符。这种单词词组的例子包括“N as in Nancy”,“P as in Paul”,和“Z as in zebra”。The first main aspect of the invention is shown in Figure 4 as a method 140 of creating a language model for use in a language processing system to indicate characters. Referring also to FIG. 5 , a system or apparatus 142 comprising means with instructions for implementing method 140 . In general, method 140 includes, for each word phrase of a word phrase list, associating, at step 144, the word phrase's character string and the word phrase with a contextual tag identifying the character string. It should be noted that the character string may consist of a single character. Likewise, a word phrase can contain a single word. For example, for a character string equal to one character and a word phrase equal to one word, step 144 associates one character of the word with the context token in word table 141 for each word. A contextual marker is usually a word or word phrase in a particular language used by the speaker to identify a linguistic element within the word phrase. Examples of contextual markers in English include "as in", "for example", "as found in", "like", "such as", etc. Similar words or word phrases can also be found in other languages, such as の in Japanese and の in Chinese. In one embodiment, step 144 includes constructing a corpus of word phrases 143 . Each word phrase includes a character string, word phrase and context token. Typically, when a single character is associated with a word, the first character is used, although another character of the word may also be used. Examples of such word phrases include "N as in Nancy", "P as in Paul", and "Z as in zebra".

在另一个实施例中,单词的另一个字符与单词和上下文标记关联,而在某些语言中,例如汉语,其中许多单词只包括一、二或三个字符,这对将该单词的每一个字符与上下文标记中的单词相关联是很有帮助的。如上指出的,关联所需字符与对应的单词和上下文标记的一种简单的方法是形成相同的单词词组。因此,给定一个单词表141,一个用于训练语言模型的单词词组143的语料库可以很容易生成所有所需的上下文标记。In another embodiment, another character of a word is associated with the word and a contextual marker, whereas in some languages, such as Chinese, where many words consist of only one, two or three characters, this would have the effect of making each of the words It is helpful for characters to be associated with words in contextual markup. As noted above, a simple way to associate desired characters with corresponding words and context tokens is to form identical word phrases. Thus, given a wordlist 141, a corpus of word phrases 143 for training a language model can easily generate all the required contextual tokens.

基于语料库143,语言模型16利用传统构造模型146来构造,如一个N字符列构造模型,实现公知的用于构造语言模型16的技术。块148表示在方法140中构造语言模型16,其中语言模型16包括,但不局限于,一个N字符列语言模型,一个上下文无关文法或上述的混合。Based on the corpus 143 , the language model 16 is constructed using a conventional construction model 146 , such as an N-gram construction model, implementing well-known techniques for constructing a language model 16 . Block 148 represents constructing the language model 16 in method 140, wherein the language model 16 includes, but is not limited to, an N-gram language model, a context-free grammar, or a mixture of the above.

生成的词组可以被指定一个合适的数值,将依据语言模型的形成产生一个合适的概率值。在上述例子中,“N as in Nancy”比词组“N as in notch”更可能被说。因此,本发明的又一个特征包括对语言模型中每一个相关的字符串和单词词组校正概率得分。概率得分可以针对语言模型16的生成来调节。在另一个实施例中,概率得分可以通过包括在语料库143中足够数量的相同的单词词组而校正来为相关的字符和单词词组在语言模型中产生一个合适的概率值。概率值也可以是单词词组使用的似然性的一个函数。通常,存在比其他单词使用更频繁来识别一个字符或多个字符的单词词组。这种单词词组可以被指定或另外在语言模型中提供一个较高的概率值。The generated phrase can be assigned an appropriate value, which will generate an appropriate probability value according to the formation of the language model. In the example above, "N as in Nancy" is more likely to be said than the phrase "N as in notch". Thus, yet another feature of the invention includes correcting the probability score for each associated string and word phrase in the language model. The probability score may be tuned for language model 16 generation. In another embodiment, the probability score may be corrected by including a sufficient number of identical word phrases in the corpus 143 to produce an appropriate probability value in the language model for the associated character and word phrase. The probability value can also be a function of the likelihood that the word phrase is used. Typically, there are word phrases that are used more frequently than other words to identify one or more characters. Such word phrases may be specified or otherwise provided with a higher probability value in the language model.

图6示出了一种语音识别模块180和语言模型16。语音识别模块180可以是上述类型;然而,应当理解语音识别模块180并不局限于该实施例,而是可以有很多形式。如上述所指明的,语音识别模块180接收表示输入语音的数据并访问语言模型16以确定输入语音是否包括具有上下文标记的词组。在检测到一个具有上下文标记的单词词组时,语音识别模块180只提供与上下文标记和单词词组相关的单个或多个字符作为输出,而不是上下文标记或单词词组。换句话说,尽管语音识别模块检测到完整的词组“N as in Nancy”,语音识别模块也只提供“N”作为输出。该输出在口授系统中尤其有用,其中讲话者单独地选择以指示所需的单个或多个字符。FIG. 6 shows a speech recognition module 180 and language model 16 . The speech recognition module 180 may be of the type described above; however, it should be understood that the speech recognition module 180 is not limited to this embodiment, but may take many forms. As noted above, speech recognition module 180 receives data representing input speech and accesses language model 16 to determine whether the input speech includes phrases with contextual markers. Upon detection of a word phrase with a contextual marker, the speech recognition module 180 provides as output only the single or multiple characters associated with the contextual marker and word phrase, rather than the contextual marker or word phrase. In other words, although the speech recognition module detects the complete phrase "N as in Nancy", the speech recognition module provides only "N" as output. This output is especially useful in dictation systems, where the speaker individually selects the desired single or multiple characters to indicate.

在这一点上,应当注意上述语言模型16实质上由相关的字符串、单词词组和上下文标记构成,因此允许语言模型16对具有这种形式的输入语音感应。在图3的实施例中,通用语言模型111可以被用作没有字符串、单词词组和上下文标记的特定形式的输入语音。然而,应当理解在这两种实施例中语言模型16和111如果需要的话可以合并。At this point, it should be noted that the language model 16 described above is substantially constructed of associated character strings, word phrases, and contextual tokens, thus allowing the language model 16 to be responsive to input speech having this form. In the embodiment of FIG. 3 , the generic language model 111 may be used as a specific form of input speech without character strings, word phrases, and contextual tokens. However, it should be understood that in both embodiments language models 16 and 111 may be combined if desired.

对于输入语音的接收和对语言模型16的访问,语音识别模块180为输入语音确定一个识别的字符串和一个识别的单词词组。在许多情况,识别的字符串由于使用语言模型16将是正确的。然而,在又一个实施例中,可以包含一个字符验证模块182来校正由语音识别模块180造成的错误中的至少一部分。字符验证模块182访问由语音识别模块180确定的识别字符串和识别单词词组并比较识别的字符串和识别的单词词组,特别是,验证识别的字符串存在于识别的单词词组中。如果识别的字符串不在识别的单词词组中,很明显发生了错误,尽管该错误既可能源自讲话者由于口授了错误的词组如“M as in Nancy”,也可能是语音识别模块180误解了识别的字符串或识别的单词词组。在一个实施例中,字符验证模块182可以假定该错误更可能在识别的字符串中,因此,用存在于识别的单词词组中的字符替代识别的字符串。以识别的单词词组的字符代替识别的字符串可以依据识别的字符串与识别的单词词组中的字符之间声音的相似性的比较来进行。因此,字符验证模块182可以访问属于说的时候独立的字符的声音的存储数据。利用存在于识别的单词词组中的字符,字符验证模块182比较在识别的单词词组中存储的每一个字符的声音数据和识别的字符串。提供最接近的字符作为输出。对于本领域普通技术人员来说,字符验证模块182可以包含在语音识别模块180中;然而,为了解释的目的,字符验证模块182单独以图示说明。Upon receipt of input speech and access to language model 16, speech recognition module 180 determines a recognized character string and a recognized word phrase for the input speech. In many cases, the recognized string will be correct due to the use of the language model 16 . However, in yet another embodiment, a character verification module 182 may be included to correct at least some of the errors made by the speech recognition module 180 . The character verification module 182 accesses the recognized character string and the recognized word phrase determined by the speech recognition module 180 and compares the recognized character string and the recognized word phrase, and in particular, verifies that the recognized character string is present in the recognized word phrase. If the recognized character string is not in the recognized word phrase, it is clear that an error has occurred, although the error may be caused by the speaker dictating the wrong phrase such as "M as in Nancy", or it may be misunderstood by the speech recognition module 180 A recognized string or a recognized word phrase. In one embodiment, the character verification module 182 may assume that the error is more likely in the recognized character string and, therefore, replace the recognized character string with characters present in the recognized word phrase. Replacing the recognized character string with characters of the recognized word phrase may be based on a comparison of the sound similarity between the recognized character string and the characters in the recognized word phrase. Thus, the character verification module 182 has access to stored data pertaining to the sounds of individual characters when spoken. Using the characters present in the recognized word phrase, the character verification module 182 compares the sound data of each character stored in the recognized word phrase with the recognized character string. Provides the closest character as output. To those of ordinary skill in the art, character verification module 182 may be included in speech recognition module 180; however, for purposes of explanation, character verification module 182 is illustrated separately.

尽管本发明参考优选实施例作了说明,本领域普通技术人员在不背离本发明精神和范围的情况下可以在形式和描述上作出各种修改。Although the present invention has been described with reference to preferred embodiments, various changes in form and description can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims (24)

1. system and discern the input voice with the method for pointing character of using a computer by language model, this method comprises:
Create language model, comprising:
Structure training corpus comprises:
For comprising, character and this word phrase of this word phrase is associated to generate the contextual tagging phrase of this training corpus automatically with the contextual tagging of this character of expression identification based on each the word phrase in the word phrase dictionary of Chinese character; And
Use this training corpus to create described language model;
Character when identification is spoken comprises:
Reception has the input voice of character string, wherein character string comprises the contextual tagging phrase that comprises character, comprises contextual tagging phrase based on Chinese character, has word phrase and contextual tagging based on Chinese character, and wherein said contextual tagging has been indicated the ambiguity of delete character;
Under the situation that does not have prompting, detect the contextual tagging phrase of importing in the voice that receives;
The instruction that execution conducts interviews to described language model, wherein said language model comprise N character row language model, and this model has the probabilistic information of the contextual tagging phrase that is generated; And
Export this character with text, and do not have this word phrase and contextual tagging of the contextual tagging phrase that detected.
2. the process of claim 1 wherein that this language model comprises a context-free grammar.
3. the process of claim 1 wherein that association comprises that first character with each word phrase is associated with this word phrase.
4. the process of claim 1 wherein that association comprises another character with at least a portion word phrase, rather than first character, be associated with corresponding word phrase.
5. the process of claim 1 wherein that each character that association comprises at least a portion word phrase is associated with corresponding word phrase.
6. the process of claim 1 wherein that association comprises is associated each character of each word phrase with corresponding word phrase.
7. the method for claim 1 also is included as each relevant character and word phrase correction probability score in the language model.
8. the method for claim 10, wherein contextual tagging comprises " " in the Japanese.
9. the process of claim 1 wherein contextual tagging comprise in the Chinese " ".
10. the process of claim 1 wherein that each word phrase all is a word that comprises at least one character.
11. the process of claim 1 wherein that output character comprises character string output as the probability function that is stored in the language model.
12. the method for claim 11, wherein output character comprises the function of character output as the N character row probability of receive input voice.
13. the method for claim 11, wherein output character comprises the unique function of character output as receive input voice.
14. the method for claim 11, when the character of identification was not present in the word phrase of identification, the character that is output was a character of the word phrase of identification.
15. a computer system that is used to discern the input voice comprises:
The language model of indication contextual tagging phrase and described contextual tagging phrase probabilistic information, wherein said contextual tagging phrase are basically by the relevant character string based on Chinese character, word phrase and the contextual tagging with character string; And
Be used to receive the identification module that the data of voice are imported in indication, described identification module detects the existence of the contextual tagging phrase in the input voice that receives under the situation that does not have the pointing character string as the prompting of text; Visit described language model; And according to the probabilistic information in the language model, output based on the character string of Chinese character as text that at least some detected by the said contextual tagging phrase of user.
16. computer system as claimed in claim 15, it is characterized in that, described identification module is handled the contextual tagging phrase that is detected in the mode that is different from other input voice, and described processing is by only exporting the character string in the detected contextual tagging phrase.
17. computer system as claimed in claim 15 is characterized in that, described language model comprises statistical language model.
18. computer system as claimed in claim 15 is characterized in that, described language model comprises N character row language model.
19. computer system as claimed in claim 15 is characterized in that, described language model comprises the context-free language model.
20. computer system as claimed in claim 15 is characterized in that, described language model output string is as the function that institute's identification string and institute's identified word phrase are compared.
21. computer system as claimed in claim 20 is characterized in that the character string of being discerned does not exist in the word phrase of being discerned, the described character string that is output is the character string of the word phrase discerned.
22. computer system as claimed in claim 15 is characterized in that, each described word phrase is a word.
23. computer system as claimed in claim 21 is characterized in that, each described character string is single character.
24. computer system as claimed in claim 15 is characterized in that, each described character string is single character.
CNB021065306A 2001-01-31 2002-01-29 Divergence elimination language model Expired - Fee Related CN100568222C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/773,342 2001-01-31
US09/773,342 US6507453B2 (en) 2000-02-02 2001-01-31 Thin floppy disk drive capable of preventing an eject lever from erroneously operating

Publications (2)

Publication Number Publication Date
CN1369830A CN1369830A (en) 2002-09-18
CN100568222C true CN100568222C (en) 2009-12-09

Family

ID=25097940

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB021065306A Expired - Fee Related CN100568222C (en) 2001-01-31 2002-01-29 Divergence elimination language model

Country Status (1)

Country Link
CN (1) CN100568222C (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1727024A1 (en) * 2005-05-27 2006-11-29 Sony Ericsson Mobile Communications AB Automatic language selection for text input in messaging context
CN1940915B (en) * 2005-09-29 2010-05-05 国际商业机器公司 Corpus expansion system and method
CN101256624B (en) * 2007-02-28 2012-10-10 微软公司 Method and system for establishing HMM topological structure being suitable for recognizing hand-written East Asia character
JP5638948B2 (en) * 2007-08-01 2014-12-10 ジンジャー ソフトウェア、インコーポレイティッド Automatic correction and improvement of context-sensitive languages using an Internet corpus
US8326333B2 (en) 2009-11-11 2012-12-04 Sony Ericsson Mobile Communications Ab Electronic device and method of controlling the electronic device
WO2014203370A1 (en) * 2013-06-20 2014-12-24 株式会社東芝 Speech synthesis dictionary creation device and speech synthesis dictionary creation method
CN103943109A (en) * 2014-04-28 2014-07-23 深圳如果技术有限公司 Method and device for converting voice to characters
JP2016024212A (en) * 2014-07-16 2016-02-08 ソニー株式会社 Information processing device, information processing method and program
US10229687B2 (en) * 2016-03-10 2019-03-12 Microsoft Technology Licensing, Llc Scalable endpoint-dependent natural language understanding
CN113034995B (en) * 2021-04-26 2023-04-11 读书郎教育科技有限公司 Method and system for generating dictation content by student tablet

Also Published As

Publication number Publication date
CN1369830A (en) 2002-09-18

Similar Documents

Publication Publication Date Title
US6934683B2 (en) Disambiguation language model
US7590533B2 (en) New-word pronunciation learning using a pronunciation graph
JP5040909B2 (en) Speech recognition dictionary creation support system, speech recognition dictionary creation support method, and speech recognition dictionary creation support program
CN100568223C (en) Method and apparatus for multimodal input of ideographic languages
US6539353B1 (en) Confidence measures using sub-word-dependent weighting of sub-word confidence scores for robust speech recognition
US7124080B2 (en) Method and apparatus for adapting a class entity dictionary used with language models
US6694296B1 (en) Method and apparatus for the recognition of spelled spoken words
US6910012B2 (en) Method and system for speech recognition using phonetically similar word alternatives
US6973427B2 (en) Method for adding phonetic descriptions to a speech recognition lexicon
JP5014785B2 (en) Phonetic-based speech recognition system and method
US7890325B2 (en) Subword unit posterior probability for measuring confidence
US8135590B2 (en) Position-dependent phonetic models for reliable pronunciation identification
US20080221890A1 (en) Unsupervised lexicon acquisition from speech and text
US7844457B2 (en) Unsupervised labeling of sentence level accent
US20080027725A1 (en) Automatic Accent Detection With Limited Manually Labeled Data
US6449589B1 (en) Elimination of left recursion from context-free grammars
US6502072B2 (en) Two-tier noise rejection in speech recognition
CN100568222C (en) Divergence elimination language model
KR20130126570A (en) Apparatus for discriminative training acoustic model considering error of phonemes in keyword and computer recordable medium storing the method thereof
KR102299269B1 (en) Method and apparatus for building voice database by aligning voice and script
Büler et al. Using language modelling to integrate speech recognition with a flat semantic analysis
JP2005221752A (en) Speech recognition apparatus, speech recognition method and program
Reddy et al. Incorporating knowledge of source language text in a system for dictation of document translations
Reddy Speech based machine aided human translation for a document translation task
Shinozaki et al. A new lexicon optimization method for LVCSR based on linguistic and acoustic characteristics of words.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091209

Termination date: 20150129

EXPY Termination of patent right or utility model