CN103154974B

CN103154974B - Character recognition device, character recognition method and character recognition system

Info

Publication number: CN103154974B
Application number: CN201280003349.XA
Authority: CN
Inventors: 山添隆文; 荣藤稔; 吉村健; 辻野孝辅
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2011-03-07
Filing date: 2012-02-24
Publication date: 2017-05-24
Anticipated expiration: 2032-02-24
Also published as: US8965126B2; WO2012121033A1; US20130108160A1; EP2685405A4; KR20130029430A; KR101463499B1; EP2685405A1; JP5647919B2; JP2012185722A; CN103154974A

Abstract

The character recognition device (1) has: an image reading unit (101), which inputs an image; a character region detection unit (103), which detects a character region of the image; a character region segmentation unit (104), which divides the A character area; a character recognition unit (105) that performs character recognition for each individual character on characters present in the divided area, and outputs one or more candidates for character recognition processing results for each individual character; generation of first character string transfer data a part (106), which is input to the candidate, calculates a weight value for transition to the candidate, and generates first character string transition data based on a set of the candidate and the weight value; and a WFST processing part (109), which according to the first The character string transition data performs state transitions in sequence, accumulates the weight values in each state transition and calculates the cumulative weight value of each state transition, and outputs the result of more than one state transition according to the cumulative weight value.

Description

Character recognition device, character recognition method and character recognition system

技术领域technical field

本发明涉及字符识别装置、字符识别方法、字符识别系统以及字符识别程序。The present invention relates to a character recognition device, a character recognition method, a character recognition system and a character recognition program.

背景技术Background technique

以往，如下述专利文献1～2、非专利文献1～3中例示的那样，公知有从情景图像中识别存在于广告牌等三维空间中的字符的技术。在这些技术中，为了对应明暗变动或字符失真等情景图像特有的问题，使用单词知识或摄影场所的位置信息等外部数据而提高了识别精度。Conventionally, as exemplified in the following Patent Documents 1 to 2 and Non-Patent Documents 1 to 3, techniques for recognizing characters existing in a three-dimensional space such as billboards from scene images are known. In these technologies, recognition accuracy is improved by using external data such as knowledge of words and location information of shooting locations in order to cope with problems specific to scene images such as changes in brightness or distortion of characters.

例如，在专利文献1中，将从图像中的广告牌等中提取的字符串与电话簿数据库进行对照，通过判定与电话簿数据库中包含的广告主数据或电话号码数据的一致度，判定提取的字符串是否与广告主关联。For example, in Patent Document 1, a character string extracted from a billboard in an image is compared with a phonebook database, and by judging the degree of coincidence with the advertiser data or phone number data contained in the phonebook database, the extraction is judged. Whether the string of is associated with the advertiser.

此外，在专利文献2中，使用在字符识别装置中装备的位置信息取得单元以及方位信息取得单元，确定摄影的位置和方向，使用确定的位置、方位与地图数据库进行对照，使用符合的店铺名或地名信息作为单词知识，由此提高了识别精度。In addition, in Patent Document 2, the location information acquisition unit and the orientation information acquisition unit equipped in the character recognition device are used to specify the location and direction of the photograph, and the determined location and orientation are compared with the map database, and the matching store name is used. or place name information as word knowledge, thus improving the recognition accuracy.

此外，使用加权有限状态转换器(Weighted Finite State Transducer，以下称作“WFST”。)作为语音识别或语言处理领域中的高速且具有较高的通用性和扩展性的方法，该加权有限状态转换器将符号串转换与权重的集合表现为状态转移。与该WFST关联地，在字符识别的领域中，如非专利文献1、非专利文献2例示的那样，提出了以取得英语那样分隔书写的以词汇为单位的字符串为前提，取得字符识别的结果的方法。此外，在非专利文献3中，提出了在日语中在输出字符识别结果后，使用WFST进行纠错的方法。In addition, a weighted finite state transducer (Weighted Finite State Transducer, hereinafter referred to as "WFST") is used as a high-speed method with high versatility and scalability in the field of speech recognition or language processing. The device represents the set of symbol string transitions and weights as state transitions. In connection with this WFST, in the field of character recognition, as exemplified in Non-Patent Document 1 and Non-Patent Document 2, it has been proposed to obtain character recognition based on the premise of obtaining character strings written in separate words like English. method of results. In addition, Non-Patent Document 3 proposes a method of performing error correction using WFST after outputting a character recognition result in Japanese.

现有技术文献prior art literature

专利文献patent documents

专利文献1：日本专利第3360030号公报Patent Document 1: Japanese Patent No. 3360030

专利文献2：日本专利第4591353号公报Patent Document 2: Japanese Patent No. 4591353

非专利文献non-patent literature

非专利文献1：”A Weighted Finite-State Framework for Correcting Errorsin NaturalScene OCR”，ICDAR 2007Vol.2，pp.889-893Non-Patent Document 1: "A Weighted Finite-State Framework for Correcting Errors in NaturalScene OCR", ICDAR 2007Vol.2, pp.889-893

非专利文献2：”The image Text Recognition Graph(iTRG)”，ICME 2009，pp.266-269Non-Patent Document 2: "The image Text Recognition Graph (iTRG)", ICME 2009, pp.266-269

非专利文献3：重み付き有限状態トランスヂューサを用いた文字誤り訂正、言語処理学会年次大会発表論文集C2-5，pp.332-335，2009Non-Patent Document 3: Revision of Finite State Transu ヂュースいた Text Mistakes, Proceedings C2-5 of the Annual Conference of the Society for Speech Processing, pp.332-335, 2009

发明内容Contents of the invention

发明要解决的问题The problem to be solved by the invention

然而，在专利文献1中记载的方法中，需要与电话簿数据库等中包含的大量的单词知识进行对照，可能无法充分地实现字符识别处理的高速化。此外，在专利文献2中记载的方法中，另外需要位置信息取得单元或方位信息取得单元，装置结构有可能复杂化。However, in the method described in Patent Document 1, it is necessary to collate with a large amount of word knowledge contained in a telephone directory database or the like, and it may not be possible to sufficiently speed up character recognition processing. In addition, in the method described in Patent Document 2, a position information acquisition unit or an orientation information acquisition unit is separately required, and the device configuration may become complicated.

此外，在非专利文献1和非专利文献2中，以在词汇之间存在分隔的分隔书写的语言为前提。即，以WFST处理涉及的词汇已经预先切出为前提。此外，在非专利文献2中，在字符识别的以字符为单位的切出中，利用重复的切出位置进行字符识别，利用WFST来表现，但是在字符识别的结果中出现了误识别的情况下可能无法进行应对。In addition, in Non-Patent Document 1 and Non-Patent Document 2, it is presupposed that there is a partition written language in which there is a partition between words. That is, the premise is that the vocabulary involved in WFST processing has already been cut out in advance. In addition, in Non-Patent Document 2, in character-by-character cutting out of character recognition, character recognition is performed using repeated cutting positions, and WFST is used to express, but misrecognition may occur in the result of character recognition may not be able to cope.

此外，在非专利文献1和非专利文献3中，通过字符的融合/分离来应对因重复的切出位置引起的误识别的问题，但是日语的字符种类较多，并且在实际环境中存在各种各样的字符设计，因此需要网罗庞大的组合。此外，在非专利文献3中，暂且使用了进行字符识别的结果，前提为在一定程度上高精度地得到字符识别结果。因此，在原本的字符识别中进行了较多的基于语言处理的修正的情况下，进行基于字符形状的修正可能变得困难。此外，可能无法应对原本的字符识别中的字符区域的检测遗漏的修正。In addition, in Non-Patent Document 1 and Non-Patent Document 3, the problem of misrecognition caused by repeated cutout positions is dealt with by character fusion/separation, but there are many types of Japanese characters, and there are various characters in the actual environment. There are a variety of character designs, so it is necessary to collect a huge combination. In addition, in Non-Patent Document 3, the result of performing character recognition is temporarily used, and the premise is that the result of character recognition is obtained with high accuracy to a certain extent. Therefore, when many corrections based on language processing are performed in original character recognition, it may become difficult to perform corrections based on character shapes. In addition, it may not be possible to cope with the correction of the detection omission of the character region in the original character recognition.

因此，本发明正是鉴于上述问题而完成的，其目的在于，提供一种能够不使用外部的电话簿等数据库而使用被简单化的装置结构，高精度且高速地从情景图像进行字符识别的字符识别装置、字符识别方法、字符识别系统以及字符识别程序。Therefore, the present invention has been made in view of the above-mentioned problems, and its object is to provide a device capable of performing character recognition from scene images with high precision and high speed using a simplified device configuration without using an external database such as a phone book. A character recognition device, a character recognition method, a character recognition system, and a character recognition program.

用于解决问题的手段means of solving problems

为了解决上述课题，本发明的字符识别装置的特征在于，具有：图像输入单元，其输入包含识别对象的字符的图像；字符区域检测单元，其检测所述图像中的、所述字符存在的区域即字符区域；字符区域分割单元，其以单个字符为单位分割所述字符区域；字符识别单元，其对存在于所述字符区域分割单元所分割的分割区域中的所述字符进行每个单个字符的字符识别处理，对单个字符输出1个以上的字符识别处理结果的候选；第1字符串转移数据生成单元，其被输入所述候选，计算针对向所述候选的转移的权重值，生成基于所述候选与所述权重值的组的字符串转移数据即第1字符串转移数据；以及有限状态转换单元，其根据所述第1字符串转移数据依次进行状态转移，将各状态转移中的权重值累计而计算每个状态转移的累计权重值，根据所述累计权重值输出一个以上的所述状态转移的结果。In order to solve the above-mentioned problems, the character recognition device of the present invention is characterized in that it includes: an image input unit that inputs an image including a character to be recognized; and a character region detection unit that detects a region in the image where the character exists. That is, a character area; a character area segmentation unit that divides the character area in units of individual characters; a character recognition unit that performs each individual character on the characters that exist in the segmented area divided by the character area segmentation unit. The character recognition process of outputting one or more candidates of character recognition processing results for a single character; the first character string transition data generation unit, which is input to the candidate, calculates a weight value for transition to the candidate, and generates a value based on The character string transition data of the group of the candidate and the weight value is the first character string transition data; and a finite state conversion unit, which sequentially performs state transition according to the first character string transition data, and converts each state transition The weight values are accumulated to calculate the cumulative weight value of each state transition, and more than one result of the state transition is output according to the cumulative weight value.

此外，本发明的字符识别方法的特征在于，具有：图像输入步骤，图像输入单元输入包含识别对象的字符的图像；字符区域检测步骤，字符区域检测单元检测所述图像中的、所述字符存在的区域即字符区域；字符区域分割步骤，字符区域分割单元以单个字符为单位分割所述字符区域；字符识别步骤，字符识别单元对存在于所述字符区域分割单元所分割的分割区域中的所述字符进行每个单个字符的字符识别处理，对单个字符输出1个以上的字符识别处理结果的候选；第1字符串转移数据生成步骤，第1字符串转移数据生成单元输入所述候选，计算针对向所述候选的转移的权重值，生成基于所述候选与所述权重值的组的字符串转移数据即第1字符串转移数据；以及有限状态转换步骤，有限状态转换单元根据所述第1字符串转移数据依次进行状态转移，将各状态转移中的权重值累计而计算每个状态转移的累计权重值，根据所述累计权重值输出一个以上的所述状态转移的结果。In addition, the character recognition method of the present invention is characterized in that it has: an image input step in which the image input unit inputs an image including a character to be recognized; a character region detection step in which the character region detection unit detects the existence of the character in the image. The area of the character area is the character area; the character area segmentation step, the character area segmentation unit divides the character area with a single character as a unit; The character recognition processing of each individual character is carried out, and the candidate of more than one character recognition processing result is output to a single character; the first character string transfer data generation step, the first character string transfer data generation unit inputs the candidate, and calculates For the weight value transferred to the candidate, generating character string transfer data based on the set of the candidate and the weight value, that is, the first character string transfer data; and a finite state conversion step, the finite state conversion unit according to the first character string transfer data 1 The character string transition data sequentially performs state transitions, accumulates the weight values in each state transition to calculate the cumulative weight value of each state transition, and outputs one or more results of the state transition according to the cumulative weight value.

此外，本发明的字符识别系统包含终端和服务器，该字符识别系统的特征在于，所述终端具有：图像输入单元，其输入包含识别对象的字符的图像；字符区域检测单元，其检测所述图像中的、所述字符存在的区域即字符区域；字符区域分割单元，其以单个字符为单位分割所述字符区域；字符识别单元，其对存在于所述字符区域分割单元所分割的分割区域中的所述字符进行每个单个字符的字符识别处理，对单个字符输出1个以上的字符识别处理结果的候选，所述服务器具有：第1字符串转移数据生成单元，其被输入所述候选，计算针对向所述候选的转移的权重值，生成基于所述候选与所述权重值的组的字符串转移数据即第1字符串转移数据；以及有限状态转换单元，其根据所述第1字符串转移数据依次进行状态转移，将各状态转移中的权重值累计而计算每个状态转移的累计权重值，根据所述累计权重值输出一个以上的所述状态转移的结果。In addition, the character recognition system of the present invention includes a terminal and a server, and the character recognition system is characterized in that the terminal has: an image input unit that inputs an image including a character to be recognized; a character area detection unit that detects the image Among them, the area where the character exists is the character area; the character area segmentation unit divides the character area in units of a single character; the character recognition unit exists in the segmented area divided by the character area segmentation unit Performing character recognition processing for each single character of the characters, and outputting more than one candidate for the character recognition processing result for a single character, the server has: a first character string transfer data generation unit, which is input to the candidate, calculating a weight value for a transition to the candidate, generating first character string transition data based on a set of the candidate and the weight value; and a finite state conversion unit configured based on the first character The serial transition data sequentially performs state transitions, accumulates the weight values in each state transition to calculate the cumulative weight value of each state transition, and outputs one or more results of the state transition according to the cumulative weight value.

此外，本发明的字符识别程序，其特征在于，使计算机作为以下单元进行动作：图像输入单元，其输入包含识别对象的字符的图像；字符区域检测单元，其检测所述图像中的、所述字符存在的区域即字符区域；字符区域分割单元，其以单个字符为单位分割所述字符区域；字符识别单元，其对存在于所述字符区域分割单元所分割的分割区域中的所述字符进行每个单个字符的字符识别处理，对单个字符输出1个以上的字符识别处理结果的候选；第1字符串转移数据生成单元，其输入所述候选，计算针对向所述候选的转移的权重值，生成基于所述候选与所述权重值的组的字符串转移数据即第1字符串转移数据；以及有限状态转换单元，其根据所述第1字符串转移数据依次进行状态转移，将各状态转移中的权重值累计而计算每个状态转移的累计权重值，根据所述累计权重值输出一个以上的所述状态转移的结果。In addition, the character recognition program of the present invention is characterized in that the computer operates as the following means: an image input means that inputs an image including a character to be recognized; a character area detection means that detects the The area in which characters exist is a character area; a character area division unit that divides the character area in units of individual characters; a character recognition unit that performs an operation on the characters existing in the division area divided by the character area division unit. Character recognition processing for each single character, outputting one or more candidates for character recognition processing results for a single character; a first character string transition data generation unit that inputs the candidates and calculates weight values for transitions to the candidates , generating character string transition data based on the group of the candidate and the weight value, that is, the first character string transition data; and a finite state conversion unit, which sequentially performs state transition according to the first character string transition data, and converts each state The weight values in the transitions are accumulated to calculate the cumulative weight value of each state transition, and more than one result of the state transition is output according to the cumulative weight value.

根据这样的本发明的字符识别装置、字符识别方法、字符识别系统以及字符识别程序，由于不使用外部的电话簿等数据库，因此不需要与该电话簿数据库等中包含的大量的单词知识进行对照，能够实现字符识别处理的高速化。此外，由于不需要位置信息取得单元或方位信息取得单元等，因此能够使装置结构简单化。使用这样的装置结构，可以高精度且高速地从情景图像进行字符识别。According to the character recognition device, character recognition method, character recognition system, and character recognition program of the present invention, since no database such as an external telephone directory is used, it is not necessary to check with a large amount of word knowledge contained in the telephone directory database or the like. , speeding up character recognition processing can be realized. In addition, since a position information acquisition unit, an azimuth information acquisition unit, and the like are not required, the device configuration can be simplified. With such an apparatus configuration, character recognition from scene images can be performed with high precision and high speed.

此外，也可以是，在本发明中，所述字符识别装置还具有第2字符串转移数据生成单元，从用户输入关键字，该第2字符串转移数据生成单元生成所述关键字的字符串转移数据即第2字符串转移数据，所述有限状态转换单元对所述第1字符串转移数据和所述第2字符串转移数据进行合成运算，由此判定在所述图像中是否存在所述关键字。In addition, in the present invention, the character recognition device may further include a second character string transition data generating unit that inputs a keyword from the user, and the second character string transition data generating unit generates a character string of the keyword. The transition data is the second character string transition data, and the finite state conversion unit performs a composite operation on the first character string transition data and the second character string transition data, thereby determining whether the keywords.

根据本发明，能够利用字符识别候选组的第1字符串转移数据本身作为对图像的检索用表，将本发明的字符识别装置有效地应用为判定在图像中是否存在用户输入关键字的装置。According to the present invention, the character recognition device of the present invention can be effectively applied as a device for judging whether or not a user-input keyword exists in an image by using the first character string transfer data itself of the character recognition candidate group as a search table for an image.

此外，也可以是，在本发明中，所述字符识别装置还具有第3字符串转移数据生成单元，该第3字符串转移数据生成单元生成在词汇数据库中存在的各词汇的字符串转移数据即第3字符串转移数据，所述有限状态转换单元对所述第1字符串转移数据和所述第3字符串转移数据进行合成运算，由此检测在所述图像中存在的词汇。In addition, in the present invention, the character recognition device may further include a third character string transition data generating unit that generates character string transition data for each vocabulary existing in the vocabulary database. That is, the third character string transition data, and the finite state conversion unit performs a synthesis operation on the first character string transition data and the third character string transition data, thereby detecting words existing in the image.

根据本发明，通过对字符识别候选组的第1字符串转移数据和词汇数据库中的第3字符串转移数据进行合成运算，能够有效地将本发明的字符识别装置应用为词汇检测装置。According to the present invention, the character recognition device of the present invention can be effectively applied as a vocabulary detection device by performing a composite operation on the first character string transition data of the character recognition candidate group and the third character string transition data in the vocabulary database.

此外，也可以是，在本发明中，所述字符识别单元对多个所述候选分别赋予优先次序并进行输出，所述第1字符串转移数据生成单元根据所述优先次序计算所述权重值。In addition, in the present invention, the character recognition unit assigns priority to each of the plurality of candidates and outputs them, and the first character string transition data generation unit calculates the weight value according to the priority. .

根据本发明，可以提供用于第1字符串转移数据生成单元计算权重值的具体的方法。According to the present invention, a specific method for calculating the weight value by the first character string transition data generation unit can be provided.

此外，也可以是，在本发明中，所述字符识别单元使用至少两种以上不同的识别方式进行所述字符识别处理，所述第1字符串转移数据生成单元根据所述不同的识别方式中的所述候选的输出数量和所述优先次序来计算所述权重值。In addition, in the present invention, the character recognition unit uses at least two different recognition methods to perform the character recognition processing, and the first character string transfer data generating unit The weight value is calculated based on the output quantity of the candidates and the priority order.

此外，也可以是，在本发明中，所述第1字符串转移数据生成单元考虑在语言数据库中登记的单词的字符串转移来计算所述权重值。In addition, in the present invention, the first character string transition data generation unit may calculate the weight value in consideration of character string transitions of words registered in a language database.

根据本发明，可以提供用于第1字符串转移数据生成单元计算权重值的具体的手法。According to the present invention, a specific method for calculating the weight value by the first character string transition data generation unit can be provided.

此外，也可以是，在本发明中，所述第1字符串转移数据生成单元根据所述候选在所述图像内的位置、或者所述候选的字符大小来修正所述权重值。Furthermore, in the present invention, the first character string transition data generation unit may correct the weight value according to the position of the candidate in the image or the character size of the candidate.

根据本发明，可以提供用于第1字符串转移数据生成单元修正权重值的具体的方法。此外，通过权重值的修正能够提高词汇的检测精度。According to the present invention, it is possible to provide a specific method for the first character string transfer data generation unit to correct the weight value. In addition, the accuracy of vocabulary detection can be improved by modifying the weight value.

此外，也可以是，在本发明中，在所述字符区域分割单元使用多个分割模式来分割所述字符区域，生成了多种所述分割区域的情况下，所述字符识别单元对所述多种分割区域分别进行所述字符识别处理，所述第1字符串转移数据生成单元对所述多种分割区域各自的所述候选生成所述第1字符串转移数据，所述有限状态转换单元输出在所述多种分割区域全部中所述累计权重值为上位的状态转移的结果作为所述结果。In addition, in the present invention, in the case where the character region dividing unit divides the character region using a plurality of division patterns to generate multiple kinds of the divided regions, the character recognition unit may performing the character recognition process on each of the plurality of divided regions, the first character string transition data generation unit generates the first character string transition data for each of the candidates of the plurality of divided regions, and the finite state conversion unit A result of state transition in which the cumulative weight value is higher in all of the plurality of divided regions is output as the result.

根据本发明，即便在字符区域分割单元进行了过分割(Over segmentation)的情况下也可以适当地应对。According to the present invention, it is possible to appropriately deal with even the case where the character region segmentation unit has performed over-segmentation (Over-segmentation).

此外，也可以是，在本发明中，所述第1字符串转移数据生成单元将从字符串转移的初始状态向所述候选的空转移即第1空转移、从所述候选向字符串转移的最终状态的空转移即第2空转移、用于以单个字符为单位跳过所述候选的空转移即第3空转移包含在内而生成所述第1字符串转移数据。In addition, in the present invention, the first character string transition data generating unit may transition from the initial state of the character string transition to the first null transition of the candidate, that is, the first null transition, and transition from the candidate to the character string. The first character string transition data is generated including the second null transition which is the final state of the null transition and the third null transition which is the candidate null transition for skipping the candidate in units of single characters.

根据本发明，使第1字符串转移数据包含第1空转移、第2空转移以及第3空转移，由此能够提高第1字符串转移数据与第2字符串转移数据或第3字符串转移数据的合成运算的精度。According to the present invention, the first string transition data includes the first dummy transition, the second dummy transition, and the third dummy transition, thereby improving the efficiency of the first string transition data and the second string transition data or the third string transition data. The precision of the compositing operation on the data.

此外，也可以是，在本发明中，所述字符识别单元在输出所述字符识别处理结果的所述候选时，一并输出表示单词间的分隔的识别信息，所述第1字符串转移数据生成单元附加所述识别信息来生成所述第1字符串转移数据，所述有限状态转换单元在进行所述状态转移时，以被两个所述识别信息分隔的部分为单位来进行所述状态转移。In addition, in the present invention, when the character recognition unit outputs the candidate of the character recognition processing result, it may also output the recognition information indicating the separation between words, and the first character string transition data The generation unit adds the identification information to generate the first character string transition data, and the finite state transition unit performs the state transition in units of parts separated by two pieces of the identification information when performing the state transition. transfer.

根据本发明，通过使用表示分隔的识别信息，对分隔书写的语言也能够高精度地进行字符识别。According to the present invention, character recognition can be performed with high accuracy even for languages written with partitions by using identification information indicating partitions.

此外，也可以是，在本发明中，所述字符识别单元在输出所述字符识别处理结果的所述候选时，一并输出该候选在所述图像内的位置信息，所述第1字符串转移数据生成单元附加所述位置信息来生成所述第1字符串转移数据，所述有限状态转换单元附加所述位置信息来输出所述结果。In addition, in the present invention, when the character recognition unit outputs the candidate of the character recognition processing result, it may also output the position information of the candidate in the image, and the first character string The transition data generation unit adds the position information to generate the first character string transition data, and the finite state conversion unit adds the position information to output the result.

根据本发明，通过使用位置信息，能够确定字符识别的结果位于图像内的哪个位置。According to the present invention, by using positional information, it is possible to specify at which position in an image the result of character recognition is located.

此外，也可以是，在本发明中，所述词汇数据库具有对词汇的分类信息，所述第2字符串转移数据生成单元或者所述第3字符串转移数据生成单元附加所述分类信息来生成所述第2字符串转移数据或者所述第3字符串转移数据，所述有限状态转换单元附加所述分类信息来输出所述结果。In addition, in the present invention, the vocabulary database has classification information for vocabulary, and the second character string transition data generation unit or the third character string transition data generation unit adds the classification information to generate For the second character string transfer data or the third character string transfer data, the finite state conversion unit adds the classification information to output the result.

根据本发明，通过使用分类信息，能够确定字符识别的结果属于哪个类别。According to the present invention, by using classification information, it is possible to determine which category a result of character recognition belongs to.

此外，也可以是，在本发明中，所述字符识别装置具有词汇分类关联性矢量存储单元，该词汇分类关联性矢量存储单元存储表示词汇与所述分类信息的关联性的词汇分类关联性矢量，所述第1字符串转移数据生成单元将所述第1字符串转移数据中的所述候选以及所述权重值与所述词汇分类关联性矢量的值相加，将值最大的分类信息作为与所述候选对应的分类信息，基于该分类信息修正对于该候选的所述权重值。In addition, in the present invention, the character recognition device may include a vocabulary classification correlation vector storage unit that stores a vocabulary classification correlation vector indicating the correlation between a vocabulary and the classification information. , the first character string transfer data generating unit adds the candidate and the weight value in the first character string transfer data to the value of the vocabulary classification relevance vector, and uses the classification information with the largest value as Classification information corresponding to the candidate, and modifying the weight value for the candidate based on the classification information.

发明的效果The effect of the invention

根据本发明，能够提供一种能够不使用外部的电话簿等数据库而使用被简单化的装置结构，高精度且高速地从情景图像进行字符识别的字符识别装置、字符识别方法、字符识别系统以及字符识别程序。According to the present invention, it is possible to provide a character recognition device, character recognition method, character recognition system, and character recognition program.

附图说明Description of drawings

图1是示出字符识别装置1的功能性的结构要素的结构概要图。FIG. 1 is a schematic configuration diagram showing functional components of a character recognition device 1 .

图2是字符识别装置1的硬件结构图。FIG. 2 is a hardware configuration diagram of the character recognition device 1 .

图3是示出本实施方式的全体处理流程的流程图。FIG. 3 is a flowchart showing the overall processing flow of the present embodiment.

图4是用于说明字符区域分割部104的动作的图。FIG. 4 is a diagram for explaining the operation of the character region dividing unit 104 .

图5是示出第1字符串转移数据生成部106生成的第1WFST数据的一例的图。FIG. 5 is a diagram showing an example of the first WFST data generated by the first character string transition data generating unit 106 .

图6是示出字符区域分割部104进行了过分割的情况下的处理的图。FIG. 6 is a diagram showing processing when the character region dividing unit 104 performs over-segmentation.

图7是用于说明第1字符串转移数据生成部106根据字符的大小/位置等调整权重值的图。FIG. 7 is a diagram for explaining that the first character string transition data generation unit 106 adjusts weight values according to the size and position of characters.

图8是示出WFST运算处理的流程图。FIG. 8 is a flowchart showing WFST calculation processing.

图9是示出WFST合成运算的映像的图。FIG. 9 is a diagram showing a map of a WFST synthesis operation.

图10示出WFST合成运算的变形1中的处理的一例。FIG. 10 shows an example of processing in Modification 1 of the WFST synthesis calculation.

图11示出WFST合成运算的变形1中的处理的一例。FIG. 11 shows an example of processing in Modification 1 of the WFST synthesis calculation.

图12示出WFST合成运算的变形1中的处理的一例。FIG. 12 shows an example of processing in Modification 1 of the WFST synthesis calculation.

图13是示出WFST合成运算的变形2中的字符识别装置1的功能性的结构要素的结构概要图。FIG. 13 is a schematic configuration diagram showing functional components of the character recognition device 1 in Variation 2 of the WFST synthesis calculation.

图14示出WFST合成运算的变形2中的处理的一例。FIG. 14 shows an example of processing in Modification 2 of the WFST synthesis calculation.

图15示出WFST合成运算的变形2中的处理的一例。FIG. 15 shows an example of processing in Modification 2 of the WFST synthesis calculation.

图16示出WFST合成运算的变形3中的处理的一例。FIG. 16 shows an example of processing in Modification 3 of the WFST synthesis calculation.

图17示出WFST合成运算的变形4中的处理的一例。FIG. 17 shows an example of processing in Variation 4 of the WFST synthesis calculation.

图18示出WFST合成运算的变形4中的处理的一例。FIG. 18 shows an example of processing in Variation 4 of the WFST synthesis calculation.

图19是示出字符识别系统100的功能性的结构要素的结构概要图。FIG. 19 is a schematic configuration diagram showing functional components of the character recognition system 100 .

具体实施方式detailed description

以下，参照附图详细地说明本发明的字符识别装置、字符识别方法、字符识别系统以及字符识别程序的优选实施方式。另外，在附图的说明中，对相同的要素标注相同的标号，省略重复的说明。Preferred embodiments of the character recognition device, character recognition method, character recognition system, and character recognition program of the present invention will be described in detail below with reference to the drawings. In addition, in the description of the drawings, the same reference numerals are attached to the same elements, and overlapping descriptions are omitted.

(字符识别装置1的整体结构)(Overall structure of the character recognition device 1)

本发明的实施方式的字符识别装置1用于从情景图像检测字符区域，并进行字符识别(例如，关键字检测，检索用表生成等)。图1是示出字符识别装置1的功能性的结构要素的结构概要图，图2是字符识别装置1的硬件结构图。如图2所示，字符识别装置1构成为通常的计算机系统，该计算机系统在物理上除包含CPU11、ROM12和RAM13等主存储装置、键盘、鼠标外，还包含照相机等作为用于读入图像的装置或者用于从外部装置读入数据的装置的输入设备14、显示器等输出设备15、用于与其他装置之间进行数据的发送接收的网卡等通信模块16、硬盘等辅助存储装置17等。输入设备14进行的图像的读入可以是由自装置摄影的图像，或者也可以是由其他装置摄影的图像。通过在CPU11、ROM12、RAM13等硬件上读入预定的计算机软件，在CPU11的控制下使输入设备14、输出设备15、通信模块16动作，并且，进行主存储装置12、13或辅助存储装置17中的数据的读出和写入，由此实现后述的字符识别装置1的各功能。The character recognition device 1 according to the embodiment of the present invention detects a character region from a scene image and performs character recognition (for example, keyword detection, generation of a search table, etc.). FIG. 1 is a schematic configuration diagram showing functional components of a character recognition device 1 , and FIG. 2 is a hardware configuration diagram of the character recognition device 1 . As shown in Figure 2, the character recognition device 1 is constituted as a common computer system, which physically includes a camera, etc., in addition to main storage devices such as CPU11, ROM12, and RAM13, a keyboard, and a mouse, as well as a camera for reading in images. input device 14, output device 15 such as a display, communication module 16 such as a network card for sending and receiving data with other devices, auxiliary storage device 17 such as a hard disk, etc. . The image read by the input device 14 may be an image taken by the own device or an image taken by another device. By reading predetermined computer software on hardware such as CPU11, ROM12, RAM13, under the control of CPU11, input device 14, output device 15, communication module 16 are operated, and main storage device 12, 13 or auxiliary storage device 17 The reading and writing of data in the device realizes various functions of the character recognition device 1 described later.

如图1所示，字符识别装置1具有以下部分作为功能性的结构要素：图像读入部101(相当于权利要求书中的“图像输入单元”)、图像二值化部102、字符区域检测部103(相当于权利要求书中的“字符区域检测单元”)、字符区域分割部104(相当于权利要求书中的“字符区域分割单元”)、字符识别部105(相当于权利要求书中的“字符识别单元”)、第1字符串转移数据生成部106(相当于权利要求书中的“第1字符串转移数据生成单元”)、第2字符串转移数据生成部107(相当于权利要求书中的“第2字符串转移数据生成单元”)、第3字符串转移数据生成部108(相当于权利要求书中的“第3字符串转移数据生成单元”)、WFST处理部109(相当于权利要求书中的“有限状态转换单元”)、字符串检测部110(相当于权利要求书中的“字符串检测单元”)以及词汇DB111(相当于权利要求书中的“词汇数据库”)。以下，参照图3的流程图对字符识别装置1的各结构要素的动作进行说明。As shown in FIG. 1 , the character recognition device 1 has the following parts as functional structural elements: an image reading unit 101 (equivalent to an “image input unit” in the claims), an image binarization unit 102, and a character area detection unit. Section 103 (corresponding to "character region detection unit" in the claims), character region segmentation unit 104 (corresponding to "character region segmentation unit" in the claims), character recognition section 105 (corresponding to the "character region segmentation unit" in the claims "character recognition unit"), the first character string transition data generation unit 106 (equivalent to the "first character string transition data generation unit" in the claims), the second character string transition data generation unit 107 (equivalent to the claims "the second character string transition data generating unit" in the claims), the third character string transition data generating unit 108 (equivalent to the "third character string transition data generating unit" in the claims), the WFST processing unit 109 ( Equivalent to the "finite state conversion unit" in the claims), character string detection unit 110 (corresponding to the "character string detection unit" in the claims), and vocabulary DB111 (corresponding to the "vocabulary database" in the claims ). Hereinafter, the operation of each component of the character recognition device 1 will be described with reference to the flowchart of FIG. 3 .

(1)图像的读入(1) Image reading

图像读入部101输入包含识别对象的字符的图像(步骤S1，相当于权利要求书中的“图像输入步骤”)。关于通过扫描仪取入印刷文件那样的文档图像已经有技术，可以高速/高精度地进行识别，因此通过既有的文档OCR引擎进行作为文档图像的字符识别(步骤S2)。然后，第1字符串转移数据生成部106根据识别结果的候选组生成由WFST表示的数据(以下称作“第1WFST数据”。相当于权利要求书中的“第1字符串转移数据”)(步骤S3，权利要求书中的“相当于第1字符串转移数据生成步骤”)。另外，在通过既有的文档OCR引擎得到的识别结果的字符数为规定数以上且识别精度为规定值以上的情况下，判定为文档，不进行步骤S10的WFST运算处理。对于分辨率过小或过大的图像，调整尺寸以成为适合字符识别的大小。The image reading unit 101 inputs an image including characters to be recognized (step S1, which corresponds to the "image input step" in the claims). Document images such as printed documents are captured by scanners, and high-speed and high-accuracy recognition is possible, so character recognition as document images is performed by an existing document OCR engine (step S2). Then, the first character string transition data generation unit 106 generates data represented by WFST (hereinafter referred to as "first WFST data". Corresponds to "first character string transition data" in the claims) based on the candidate group of the recognition result ( Step S3, "equivalent to the first character string transition data generating step" in the claims). In addition, when the number of characters of the recognition result obtained by the existing document OCR engine is more than a predetermined number and the recognition accuracy is more than a predetermined value, it is determined to be a document, and the WFST calculation process of step S10 is not performed. For images that are too small or too large in resolution, resize them to be suitable for character recognition.

(2)图像二值化(2) Image binarization

在步骤S1中输入的图像不是文档图像的情况下，图像二值化部102进行图像二值化(步骤S4)。图像二值化根据局部的明暗来进行，也可以应对低对比度的状况。在白底上进行黑色字符的检测，也可以反转原图像的明暗，在黑底上进行白色字符的检测。此外，对于明显的字符以外的区域，通过膨胀收缩等遮挡处理进行噪声除去。If the image input in step S1 is not a document image, the image binarization unit 102 performs image binarization (step S4). Image binarization is performed according to local light and shade, and can also deal with low-contrast situations. The detection of black characters on a white background can also reverse the light and shade of the original image to detect white characters on a black background. In addition, noise removal is performed on regions other than conspicuous characters through occlusion processing such as dilation and contraction.

(3)字符区域检测(3) Character area detection

字符区域检测部103检测字符区域(步骤S5，相当于权利要求书中的“字符区域检测步骤”)。“字符区域”是指在步骤S1中输入的图像中，识别对象的字符存在的区域，或者存在该可能性的区域。关于该字符区域的检测，公知有如下述的参考文献1那样，通过统计地学习形状的特征而进行检测的方法。在本装置中，通过进行标记处理对每个区域附加标记，根据各区域的形状特征(圆形度、孔数、构成的区域数、外周矩形大小/纵横比、标记区域与非标记区域的面积比等)判定是否是字符区域而进行检测。The character area detection unit 103 detects a character area (step S5, which corresponds to the "character area detection step" in the claims). The "character area" refers to an area in which a character to be recognized exists in the image input in step S1, or an area in which there is a possibility. As for the detection of the character area, there is known a method of performing detection by statistically learning the features of the shape as described in Reference 1 below. In this device, a mark is added to each region by carrying out the marking process, and according to the shape characteristics of each region (circularity, number of holes, number of composed regions, size/aspect ratio of the outer perimeter rectangle, area of the marked region and non-marked region Ratio, etc.) to determine whether it is a character area and detect it.

<参考文献1>“A learning-based method to detect andsegment text fromscene images”，JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE A Volume 8，Number4，pp.568-574<Reference 1> "A learning-based method to detect and segment text from scene images", JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE A Volume 8, Number4, pp.568-574

在本实施方式中，为了通过后述的WFST处理进行过滤，相比于从开始就不检测非字符的噪声区域，优先使用预先尽可能多地检测可能是字符的区域的方法，以便不产生遗漏。因此，将通过膨胀收缩处理连接了接近区域的方式、分解了连接区域的方式、除去了字符周围的噪声的方式也作为检测字符区域的方式。此外，关于该检测方式，能够追加各种方法(利用边缘或色调的方法、高度的字符区域连接处理等)。In this embodiment, in order to perform filtering by the WFST process described later, it is preferable to detect as many regions that may be characters as possible in advance rather than not detecting non-character noise regions from the beginning so that no omissions will occur. . Therefore, a method of connecting adjacent regions by expansion and contraction processing, a method of decomposing connected regions, and a method of removing noise around characters are also methods of detecting character regions. In addition, various methods (methods using edges or hues, high-level character region connection processing, etc.) can be added to this detection method.

(4)字符串候选检测，以单个字符为单位的切出(4) String candidate detection, cut out in units of single characters

字符区域分割部104检测字符区域内的字符串候选，以单个字符为单位进行分割字符区域(以下称作“切出”。)(步骤S6，相当于权利要求书中的“字符区域分割步骤”)。具体而言，字符区域分割部104首先检测字符行。假定字符行由3个字符以上构成，根据字符区域的区域大小/间隔/角度的推移进行检测。对检测到的每个字符行进行标记处理，根据赋予了标记的每个区域的角度的中央值、平均值、最频值等来缩减字符行。图4是用于说明字符区域分割部104的动作的图。如图4所示，按照每个字符行L通过进行基于字符行的角度的搜索进行水平方向/垂直方向的剪切变形，并且，对字符的剪切/旋转双方的变形失真进行校正。在图4中，图像A1示出写有旋转后的字符串的校正前的图像，图像A2示出通过将字符行在垂直方向上剪切变形而校正了字符串方向的倾斜后的图像。The character area segmentation unit 104 detects the character string candidates in the character area, and divides the character area in units of individual characters (hereinafter referred to as "cutting out") (step S6, which corresponds to the "character area segmentation step" in the claims) ). Specifically, the character region dividing unit 104 first detects character lines. Assuming that a character line is composed of three or more characters, detection is performed based on the change of the area size/interval/angle of the character area. Marking is performed on each detected character line, and the character line is reduced based on the median value, average value, mode value, etc. of the angles assigned to each marked area. FIG. 4 is a diagram for explaining the operation of the character region dividing unit 104 . As shown in FIG. 4 , horizontal/vertical shear deformation is performed by performing a search based on the angle of the character line for each character line L, and deformation distortion of both shearing and rotation of the character is corrected. In FIG. 4 , image A1 shows an image before correction in which a rotated character string is written, and image A2 shows an image after correction of inclination in the character string direction by shearing and deforming character lines in the vertical direction.

字符区域分割部104从校正了失真后的图像A2中除去噪声，然后，求出字符行方向的字符间隔，以单个字符为单位进行切出。以单个字符为单位的切出是利用将相对于字符串方向为垂直方向的像素相加而得到的直方图，求出成为字符之间的候选，以在字符行检测时求出的区域大小的中央值、平均值、最频值等为基准，决定多个重叠的切出位置来进行的。在图4中示出对校正后的图像A2中的字符串M一边一点一点地改变角度一边进行水平方向的剪切变形，由此生成多个字符串M1、M2、M3，并对这些字符串M1、M2、M3以单个字符为单位进行了切出的情况。字符串Y2示出对字符串M2进行了以单个字符为单位的切出后得到的字符串，该情况下的空白区域数是4。“空白区域”是指字符之间的区域，在图4中由标号K示出。此外，标号Y3示出对字符串M3进行了以单个字符为单位的切出后得到的字符串，该情况下的空白区域数是7。在本实施方式中，字符区域分割部104采用空白区域的数量和面积最大的情况作为字符区域分割的结果。在图4的例中，字符串Y3是最终选择的字符区域分割后的字符串。此外，通过既有的OCR引擎进行单个字符行的字符识别等基于多个方法/参数的字符位置检测/字符识别处理，在每个可能是字符的切出位置进行单个字符为单位的切出，求出容许位置重复那样的成为过分割的状态转移。The character area dividing unit 104 removes noise from the distortion-corrected image A2, and then obtains the character interval in the character row direction, and cuts out each character. Cutting out in units of individual characters uses a histogram obtained by adding pixels perpendicular to the direction of the character string to find candidates between characters and the size of the area obtained when character line detection is performed. Based on the median value, average value, and mode value, etc., multiple overlapping cutout positions are determined. 4 shows that the character string M in the corrected image A2 is sheared and deformed in the horizontal direction while changing the angle little by little, thereby generating a plurality of character strings M1, M2, M3, and these The character strings M1, M2, and M3 are cut out in units of single characters. The character string Y2 shows a character string obtained by cutting out the character string M2 in units of characters, and the number of blank areas in this case is four. A "blank area" refers to an area between characters, which is indicated by a symbol K in FIG. 4 . In addition, reference numeral Y3 indicates a character string obtained by cutting out the character string M3 in units of individual characters, and the number of blank areas in this case is seven. In this embodiment, the character region dividing unit 104 adopts the case where the number and area of blank regions are the largest as the character region dividing result. In the example of FIG. 4 , the character string Y3 is a character string obtained by dividing the finally selected character region. In addition, character position detection/character recognition processing based on multiple methods/parameters, such as character recognition of a single character line, is carried out through the existing OCR engine, and a single character is cut out at each possible character cut-out position, A state transition that becomes over-segmented such that overlapping of positions is allowed is obtained.

(5)字符识别(5) Character recognition

字符识别部105对存在于字符区域分割部104在步骤S6中分割后的分割区域(图4中由标号D显示)中的各字符进行每个字符的字符识别处理，对每个字符输出1个以上的字符识别处理结果的候选(以下，称作“字符识别候选组”，或者仅称作“候选”。)(步骤S7，相当于权利要求书中的“字符识别步骤”)。以通过多个引擎取得字符识别结果的方式进行单个字符为单位的字符识别。The character recognition unit 105 performs character recognition processing for each character for each character present in the divided region (shown by a symbol D in FIG. 4 ) after the character region division unit 104 divides in step S6, and outputs one character for each character. Candidates for the above character recognition processing results (hereinafter referred to as "character recognition candidate group" or simply "candidates") (step S7 corresponds to "character recognition step" in the claims). Character recognition is performed in units of single characters by obtaining character recognition results from multiple engines.

(6)WFST数据生成(6) WFST data generation

第1字符串转移数据生成部106从在步骤S7中得到的识别结果的候选组中汇总重复候选，生成WFST数据(以下也称作“第1WFST数据”。相当于权利要求书中的“第1字符串转移数据”)(步骤S8，相当于权利要求书中的“第1字符串转移数据生成步骤”)。即，第1字符串转移数据生成部106从字符识别部105输入字符识别处理结果的候选(每个字符1个以上的候选)，计算对于向该候选的转移的权重值，生成基于这些候选和权重值的组的第1WFST数据。The first character string transition data generation unit 106 collects duplicate candidates from the candidate group of the recognition result obtained in step S7, and generates WFST data (hereinafter also referred to as "first WFST data". It corresponds to "first WFST data" in the claims. Character string transfer data") (step S8 corresponds to the "first character string transfer data generating step" in the claims). That is, the first character string transition data generation unit 106 inputs candidates (one or more candidates for each character) of character recognition processing results from the character recognition unit 105, calculates a weight value for transition to the candidate, and generates a character string based on these candidates and The first WFST data of the group of weight values.

在字符识别部105对多个字符识别处理结果的候选分别赋予优先次序并输出的情况下，第1字符串转移数据生成部106基于该优先次序计算上述权重值。此外，在字符识别部105使用至少两种以上不同的识别方式进行了字符识别处理的情况下，第1字符串转移数据生成部106根据该不同的识别方式下的字符识别处理结果的候选的输出数量和上述优先次序来计算上述权重值。在此，通过积/和来合成重复候选的权重值，由此在各字符识别结果中同一候选出现次数越多，权重值就越小。即，在本实施方式中，可以说权重值越小，越是接近实际正确结果的候选。此外，第1字符串转移数据生成部106也可以考虑在语言数据库中登记的单词的字符串转移来计算上述权重值。When the character recognition unit 105 assigns priority to and outputs a plurality of candidates of the character recognition processing results, the first character string transition data generation unit 106 calculates the weight value based on the priority. In addition, when the character recognition unit 105 has performed character recognition processing using at least two different recognition methods, the first character string transition data generation unit 106 outputs candidates based on the character recognition processing results in the different recognition methods Quantity and the above priority order to calculate the above weight value. Here, the weight values of duplicate candidates are synthesized by product/sum, so that the more the same candidate appears in each character recognition result, the smaller the weight value is. That is, in this embodiment, it can be said that the smaller the weight value is, the closer the candidate is to the actual correct result. In addition, the first character string transition data generation unit 106 may calculate the above-mentioned weight value in consideration of character string transitions of words registered in the language database.

图5示出第1字符串转移数据生成部106生成的第1WFST数据的一例。如图5所示，第1WFST数据为被赋予了多个候选及其权重值的状态转移。在存在多个字符识别结果的情况下，具有相同的初始状态的状态转移成为并列排列的形式。图5的例中，示出例如字符识别处理的实际的正确结果是“ドコモ”的情况下，字符识别处理中的多个候选是例如“ド”、“ト”、“人”、“コ”、“二”、“口”、“モ”、“毛”、“t”等、并且各自的权重值是“0.2”、“0.4”、“0.6”、“0.2”、“0.5”、“0.6”、“0.2”、“0.4”、“0.5”的情况。FIG. 5 shows an example of the first WFST data generated by the first character string transition data generating unit 106 . As shown in FIG. 5 , the first WFST data is a state transition to which a plurality of candidates and their weight values are assigned. When there are multiple character recognition results, state transitions with the same initial state are arranged in parallel. In the example of FIG. 5, for example, when the actual correct result of the character recognition process is "ドコモ", a plurality of candidates in the character recognition process are, for example, "ド", "ト", "人", "コ". , "two", "mouth", "モ", "mao", "t", etc., and the respective weight values are "0.2", "0.4", "0.6", "0.2", "0.5", "0.6 ", "0.2", "0.4", "0.5".

为了检测文章中间的关键字，在根据字符识别候选组生成的第1WFST数据中包含从字符串转移的初始状态向各字符候选的ε转移(不具有输入输出的空转移，相当于权利要求书中的“第1空转移”)、从各字符候选向字符串转移的最终状态的ε转移(相当于权利要求书中的“第2空转移”)、为了避免将噪声捕捉为字符而赋予权重值并且以单个字符为单位跳过各字符候选的ε转移(相当于权利要求书中的“第3空转移”)。在图5中，第1空转移由标号E1示出，第2空转移由标号E2示出，第3空转移由标号E3示出，第3空转移的权重值例如示出为“2.0”。另外，为了能够以最适合的处理大小进行运算，将第1WFST数据设为能够在分割为多行单位或一定字符数单位的基础上进行运算，并组合其结果来进行利用。In order to detect the keyword in the middle of the article, the first WFST data generated from the character recognition candidate group includes the ε transition from the initial state of the character string transition to each character candidate (empty transition without input and output, which corresponds to the "first empty transition"), the ε transition of the final state from each character candidate to the character string (corresponding to the "second empty transition" in the claims), weighting value to avoid capturing noise as a character And the ε transition of each character candidate is skipped in units of a single character (equivalent to "the third empty transition" in the claims). In FIG. 5 , the first idle transition is indicated by symbol E1 , the second idle transition is indicated by symbol E2 , the third idle transition is indicated by symbol E3 , and the weight value of the third idle transition is shown as "2.0", for example. In addition, in order to be able to perform calculations with an optimum processing size, the first WFST data can be divided into units of multiple lines or units of a fixed number of characters for calculations, and the results are combined for use.

在此，在步骤S6中成为了过分割的情况下，如图6所示，按照每个重复位置进行以单个字符为单位的字符识别，将重复的字符切出位置的转移表现为一个第1WFST数据。换言之，在字符区域分割部104使用多个分割模式来分割字符区域，生成了多种分割区域的情况下(即在过分割的情况下)，字符识别部105分别对该多种分割区域进行字符识别处理，第1字符串转移数据生成部106对该多种分割区域各自中的字符候选生成第1WFST数据。Here, in the case of over-segmentation in step S6, as shown in FIG. 6, character recognition is performed in units of individual characters for each repeated position, and the transition of the cut-out position of repeated characters is expressed as a first WFST data. In other words, when the character region dividing unit 104 divides the character region using a plurality of division patterns and generates multiple kinds of divided regions (that is, in the case of over-segmentation), the character recognition unit 105 characterizes the various divided regions respectively. In the recognition process, the first character string transition data generating unit 106 generates first WFST data for character candidates in each of the plurality of types of divided regions.

图6的例子示出字符识别处理的实际的正确结果是例如“Forum”的情况下(图6的(A))，通过多个方法、切出参数决定分割位置，并且在多个分割位置进行了单个字符识别处理的结果(图6的(B)和(C))。在图6的(B)所示的结果中得出“fbnim”的识别结果，在图6的(C)所示的结果中得出“石rurn”的识别结果。另外，在图6的(B)的结果中的“b”的部分中，由于噪声，第一候选是“b”，第二候选是“o”。认为噪声是由于在切出时“F”的右上的一部分进入而产生的。对于这样的两个结果，第1字符串转移数据生成部106生成图6的(D)所示那样的一个第1WFST数据。另外，在图6的例子中，省略了从初始状态向中间状态的ε转移、从中间状态向最终状态的ε转移、用于跳过字符的加权ε转移。此外，将所生成的一个第1WFST数据在之后用于与词汇数据的WFST合成运算(参照图6的(E)和(F))，WFST处理部109输出在多种分割区域的全体中累计权重值为上位的(在图6的例中，与词汇数据匹配的“forum”)作为结果，这将在后面记述。The example of FIG. 6 shows that when the actual correct result of the character recognition process is, for example, "Forum" ((A) of FIG. 6 ), the division position is determined by a plurality of methods and cutout parameters, and is performed at a plurality of division positions. The results of the individual character recognition processing are shown ((B) and (C) of FIG. 6 ). The recognition result of "fbnim" was obtained in the result shown in (B) of FIG. 6 , and the recognition result of "rurn" was obtained in the result shown in (C) of FIG. 6 . In addition, in the part of "b" in the result of (B) of FIG. 6 , the first candidate is "b" and the second candidate is "o" due to noise. It is believed that the noise is due to the entry of a part of the upper right of "F" when cutting out. Regarding these two results, the first character string transition data generating unit 106 generates one first WFST data as shown in (D) of FIG. 6 . In addition, in the example of FIG. 6 , the ε transition from the initial state to the intermediate state, the ε transition from the intermediate state to the final state, and the weighted ε transition for skipping characters are omitted. In addition, the generated first WFST data is later used in the WFST synthesis calculation with the vocabulary data (see (E) and (F) in FIG. 6 ), and the WFST processing unit 109 outputs the accumulated weights in the entirety of the various divided regions. The upper value ("forum" matching the vocabulary data in the example of FIG. 6) is the result, which will be described later.

此外，为了进一步提高从情景图像等中检测有意义的词汇的精度，第1字符串转移数据生成部106根据字符识别结果的候选在图像内的位置或者字符识别结果的候选的字符大小等来修正权重值。在图7的例子中，图7的(A)示出有字符进入的图像A3。图7的(B)示出第1字符串转移数据生成部106最初计算出的权重值。对于字符候选“この先”计算出权重值“0.13”。同样地，对于“株式会社”、“10km”、“清水寺”、“旅館”分别计算出权重值“0.15”、“0.15”、“0.20”、“0.21”。In addition, in order to further improve the accuracy of detecting meaningful words from scene images, etc., the first character string transition data generating unit 106 corrects Weights. In the example of FIG. 7 , (A) of FIG. 7 shows an image A3 in which characters enter. (B) of FIG. 7 shows the weight value firstly calculated by the first character string transition data generation unit 106 . For the character candidate "この先", the weight value "0.13" is calculated. Similarly, weight values "0.15", "0.15", "0.20", and "0.21" are calculated for "Co., Ltd.", "10km", "Kiyomizu Temple", and "hotel", respectively.

在此，第1字符串转移数据生成部106使用由图7的(C)和(D)示出的信息来调整最初计算出的权重值。图7的(C)是示出作为关键字的价值的统计性的空间分布的信息。在该例中，图像的中央、左上、右下等是作为关键字的价值较高的部分，在图7的(C)中，用颜色深浅来显示作为关键字的价值。在颜色显示为较深的部分中，由于作为关键字的价值较高，因此分配“1”作为权重系数。在颜色显示较浅的部分，由于作为关键字的价值较低，因此分配“2.5”作为权重系数。图7的(D)示出与字符大小对应的权重系数表。大小为“24”的字符由于大小较大而假定作为关键字的价值较高，分配“1”作为权重系数。大小为“8”的字符由于大小较小而假定作为关键字的价值较低，分配“2.2”作为权重系数。Here, the first character string transition data generation unit 106 adjusts the initially calculated weight value using the information shown in (C) and (D) of FIG. 7 . (C) of FIG. 7 is information showing the statistical spatial distribution of the value of a keyword. In this example, the center, upper left, and lower right of the image are parts with high value as keywords, and in (C) of FIG. 7 , the values as keywords are displayed in shades of color. In the part where the color is displayed darker, "1" is assigned as a weight coefficient because of a higher value as a keyword. In the portion where the color is displayed lighter, "2.5" is assigned as a weight factor because of its low value as a keyword. (D) of FIG. 7 shows a weight coefficient table corresponding to character sizes. A character with a size of "24" is assumed to have a higher value as a keyword due to its larger size, and "1" is assigned as a weight coefficient. A character with a size of "8" is assumed to be less valuable as a keyword due to its smaller size, and "2.2" is assigned as a weight factor.

图7的(E)示出第1字符串转移数据生成部106使用由图7的(C)和(D)示出的信息，调整最初计算出的权重值后得到的结果。通过将最初计算出的权重值与图7的(C)和(D)的权重系数相乘来进行加权，以提高位于较大的字符区域或者作为词汇位于价值较高的位置处的词汇的优先次序。例如，对于词汇“清水寺”，将最初计算出的权重值“0.20”与图7的(C)的空间分布权重值“1.5”和字符大小权重值“1.0”相乘，被赋予“0.3”作为调整后的权重值。通过以上的处理，在权重值调整前，例如词汇“この先”具有比词汇“清水寺”小的权重值，但通过权重值调整，词汇“この先”的权重值大于词汇“清水寺”的权重值。即，可以说通过权重值调整，实际上具有作为关键字的价值的词汇被调整成为具有较小的权重值。(E) of FIG. 7 shows the result obtained by the first character string transition data generation unit 106 adjusting the weight value calculated initially using the information shown in (C) and (D) of FIG. 7 . Weighting is carried out by multiplying the initially calculated weight value with the weight coefficients of (C) and (D) of Figure 7 to increase the priority of words located in larger character regions or as words located in positions with higher value order. For example, for the word "Kiyomizu Temple", the weight value "0.20" initially calculated is multiplied by the weight value "1.5" of the spatial distribution and the weight value "1.0" of the character size in (C) of FIG. 7, and "0.3" is given as Adjusted weight value. Through the above processing, before the weight value adjustment, for example, the vocabulary "この先" has a smaller weight value than the vocabulary "Kiyomizu Temple", but through the weight value adjustment, the weight value of the vocabulary "この先" is greater than the weight value of the vocabulary "Kiyomizu Temple". That is, it can be said that through weight value adjustment, words that actually have value as keywords are adjusted to have smaller weight values.

(7)WFST运算处理(7) WFST operation processing

(WFST运算处理全体的流程)(WFST calculation process overall flow)

WFST处理部109和字符串检测部110将在步骤S3和S8中生成的第1WFST数据汇总成一个第1WFST数据(步骤S9)，然后进行WFST运算处理(步骤S10，相当于权利要求书中的“有限状态转换步骤”)。WFST处理部109和字符串检测部110进行的“WFST运算处理”，包含WFST合成运算(相当于权利要求书中的“合成运算”)，并且包含以下一系列处理：WFST处理部109根据WFST数据依次进行状态转移，将各状态转移中的权重值累计来计算每个状态转移的累计权重值，当根据累计权重值将一个以上的状态转移的结果输出到字符串检测部110时，字符串检测部110根据该累计权重值检测出1个以上的字符串作为字符串识别结果。图8是示出WFST运算处理的流程图。根据字符识别候选组生成的第1WFST数据除了用于与词汇DB111(图1参照)的WFST运算处理的词汇检测外，还能够利用字符识别候选组的第1WFST数据本身作为对图像的检索用表。The WFST processing part 109 and the character string detection part 110 summarize the 1st WFST data generated in steps S3 and S8 into a 1st WFST data (step S9), and then perform WFST calculation processing (step S10, which is equivalent to "" in the claims finite state transition steps"). The "WFST calculation processing" performed by the WFST processing unit 109 and the character string detection unit 110 includes WFST synthesis calculation (equivalent to the "synthesis calculation" in the claims), and includes the following series of processing: WFST processing unit 109 State transitions are carried out sequentially, and the weight values in each state transition are accumulated to calculate the cumulative weight value of each state transition. When the result of more than one state transition is output to the string detection unit 110 according to the cumulative weight value, the string detection The unit 110 detects one or more character strings as character string recognition results based on the cumulative weight value. FIG. 8 is a flowchart showing WFST calculation processing. The first WFST data generated from the character recognition candidate group is not only used for word detection in WFST arithmetic processing with the vocabulary DB 111 (see FIG. 1 ), but also the first WFST data itself of the character recognition candidate group can be used as a search table for images.

在图8中，由步骤S10-1、S10-2、S10-3以及S10-4构成的处理流程是利用字符识别候选组的第1WFST数据本身作为对图像的检索用表，判定在图像中是否存在用户输入关键字的情况的处理流程。该情况下，WFST处理部109对通过步骤S1～S9的一系列处理生成的第1WFST数据以及针对用户输入的关键字的WFST数据(相当于权利要求书中的“第2字符串转移数据”，以下称作“第2WFST数据”。)进行WFST运算处理，由此判定在图像中是否存在关键字。In FIG. 8 , the processing flow consisting of steps S10-1, S10-2, S10-3, and S10-4 uses the first WFST data itself of the character recognition candidate group as a search table for images to determine whether The processing flow when there is a case where the user enters a keyword. In this case, the WFST processing unit 109 compares the first WFST data generated through the series of processes of steps S1 to S9 and the WFST data corresponding to the keyword input by the user (corresponding to the "second character string transfer data" in the claims, Hereinafter, it is referred to as "second WFST data.") WFST arithmetic processing is performed to determine whether or not a keyword exists in the image.

具体而言，首先，从用户输入关键字，第2字符串转移数据生成部107生成对该关键字的第2WFST数据(步骤S10-1)。在图8的(A)中将对用户输入的关键字(检索词汇)生成的第2WFST数据映像。接着，WFST处理部109使用在步骤S10-1中生成的第2WFST数据和通过步骤S1～S9的一系列的处理生成的第1WFST数据进行WFST合成运算(步骤S10-2)。接着，WFST处理部109根据在步骤S10-2中的WFST合成运算的结果进行求出最佳路径的运算(步骤S10-3)。最后，字符串检测部110根据最佳路径的运算结果输出有无用户输入关键字的判定结果，或者该判定结果中的权重(步骤S10-4)。Specifically, first, a keyword is input from the user, and the second character string transition data generation unit 107 generates second WFST data for the keyword (step S10-1). In (A) of FIG. 8 , the second WFST data generated for the keyword (search vocabulary) input by the user is mapped. Next, the WFST processing unit 109 performs a WFST composition calculation using the second WFST data generated in step S10-1 and the first WFST data generated by a series of processing in steps S1 to S9 (step S10-2). Next, the WFST processing unit 109 performs calculations to obtain an optimal route based on the result of the WFST combination calculation in step S10-2 (step S10-3). Finally, the character string detection unit 110 outputs the determination result of whether or not the keyword is input by the user based on the calculation result of the optimal route, or the weight in the determination result (step S10-4).

此外，在图8中，由步骤S10-5、S10-6、S10-7、S10-8以及S10-9构成的处理流程是与词汇DB111的WFST运算处理的词汇检测的情况下的处理流程。该情况下，WFST处理部109对通过步骤S1～S9的一系列处理生成的第1WFST数据和在词汇DB111中存在的各词汇的WFST数据(相当于权利要求书中的“第3字符串转移数据”，以下称作“第3WFST数据”。)进行WFST运算处理，由此检测在图像中存在的词汇。In addition, in FIG. 8, the processing flow comprised by steps S10-5, S10-6, S10-7, S10-8, and S10-9 is a processing flow in the case of vocabulary detection of the WFST calculation process related to vocabulary DB111. In this case, the WFST processing unit 109 compares the first WFST data generated through the series of processes of steps S1 to S9 and the WFST data of each vocabulary existing in the vocabulary DB 111 (corresponding to the "third character string transfer data" in the claims). ", hereinafter referred to as "third WFST data".) WFST arithmetic processing is performed to detect vocabulary existing in the image.

具体而言，首先，第3字符串转移数据生成部108生成在词汇DB111中存在的各词汇的第3WFST数据(步骤S10-5)。接着，WFST处理部109使用在步骤S10-5中生成的第3WFST数据和通过步骤S1～S9的一系列处理生成的第1WFST数据进行WFST合成运算(步骤S10-6)。接着，WFST处理部109根据在步骤S10-6中的WFST合成运算的结果，进行求出最佳路径的运算(步骤S10-7)。最后，字符串检测部110按照最佳路径中的权重值顺序输出词汇(步骤S10-8)。并且，在辞典处于分类别的情况下，或者存在分类信息辞典的情况下，输出分类信息(步骤S10-9)。Specifically, first, the third character string transition data generation unit 108 generates third WFST data for each vocabulary existing in the vocabulary DB 111 (step S10-5). Next, the WFST processing unit 109 performs WFST synthesis calculation using the third WFST data generated in step S10-5 and the first WFST data generated by a series of processing in steps S1 to S9 (step S10-6). Next, the WFST processing unit 109 performs calculations to obtain an optimal route based on the result of the WFST combination calculation in step S10-6 (step S10-7). Finally, the character string detection unit 110 outputs vocabulary in order of weight value in the optimal path (step S10-8). And, when the dictionary is classified into categories, or when there is a category information dictionary, category information is output (step S10-9).

(WFST合成运算)(WFST synthesis operation)

图9中示出WFST合成运算(图8的步骤S10-2和10-6)映像。WFST合成运算是对由两个WFST数据表现的状态转移进行比较，并取出以共同的单个字符为单位的词汇的转移的运算。至于WFST合成运算的结果，根据合成的两个转移的权重值重新计算各转移具有的权重值，WFST合成运算的结果为根据状态转移的权重值计算出最佳路径(权重小的转移)上位而得到的结果。另外，在过分割的情况下，WFST处理部109输出在多种分割区域的全体中累计权重值为上位的状态转移的结果，作为WFST合成运算的结果。FIG. 9 shows a map of the WFST synthesis operation (steps S10-2 and 10-6 in FIG. 8). The WFST synthesis operation is an operation for comparing state transitions represented by two pieces of WFST data, and extracting transitions of words in units of common single characters. As for the result of the WFST synthesis operation, the weight value of each transition is recalculated according to the weight values of the two transitions synthesized. The result of the WFST synthesis operation is to calculate the best path (transition with a small weight) based on the weight value of the state transition. The results obtained. In addition, in the case of over-segmentation, the WFST processing unit 109 outputs the result of state transition in which the cumulative weight value is higher in the entirety of the plurality of divisional regions as the result of the WFST synthesis calculation.

在词汇检测(由步骤S10-5～S10-9构成的处理流程)中，进行图9的(A)中示出那样的字符识别候选组的第1WFST数据(与图5所示的相同)与图9的(B)中示出那样的词汇DB111中的词汇数据的第3WFST数据的WFST合成运算，取出转移的权重为上位的词汇(即仅取出与词汇数据匹配的路径)，由此按照权重值的顺序检测词汇。图9的(C)示出取得了“ドコモ”、“人毛”、“人口”作为WFST合成运算的结果，各自的权重值分别是“0.2+0.2+0.2＝0.6”、“0.6+2.0+0.4＝2.8”、“0.6+0.6＝1.2”的情况。因此，检测到权重值最小的“ドコモ”作为最佳路径，字符串检测部110输出“ドコモ”作为词汇检测的结果。此外，由于存在用于跳过字符的ε转移，还能够进行将“天ぷらおむすび”检测为“天むす”等的略称检测。此外，在词汇DB111为大规模的结构的情况下，存在即便没有完全一致的词汇也可以取出词汇的一部分作为正确结果词汇的情况。In vocabulary detection (processing flow consisting of steps S10-5 to S10-9), the first WFST data (the same as that shown in FIG. 5 ) and the first WFST data (the same as shown in FIG. The WFST synthesis operation of the third WFST data of the vocabulary data in the vocabulary DB 111 shown in (B) of FIG. The sequence of values detects the vocabulary. (C) of FIG. 9 shows that "ドコモ", "human hair", and "population" are obtained as the result of the WFST synthesis operation, and the respective weight values are "0.2+0.2+0.2=0.6", "0.6+2.0+ 0.4=2.8", "0.6+0.6=1.2". Therefore, "docomo" having the smallest weight value is detected as an optimal route, and the character string detection unit 110 outputs "docomo" as a result of vocabulary detection. In addition, since there is an ε transition for skipping characters, it is also possible to perform abbreviation detection such as detecting "天ぷらおむすび" as "天むす". In addition, when the vocabulary DB 111 has a large-scale structure, even if there is no completely matching vocabulary, it may be possible to extract a part of the vocabulary as a correct result vocabulary.

在检索用表的情况下(由步骤S10-1～S10-4构成的处理流程)，通过第2WFST数据来表现想要在图像中查找的检索关键字，进行与字符识别候选组的第1WFST数据的WFST合成运算。判定在该WFST合成运算中，是否通过合成的两个WFST数据的转移得到从初始状态向最终状态的转移。由此，能够判定字符识别候选组中是否存在检索关键字，即在图像中是否存在用户输入的关键字。此外，还能够根据转移的权重值对多个图像赋予次序。在图9的例中，当用户输入的关键字例如是“ドコモ”，“人毛”，“人口”中的任意一个时，能够通过合成的两个WFST数据的转移得到从初始状态向最终状态的转移，因此判定为该用户输入的关键字存在于图像中。但是，“ドコモ”、“人毛”、“人口”的权重值分别是“0.2+0.2+0.2＝0.6”、“0.6+2.0+0.4＝2.8”、“0.6+0.6＝1.2”，因此检测出权重值最小的“ドコモ”作为最佳路径。在用户输入的关键字是“ドコモ”的情况下，字符串检测部110输出最小的权重值作为词汇检索的结果。In the case of a search table (processing flow consisting of steps S10-1 to S10-4), the search keyword to be searched in the image is represented by the second WFST data, and the first WFST data of the character recognition candidate group is performed. The WFST synthesis operation. It is judged whether or not the transition from the initial state to the final state is obtained by the transition of the combined two WFST data in this WFST compositing operation. Thereby, it can be determined whether or not a search keyword exists in the character recognition candidate group, that is, whether or not a keyword input by the user exists in the image. In addition, it is also possible to assign an order to a plurality of images according to transferred weight values. In the example of Figure 9, when the keyword input by the user is any one of "ドコモ", "human hair", and "population", the transition from the initial state to the final state can be obtained through the transfer of the two synthesized WFST data Therefore, it is determined that the keyword input by the user exists in the image. However, the weight values of "ドコモ", "human hair", and "population" are "0.2+0.2+0.2=0.6", "0.6+2.0+0.4=2.8", and "0.6+0.6=1.2", respectively, so it is detected "ドコモ" with the smallest weight value is taken as the best path. When the keyword input by the user is "ドコモ", the character string detection unit 110 outputs the smallest weight value as a result of the vocabulary search.

如图9的(C)所示，通过将初始状态设为相同的逐个字符的词汇的转移来表示图9的(B)中例示的词汇数据。此外，也可以利用由另行统计处理等得到的频度信息、利用者输入的学习信息或者词汇的字符串长度等来赋予权重。此外，作为比较的对象的第1WFST数据与第2WFST数据以及第1WFST数据与第3WFST数据并不需要分别一定是相同形式的数据，只要是表示字符的状态转移的数据，并且是可以进行比较的程度的数据形式即可。As shown in (C) of FIG. 9 , the vocabulary data exemplified in (B) of FIG. 9 is represented by transition of vocabulary character by character with the initial state set to the same. In addition, weights may be assigned using frequency information obtained by separate statistical processing, learning information input by the user, character string lengths of vocabulary, and the like. In addition, the 1st WFST data and the 2nd WFST data and the 1st WFST data and the 3rd WFST data that are the objects of comparison do not necessarily need to be data in the same format, as long as they are data that represent the state transition of characters and are comparable data format.

(WFST合成运算，变形1)(WFST synthesis operation, deformation 1)

在本实施方式中，在WFST合成运算中假定了各种变形，以下，对变形1进行说明。在变形1中，词汇DB111具有对词汇的分类信息，第2字符串转移数据生成部107或者第3字符串转移数据生成部108附加该分类信息而生成第2WFST数据或者第3WFST数据，WFST处理部109附加该分类信息而进行WFST合成运算，并输出其结果。即，在变形1中，作为词汇DB111，通过将输入作为词汇、将输出作为分类信息的WFST数据与附加了分类信息的词汇DB111的WFST数据的合成运算，能够在检测关键字的同时取得用于分类的信息或者对关键字附加分类信息。该情况下，为了能够使同一词汇具有多个分类信息，在词汇DB111的最终状态的输入中附加分类信息的连续编号，在输出中附加分类信息(即分类信息的内容)。此外，在根据字符识别生成的第1WFST数据的最终状态中，附加向在词汇DB111上的同一词汇中使用的分类信息的最大数量个连续编号的转移。In this embodiment, various modifications are assumed in the WFST synthesis calculation, and a modification 1 will be described below. In Variation 1, the vocabulary DB 111 has classification information on vocabulary, and the second character string transition data generating unit 107 or the third character string transition data generating unit 108 adds the classification information to generate the second WFST data or the third WFST data, and the WFST processing unit 109 adds the classification information to perform WFST composition calculation, and outputs the result. That is, in Variation 1, as the vocabulary DB 111, by synthetically computing the WFST data whose input is vocabulary and the output is classification information, and the WFST data of the vocabulary DB 111 to which classification information has been added, it is possible to obtain keywords used for detecting keywords at the same time. classified information or add classified information to keywords. In this case, in order to allow the same vocabulary to have a plurality of classification information, the serial number of the classification information is added to the input of the final state of the vocabulary DB 111, and the classification information (that is, the content of the classification information) is added to the output. In addition, in the final state of the first WFST data generated by character recognition, transitions to the maximum number of consecutive numbers of classification information used in the same vocabulary on the vocabulary DB 111 are added.

图10示出变形1中的处理的一例。图10的(A)示出附加了分类信息的词汇数据的一例。在检索用表的情况下，图10的(A)示出第2字符串转移数据生成部107生成的带分类信息的第2WFST数据。在词汇检测的情况下，则图10的(A)示出第3字符串转移数据生成部108生成的带分类信息的第3WFST数据。分类信息是用于识别同一词汇的多个类别的信息。例如，对词汇“つばめ”附加了连续编号为“0000”和“0001”这两个分类信息即“新干线”和“鸟类”。另外，图10的(A)中的“<eps>”是示出WFST运算处理中的空的转移的标号，是各字符(例如“つ”、“ば”、“め”等)是输入的情况下的输出。图10的(B)示出对字符识别的结果附加分类信息的连续编号并转换为第1WFST数据的情况。例如，在字符识别的结果“つばめ”中，在其WFST数据的最终状态中，附加了向词汇DB111中在词汇“つばめ”中使用的分类信息的最大数量个连续编号(在图10的例中为连续编号“0000”和“0001”这两个编号)的转移。进行图10的(A)所示的第2WFST数据或者第3WFST数据与图10的(B)所示的第1WFST数据的合成运算，图10的(C)示出合成运算的结果。在比较了两个WFST数据之后，仅取出了两个匹配的路径，但是，通过<eps>空转移，作为图10的(C)的结果仅示出了分类信息。FIG. 10 shows an example of processing in Variation 1. (A) of FIG. 10 shows an example of vocabulary data to which classification information is added. In the case of a search table, (A) of FIG. 10 shows the second WFST data with classification information generated by the second character string transition data generation unit 107 . In the case of vocabulary detection, (A) of FIG. 10 shows the third WFST data with classification information generated by the third character string transition data generation unit 108 . Classification information is information for identifying a plurality of categories of the same vocabulary. For example, to the word "つばめ", two classification information of consecutive numbers "0000" and "0001", that is, "shinkansen" and "birds", are added. In addition, "<eps>" in (A) of FIG. case output. (B) of FIG. 10 shows a case where a serial number of classification information is added to the result of character recognition and converted into first WFST data. For example, in the character recognition result "つばめ", in the final state of its WFST data, the maximum number of consecutive numbers (in the example of FIG. For the transition of the two consecutive numbers "0000" and "0001"). Combination calculation of the second WFST data or the third WFST data shown in (A) of FIG. 10 and the first WFST data shown in FIG. 10(B) is performed, and (C) of FIG. 10 shows the result of the synthesis calculation. After comparing the two WFST data, only two matched paths are taken out, however, only classification information is shown as a result of (C) of FIG. 10 by the <eps> null transition.

图11示出与图10的情况同样的情况，但不同之处在于字符识别的结果是“すずめ”。图11的(C)中示出了合成运算的结果，在比较了两个WFST数据之后，仅取出了一个匹配的路径，但是，与图10的(C)同样，通过<eps>空转移，作为结果仅示出分类信息。FIG. 11 shows the same situation as that of FIG. 10 , but differs in that the result of character recognition is "すずめ". (C) of FIG. 11 shows the result of the synthesis operation. After comparing the two WFST data, only one matching path is taken out. However, as in (C) of FIG. As a result only classification information is shown.

图12示出与图10的情况同样的情况，但不同之处在于没有<eps>转移。图12的(C)中示出了合成运算的结果，在比较了两个WFST数据之后，仅取出了两个匹配的路径，但是，由于没有<eps>转移，因此作为结果示出了词汇和分类信息双方。Fig. 12 shows the same situation as that of Fig. 10, except that there is no <eps> transition. The result of the synthesis operation is shown in (C) of Figure 12. After comparing the two WFST data, only two matching paths are taken out, however, since there is no <eps> transition, the vocabulary and Categorize information on both sides.

(WFST合成运算，变形2)(WFST synthesis operation, deformation 2)

接着，对变形2进行说明。在变形2中，如图13所示，字符识别装置1还具有词汇分类关联性矢量存储部112(相当于权利要求书中的“词汇分类关联性矢量存储单元”)。词汇分类关联性矢量存储部112用于存储示出词汇与分类信息的关联性的词汇分类关联性矢量。第1字符串转移数据生成部106将自身生成的第1WFST数据中的字符识别处理结果的候选和该候选的权重值与词汇分类关联性矢量的值相加。接着，第1字符串转移数据生成部106将值最大的分类信息作为与该候选对应的分类信息，基于该分类信息修正对于该候选的权重值。然后，WFST处理部109根据该修正后的权重值进行WFST合成运算。Next, modification 2 will be described. In Variation 2, as shown in FIG. 13 , the character recognition device 1 further includes a lexical-category-relevance-vector storage unit 112 (corresponding to "vocabulary-category-relevance vector storage means" in the claims). The vocabulary classification correlation vector storage unit 112 is used to store a vocabulary classification correlation vector showing the correlation between vocabulary and classification information. The first character string transition data generation unit 106 adds the candidate of the character recognition processing result in the first WFST data generated by itself and the weight value of the candidate to the value of the vocabulary classification correlation vector. Next, the first character string transition data generation unit 106 uses the classification information having the largest value as the classification information corresponding to the candidate, and corrects the weight value for the candidate based on the classification information. Then, the WFST processing unit 109 performs WFST composition calculations based on the corrected weight values.

即，在变形2中，利用同义关系数据库预先将分类信息与词汇的关联性作为表进行准备，由此能够变更分类信息的优先级。例如，如图14那样，如果具有食物菜单作为词汇，具有以矢量表示食物类别的关系性的排列(图14的(B)，词汇分类关联性矢量)作为分类信息，则将检测词汇的矢量相加，能够检测出矢量最大的食物类别作为检测词汇的食物类别。相反，通过根据取得的类别的顺序或矢量值重新决定词汇的权重，还能够变更所检测到的食物菜单的优先级。That is, in Modification 2, the relationship between classification information and vocabulary is prepared in advance as a table using the synonym relational database, whereby the priority of classification information can be changed. For example, as shown in FIG. 14, if there is a food menu as a vocabulary and a vector representing the relationship between food categories ((B) in FIG. 14 , vocabulary classification correlation vector) as classification information, the vectors of the detected vocabulary are compared to In addition, the food category with the largest vector can be detected as the food category of the detection vocabulary. Conversely, it is also possible to change the priority of the detected food menu by re-determining the weight of the vocabulary based on the acquired category order or vector value.

图14的(A)示出在字符识别中检测到的词汇(“餃子”，“スープ”等，各食物菜单)，图14的(B)示出各食物菜单与食物类别的对应表(词汇分类关联性矢量)。图14的(C)示出参照图14的(B)的对应表，计算与图14的(A)的各食物菜单对应的矢量值的例子。在该例中，由于对“中餐”计算出最高的矢量值，因此将图14的(A)所示的词汇的类别判断为“中餐”。最后，图14的(D)示出反映了图14的(C)中判断出的类别“中餐”，并修正了对图14的(A)的各食物菜单的权重值后的情况。(A) of FIG. 14 shows vocabulary detected in character recognition (“dumpling”, “スープ”, etc., each food menu), and FIG. 14 (B) shows a correspondence table of each food menu and food category (vocabulary categorical relevance vector). (C) of FIG. 14 shows the example which calculated the vector value corresponding to each food menu of FIG. 14 (A) with reference to the correspondence table of FIG. 14 (B). In this example, since the highest vector value is calculated for "Chinese food", the category of the vocabulary shown in (A) of FIG. 14 is determined to be "Chinese food". Finally, (D) of FIG. 14 shows the situation after the weight value of each food menu in (A) of FIG. 14 is corrected reflecting the category “Chinese food” judged in (C) of FIG. 14 .

图15是示出图14的(D)所示的修正权重值的计算过程的图。通过图15的(A)～(D)的步骤，计算图14的(D)所示的修正权重值。图15的(A)示出检测到的类别权重值的和，相当于图14的(B)和(C)。图15的(B)示出取图14的(A)所示的词汇权重值的倒数，即取(1/词汇权重值)，并乘以各词汇的类别权重值的情况。例如，对于“餃子”，取图14的(A)所示的词汇权重值即“0.3”的倒数，即取“1/0.3”，并分别与图15的(A)所示的类别权重值“0，1.0，0”相乘，由此得到“0，3.33，0”的计算结果。同样地，对于“スープ”，取图14的(A)所示的词汇权重值即“0.45”的倒数，即“1/0.45”，并分别乘以图15的(A)所示的类别权重值“0，0.3，0.7”，由此得到“0，0.67，1.56”的计算结果。FIG. 15 is a diagram showing a calculation procedure of a correction weight value shown in (D) of FIG. 14 . The correction weight value shown in (D) of FIG. 14 is calculated through the steps of (A) to (D) of FIG. 15 . (A) of FIG. 15 shows the sum of detected class weight values, and corresponds to (B) and (C) of FIG. 14 . (B) of FIG. 15 shows the case where the reciprocal of the vocabulary weight value shown in FIG. 14 (A), that is, (1/vocabulary weight value) is taken and multiplied by the category weight value of each vocabulary. For example, for "dumplings", take the reciprocal of the vocabulary weight value shown in (A) of Figure 14, that is, "0.3", that is, take "1/0.3", and respectively compare with the category weight value shown in Figure 15 (A) "0, 1.0, 0" are multiplied together to obtain the calculation result of "0, 3.33, 0". Similarly, for "スープ", take the reciprocal of the vocabulary weight value shown in (A) of Figure 14, that is, "0.45", that is, "1/0.45", and multiply it by the category weight shown in (A) of Figure 15 Values "0, 0.3, 0.7", resulting in calculation results of "0, 0.67, 1.56".

图15的(C)示出将图15的(B)的结果与图15的(A)的和相乘的情况。例如，对于“餃子”，将图15的(B)的结果即“0，3.33，0”分别与图15的(A)的和即“0.5，2.8，0.7”相乘，由此得到“0，9.33，0”的计算结果。同样地，对于“スープ”，将图15的(B)的结果即“0，0.67，1.56”分别与图15的(A)的和即“0.5，2.8，0.7”相乘，由此得到“0，1.87，1.09”的计算结果。(C) of FIG. 15 shows a case where the result of (B) of FIG. 15 is multiplied by the sum of (A) of FIG. 15 . For example, for "dumplings", the result of (B) in Figure 15, that is, "0, 3.33, 0" is multiplied with the sum of (A) in Figure 15, that is, "0.5, 2.8, 0.7", thereby obtaining "0 ,9.33,0" calculation result. Similarly, for "スープ", the result of (B) in Figure 15, that is, "0, 0.67, 1.56" is multiplied by the sum of (A) in Figure 15, that is, "0.5, 2.8, 0.7", thereby obtaining " 0, 1.87, 1.09" calculation results.

最后，图15的(D)示出对各词汇按照每个类别分别将图15的(C)的计算结果相加，并将相加得到的值的倒数作为修正权重值的情况。例如，对于“餃子”，对图15的(C)的结果即“0，9.33，0”将每个类别的数值全部相加得到“9.33”的计算结果。然后，取其倒数而得到修正权重值“0.11”的计算结果。同样地，对于“スープ”，对图15的(C)的结果即“0，1.87，1.09”将每个类别的数值全部相加得到“2.96”的计算结果。然后，取其倒数而得到修正权重值“0.34”的计算结果。Finally, (D) of FIG. 15 shows a case where the calculation results of (C) of FIG. 15 are added for each vocabulary for each category, and the reciprocal of the added value is used as a correction weight value. For example, for "dumplings", the calculation result of "9.33" is obtained by adding all the numerical values of each category to the result of (C) in FIG. 15 , namely "0, 9.33, 0". Then, the reciprocal thereof is taken to obtain a calculation result of the corrected weight value "0.11". Similarly, for "スープ", the calculation result of "2.96" is obtained by adding all the numerical values of each category to "0, 1.87, 1.09", which is the result of (C) in FIG. 15 . Then, the inverse number thereof is taken to obtain a calculation result of the corrected weight value "0.34".

(WFST合成运算，变形3)(WFST synthesis operation, deformation 3)

接着，对变形3进行说明。在变形3中，在WFST数据中附加“位置信息”。即，当输出字符识别处理结果的候选时，字符识别部105一并输出该候选在图像内的位置信息。可以设置用于取得图像内的位置信息的另外的单元，也可以由字符识别部105进行该取得。而且，第1字符串转移数据生成部106附加该位置信息而生成第1WFST数据，WFST处理部109和字符串检测部110附加该位置信息而进行WFST合成运算，并输出其结果。在变形3中，由于通过WFST合成运算本身无法确定到检测词汇的位置，因此另外准备表(参照下述的图16的(C))来存储原来的位置信息，并将该表编号附加到状态转移中。这样，所检测到的结果中也带有该表编号，因此能够确定原来的位置信息(参照图16的(D)和(E))。Next, modification 3 will be described. In Variation 3, "position information" is added to the WFST data. That is, when outputting a candidate of the character recognition processing result, the character recognition unit 105 also outputs the position information of the candidate within the image. A separate means for acquiring positional information within the image may be provided, or the acquisition may be performed by the character recognition unit 105 . Then, the first character string transition data generation unit 106 adds the position information to generate first WFST data, and the WFST processing unit 109 and the character string detection unit 110 add the position information to perform WFST composition calculation and output the result. In Variation 3, since the position of the detected vocabulary cannot be determined by the WFST synthesis operation itself, a separate table (see (C) of FIG. 16 below) is prepared to store the original position information, and the table number is added to the state Transferring. In this way, the table number is included in the detected result, so that the original position information can be identified (see (D) and (E) of FIG. 16 ).

图16是用于示出变形3中的各功能要素的动作的图。图16的(C)示出字符识别部105输出的位置信息。位置信息作为位置信息表被输出，通过x1和y1示出各字符的左上的坐标，通过x2和y2示出各字符的右下的坐标。此外，各位置信息通过“0000”、“0001”等连续编号来识别。当输出字符识别处理结果的候选时，字符识别部105一并输出图16的(C)所示那样的连续编号。图16的(B)示出第1字符串转移数据生成部106生成的带位置信息的第1WFST数据。与图5的第1WFST数据相比，不同点在于附加了位置信息的连续编号。另外，省略了用于跳过字符的加权ε转移。FIG. 16 is a diagram illustrating the operation of each functional element in Variation 3. FIG. (C) of FIG. 16 shows position information output by the character recognition unit 105 . The positional information is output as a positional information table, in which x1 and y1 indicate the coordinates of the upper left of each character, and x2 and y2 indicate the coordinates of the lower right of each character. In addition, each piece of position information is identified by serial numbers such as "0000" and "0001". When outputting the candidates of the character recognition processing results, the character recognition unit 105 also outputs serial numbers as shown in (C) of FIG. 16 . (B) of FIG. 16 shows the first WFST data with position information generated by the first character string transition data generation unit 106 . Compared with the first WFST data of FIG. 5 , the difference lies in the addition of serial numbers for position information. In addition, the weighted ε transfer for skipping characters is omitted.

图16的(A)示出附加了位置信息(更准确地讲是位置信息的连续编号)的词汇数据的一例。在检索用表的情况下，图16的(A)示出第2字符串转移数据生成部107生成的带位置信息的第2WFST数据。在词汇检测的情况下，图16的(A)示出第3字符串转移数据生成部108生成的带位置信息的第3WFST数据。如图16的(A)所示，在转移的最初和最后附加有位置信息的连续编号，所附加的连续编号的数量与图16的(C)所示的位置信息的数量的最大数相同。在该例中，位置信息的数量的最大数是从“0000”到“9999”的1万个。(A) of FIG. 16 shows an example of vocabulary data to which position information (more precisely, a serial number of position information) is added. In the case of a search table, (A) of FIG. 16 shows the second WFST data with position information generated by the second character string transition data generation unit 107 . In the case of vocabulary detection, (A) of FIG. 16 shows the third WFST data with position information generated by the third character string transition data generation unit 108 . As shown in (A) of FIG. 16 , serial numbers of position information are added at the beginning and end of the transition, and the number of consecutive numbers added is the same as the maximum number of position information shown in (C) of FIG. 16 . In this example, the maximum number of pieces of location information is 10,000 from "0000" to "9999".

图16的(D)示出进行了图16的(A)的第2WFST数据或者第3WFST数据与图16的(B)的第1WFST数据的合成运算的结果。在合成运算的结果中附加了位置信息的连续编号。而且，如图16的(E)所示，通过对照在合成运算的结果中附加的连续编号与图16的(C)的位置信息表，能够确定“ライン”、“スソ”等字符识别的结果位于图像内的哪个位置。(D) of FIG. 16 shows the result of the synthesis calculation of the 2nd WFST data or the 3rd WFST data of FIG. 16(A) and the 1st WFST data of FIG. 16(B). The serial number of the position information is appended to the result of the composite operation. Furthermore, as shown in (E) of FIG. 16 , the results of character recognition such as "Rain" and "スソ" can be specified by comparing the serial number added to the result of the synthesis operation with the position information table in (C) of FIG. 16 where in the image.

在仅利用一个重复位置的词汇的情况下，另外生成已经检测出字符位置的判定用排列，将从最佳路径上位起与检测词汇的字符位置对应的排列位置设为已经检测。在已经检测出排列的情况下，判定为词汇重叠，仅利用从同一位置检测到的关键字的优先次序高的词汇。这样，以填补间隙的方式配置词汇，由此可以利用优先次序高的词汇来进行字符识别结果的校正。In the case of using only one vocabulary at an overlapping position, an array for determination in which character positions have already been detected is generated separately, and the array position corresponding to the character position of the detected vocabulary from the top of the optimal path is already detected. If the sequence has already been detected, it is determined that vocabulary overlaps, and only a vocabulary with a high priority of a keyword detected from the same position is used. In this way, by arranging the vocabulary so as to fill in the gaps, correction of the character recognition result can be performed using the vocabulary with a high priority.

(WFST合成运算，变形4)(WFST compositing operation, variant 4)

接着，对变形4进行说明。在以分隔书写为前提的语言的情况下，与日语相比，构成的字符的变形较少，因此在利用图5那样的字符识别候选组的WFST数据的情况下，可能会像例如从“pencil”的一部分中仅检测到“pen”那样，仅检测出单词的一部分。因此，在变形4中，在WFST数据中附加“分隔识别信息(相当于权利要求书中的“识别信息”)”。Next, modification 4 will be described. In the case of a language that presupposes separated writing, there are fewer deformations of the characters than Japanese, so in the case of using the WFST data of the character recognition candidate group as shown in FIG. Only part of the word is detected like "pen" in part of ". Therefore, in Variation 4, "separation identification information (corresponding to "identification information" in the claims)" is added to the WFST data.

即，当输出字符识别处理结果的候选时，字符识别部105一并输出表示单词间的分隔的分隔识别信息。在字符识别中，在识别为是分隔的字符的情况下，即识别出的字符是例如空格、句号、括号等符号的情况下，输出分隔识别信息。然后，第1字符串转移数据生成部106附加该分隔识别信息而生成带分隔识别信息的第1WFST数据。此外，第2字符串转移数据生成部107和第3字符串转移数据生成部108也附加分隔识别信息而分别生成带分隔识别信息的第2WFST数据和第3WFST数据。除了在上述空格、句号、括号等符号的情况下附加分隔识别信息外，还在单词的最初和最后附加。然后，当进行用于WFST处理的状态转移时，WFST处理部109以被两个分隔识别信息而分隔的部分为单位进行该状态转移。That is, when outputting the candidates of the character recognition processing result, the character recognition unit 105 also outputs the separation identification information indicating the separation between words. In character recognition, when a character recognized as a delimiter, that is, when the recognized character is a symbol such as a space, a period, or a parenthesis, delimiter identification information is output. Then, the first character string transition data generating unit 106 adds the segment identification information to generate the first WFST data with segment identification information. In addition, the second character string transition data generating unit 107 and the third character string transition data generating unit 108 also add the delimiter identification information to generate the second WFST data and the third WFST data with delimiter identification information, respectively. In addition to adding delimiter identification information in the case of symbols such as spaces, periods, and parentheses described above, it is also added at the beginning and end of words. Then, when performing a state transition for WFST processing, the WFST processing unit 109 performs the state transition in units of parts separated by two pieces of separation identification information.

图17是用于示出变形4中的各功能要素的动作的图。图17的(B)示出图像中的字符串是“{two pens}”的情况下，第1字符串转移数据生成部106生成的带分隔识别信息的第1WFST数据。识别出符号“{”的字符识别部105输出该字符识别处理结果，并输出分隔识别信息。第1字符串转移数据生成部106取得这些信息后，生成以符号“{”为输入、分隔识别信息“<sp>”为输出的带识别信息的第1WFST数据。关于符号“}”也是同样的。关于“two”与“pens”之间的空格，当字符识别部105将字符识别的结果是空格的情况输出后，第1字符串转移数据生成部106生成对该空格分配了分隔识别信息后的带分隔识别信息的第1WFST数据。另外，在图17中，省略了从初始状态向中间状态的ε转移、从中间状态向最终状态的ε转移、以及用于跳过字符的加权ε转移。FIG. 17 is a diagram illustrating the operation of each functional element in Variation 4. FIG. (B) of FIG. 17 shows the first WFST data with separator identification information generated by the first character string transition data generation unit 106 when the character string in the image is "{two pens}". The character recognition section 105 that recognizes the symbol "{" outputs the result of the character recognition processing, and outputs delimited recognition information. After acquiring these pieces of information, the first character string transition data generation unit 106 generates first WFST data with identification information, which takes the symbol "{" as input and separator identification information "<sp>" as output. The same applies to the symbol "}". Regarding the space between "two" and "pens", when the character recognition unit 105 outputs that the character recognition result is a space, the first character string transition data generation unit 106 generates a space in which delimiter identification information is assigned to the space. 1st WFST data with delimited identification information. In addition, in FIG. 17 , the ε transition from the initial state to the intermediate state, the ε transition from the intermediate state to the final state, and the weighted ε transition for skipping characters are omitted.

图17的(A)示出附加了分隔识别信息的分隔书写用的词汇数据的一例。在检索用表的的情况下，图17的(A)示出第2字符串转移数据生成部107生成的带分隔识别信息的第2WFST数据。在词汇检测的的情况下，图17的(A)示出第3字符串转移数据生成部108生成的带分隔识别信息的第3WFST数据。如图17的(A)中所示，在单词的最初和最后附加了分隔识别信息“<sp>”。此外，对于在英语中表示复数形式的“s”，使输出为“<esp>”。由此，能够使复数形式的“s”在合成运算的结果中不产生影响。(A) of FIG. 17 shows an example of vocabulary data for partition writing to which partition identification information is added. In the case of a search table, (A) of FIG. 17 shows the second WFST data with segment identification information generated by the second character string transition data generation unit 107 . In the case of vocabulary detection, (A) of FIG. 17 shows the third WFST data with separator identification information generated by the third character string transition data generation unit 108 . As shown in (A) of FIG. 17 , partition identification information "<sp>" is appended at the beginning and end of a word. Also, make the output "<esp>" for "s" which means plural in English. Accordingly, it is possible to prevent the complex number "s" from having an influence on the result of the synthesis operation.

图17的(C)中示出进行了图17的(B)所示的第1WFST数据与图17的(A)所示的第2WFST数据或者第3WFST数据的合成运算的结果。WFST处理部109在进行状态转移时，以被两个分隔识别信息分隔的部分为单位进行该状态转移，即以图17的(B)所示的“two”或“pens”为单位进行该状态转移，进行与图17的(A)的第1WFST数据的合成运算，因此输出“pen”作为结果。(C) of FIG. 17 shows the result of the synthesis calculation of the first WFST data shown in (B) of FIG. 17 and the second WFST data or third WFST data shown in (A) of FIG. 17 . When the WFST processing unit 109 performs a state transition, the state transition is performed in units of parts separated by two separation identification information, that is, the state is performed in units of "two" or "pens" shown in (B) of FIG. In the branch, the synthesis operation with the first WFST data in (A) of FIG. 17 is performed, and "pen" is output as a result.

与此相对，图17的(D)示出在图像中的字符串是“pencil.”的情况下，第1字符串转移数据生成部106生成的带分隔识别信息的第1WFST数据。字符识别部105识别出符号“.”后输出该字符识别处理结果，并输出分隔识别信息。第1字符串转移数据生成部106收到这些信息后，生成以符号“.”为输入、分隔识别信息“<sp>”为输出的带识别信息的第1WFST数据。在图17的(E)中示出进行了图17的(D)所示的第1WFST数据与图17的(A)所示的第2WFST数据或者第3WFST数据的合成运算的结果。WFST处理部109在进行状态转移时，以被两个分隔识别信息分隔的部分为单位进行该状态转移，即以图17的(D)所示的“pencil”为单位进行该状态转移，进行与图17的(A)的第1WFST数据的合成运算，因此未检测到匹配的词汇。由此，能够防止从“pencil”的一部分中仅检测出“pen”等仅检测出单词的部分拼写的情况。On the other hand, (D) of FIG. 17 shows the first WFST data with separator identification information generated by the first character string transition data generation unit 106 when the character string in the image is "pencil." The character recognition unit 105 recognizes the symbol ".", outputs the result of character recognition processing, and outputs delimited recognition information. After receiving these information, the first character string transition data generation unit 106 generates the first WFST data with identification information, which takes the symbol "." as input and the separator identification information "<sp>" as output. (E) of FIG. 17 shows the result of the synthesis calculation of the first WFST data shown in (D) of FIG. 17 and the second or third WFST data shown in (A) of FIG. 17 . When the WFST processing unit 109 performs a state transition, the state transition is performed in units of parts separated by two separation identification information, that is, the state transition is performed in units of "pencils" shown in (D) of FIG. Since the synthesis operation of the first WFST data in (A) of FIG. 17 does not detect a matching vocabulary. Accordingly, it is possible to prevent the detection of only partial spelling of a word such as "pen" from a part of "pencil".

图18是用于示出对于组合了分隔书写的语言的词汇与非分隔书写的语言的词汇的情况下的、变形4中的各功能要素的动作的图。在以下说明的方法中，在字母与字母以外的字符之间的转移中附加分隔的识别信息的转移和ε转移。由此，即便字母与字母以外的字符没有分隔地存在的情况下，也能够同时进行由字母构成的词汇的检测，以及组合了字母和字母以外的字符的词汇的检测。FIG. 18 is a diagram illustrating the operation of each functional element in Variation 4 when a vocabulary of a partitioned written language and a vocabulary of a non-separated written language are combined. In the method described below, the transfer of the divided identification information and the transfer of ε are added to the transfer between alphabets and characters other than alphabets. Thus, even when alphabets and characters other than alphabets exist without separation, detection of vocabulary consisting of alphabets and detection of vocabulary combining alphabets and characters other than alphabets can be performed simultaneously.

图18的(A)示出第1字符串转移数据生成部106最初生成的带分隔识别信息的第1WFST数据。以与图17相同的要领生成第1WFST数据，在单词的最初和最后附加分隔识别信息“<sp>”。图18的(B)示出第1字符串转移数据生成部106修正图18的(A)而生成的带分隔识别信息的第1WFST数据。在字母与字母以外的字符之间的转移中，即在分隔书写的语言的词汇与非分隔书写的语言的词汇之间的转移中，附加了分隔识别信息。(A) of FIG. 18 shows the first WFST data with separator identification information generated first by the first character string transition data generation unit 106 . The first WFST data is generated in the same manner as in FIG. 17 , and separator identification information "<sp>" is added to the beginning and end of a word. (B) of FIG. 18 shows the first WFST data with segment identification information generated by the first character string transition data generation unit 106 by correcting (A) of FIG. 18 . In the transfer between alphabets and characters other than alphabets, that is, in the transfer between words of a language written in a partition and words in a language not written in a partition, partition identification information is added.

此外，与分隔识别信息一起附加ε转移“<eps>”，由此还能够对应组合了字母和字母以外的字符的词汇。即，考虑由字母构成的转移和由字母以外的字符构成的转移并列地排列，在字符间的转移中附加如图18的(C)所示那样的状态转移。由此，成为在字母与字母以外的字符之间的转移中附加分隔识别信息“<sp>”的转移的结构。另外，在图18中，省略了从初始状态向中间状态的ε转移、从中间状态向最终状态的ε转移以及用于跳过字符的加权ε转移。In addition, by adding the epsilon shift "<eps>" to the segment identification information, it is also possible to correspond to a vocabulary combining alphabets and characters other than alphabets. That is, it is considered that transitions made of letters and transitions made of characters other than letters are arranged in parallel, and state transitions as shown in (C) of FIG. 18 are added to transitions between characters. Thereby, a transition of partition identification information "<sp>" is added to the transition between alphabets and characters other than alphabets. In addition, in FIG. 18 , the ε transition from the initial state to the intermediate state, the ε transition from the intermediate state to the final state, and the weighted ε transition for skipping characters are omitted.

(作为字符识别系统100的结构例)(As a configuration example of the character recognition system 100)

接着，对本实施方式的其他结构例进行说明。以上，对本发明构成为字符识别装置1的情况进行了说明，但不限于此，如图19所示，本发明也可以构成为具有终端200和服务器300的字符识别系统100。图19是该情况下的结构概要图，终端200和服务器300能够通过通信网络以彼此能够通信的方式连接。Next, another structural example of this embodiment will be described. Above, the case where the present invention is configured as the character recognition device 1 has been described, but the present invention is not limited thereto, and the present invention may be configured as the character recognition system 100 including the terminal 200 and the server 300 as shown in FIG. 19 . FIG. 19 is a schematic configuration diagram in this case, and the terminal 200 and the server 300 are communicably connectable to each other via a communication network.

终端200具有以下部分作为功能性的结构要素：图像读入部101(相当于权利要求书中的“图像输入单元”)、图像二值化部102、字符区域检测部103(相当于权利要求书中的“字符区域检测单元”)、字符区域分割部104(相当于权利要求书中的“字符区域分割单元”)、字符识别部105(相当于权利要求书中的“字符识别单元”)、第1字符串转移数据生成部106(相当于权利要求书中的“第1字符串转移数据生成单元”)以及第2字符串转移数据生成部107(相当于权利要求书中的“第2字符串转移数据生成单元”)。The terminal 200 has the following parts as functional components: an image reading unit 101 (equivalent to “image input unit” in the claims), an image binarization unit 102, and a character region detection unit 103 (equivalent to the “image input unit” in the claims). The "character region detection unit" in the claims), the character region segmentation unit 104 (equivalent to the "character region segmentation unit" in the claims), the character recognition unit 105 (equivalent to the "character recognition unit" in the claims), The first character string transition data generation unit 106 (equivalent to the "first character string transition data generation unit" in the claims) and the second character string transition data generation unit 107 (corresponding to the "second character string transition data generation unit" in the claims Serial Transfer Data Generation Unit").

服务器300具有以下部分作为功能性的结构要素：第3字符串转移数据生成部108(相当于权利要求书中的“第3字符串转移数据生成单元”)、WFST处理部109(相当于权利要求书中的“有限状态转换单元”)、字符串检测部110(相当于权利要求书中的“字符串检测单元”)以及词汇DB111(相当于权利要求书中的“词汇数据库”)。The server 300 has the following parts as functional structural elements: a third character string transfer data generation unit 108 (equivalent to “the third character string transfer data generation unit” in the claims), a WFST processing unit 109 (equivalent to the “third character string transfer data generation unit” in the claims The "finite state conversion unit" in the text), the character string detection unit 110 (corresponding to the "character string detection unit" in the claims), and the vocabulary DB 111 (corresponding to the "vocabulary database" in the claims).

关于终端200和服务器300具有的功能性的结构要素各自的说明，与字符识别装置1中说明的内容重复，因而此处省略说明。另外，在本实施方式中，举出了第1字符串转移数据生成部106和第2字符串转移数据生成部107存在于终端200内、第3字符串转移数据生成部108存在于服务器300内的结构例，但不限于此，第1字符串转移数据生成部106、第2字符串转移数据生成部107以及第3字符串转移数据生成部108可以存在于终端200、服务器300中的任意一方中。The description of each of the functional components of the terminal 200 and the server 300 overlaps with that of the character recognition device 1 , so the description is omitted here. In addition, in this embodiment, the first character string transition data generation unit 106 and the second character string transition data generation unit 107 exist in the terminal 200, and the third character string transition data generation unit 108 exists in the server 300. , but not limited thereto, the first character string transition data generation unit 106, the second character string transition data generation unit 107, and the third character string transition data generation unit 108 may exist in either the terminal 200 or the server 300 middle.

能够将图2作为终端200的硬件结构图来参照。如图2所示，终端200构成为通常的计算机系统，该计算机系统在物理上除包含CPU21、ROM22和RAM23等主存储装置、键盘、鼠标外，还包含照相机等用于读入图像的装置或者用于从外部装置读入数据的装置即输入设备24、显示器等输出设备25、用于在与其他装置之间进行数据的发送接收的网卡等通信模块26、以及硬盘等辅助存储装置27等。输入设备24进行的图像的读入可以是由自装置摄影的图像，或者也可以是由其他装置摄影的图像。通过在CPU21、ROM22、RAM23等硬件上读入预定的计算机软件，在CPU21的控制下使输入设备24、输出设备25、通信模块26进行动作，并且进行主存储装置22、23或辅助存储装置27中的数据的读出和写入，由此实现上述终端200的各功能。FIG. 2 can be referred to as a hardware configuration diagram of the terminal 200 . As shown in FIG. 2 , the terminal 200 is constituted as a common computer system, which physically includes not only main storage devices such as CPU21, ROM22 and RAM23, keyboard, and mouse, but also devices for reading images such as cameras or An input device 24 for reading data from an external device, an output device 25 such as a display, a communication module 26 such as a network card for transmitting and receiving data with other devices, and an auxiliary storage device 27 such as a hard disk. The image read by the input device 24 may be an image taken by the own device, or an image taken by another device. By reading predetermined computer software on hardware such as CPU21, ROM22, RAM23, under the control of CPU21, input device 24, output device 25, communication module 26 are operated, and main storage device 22, 23 or auxiliary storage device 27 The functions of the above-mentioned terminal 200 are realized by reading and writing data in the terminal 200.

能够将图2作为服务器300的硬件结构图来参照。如图2所示，服务器300构成为通常的计算机系统，该计算机系统在物理上除包含CPU31、ROM32和RAM33等主存储装置、键盘、鼠标外，还包含从外部装置读入数据的装置即输入设备34、显示器等输出设备35、用于在与其他装置之间进行数据的发送接收的网卡等通信模块36、以及硬盘等辅助存储装置37等。通过在CPU31、ROM32、RAM33等硬件上读入预定的计算机软件，在CPU31的控制下使输入设备34、输出设备35、通信模块36进行动作，并且进行主存储装置32、33或辅助存储装置37中的数据的读出和写入，由此实现上述的服务器300的各功能。FIG. 2 can be referred to as a hardware configuration diagram of the server 300 . As shown in Figure 2, the server 300 is constituted as a common computer system. The computer system physically includes not only main storage devices such as CPU31, ROM32 and RAM33, a keyboard, and a mouse, but also a device for reading data from an external device, that is, an input device. A device 34 , an output device 35 such as a display, a communication module 36 such as a network card for transmitting and receiving data with other devices, an auxiliary storage device 37 such as a hard disk, and the like. By reading predetermined computer software on hardware such as CPU31, ROM32, RAM33, under the control of CPU31, input device 34, output device 35, communication module 36 are operated, and main storage device 32, 33 or auxiliary storage device 37 The functions of the above-mentioned server 300 are realized by reading and writing data in the server.

(作为字符识别程序的结构例)(Structure example as character recognition program)

本发明还能够构成为字符识别程序，能够将关于以上的字符识别装置1的说明理解为关于使计算机作为字符识别装置1进行动作的字符识别程序的说明。虽然省略重复的说明，但字符识别程序使计算机作为以上说明的图像读入部101、图像二值化部102、字符区域检测部103、字符区域分割部104、字符识别部105、第1字符串转移数据生成部106、第2字符串转移数据生成部107、第3字符串转移数据生成部108、WFST处理部109以及字符串检测部110发挥作用。例如将字符识别程序存储在记录介质中来提供。另外，作为记录介质，可例示软盘、CD、DVD等记录介质、ROM等记录介质、或者半导体存储器等。The present invention can also be configured as a character recognition program, and the above description of the character recognition device 1 can be understood as a description of a character recognition program that causes a computer to operate as the character recognition device 1 . Although repeated description is omitted, the character recognition program uses the computer as the image reading unit 101, the image binarization unit 102, the character area detection unit 103, the character area division unit 104, the character recognition unit 105, and the first character string described above. The transition data generation unit 106 , the second character string transition data generation unit 107 , the third character string transition data generation unit 108 , the WFST processing unit 109 , and the character string detection unit 110 function. For example, a character recognition program is stored in a recording medium and provided. Moreover, as a recording medium, recording media, such as a flexible disk, CD, and DVD, recording media, such as a ROM, or a semiconductor memory, etc. can be illustrated.

(本实施方式的作用和效果)(Function and effect of the present embodiment)

接着，对本实施方式的字符识别装置1的作用和效果进行说明。根据本实施方式的字符识别装置1，由于不利用外部的电话簿等数据库，因此不需要与在该电话簿数据库等中包含的大量的单词知识进行对照，能够实现字符识别处理的高速化。即，根据本实施方式，不是根据字符识别结果来检测单词而进行与外部的单词数据库的对照，而是利用有限状态转换器(WFST)来表现在字符识别装置1内存在的单词/分类信息数据库与字符识别候选组，并进行WFST的合成运算，由此可以高速地进行单词提取/分类信息提取/字符位置提取处理。此外，由于也不需要位置信息取得单元或方位信息取得单元等，因此能够使装置结构简单化。即，可以不使用位置信息取得装置或方位信息取得装置，而仅使用字符识别装置1内的信息进行字符识别。使用这样的装置结构，可以高精度且高速地从情景图像进行字符识别。Next, operations and effects of the character recognition device 1 of this embodiment will be described. According to the character recognition device 1 of the present embodiment, since an external database such as a telephone directory is not used, it is not necessary to collate with a large amount of word knowledge contained in the telephone directory database, etc., and speed-up of the character recognition process can be realized. That is, according to the present embodiment, instead of detecting a word based on the character recognition result and performing comparison with an external word database, the word/category information database existing in the character recognition device 1 is expressed using a finite state transformer (WFST). Candidate groups are recognized with characters and combined with WFST, so that word extraction/category information extraction/character position extraction processing can be performed at high speed. In addition, since the position information acquisition means, the orientation information acquisition means, etc. are not required, the device configuration can be simplified. That is, character recognition can be performed using only the information in the character recognition device 1 without using the position information acquisition device or the orientation information acquisition device. With such an apparatus configuration, character recognition from scene images can be performed with high precision and high speed.

此外，在本实施方式中，即便以在图像中会出现噪声的形式来提取字符区域，也可以施加基于WFST运算处理的高度的语言的制约。由此，除了能够除去该噪声外，还能够降低优先次序。因此，即便在明暗的变动或字符的失真等具有情景图像特有的问题的情况下，也能够提高识别精度。In addition, in the present embodiment, even if the character region is extracted in such a manner that noise may appear in the image, it is possible to impose strict language restrictions by WFST arithmetic processing. In this way, in addition to removing the noise, it is also possible to lower the priority. Therefore, it is possible to improve recognition accuracy even when there are problems specific to scene images such as changes in brightness and distortion of characters.

此外，通过在既有方法得到的字符识别结果中应用由本实施方式检测到的词汇，由此能够利用本实施方式的字符识别装置1等作为用于对既有方法的字符识别结果进行纠错的装置。In addition, by applying the vocabulary detected by the present embodiment to the character recognition result obtained by the conventional method, the character recognition device 1 etc. of the present embodiment can be used as a means for correcting the character recognition result of the conventional method. device.

此外，根据本实施方式，能够利用字符识别候选组的第1WFST数据本身作为对图像的检索用表，将本实施方式的字符识别装置1等有效地利用为判定在图像中是否存在用户输入关键字的装置等。In addition, according to the present embodiment, the first WFST data itself of the character recognition candidate group can be used as a search table for images, and the character recognition device 1 etc. devices etc.

此外，根据本实施方式，对字符识别候选组的第1WFST数据和词汇DB111中的第3WFST数据进行合成运算，由此能够有效地将本实施方式的字符识别装置1等应用为词汇检测装置等。In addition, according to the present embodiment, the character recognition device 1 and the like of the present embodiment can be effectively applied as a vocabulary detection device and the like by performing a synthetic operation on the first WFST data of the character recognition candidate group and the third WFST data in the vocabulary DB 111 .

此外，根据本实施方式，提供用于第1字符串转移数据生成部106计算权重值的具体的方法。In addition, according to the present embodiment, a specific method for calculating the weight value by the first character string transition data generation unit 106 is provided.

此外，根据本实施方式，提供用于第1字符串转移数据生成部106修正权重值的具体的方法。此外，通过权重值的修正能够提高词汇的检测精度。Furthermore, according to the present embodiment, a specific method for correcting the weight value by the first character string transition data generation unit 106 is provided. In addition, the accuracy of vocabulary detection can be improved by modifying the weight value.

此外，根据本实施方式，即便在字符区域分割部104进行了过分割的情况下也可以适当地应对。In addition, according to the present embodiment, even when the character region dividing unit 104 has over-divided, it can be appropriately dealt with.

此外，根据本实施方式，使第1WFST数据中包含第1空转移、第2空转移以及第3空转移，由此能够提高第1WFST数据与第2WFST数据或者第3WFST数据的合成运算的精度。Furthermore, according to the present embodiment, the first WFST data includes the first dummy jump, the second dummy jump, and the third dummy jump, thereby improving the accuracy of the synthesis calculation of the first WFST data and the second WFST data or the third WFST data.

此外，根据本实施方式，通过使用表示分隔的识别信息，对于分隔书写的语言也能够高精度地进行字符识别。此外，对于英语那样的分隔书写的语言和日语那样的非分隔书写的语言的辞典，能够对词汇进行共同的处理。Furthermore, according to the present embodiment, character recognition can be performed with high accuracy even for languages written with partitions by using identification information indicating partitions. In addition, it is possible to perform common processing on vocabulary for the dictionaries of a language of divided writing such as English and a language of non-divided writing such as Japanese.

此外，根据本实施方式，通过使用位置信息，能够确定字符识别的结果位于图像内的哪个位置。Furthermore, according to the present embodiment, by using the position information, it is possible to specify where in the image the character recognition result is located.

此外，根据本实施方式，通过使用分类信息，能够确定字符识别的结果属于哪个类别。Furthermore, according to the present embodiment, by using classification information, it is possible to specify which category the result of character recognition belongs to.

标号说明Label description

1…字符识别装置，100…字符识别系统，101…图像读入部，102…图像二值化部，103…字符区域检测部，104…字符区域分割部，105…字符识别部，106…第1字符串转移数据生成部，107…第2字符串转移数据生成部，108…第3字符串转移数据生成部，109…处理部，110…字符串检测部，111…词汇DB，200…终端，300…服务器。1... character recognition device, 100... character recognition system, 101... image reading part, 102... image binarization part, 103... character area detection part, 104... character area division part, 105... character recognition part, 106... No. 1 character string transition data generation unit, 107...the second character string transition data generation unit, 108...the third character string transition data generation unit, 109...processing unit, 110...character string detection unit, 111...vocabulary DB, 200...terminal , 300...server.

产业上的可用性Industrial Availability

本发明提供一种能够不使用外部的电话簿等数据库而使用被简单化的装置结构，高精度且高速地进行字符识别的字符识别装置、字符识别方法、字符识别系统以及字符识别程序。The present invention provides a character recognition device, a character recognition method, a character recognition system, and a character recognition program capable of performing character recognition with high precision and high speed using a simplified device configuration without using an external database such as a telephone directory.

Claims

1. A character recognition device, characterized in that it has:

an image input unit that inputs an image including a recognition object character;

a character area detection unit that detects a character area in the image where the character exists;

a character area dividing unit that divides the character area in units of individual characters;

a character recognition unit that performs character recognition processing for each individual character on the characters existing in the segmented regions divided by the character region segmenting unit, and outputs one or more candidates for character recognition processing results for a single character;

A first character string transition data generation unit that inputs the candidate, calculates a weight value for transition to the candidate, and generates first character string transition data that is character string transition data based on a combination of the candidate and the weight value data; and

A finite state transition unit, which sequentially performs state transitions according to the first character string transition data, accumulates the weight values in each state transition to calculate the cumulative weight value of each state transition, and outputs the state transition with the small cumulative weight value The result of is used as the result of character recognition.

2. The character recognition device according to claim 1, wherein:

The character recognition device further includes a second character string transition data generation unit configured to input a keyword from a user and generate second character string transition data that is character string transition data of the keyword,

The finite state conversion unit performs a synthesis operation on the first character string transition data and the second character string transition data, thereby determining whether the keyword exists in the image.

3. The character recognition device according to claim 1, wherein:

The character recognition device further includes a third character string transition data generating unit that generates third character string transition data that is character string transition data for each vocabulary existing in the vocabulary database,

The finite state conversion unit detects vocabulary existing in the image by performing a synthetic operation on the first character string transition data and the third character string transition data.

4. The character recognition device according to any one of claims 1 to 3, wherein:

The character recognition unit assigns priority to a plurality of the candidates and outputs them,

The first character string transfer data generating unit calculates the weight value according to the priority order.

5. The character recognition device according to claim 4, wherein:

The character recognition unit uses at least two different recognition methods to perform the character recognition process,

The first character string transition data generation unit calculates the weight value based on the output number of the candidates in the different identification methods and the priority order.

6. The character recognition device according to any one of claims 1 to 3, wherein:

The first character string transition data generating unit calculates the weighting value in consideration of character string transitions of words registered in a language database.

7. The character recognition device according to any one of claims 1 to 3, wherein:

The first character string transition data generating unit corrects the weight value according to the position of the candidate in the image or the character size of the candidate.

8. The character recognition device according to any one of claims 1 to 3, wherein:

In the case where the character region segmentation unit divides the character region using a plurality of segmentation patterns to generate multiple kinds of the segmentation regions,

The character recognition unit performs the character recognition processing on the plurality of divided regions respectively,

the first character string transition data generating unit generates the first character string transition data for each of the candidates of the plurality of divided regions,

The finite state transition unit outputs, as the result, a state transition result in which the cumulative weight value is smaller in all of the plurality of divided regions.

9. The character recognition device according to any one of claims 1 to 3, wherein:

The first character string transition data generating unit performs a first null transition from an initial state of the character string transition to the candidate null transition, and a second null transition that is a final state transition from the candidate to the character string transition. , for generating the first character string transition data by skipping the candidate null transition in units of single characters, that is, including the third null transition.

10. The character recognition device according to any one of claims 1 to 3, wherein:

when the character recognition unit outputs the candidate of the result of the character recognition processing, it also outputs recognition information representing a separation between words,

the first character string transition data generating unit adds the identification information to generate the first character string transition data,

When the finite state transition unit performs the state transition, the state transition is performed in units of parts separated by two pieces of the identification information.

11. The character recognition device according to any one of claims 1 to 3, wherein:

When the character recognition unit outputs the candidate of the character recognition processing result, it also outputs the position information of the candidate in the image,

the first character string transfer data generating unit adds the position information to generate the first character string transfer data,

The finite state transition unit appends the position information to output the result.

12. The character recognition device according to claim 2, wherein:

The vocabulary database has classification information for vocabulary,

the second character string transfer data generating unit adds the classification information to generate the second character string transfer data,

The finite state conversion unit appends the classification information to output the result.

13. The character recognition device according to claim 3, wherein:

The vocabulary database has classification information on vocabulary,

the third character string transfer data generating unit adds the classification information to generate the third character string transfer data,

14. The character recognition device according to claim 12 or 13, characterized in that,

The character recognition device has a vocabulary classification correlation vector storage unit, which stores a vocabulary classification correlation vector representing the correlation between vocabulary and the classification information,

The first character string transfer data generation unit adds the candidates in the first character string transfer data and the weight value to the value of the vocabulary classification correlation vector, and uses the classification information with the largest value as the The classification information corresponding to the candidate, and modifying the weight value for the candidate based on the classification information.

15. A character recognition method, characterized in that it has:

In an image input step, the image input unit inputs an image containing characters to be recognized;

The character area detection step, the character area detection unit detects the area where the character exists in the image, that is, the character area;

In the character region segmentation step, the character region segmentation unit divides the character region in units of single characters;

A character recognition step, wherein the character recognition unit performs character recognition processing for each individual character on the characters present in the divided regions divided by the character region dividing unit, and outputs one or more candidates for character recognition processing results for a single character ;

In a first character string transition data generating step, the first character string transition data generating unit inputs the candidate, calculates a weight value for transition to the candidate, and generates a character string transition based on a combination of the candidate and the weight value. The data is the first character string transfer data; and

In the finite state transition step, the finite state transition unit sequentially performs state transitions according to the first character string transition data, accumulates the weight values in each state transition to calculate the cumulative weight value of each state transition, and outputs the cumulative weight value small The result of the state transition is as the result of character recognition.

16. A character recognition system comprising a terminal and a server, the character recognition system is characterized in that,

The terminal has:

an image input unit which inputs an image containing characters as recognition objects;

a character recognition unit that performs character recognition processing for each individual character on the characters present in the segmented regions divided by the character region segmenting unit, and outputs one or more candidates for character recognition processing results for the individual characters; and

A first character string transition data generation unit that inputs the candidate, calculates a weight value for transition to the candidate, and generates first character string transition data that is character string transition data based on a combination of the candidate and the weight value data,

Said server has: