JP3666909B2

JP3666909B2 - Character recognition apparatus and method

Info

Publication number: JP3666909B2
Application number: JP26958994A
Authority: JP
Inventors: 朋紀工藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1994-11-02
Filing date: 1994-11-02
Publication date: 2005-06-29
Anticipated expiration: 2020-06-29
Also published as: JPH08129617A

Description

【０００１】
【産業上の利用分野】
本発明は文字認識装置及び方法、特に認識した文字候補群の中の最適な候補文字を決定するための文字認識装置及び方法に関するものである。
【０００２】
【従来の技術】
従来の文字認識を説明する。図３の処理の流れを表すフローチャートのように、入力装置（例えばスキャナ）或いは入力ペン等を使用して文字列を入力すると、文字認識辞書を検索して文字列が特定できたかどうかを判断する。特定できたかどうかは、例えば個々の文字画像の特徴ベクトルと認識辞書にある標準特徴ベクトルとの距離を演算し、その演算した結果が所定値と比較することで判定する。
【０００３】
さて、特定できたと判断した場合には、認識結果（文字コード列）をそのまま出力する。
【０００４】
また、特定できなかった場合、認識結果が誤っている可能性が高くなり、候補群を作成する。そして、別途用意された単語辞書を検索することを繰り返して単語を決定する。
【０００５】
例えは、図４のように「電子」と入力した場合、文字認識で得られた候補文字列の組み合わせが「電千」、「電子」、「電干」、「電于」等である場合、図５のようにこれらの候補を単語の辞書を検索する。この結果、「電子」のみ単語辞書に格納されているので、「電子」が決定し、それを認識結果として出力する。
【０００６】
【発明が解決しようとする課題】
ところで、前述した従来の文字認識では、入力された文字が「漢字仮名交じり文字列」の場合に適応させようとすると、単語の辞書に漢字仮名交じりの単語をすべて格納する必要がある。例えば、「電子」という単語の場合では、「でんし」、「電し」、「でん子」、「電子」の４種類の単語のすべてを単語の辞書に格納しなければならない。そのため、漢字のみの単語辞書に比較して漢字仮名混じりの文字認識の単語辞書は容量が著しく大きくなる。かといって、漢字仮名交じり単語すべてを単語の辞書に格納しない場合には、文字の高い認識率は望めない。
【０００７】
【課題を解決するための手段】
及び
【作用】
本発明はかかる問題点に鑑みなされたものであり、認識処理にかかる辞書の規模を小さくしながらも、認識対象の文字列が漢字仮名混じり単語である場合における認識率を向上させることを可能とする文字認識装置及び方法を提供しようとするものである。
【０００８】
この課題を解決するため、例えば本発明の文字認識装置は以下の構成を備える。すなわち、
漢字仮名混じり文字列を入力する入力手段と、
前記入力手段により入力された漢字仮名混じり文字列の文字を文字認識し、各文字に対して候補文字を１つまたは複数出力する文字認識手段と、
前記文字認識手段で文字認識して得られた各文字の候補文字を組み合わせることにより、前記漢字仮名混じり文字列に対する候補単語を１つまたは複数形成する候補単語形成手段と、
前記候補単語形成手段で形成された候補単語が複数あると判断した場合、当該複数の候補単語のうちから処理対象とされた候補単語に含まれる漢字の候補文字の読みを１つまたは複数取り出す処理を行う取り出し手段と、
前記取り出し手段で取り出された読みと当該候補単語に含まれる仮名の候補文字とを組み合わせることにより前記漢字仮名混じり文字列の読み候補を複数生成し、当該生成された複数の読み候補を仮名漢字変換する処理を行う仮名漢字変換手段と、
前記仮名漢字変換手段で得られた変換結果の漢字と前記候補単語に含まれる漢字部分の候補文字とを照合する処理を行う照合手段と、
前記複数あると判断された候補単語のうちの最初の候補単語から順に前記処理対象の候補単語として、前記取り出し手段と前記仮名漢字変換手段と前記照合手段とによるそれぞれの処理を行っていき、前記照合手段の照合が成功した時点で、前記認識手段で得られた複数の候補文字の中から認識結果とする候補文字を特定し、当該特定された候補文字を前記漢字仮名混じり文字列の認識結果として出力する第１の出力手段と、
前記候補単語形成手段で形成された候補単語が１つであると判断した場合、当該候補単語を構成している候補文字を特定し、当該特定された候補文字を前記漢字仮名混じり文字列の認識結果として出力する第２の出力手段とを備える。
【０００９】
ここで、本発明に好適な実施態様に従えば、前記入力手段は、文字をストローク情報として入力することが望ましく、特に、そのストローク情報は、所定のペンによる手書き文字入力によることが望ましい。これによって、ストローク入力面の大きさが限られていたり、画数の多い漢字であったり、或いは、その漢字そのもの忘れてしまったりした場合に、その部分を認識率が高い仮名で入力することで認識させることが可能になる。
【００１２】
【実施例】
以下、添付図面に従って本発明に係る実施例を詳細に説明する。
【００１３】
図２は、本発明の文字処理方法が実施される情報処理システムの構成を表したブロック図である。
【００１４】
入力装置１（手書きパッド，スキャナなど）から入力された文字列は、中央演算処理装置２（ＣＰＵ）によって処理され、辞書やその処理結果は主記憶装置４（ＲＡＭなど）や補助記憶装置５（フロッピーディスク，ハードディスクなど）に記憶し、出力装置３（ＣＲＴ，プリンタなど）によって出力される。
【００１５】
図１は本発明の処理の流れを表すフローチャートである。尚、同フローチャートに基づくプログラムは、補助記憶装置５に記憶保持されており、電源投入時に主記憶装置２にロードされ、実行される。
【００１６】
本実施例では、以下の工程を備える。すなわち、
入力装置１から読み文字列の入力処理を行なうステップＳ１１と、かな漢字変換するステップＳ１２と、候補の読みを取得するステップＳ１４と、取得した読みをかな漢字変換するステップＳ１５と、かな漢字変換の結果と読みを取り出した文字を一致するかを判定するステップＳ１６と、一致しなかった場合、次のデータを取得するステップＳ１７と、一致した場合、決定した文字を出力装置３または、主記憶装置４、補助記憶装置５に出力するステップＳ１８とからなる。
【００１７】
次に、図１と図６，図７をもとに処理の流れを説明する。ここでは、具体例として「でん子」と入力した場合について説明する。
【００１８】
図６のように、文字列「でん子」を入力する（ステップＳ１１）。主記憶装置４、または補助記憶装置５に格納されている辞書より公知の認識処理を行う（ステップＳ１２）。
【００１９】
実施例における文字認識は、認識しようとする文字画像或いは文字ストロークの特徴ベクトルと、認識辞書に格納されている各標準ベクトルとの距離を演算し、その演算値が小さい文字を認識結果の文字として出力する。このとき、距離演算値が小さい順に第１候補文字、第２候補文字、…となるが、第１候補文字の距離演算値が所定値以下、すなわち、確からしさが高く、第２候補文字に対して充分な差がある場合、候補文字を１つだけ出力する。これに該当しない場合には、第２候補以下の所定数の候補文字群（但し、演算距離値が所定閾値以下のもの）として出力するものとする。
【００２０】
さて、上記文字認識処理で得られた認識文字で文字列候補を作成する。ここで、認識処理で１個の文字に対して複数の候補文字が出力されてくると、候補文字列はその組み合わせの数だけ作成されることになる。そこで、ステップＳ１３では、候補文字列が１つかそれ以上かを判断し、入力された文字列が特定できたかどうかを判定する。
【００２１】
図６の場合、候補文字列が４つ存在するため、特定できていないことになる。この場合、処理はステップＳ１４に進み、主記憶装置４、または補助記憶装置５に格納されている読み辞書を検索し、文字列候補の読みを取得する。「千」に対しては「せん」、「ち」、「ぢ」を取得する。
【００２２】
次に、図７のように、ステップ１４で取得した読みに対してかな漢字変換を行う。「でんせん」、「でんち」、「でんぢ」の読みに対して「電線」、「伝染」、「電池」、「田地」、「伝」「ぢ」等の変換結果を取得する。このとき、変換結果の文節数が最小になるように、または読みが最長になように変換結果を残す。
【００２３】
ステップＳ１６では、上記ステップＳ１７の結果である「電線」、「伝染」、「電池」、「田地」に文字「千」が含まれているかどうかを判定する。この場合、それがないわけであるから、処理はステップＳ１７に進み、次の候補文字列、例えば「でん子」を着目し、ステップＳ１４に戻る。
【００２４】
さて、今度は、「でん子」の「子」について同様の処理を行う「子」に対しては読み辞書を検索し、「おとこ」「こ」、「ご」、「し」、「じ」、「す」、「ね」、「み」の読みを取得する。取得した読みから手書き文字列の読み候補「でんおとこ」、「でんこ」、「でんご」、「でんし」、「でんじ」、「でんす」、「でんね」、「でんみ」の読みに対して「電」「男」、「電故」、「電」「五」、「電子」、「出んし」、「電磁」、「田地」、「伝」「巣」、「伝ね」、「伝」「身」等を変換結果として取得する。この結果、「電故」、「電子」、「出んし」、「電磁」が残る。「子」と一致する「電子」があるわけであるから、処理はステップＳ１６からステップＳ１８に進む。「子」を認識文字として決定し、「でん子」を出力装置３または、主記憶装置４、補助記憶装置５に出力する（ステップ１８）。
【００２５】
以上のように、本実施例によれば、認識対象の漢字仮名混じり文字列を入力した場合、認識して得られた候補の組み合わせ中に漢字がある場合、その漢字を「読み」に変換し、全てを平仮名にして仮名漢字変換を行う。そして、この仮名漢字変換と先に文字認識して得られた結果とを照合することで、認識候補文字を高認識率で、しかも、辞書の規模を大きくしないで認識することが可能になる。
【００２６】
【他の実施例】
次に第２の実施例を説明する。本第２の実施例では、上記実施例のごとく、文字列「でん子」と入力した場合において、文字認識候補による文字列候補として、「でん千」、「でん子」、「でん于」、「でん干」が生成された場合に、出力として「電子」を出力するものである。
【００２７】
以下、図１、図６及び図８をもとに処理の流れを説明する。
【００２８】
図６のように、文字列「でん子」を入力する（ステップ１１）。主記憶装置４、または補助記憶装置５に格納されている辞書より認識文字候補「でん千」、「でん子」、「でん干」、「でん于」を作成する（ステップ１２）。特定できるかの判定で４つ候補が存在するため、この場合は特定できない（ステップ１３）。主記憶装置４、または補助記憶装置５に格納されている読み辞書を検索し、文字候補の読みを取得する。「千」に対しては「せん」、「ち」、「ぢ」を取得する（ステップ１４）。
【００２９】
図８のようにステップ１４で取得した読みに対してかな漢字変換を行う。「でんせん」、「でんち」、「でんぢ」の読みに対して「電線」、「伝染」、「電池」、「田地」、「伝」「ぢ」等の変換結果を取得する（ステップ１５）。このとき変換結果の文節数が最小になるように、または読みが最長になるように変換結果を残す。この結果、「電線」、「伝染」、「電池」、「田地」が残り、変換結果に「千」が候補に含まれない（ステップ１６）。次のデータ「子」を取得する（ステップ１７）。
【００３０】
「子」について同様の処理を行う「子」に対しては読み辞書を検索し、「おとこ」「こ」、「ご」、「し」、「じ」、「す」、「ね」、「み」の読みを取得する。取得した読みから手書き文字列の読み候補「でんおとこ」、「でんこ」、「でんご」、「でんし」、「でんじ」、「でんす」、「でんね」、「でんみ」の読みに対して「電」「男」、「電故」、「電」「五」、「電子」、「出んし」、「電磁」、「田地」、「伝」「巣」、「伝ね」、「伝」「身」等を変換結果として取得する。この結果、「電故」、「電子」、「出んし」、「電磁」が残り、「子」と一致する「電子」があると判定する（ステップ１６）。「子」を認識文字として、かな漢字変換の結果である最適な表記形態として決定した文字「電子」を出力装置３または、主記憶装置４、補助記憶装置５に出力する（ステップ１８）。
【００３１】
以上説明したように本第２の実施例によれば、本来、漢字文字列で表記されることが望ましいが、その漢字の一部を平仮名で入力した場合、意図した漢字文字列の単語が出力されることになる。
【００３２】
尚、実施例では、入力される文字列は、文字をイメージとして取り込んで認識する場合（ＯＣＲ等）、及び、手書きパッド（例えば専用のペン）で入力する場合のいずれにも適応可能ではあるが、例えばイメージを記憶している記憶媒体から画像を読み取って認識したり、回線を介して受信した画像（例えばファクシミリ）を直接認識するようにしても良い。いずれにしても、上記説明から容易に推察できるように、直接文字を手で記入してそれを認識する装置に適応した場合に、その効果が大きい。特に、ペン入力による入力領域がある程度狭くて画数の多い文字が入力しずらい場合、或いは、漢字そのものが思い出せない場合にその威力を発揮する。
【００３３】
また、入力が手書きパッドのような手書き入力の場合、認識用の辞書も手書き認識用の辞書となり、文字を認識する詳細な処理は異なるが、その後の読みに戻し、かな漢字変換をし、認識文字列を決定する処理は同様にできるので、効果が大きい。
【００３４】
更にまた、実施例では特に説明しなかったが、候補単語の読みに対する仮名漢字変換を行って照合処理を行う場合、その候補単語を構成する文字は、認識処理で得られた候補文字の第１候補から順に使用することが望ましい。理由は、図１のステップＳ１４〜ステップＳ１７のループ回数が、全体として少ない方向に作用するからである。
【００３５】
更には、上記説明から容易に推察されるように、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器から成る装置に適用しても良い。また、本発明はシステム或は装置にプログラムを供給することによって達成される場合にも適用できることは言うまでもない。
【００３６】
【発明の効果】
以上説明したように本発明のデータ処理方法は、文字列を入力するステップと、文字を認識するステップと、辞書から漢字の読みを取り出すステップと、その読みをかな漢字変換するステップと、変換した文字と認識した文字の候補を画するステップと、認識した文字列を出力するステップと、を設けることにより、そのことによって、文字認識の認識率の向上、かな漢字変換辞書との共用等の記憶容量の減少等の効果がある。
【００３７】
【図面の簡単な説明】
【図１】実施例における文字認識処理の処理手順を示すフローチャートである。
【図２】実施例のシステムのブロック構成図である。
【図３】従来の文字認識処理手順を示すフローチャートである。
【図４】従来の文字認識の概要を説明するための図である。
【図５】従来の文字認識の概要を説明するための図である。
【図６】実施例における文字認識処理の概要を示す図である。
【図７】実施例における文字認識処理の概要を示す図である。
【図８】第２の実施例における文字認識処理の概要を示す図である。
【符号の説明】
１中央処理装置
２主記憶装置
３入力装置
４表示装置
５補助記憶装置[0001]
[Industrial application fields]
The present invention relates to a character recognition apparatus and method, and more particularly to a character recognition apparatus and method for determining an optimum candidate character in a recognized character candidate group.
[0002]
[Prior art]
Conventional character recognition will be described. When a character string is input using an input device (for example, a scanner) or an input pen as shown in the flowchart of FIG. 3, the character recognition dictionary is searched to determine whether the character string has been specified. . Whether or not it has been specified is determined, for example, by calculating the distance between the feature vector of each character image and the standard feature vector in the recognition dictionary and comparing the calculated result with a predetermined value.
[0003]
When it is determined that the identification has been made, the recognition result (character code string) is output as it is.
[0004]
Moreover, when it cannot identify, possibility that the recognition result is incorrect becomes high and creates a candidate group. Then, a word is determined by repeatedly searching a separately prepared word dictionary.
[0005]
For example, when “electronic” is input as shown in FIG. 4, the combination of candidate character strings obtained by character recognition is “den thousand”, “electronic”, “electric dr”, “den”, etc. As shown in FIG. 5, a word dictionary is searched for these candidates. As a result, since only “electronic” is stored in the word dictionary, “electronic” is determined and output as a recognition result.
[0006]
[Problems to be solved by the invention]
By the way, in the conventional character recognition described above, if the input character is to be adapted to the case where the input character is a “kanji-kana mixed character string”, it is necessary to store all the kanji-kana mixed words in the word dictionary. For example, in the case of the word “electronic”, all of the four types of words “den”, “electric”, “denko”, and “electronic” must be stored in the word dictionary. Therefore, the capacity of a word dictionary for character recognition mixed with kanji characters is significantly larger than that of a word dictionary with only kanji characters. However, a high recognition rate of characters cannot be expected unless all kanji-kana mixed words are stored in the word dictionary.
[0007]
[Means for Solving the Problems]
And [Action]
The present invention has been made in view of such problems, and it is possible to improve the recognition rate when the character string to be recognized is a word mixed with kanji characters while reducing the size of the dictionary for recognition processing. An object of the present invention is to provide a character recognition apparatus and method.
[0008]
In order to solve this problem, for example, the character recognition device of the present invention has the following configuration. That is,
An input means for inputting a kanji mixed character string;
Character recognition means for recognizing characters of a kanji mixed character string input by the input means, and outputting one or more candidate characters for each character;
Candidate word forming means for forming one or a plurality of candidate words for the character string mixed with kanji by combining candidate characters of each character obtained by character recognition by the character recognition means;
The candidate If word candidate word formed by formation means determines that there are multiple, the readings of the candidate Chinese characters included in the candidate word that has been processed from among the plurality of candidate words to one or more retrieving process Taking- out means for performing ,
Generating a plurality of kanji kana-mixed character string reading candidates by combining the reading extracted by the extracting means and kana candidate characters included in the candidate word , and converting the generated reading candidates to kana-kanji conversion Kana-kanji conversion means for performing processing
Collation means for performing a process of collating the kanji of the conversion result obtained by the kana-kanji conversion means and the candidate characters of the kanji part included in the candidate word;
As the candidate words to be processed in order from the first candidate word among the candidate words determined to be plural, the extraction unit, the kana-kanji conversion unit, and the collating unit perform respective processes, When the collation by the collating unit is successful, a candidate character as a recognition result is identified from among a plurality of candidate characters obtained by the recognizing unit, and the identified candidate character is recognized as a recognition result of the character string mixed with the kanji kana First output means for outputting as:
When it is determined that the number of candidate words formed by the candidate word forming unit is one, the candidate characters constituting the candidate word are identified, and the identified candidate characters are recognized as the kanji kana mixed character string. Second output means for outputting as a result.
[0009]
Here, according to a preferred embodiment of the present invention, the input means preferably inputs characters as stroke information, and in particular, the stroke information is preferably input by handwritten characters using a predetermined pen. As a result, if the stroke input surface is limited in size, has a large number of strokes, or forgets the kanji itself, it will be recognized by entering the kana with a high recognition rate. It becomes possible to make it.
[0012]
【Example】
Hereinafter, an implementation example of the present invention will be described in detail with reference to the accompanying drawings.
[0013]
FIG. 2 is a block diagram showing the configuration of an information processing system in which the character processing method of the present invention is implemented.
[0014]
The character string input from the input device 1 (handwriting pad, scanner, etc.) is processed by the central processing unit 2 (CPU), and the dictionary and its processing result are stored in the main storage device 4 (RAM, etc.) and the auxiliary storage device 5 ( Stored in a floppy disk, hard disk, etc.) and output by an output device 3 (CRT, printer, etc.).
[0015]
FIG. 1 is a flowchart showing the processing flow of the present invention. The program based on the flowchart is stored and held in the auxiliary storage device 5, and is loaded into the main storage device 2 and executed when the power is turned on.
[0016]
In this embodiment, the following steps are provided. That is,
Step S11 for inputting a reading character string from the input device 1, Step S12 for Kana-Kanji conversion, Step S14 for acquiring candidate readings, Step S15 for converting the acquired readings to Kana-Kanji, and the results and readings of Kana-Kanji conversion In step S16 for determining whether or not the extracted characters match, and in step S17 for acquiring the next data if they do not match, the determined characters are output to the output device 3 or the main storage device 4 and the auxiliary Step S18 is output to the storage device 5.
[0017]
Next, the flow of processing will be described based on FIG. 1, FIG. 6, and FIG. Here, a description will be given about the case where the type "gene" as a specific example.
[0018]
As shown in FIG. 6, the character string “Denko” is input (step S11). A known recognition process is performed from a dictionary stored in the main storage device 4 or the auxiliary storage device 5 (step S12).
[0019]
In the character recognition in the embodiment, the distance between the character vector of the character image or character stroke to be recognized and each standard vector stored in the recognition dictionary is calculated, and the character having a small calculated value is used as the character of the recognition result. Output. At this time, the first candidate character, the second candidate character,... Are in order of increasing distance calculation value, but the distance calculation value of the first candidate character is less than or equal to a predetermined value, that is, the probability is high. If there is a sufficient difference, only one candidate character is output. If this is not the case, a predetermined number of candidate character groups equal to or less than the second candidate (however, the calculation distance value is equal to or less than a predetermined threshold value) is output.
[0020]
Now, a character string candidate is created with the recognized characters obtained by the character recognition process. Here, if a plurality of candidate characters are output for one character in the recognition process, candidate character strings are created for the number of combinations. Therefore, in step S13, it is determined whether there are one or more candidate character strings, and it is determined whether the input character string has been identified.
[0021]
In the case of FIG. 6, since there are four candidate character strings, it cannot be specified. In this case, the process proceeds to step S14, the reading dictionary stored in the main storage device 4 or the auxiliary storage device 5 is searched, and the reading of the character string candidate is acquired . Acquire “Sen”, “Chi”, and “ぢ” for “1000”.
[0022]
Next, as shown in FIG. 7, kana-kanji conversion is performed on the reading acquired in step 14. For “Densen”, “Denchi”, and “Den ぢ” readings, conversion results such as “Electric wire”, “Infectious”, “Battery”, “Taji”, “Dentsu”, and “Tatsumi” are acquired. At this time, the conversion result is left so that the number of clauses of the conversion result is minimized or the reading is the longest.
[0023]
In step S16, it is determined whether or not the character “1000” is included in “wire”, “contamination”, “battery”, and “country” as the result of step S17. In this case, since this is not the case, the process proceeds to step S17, paying attention to the next candidate character string, for example, “denko”, and returns to step S14.
[0024]
Now, search the reading dictionary for “child” that performs the same process for “child” of “denko”, and search for “man”, “ko”, “go”, “shi”, “ji”. ”,“ Su ”,“ ne ”,“ mi ”readings. Candidates for reading handwritten text from acquired readings “Denotoko”, “Denko”, “Dengo”, “Denshi”, “Denji”, “Densu”, “Denne”, "Denmi" reading "den""male""denitsu""den""five""electronic""denshi""electromagnetic""tachi""den" “Nest”, “Transmission”, “Transmission”, “Body”, etc. are acquired as conversion results. As a result, “electricity”, “electron”, “spring”, and “electromagnetic” remain. Since there is an “electronic” that matches the “child”, the process proceeds from step S16 to step S18 . “Child” is determined as a recognized character, and “Denko” is output to the output device 3, the main storage device 4, or the auxiliary storage device 5 (step 18).
[0025]
As described above, according to the present embodiment, when a kanji kana mixed character string to be recognized is input, if there is a kanji in a combination of candidates obtained by recognition, the kanji is converted to “reading”. , Convert all to hiragana and kana-kanji conversion. Then, by collating this kana-kanji conversion and the result obtained by previously recognizing characters, it becomes possible to recognize recognition candidate characters with a high recognition rate and without increasing the size of the dictionary.
[0026]
[Other embodiments]
Next, a second embodiment will be described. In the second embodiment, as in the above embodiment, when the character string “Denko” is input, “Den Sen”, “Denko”, “ When "Nu" and "Denrai" are generated, "Electron" is output as output.
[0027]
Hereinafter, the flow of processing will be described with reference to FIGS.
[0028]
As shown in FIG. 6, the character string “Denko” is input (step 11). Recognized character candidates “densen”, “denko”, “denboshi”, and “denyu” are created from the dictionary stored in the main storage device 4 or the auxiliary storage device 5 (step 12). . Since there are four candidates for determining whether it can be specified, in this case, it cannot be specified (step 13). A reading dictionary stored in the main storage device 4 or the auxiliary storage device 5 is searched to obtain a reading of a character candidate. For “1000”, “Sen”, “Chi”, and “ぢ” are acquired (step 14).
[0029]
As shown in FIG. 8, kana-kanji conversion is performed on the reading acquired in step 14. For the readings of “Densen”, “Denchi”, “Den ぢ”, obtain conversion results such as “Electric wire”, “Infection”, “Battery”, “Taji”, “Transmission”, “ぢ” ( Step 15). As the number of clauses this time conversion result is minimized, or leaving the conversion result as readings ing the longest. As a result, “wire”, “contamination”, “battery”, and “tachi” remain, and “1000” is not included in the conversion result (step 16). The next data “child” is acquired (step 17).
[0030]
The same processing is performed for “child”. For “child”, the reading dictionary is searched, and “man”, “ko”, “go”, “shi”, “ji”, “su”, “ne”, “ne”, “ Get a reading of Candidates for reading handwritten text from acquired readings “Denotoko”, “Denko”, “Dengo”, “Denshi”, “Denji”, “Densu”, “Denne”, "Denmi" reading "den""male""denitsu""den""five""electronic""denshi""electromagnetic""tachi""den" “Nest”, “Transmission”, “Transmission”, “Body”, etc. are acquired as conversion results. As a result, it is determined that “electronic”, “electron”, “outgoing”, and “electromagnetic” remain, and there is “electron” that matches “child” (step 16 ). The character “electronic” determined as the optimum notation form as a result of the kana-kanji conversion is output to the output device 3, the main storage device 4, or the auxiliary storage device 5 with “child” as a recognized character (step 18).
[0031]
As described above, according to the second embodiment, it is originally desirable to be written in a kanji character string, but when a part of the kanji is input in hiragana, the word of the intended kanji character string is output. Will be.
[0032]
In the embodiment, the input character string can be applied to both cases where the character is captured and recognized as an image (such as OCR) and input using a handwriting pad (for example, a dedicated pen). For example, an image may be read and recognized from a storage medium storing the image, or an image (for example, a facsimile) received via a line may be directly recognized. In any case, as can be easily inferred from the above description, the effect is great when the present invention is applied to a device that directly enters characters by hand and recognizes them. This is particularly effective when the input area for pen input is somewhat narrow and it is difficult to input characters with a large number of strokes, or when the kanji itself cannot be remembered.
[0033]
In addition, when the input is handwritten input such as a handwriting pad, the recognition dictionary also becomes a dictionary for handwriting recognition, and although the detailed processing for recognizing characters is different, it is returned to the subsequent reading, kana-kanji conversion, and the recognized characters Since the process for determining the column can be performed in the same manner, the effect is great.
[0034]
Furthermore, although not specifically described in the embodiment, when the kana-kanji conversion is performed on the reading of the candidate word and the matching process is performed, the characters constituting the candidate word are the first candidate characters obtained by the recognition process. It is desirable to use them in order from the candidate. The reason is that the number of loops in steps S14 to S17 in FIG.
[0035]
Furthermore, as can be easily inferred from the above description, the present invention may be applied to a system composed of a plurality of devices or an apparatus composed of a single device. Needless to say, the present invention can also be applied to a case where the present invention is achieved by supplying a program to a system or apparatus.
[0036]
【The invention's effect】
As described above, the data processing method of the present invention includes a step of inputting a character string, a step of recognizing a character, a step of extracting a kanji reading from the dictionary, a step of converting the kana-kanji character, and a converted character A step of drawing a recognized character candidate and a step of outputting the recognized character string, thereby improving the recognition rate of character recognition and sharing the storage capacity with the Kana-Kanji conversion dictionary. There are effects such as reduction.
[0037]
[Brief description of the drawings]
FIG. 1 is a flowchart showing a processing procedure of character recognition processing in an embodiment.
FIG. 2 is a block diagram of a system according to the embodiment.
FIG. 3 is a flowchart showing a conventional character recognition processing procedure.
FIG. 4 is a diagram for explaining an outline of conventional character recognition;
FIG. 5 is a diagram for explaining an outline of conventional character recognition;
FIG. 6 is a diagram showing an outline of character recognition processing in the embodiment.
FIG. 7 is a diagram showing an outline of character recognition processing in the embodiment.
FIG. 8 is a diagram showing an outline of character recognition processing in the second embodiment.
[Explanation of symbols]
1 Central processing unit 2 Main storage device 3 Input device 4 Display device 5 Auxiliary storage device

Claims

An input means for inputting a kanji mixed character string;
Character recognition means for recognizing characters of a kanji mixed character string input by the input means, and outputting one or more candidate characters for each character;
Candidate word forming means for forming one or a plurality of candidate words for the character string mixed with kanji by combining candidate characters of each character obtained by character recognition by the character recognition means;
The candidate If word candidate word formed by formation means determines that there are multiple, the readings of the candidate Chinese characters included in the candidate word that has been processed from among the plurality of candidate words to one or more retrieving process Taking- out means for performing ,
Generating a plurality of kanji kana-mixed character string reading candidates by combining the reading extracted by the extracting means and kana candidate characters included in the candidate word , and converting the generated reading candidates to kana-kanji conversion Kana-kanji conversion means for performing processing
Collation means for performing a process of collating the kanji of the conversion result obtained by the kana-kanji conversion means and the candidate characters of the kanji part included in the candidate word;
As the candidate words to be processed in order from the first candidate word among the candidate words determined to be plural, the extraction unit, the kana-kanji conversion unit, and the collating unit perform respective processes, When the collation by the collating unit is successful, a candidate character as a recognition result is identified from among a plurality of candidate characters obtained by the recognizing unit, and the identified candidate character is recognized as a recognition result of the character string mixed with the kanji kana First output means for outputting as:
When it is determined that the number of candidate words formed by the candidate word forming unit is one, the candidate characters constituting the candidate word are identified, and the identified candidate characters are recognized as the kanji kana mixed character string. A character recognition device comprising: a second output means for outputting as a result.

The character recognition apparatus according to claim 1, wherein the character string input by the input unit is a character string input by handwriting.

The character recognition apparatus according to claim 1, wherein the character string input by the input unit is a character image.

A character recognition step of recognizing characters of a character string mixed with kanji input and outputting one or more candidate characters for each character;
A candidate word forming step of forming one or more candidate words for the character string mixed with kanji by combining candidate characters of each character obtained by character recognition in the character recognition step;
If the candidate word formed by said candidate word forming step have multiple fetches for performing reading one or more retrieving process of candidate Chinese characters included in the candidate word that has been processed from among the plurality of candidate words Process,
Generating a plurality of kanji kana-mixed character string reading candidates by combining the reading extracted in the extraction step and kana candidate characters included in the candidate word , and converting the generated plurality of reading candidates to kana-kanji conversion A kana-kanji conversion process for performing
A collation step of performing a process of collating the kanji of the conversion result obtained in the kana-kanji conversion step and the candidate characters of the kanji part included in the candidate word;
As the candidate words to be processed in order from the first candidate word among the candidate words determined to be plural, the respective processes by the extraction step, the kana-kanji conversion step, and the matching step are performed, When the collation in the collation process is successful, a candidate character as a recognition result is identified from among a plurality of candidate characters obtained in the recognition step, and the identified candidate character is recognized as a character string mixed with the kanji kana A first output step to output as:
When there is one candidate word formed in the candidate word formation step, a candidate character constituting the candidate word is specified, and the specified candidate character is output as a recognition result of the character string mixed with the kanji kana. A character recognition method comprising: an output step.

The character recognition method according to claim 4, wherein the input character string mixed with kanji characters is a character string input by handwriting.

5. The character recognition method according to claim 4, wherein the inputted character string mixed with kanji characters is a character image.