EP0896704A1 - Procede et dispositif de reconnaissance de caracteres manuscrits - Google Patents
Procede et dispositif de reconnaissance de caracteres manuscritsInfo
- Publication number
- EP0896704A1 EP0896704A1 EP97915155A EP97915155A EP0896704A1 EP 0896704 A1 EP0896704 A1 EP 0896704A1 EP 97915155 A EP97915155 A EP 97915155A EP 97915155 A EP97915155 A EP 97915155A EP 0896704 A1 EP0896704 A1 EP 0896704A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- characters
- character
- combined
- possible characters
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- This invention relates generally to handwriting recognition by a character recognizer, and more particularly to improving recognition of handwritten characters using a post ⁇ processing method and device.
- Conventional character recognizers have approximately a 70 to 80 percent accuracy rate when attempting to correctly recognize handwritten characters from a digitizing tablet or other input device, yielding a 15 to 30 percent error rate. This accuracy rate is not good enough for the average user to feel confident in the ability of the recognizer.
- character recognizers can be useful and valuable. For instance, character recognizers can be useful in conferences or seminars where a user does not bring in a keyboard but desires to electronically take notes. A character recognizer would then be used. If the character recognizer does not have a fairly high rate of accuracy, the notes taken during the seminar may become misleading.
- character recognizers may be valuable in hospitals if the character recognizer has a high rate of accuracy.
- Hand ⁇ held character recognizers would allow hospital personnel to checks patients and enter by hand reports which may be life saving. Without a high recognition rate, lives may be endangered.
- One very useful application for character recognizers is inputting Chinese characters for electronic processing and storage. Chinese characters do not lend themselves well to keyboard entry making word processing in the Chinese language difficult. Chinese characters are complex and changing a small portion of the character may entirely change the meaning of the character or word. A high rate of accuracy is necessary for Chinese character recognition. Unfortunately, conventional character recognizers and recognition processes have not achieved the high accuracy necessary for these varying application.
- a method comprising the steps of: choosing a number of template characters from a template character set which are likely to resemble a handwritten character thereby providing a set of possible characters, each of the possible characters having a value representing a degree of similarity with the handwritten character; and processing the possible characters according to a language model to determine which of the possible characters most resembles the handwritten character.
- the step of processing the possible characters according to a language model preferably includes: combining each of the possible characters with a surrounding character to form combined characters; assigning a combined value to each of the combined characters where the combined value represents a probability that the surrounding character would be in combination with a respective one of the possible characters; and resorting the possible characters.
- it includes comparing each of the possible characters with a surrounding character to determine a probability that the surrounding character would be in combination with a respective one of the possible characters; and determining from the probability for each of the possible characters which of the possible characters most resembles the handwritten character.
- the value for each of the possible characters and the combined value of the combined characters may be weighted to determine a weighted value for each of the possible characters; and these may be ordered for each of the possible characters to determine a sequential order for resorting the possible characters.
- a recognizer comprising: a character recognizer coupled to a handwriting input device, to choose a number of template characters from a template character set which are likely to resemble a handwritten character (possible characters), each of the possible characters having a value representing a degree of similarity with the handwritten character; a post-processor coupled to the character recognizer to process the possible characters according to a language model to determine which of the possible characters most resembles the handwritten character; and a display device coupled to the post-processor to receive the one of the possible characters most resembling the handwritten character.
- FIG. 1 is a block diagram illustrating a preferred embodiment of the present invention.
- FIG. 2 is a flow chart illustrating a method of performing the present invention.
- FIG. 3 is a flow chart illustrating the method of performing the present invention according to the preferred embodiment.
- FIG. 4 shows an example of the operation of a language modeling post-processor according to the present invention.
- FIG. 1 illustrates, with reference also to FIG. 2, a device and method, according to the present invention, for improving the accuracy of handwritten character recognition.
- Handwritten character recognizing devices such as character recognizing device 100, generally include some sort of handwriting input device or tablet 110 allowing a user to enter handwritten characters to character recognizing device 100. It will be noted at this point that character recognizing devices may also receive input through devices other than through tablets. For instance, handwritten characters may be input to character recognizing device 100 via facsimile or any other media in addition to tablet 1 10.
- handwritten characters are input from tablet 110 to a character recognizer 120 (Step 200 of FIG. 2).
- the character recognizer 120 chooses characters from a predetermined template character set 125 (step 210) for comparison with the handwritten character.
- the predetermined template characters of template character set 125 are the characters used in the language for which character recognizing device 100 is designed. For instance, if English handwritten characters are being input to character recognizing device 100, template character set 125 will contain information representing English characters in some form, such as longhand, print, or a combination of styles. If, for instance, recognizer 100 is designed for Chinese character input, template character set 125 will contain information representing Chinese characters in such styles (cursive or printed) as character recognizing device 100 is designed for.
- Character recognizer 120 compares each input handwritten character to the characters stored in template character set 125 and chooses a number of the characters, or possible characters, which most closely resemble the input character. In a preferred embodiment, character recognizer
- character recognizer 120 chooses 10 characters from the template character set 125. To each of these number of possible characters (10 in the preferred embodiment), character recognizer 120 assigns a score (or value) that represents the degree of similarity between the respective possible character and the input character (step 220 of FIG. 2). Character recognizer 120 then prioritizes the number of possible characters according to their respective scores (step 240). Character recognizer 120 prioritizes the number of possible characters according to their respective scores ordered into a chronological order with the possible character having the score indicative of the nearest similarity ordered at the top of the list.
- handwritten characters which are processed simply by a character recognizer have approximately a 15 to 30 percent error rate when choosing the top prioritized possible character.
- the probability that the top prioritized possible character chosen by character recognizer 120 is actually the same as the handwritten input character is about 80 to 85 percent.
- There is a 92 to 96 percent probability that the actual handwritten input character is one of the number of possible characters chosen by character recognizer 120 when the total number of possible characters is 10 pursuant to the preferred embodiment. This accuracy is nearly the same as the degree of accuracy most people have when reading handwritten characters, which accuracy is about 95 to 97 percent.
- the accuracy of the 10 chosen possible characters is capitalized upon through the method described below to increase the probability that the character chosen as the top prioritized possible character is the same as the handwritten character.
- the present invention contemplates further analyzing and processing the number of possible characters generated by character recognizer 120 to improve recognition accuracy.
- the additional analysis and processing (post-processing) focuses on the 10 possible characters.
- character recognizer 120 outputs the list of 10 possible characters to a post- processor 130 (step 250).
- Post-processor 130 processes the 10 possible characters according to a language model to select which of the 10 possible characters is a best-fit character (step 260).
- the language model post-processing chooses one of the 10 possible characters for output. This yields approximately a 90 to 92 percent probability that the character which is output, or best-fit character, is the same as the input handwritten character.
- Language modeling is a process where each possible character processed is compared with a surrounding character to determine the probability that the possible character could be properly used in combination with such surrounding characters in the language being used. This process will be described in detail later.
- post-processor 130 After post-processor 130 has chosen a best-fit character from the 10 possible characters, post-processor 130 outputs the best-fit character (step 270). In the preferred embodiment shown in FIG. 1, the best-fit character is output to digitiizing display 1 10 and displayed to the user.
- the flow chart of FIG. 3 shows a preferred embodiment of the post-processing method and is described in conjunction with the preferred embodiment of FIG. 1.
- the top prioritized possible character is chosen as the best-fit character (step 325) and output to the digitizing display 110 of the preferred embodiment (step 380). Choosing the top prioritized possible character as the best-fit character simply means that no further processing is chosen and character recognizer 120 operates in a conventional manner with the output (top prioritized possible character) sent directly to the digitizing display 110.
- language model processor 140 which includes combiner 142, scoring device 144, and language model library 145.
- Language model processor 140 compares each of the possible characters from character recognizer 120 with surrounding characters to determine the probability that the possible character could be in combination with the surrounding characters.
- the surrounding characters are usually characters which have been already been recognized which are stored by the computing device (Surrounding Character 141), but may also be numbers, indications of the beginning of a sentence or word, words from a different language (such as English company names used while writing Chinese characters), etc.
- the surrounding characters may also be characters which have not been recognized, such as a character subsequent in sequence to the handwritten character currently being recognized.
- FIG. 4 illustrates the language model post-processing method using an example of two letters.
- two letters are assumed to have been input as handwritten characters.
- the first character in slot b will be assumed to have been processed previously and correctly by character recognizer 100 and confirmed as the letter "h”.
- the second character in slot a is the character to be recognized.
- character recognizer 120 generates a number of possible characters, which for the preferred embodiment is 10 possible characters, listed in FIG.
- Scoring device 144 obtains from language model library 145, for each of the combinations, a predetermined probability (combined score) that the adjacent character in slot b, "h", will be combined with the number of possible characters ai through a n .
- each of the combinations are assigned their respective combined score (step 340 and column 420). For instance, if character recognizer 120 determined ai to be "a”, and the letter in slot b was already determined to be "h", the probability that these two letters would be combined in sequence would be very high since "h” and "a” are combined in sequence in many different words.
- the combined score representing this probability found in language model library 145 would be high and an appropriate combined score would be assigned to "ha”.
- resorter 150 obtains the combined scores from scoring device 144, generates an order from the combined scores, and resorts the number of possible characters based upon that order. Specifically, a weighting element 152 of resorter 150 weights each of the combined scores from the scoring device 144 with the score of its corresponding number of possible characters to determine a weighted score for each of the number of possible characters (step 350). The weighting is calculated for each of the number of possible characters by: (i) multiplying the score (see previous discussion with respect to step 220 of FIG.
- ⁇ c R and LM combined equal 1. Further, at optimum values for the weighting factors, ⁇ c R is greater than ⁇ LM. and ⁇ M is equal to 0.33. A user may choose a value for ⁇ M which is greater than or less than the optimum value, depending upon the desired output, and the choice may be input manually into weighting element 152.
- Reorderer 154 of resorter 150 receives the weighted scores from weighting element 152 and orders the weighted scores in chronological order. In the preferred embodiment, the weighted scores are ordered from highest to lowest. This determined order is used to resort the number of possible characters. Reorderer 154 then resorts the number of possible characters according to the order it just determined, and chooses the best-fit character from the reordered number of possible characters (steps 360 and 370). The best-fit character is then output (step 380).
- Post-processing of the output of character recognizers is necessary in order to improve the rate of accuracy of selecting a single possible character representing an input handwritten character. Without the additional accuracy of post-processing, character recognizers will probably not become commercially viable.
- the probability of selecting a single possible character which is the same as a handwritten character increases from roughly 84% to approximately 90 to 92 percent. This recognition accuracy brings handwriting recognition into an acceptable range for consumer use.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Discrimination (AREA)
Abstract
Un dispositif (100) de reconnaissance de caractères manuscrits et un procédé de reconnaissance de caractères manuscrits améliorent la précision des reconnaisseurs de caractères classiques par l'utilisation d'un postprocesseur (130). Le dispositif (100) de reconnaissance de caractères manuscrits comprend un reconnaisseur (100) de caractères, un processeur (140) de modèles de langues comparant les sorties d'un reconnaisseur (120) de caractères à des caractères manuscrits environnants afin de déterminer une probabilité que les sorties dudit reconnaisseur (120) de caractères forment une combinaison avec les caractères manuscrits environnants, et une unité de reclassement (150) qui pondère les probabilités déterminées dans le processeur (140) de modèles de langues avec le reconnaisseur de caractères et reclasse les sorties du reconnaisseur (120) de caractères.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US61484696A | 1996-03-08 | 1996-03-08 | |
US614846 | 1996-03-08 | ||
PCT/US1997/004349 WO1997033249A1 (fr) | 1996-03-08 | 1997-03-06 | Procede et dispositif de reconnaissance de caracteres manuscrits |
Publications (1)
Publication Number | Publication Date |
---|---|
EP0896704A1 true EP0896704A1 (fr) | 1999-02-17 |
Family
ID=24462951
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP97915155A Withdrawn EP0896704A1 (fr) | 1996-03-08 | 1997-03-06 | Procede et dispositif de reconnaissance de caracteres manuscrits |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP0896704A1 (fr) |
CN (1) | CN1181827A (fr) |
AU (1) | AU726852B2 (fr) |
CA (1) | CA2247359A1 (fr) |
IL (1) | IL125648A0 (fr) |
WO (1) | WO1997033249A1 (fr) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AUPR824601A0 (en) | 2001-10-15 | 2001-11-08 | Silverbrook Research Pty. Ltd. | Methods and system (npw004) |
AU2003900865A0 (en) | 2003-02-26 | 2003-03-13 | Silverbrook Research Pty Ltd | Methods, systems and apparatus (NPW010) |
US8411958B2 (en) * | 2004-05-04 | 2013-04-02 | Nokia Corporation | Apparatus and method for handwriting recognition |
CN100356392C (zh) * | 2005-08-18 | 2007-12-19 | 北大方正集团有限公司 | 一种字符识别的后处理方法 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5131053A (en) * | 1988-08-10 | 1992-07-14 | Caere Corporation | Optical character recognition method and apparatus |
US5151950A (en) * | 1990-10-31 | 1992-09-29 | Go Corporation | Method for recognizing handwritten characters using shape and context analysis |
US5313527A (en) * | 1991-06-07 | 1994-05-17 | Paragraph International | Method and apparatus for recognizing cursive writing from sequential input information |
US5343537A (en) * | 1991-10-31 | 1994-08-30 | International Business Machines Corporation | Statistical mixture approach to automatic handwriting recognition |
US5502774A (en) * | 1992-06-09 | 1996-03-26 | International Business Machines Corporation | Automatic recognition of a consistent message using multiple complimentary sources of information |
US5392363A (en) * | 1992-11-13 | 1995-02-21 | International Business Machines Corporation | On-line connected handwritten word recognition by a probabilistic method |
US5465309A (en) * | 1993-12-10 | 1995-11-07 | International Business Machines Corporation | Method of and apparatus for character recognition through related spelling heuristics |
-
1997
- 1997-03-06 EP EP97915155A patent/EP0896704A1/fr not_active Withdrawn
- 1997-03-06 IL IL12564897A patent/IL125648A0/xx unknown
- 1997-03-06 AU AU22168/97A patent/AU726852B2/en not_active Ceased
- 1997-03-06 CN CN97190161A patent/CN1181827A/zh active Pending
- 1997-03-06 CA CA002247359A patent/CA2247359A1/fr not_active Abandoned
- 1997-03-06 WO PCT/US1997/004349 patent/WO1997033249A1/fr not_active Application Discontinuation
Non-Patent Citations (1)
Title |
---|
See references of WO9733249A1 * |
Also Published As
Publication number | Publication date |
---|---|
AU2216897A (en) | 1997-09-22 |
WO1997033249A1 (fr) | 1997-09-12 |
CN1181827A (zh) | 1998-05-13 |
IL125648A0 (en) | 1999-04-11 |
CA2247359A1 (fr) | 1997-09-12 |
AU726852B2 (en) | 2000-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7129932B1 (en) | Keyboard for interacting on small devices | |
US6173253B1 (en) | Sentence processing apparatus and method thereof,utilizing dictionaries to interpolate elliptic characters or symbols | |
US6513005B1 (en) | Method for correcting error characters in results of speech recognition and speech recognition system using the same | |
US6005973A (en) | Combined dictionary based and likely character string method of handwriting recognition | |
JP4463795B2 (ja) | 減少型キーボード曖昧さ除去システム | |
CA2477637C (fr) | Systeme d'ordre par frappe adaptative, fonde sur des composants ideographiques | |
US20050089226A1 (en) | Apparatus and method for letter recognition | |
KR20070043673A (ko) | 사용자의 다음 문자열 입력을 예측하는 글자 입력 시스템및 그 글자 입력 방법 | |
US20060083431A1 (en) | Electronic device and method for visual text interpretation | |
AU726852B2 (en) | Method and device for handwritten character recognition | |
US6320985B1 (en) | Apparatus and method for augmenting data in handwriting recognition system | |
JPH08263587A (ja) | 文書入力方法および文書入力装置 | |
TW409213B (en) | Method and device for handwritten character recognition | |
El-Nasan et al. | Ink-link [character recognition] | |
JP2765712B2 (ja) | 文字認識入力装置 | |
JP3022790B2 (ja) | 手書き文字入力装置 | |
JPH10232864A (ja) | 文章入力装置、及び文章入力プログラムを記録したコンピュータ読み取り可能な記録媒体 | |
US20030110451A1 (en) | Practical chinese classification input method | |
JP2874815B2 (ja) | 日本語文字読取装置 | |
JP2990734B2 (ja) | 文字認識装置の認識候補文字出力制御方法 | |
JPH10320107A (ja) | 手書き文字認識機能を有する手書き文字入力装置 | |
JP2000036008A (ja) | 文字認識装置及び記憶媒体 | |
JP2639314B2 (ja) | 文字認識方式 | |
JPS61131159A (ja) | 誤読文字修正装置 | |
TW511039B (en) | Apparatus for encoding and defining symbols and, assembling text in ideographic languages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19981008 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
AX | Request for extension of the european patent |
Free format text: AL PAYMENT 981008;LT PAYMENT 981008;LV PAYMENT 981008;RO PAYMENT 981008;SI PAYMENT 981008 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20001002 |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230525 |