CA2002542C - Imaged symbol classification - Google Patents
Imaged symbol classificationInfo
- Publication number
- CA2002542C CA2002542C CA002002542A CA2002542A CA2002542C CA 2002542 C CA2002542 C CA 2002542C CA 002002542 A CA002002542 A CA 002002542A CA 2002542 A CA2002542 A CA 2002542A CA 2002542 C CA2002542 C CA 2002542C
- Authority
- CA
- Canada
- Prior art keywords
- image
- substep
- feature
- feature maps
- templates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/955—Hardware or software architectures specially adapted for image or video understanding using specific electronic processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Discrimination (AREA)
- Character Input (AREA)
- Image Processing (AREA)
Abstract
A highly effective OCR system that combines various techniques to remove meaningless variability and capture meaningful variability on the image before attempting to classify the characters within the image. Specifically, characters to be recognized are scanned to form a pixel image, and the image is scaled and "cleansed." The scaled and cleansed image is deskewed and skeletonized to form a thinned image, and the thinned image is applied to a neural network that extracts features of the image and forms an image map. Features are then combined to form "super features." Thereafter, the image is coarse blocked to reduce the dimensionality of the resulting feature vector. The feature vector is, lastly, applied to a classified that identifies the character.
Description
Imaged Symbol C~ ifi~tion Back~round of the Invention This invention relates to pattern analysis and recognition and, more particularly, to ~y~ ls for pattern or symbol recognition in an image composed of an array of picture elements tpixels).
A wide variety of applications exist in which it is desirable for a machine to automatically recognize, analyze and classify character patterns in agiven image. The explosion of colllpulel-based infc,llllalion g~thering, handling, manipulation, storage, and tr~n~mi~sion systems offer the technology that makes the l0 re~1i7~tion of these desires possible. Elaborate programs have been written for general purpose co~ ul~l s to perform pattern recognition, but they have experienced a limited level of success. That success was achieved mostly in the area of recognizing standard printed fonts.
One character recognition technique that dates back to the early 1960's involves following the curve of the characters to be recognized. It has an intuitive appeal but, unfortunately, it often fails when the characters are missh~pen or have extraneous strokes.
Bakis et al. (IBM) reported on an approach for recognizing hand-printed numerals in an article titled "An Experimccnt~1 Study of Machine Recognition of 20 Hand Printed Numerals," IEEE Transactions on Systerns Science and CyberneticsVol SSC-4, No. 2, July 1968. In the described system, the numerals are convertedinto a 25 x32 binary matrix. Features are extracted to reduce the ~lim~n~ionality of the 800 bit vector (25x32) to about l00, and the l00 bit vector is applied to several categorizers. Some "norm~1i7~tion" of the characters is also performed. The authors reported a recognition rate of between 86 to 99.7 percent, depending on the handwriting samples employed. Because of the low recognition rate relative to the desired level for co~ ;ial applications, the authors concluded that "it would seem that the course to follow is to combine curve-following type measurements ... with automatic feature selection and parallel decision logic."
In what appears to be a follow-up effort, R. G. Casey described an ~t;l;ment that expanded the "nonT ~1i7~tion" of Bakis et al. to a process of deskewing of the subject characters. "Moment Norm~1i7~tion of Handprinted Characters", IBM Journal of Research Development, September, 1970, pp 548-557.
He used feature recognition in combination with curve following, as suggested by35 Bakis et al., and decision methodologies which included template matching, clustering, autocorrelation, weighted cross correlation, and zoned n-tuples.
~, :~0;~542 In a subsequent article Naylor (also of IBM) described an OCR (Optical Character Recognition) system that employs a colllpulel, an interactive graphicsconsole, and skew norm~li7~hon. "Some Studies in the Interactive Design of Character recognition Systems", IEEE Transactions on Computers, September, 5 1971, pp 1075-1086. The objective of his system was to develop the applol liate logic for identifying the features to be extracted.
In U.S. Patent 4,259,661 issued March 31, 1981, another extracted-features approach was described by Todd. In accordance with the Todd approach, arectangular area defined by the character's extremeties is norm~li7ed to a predefinecl 10 size, and then divided into subareas. The "d~rkness" of the image within each of the subareas is evaluated, and the collection of the ~l~rkness evaluations is formed into a "feature vector". The feature vector is compared to a stored set of feature vectors that represent characters, and the closest match is selected as the recognized character.
In an article entitled "SPTA: A Proposed Alguli~ l for Thinning Binary Patterns", IEEE Transaction on Systerns, Man, and Cybernencs, Vol. SMC-14, No.
A wide variety of applications exist in which it is desirable for a machine to automatically recognize, analyze and classify character patterns in agiven image. The explosion of colllpulel-based infc,llllalion g~thering, handling, manipulation, storage, and tr~n~mi~sion systems offer the technology that makes the l0 re~1i7~tion of these desires possible. Elaborate programs have been written for general purpose co~ ul~l s to perform pattern recognition, but they have experienced a limited level of success. That success was achieved mostly in the area of recognizing standard printed fonts.
One character recognition technique that dates back to the early 1960's involves following the curve of the characters to be recognized. It has an intuitive appeal but, unfortunately, it often fails when the characters are missh~pen or have extraneous strokes.
Bakis et al. (IBM) reported on an approach for recognizing hand-printed numerals in an article titled "An Experimccnt~1 Study of Machine Recognition of 20 Hand Printed Numerals," IEEE Transactions on Systerns Science and CyberneticsVol SSC-4, No. 2, July 1968. In the described system, the numerals are convertedinto a 25 x32 binary matrix. Features are extracted to reduce the ~lim~n~ionality of the 800 bit vector (25x32) to about l00, and the l00 bit vector is applied to several categorizers. Some "norm~1i7~tion" of the characters is also performed. The authors reported a recognition rate of between 86 to 99.7 percent, depending on the handwriting samples employed. Because of the low recognition rate relative to the desired level for co~ ;ial applications, the authors concluded that "it would seem that the course to follow is to combine curve-following type measurements ... with automatic feature selection and parallel decision logic."
In what appears to be a follow-up effort, R. G. Casey described an ~t;l;ment that expanded the "nonT ~1i7~tion" of Bakis et al. to a process of deskewing of the subject characters. "Moment Norm~1i7~tion of Handprinted Characters", IBM Journal of Research Development, September, 1970, pp 548-557.
He used feature recognition in combination with curve following, as suggested by35 Bakis et al., and decision methodologies which included template matching, clustering, autocorrelation, weighted cross correlation, and zoned n-tuples.
~, :~0;~542 In a subsequent article Naylor (also of IBM) described an OCR (Optical Character Recognition) system that employs a colllpulel, an interactive graphicsconsole, and skew norm~li7~hon. "Some Studies in the Interactive Design of Character recognition Systems", IEEE Transactions on Computers, September, 5 1971, pp 1075-1086. The objective of his system was to develop the applol liate logic for identifying the features to be extracted.
In U.S. Patent 4,259,661 issued March 31, 1981, another extracted-features approach was described by Todd. In accordance with the Todd approach, arectangular area defined by the character's extremeties is norm~li7ed to a predefinecl 10 size, and then divided into subareas. The "d~rkness" of the image within each of the subareas is evaluated, and the collection of the ~l~rkness evaluations is formed into a "feature vector". The feature vector is compared to a stored set of feature vectors that represent characters, and the closest match is selected as the recognized character.
In an article entitled "SPTA: A Proposed Alguli~ l for Thinning Binary Patterns", IEEE Transaction on Systerns, Man, and Cybernencs, Vol. SMC-14, No.
3, May/June 1984, pp. 409-418, Naccache et al. present a different approach to the OCR problem. This approach addresses the fact that patterns are often made up ofstrokes that are wide, and that it may be of benefit to skeletonize the patterns. As 20 described by Naccache et al, "skeletonization consists of iterative deletions of the dark points (i.e., changing them to white) along edges of a pattern until the pattern is thinned to a line drawing". Ideally, the original pattern is thinned to its medial axis.
The article briefly describes fourteen different known skeletonization algolillLlls, and then proposes its own algorithm (SPTA). All of the described skeletonization25 algc.~ llls, inclu-ling SPTA, are based on the concept of passing over the image a square window of three rows and three columns (cunllllonly referred to as a 3x3 window). As the square 3x3 window is passed across the image, the algolilllllls evaluate the 8 pixel neighborhood surrounding the center pixel and, based on theevaluation, either convert a black center point to white, or leave it unaltered.Pattern classification received a boost from another direction with recent advances in the field of connectionism. Specifically, highly parallel computa~ion networks ("neural networks") have come to the fore with the work by Hopfield, disclosed in U.S. Patent 4,660,166, issued April 21, 1987. Also, work continued on robust learning algoli~llllls for multi-layered networks in which "hidden" layers of 35 neural elements perrnit separation of arbitrary regions of the feature space. This work, reported on, inter alia, by Gullichsen et al. in "Pattern Classification by Neural 200~542 Networks: An Experimental System for Icon Recognition", Proceedings of the IEEE
First Intemational Conference of Neural Networks, pp. IV-725-732, Cardill et al., Editors, concentrates on the character classification process. The system they describe uses some image preprocessing but not feature extractions. Instead, they rely entirely 5 on the inherent classification intelligence that the neural networks acquire through the "back propagation" training process. The reported system apparently works, but as suggested by the authors, many questions remained to be investigated. Thesystem's performance is less than acceptable.
There exist many other character classification techniques, 10 approaches, and algorithms. For purposes of this disclosure, however, the above references provide a reasonable description of the most relevant prior art. Suffice it to say that with all the efforts that have gone into solving the character recognition (i.e., classification) problem, the existing systems do not offer the accuracy that would permit a reliable automatic character recognition to be 15 implemented.
Summary of the Invention Our invention offers a highly effective OCR system by selecting the right combination of techniques, and modifying those techniques for increased performance in terms of speed and accuracy. The general approach of our system 20 is to remove meaningless variability and capture meaningful variability before attempting to classify. Specifically, in accordance with the principles of our invention, characters to be recognized are scanned to form a pixel image, and the image is scaled and "cleansed." The scaled and cleansed image is deskewed and skeletonized to form a thinned image, and the thinned image is applied to a neural 25 network that extracts features of the image and forms an image map. Features are then combined to form "super features". Thereafter, the image is coarse blocked to reduce the dimensionality of the resulting feature vector. The feature vector is, lastly, applied to a classifier that identifies the character.
In accordance with one aspect of the invention there is provided a 30 method for classifying an image, said method including a step of processing signals corresponding to said image to develop a processed image, a step of feature extraction to develop features of said processed image, and a step of classification CHARACTERIZED BY: said step of processing includes a substep of skeletonization of said image and a substep of deskewing of said image.
2002~42 `~ - 3a -In accordance with another aspect of the invention there is provided a method for classifying characters in an image COMPRISING THE STEPS OF:
capturing said image in ~he form of an array of picture elements; processing said image to scale said array of picture elements to a chosen size and to delete extraneous image portions from said image, developing thereby a processed image;deskewing and skeletonizing said processed image to develop a thinner and a moreupright version of said image; extracting features from said thinner and more upright image that characterize said image and developing feature identifiers torepresent features of said image and their locations; coarse blocking said feature maps to develop more granular feature maps; and classifying said granular feature maps as representing one of a known set of symbols, or representing an unknown symbol.
In accordance with yet another aspect of the invention there is provided apparatus for identifying characters embedded in an applied image comprising: first means for capturing said image in the form of a matrix of picture elements; second means responsive to said first means for identifying the presence of a character in said image and conditioning said character by deskewing and skeletonizing said image to develop a conditioned image; third means responsive to said second means for extracting features of said conditioned image to develop feature maps pertaining to specified features of said image; and fourth means responsive to said third means for developing coarse blocked versions of said feature maps; and means for classifying said coarse blocked version of said feature maps as representing one of a known number of characters or an unknown character.
Brief Description of the Drawin~
FIG. 1 presents a general flow diagram of the classification method of our invention;
FIG. 2 presents an example of a problem resulting from use of independent 3x3 templates;
FIG. 3 shows the set of thinning templates used with our invention, which includes templates greater than 3x3;
S~2 FIG. 4 depicts a set of feature extraction templates;
FIG. 5 illustrates the structure of a neural network decision circuit used in connection with the templates of FIGS. 3 and 4;
FIG. 6 depicts the structure of a two-layer neural network with analog-5 valued connection weights; and FIG. 7 illustrates one re~li7~tion for an analog-valued connection weights neural network.
Detailed Description FIG. 1 presents a flow chart of our process for character or symbol 10 cl~sific~tion. In block 10, the character image is ca~ d and, advantageously,stored in a frame buffer such as a semiconductor l~mol ~. The image may be obtained through electronic tr~nsmi~siQn from a remote location, or it may be obtained "locally" with a scanning camera. Regardless of the source, in accordance with conventional practice the image is l~plesen~ed by an ordered collection (array) 15 of pixels. The value of each pixel coll~,i,ponds to the light (hrightn~ss, color, etc.) em~n~ting from a particular small area of the image. The pixel values are stored in the memory.
Smudges and extraneous strokes are often found in proximity to characters. Their presence cannot help but make the recognition process more 20 difficult. In accordance with our invention, block 20 follows block 10 and its function is to cleanse the image. This is the first step in our effort to removeme~ningless variability from the image.
Usually, an image of a symbol or a character, such as a digit, contains one large group of pixels (contiguous) and a small number, possibly zero, of smaller 25 groups. Our cleaning algorithm basically identifies all such groups and deletes all but the largest one. If the deleted groups, together, con~tihlte more than a certain percentage of the original image, this fact is noted for later use, since it indicates that the image is anomolous. Tn the context of this description, it is ~sllme~l that the image symbols are composed of dark strokes on a light background. A "reversed"
30 image can of course be handled with equal facility. The above cleaning algc~
also assumes that the symbol set that is expected in the image does not contain symbols that call for disjoint strokes. The digits 0-9 and the Latin alphabet form such sets (save for lower case letters i and j), but most other alphabets (Hebrew, Chinese, Japanese, Korean, Arabic, etc.) contain many disjoint strokes. For such35 other sets a slightly differellt cleansing algorithm would have to be applied, such as looking at each disjoint area, rather than at the whole collection of such areas.
2~0~S42 There are a number of processes that can be applied to detect and identify these extraneous areas. The process we use resembles a brush fire.
In accordance with our process, the image is raster-scanned from top to bottom in an effort to find "black" pixel groups. When such a group is found (i.e., 5 when a black pixel is encountered that has not been considered before), the scanning is suspended and a "brush fire" is ignited. That is, the encountered pixel is marked with an identifiçr~ and the marking initiates a spreading process. In the spreading process, each of the eight immetli~ely neighboring pixels are considered. Those neighboring pixels that are black are similarly marked with the identifier, and each 10 m~rking initi~tes its own spreading process. In this manner, the first encou-lleled pixel of a "black" group causes the entire group to be quickly identified by theselected identifier. At this point in the process, the scS~nning of the image resumes so that other groups can be discovered and identified (with a dirr~ t identifier). When sc~nning is completed and all of the "black" areas are identified, area calculations 15 can be carried out. As indicated above, all but the largest group is deleted from the image (i.e., turned from dark to light, or turned OFF).
It may be noted at this point that in the character recognition art, it is more important to not make a mistake in identifying a character incorrectly, than to refuse to make a decision. For that reason, in a system that is ~lesigned to identify 20 numerals or other character sets that do not have disconnected strokes, the area removal threshold should be set to a fairly low level.
Ordinarily it is expected that the pixels comrrising the m~ningful part of the image will be contiguous in a strict sense (in the arole-..P~-tioned 0-9 character set and the Latin alphabet). On the other hand, an exception should be made, 25 perhaps, when areas are separated only slightly, and external inro""ation leads one to believe that it is possible for a character stroke to be inadvertently broken (such as when writing with a poor pen or on rough writing surface). To provide for such contingencies, our process for spreading the "fire" includes an option for defining the neighborhood to include eight additional pixels that are somewhat removed from the 30 eight imm~li~te pixels (the eight pixels being corners of a larger window and center pixels of sides of the larger window). In effect, we permit the "fire" to jump over a "fire break."
The process of scaling the image to a given size, in block 25, follows the cleansing process. Scaling, of course, removes a m.o~nin~less variability of the35 image. The sequence of cleansing followed by scaling is imposed by the desire to not scale the image that includes smudges. The scaling process can use any one of a 2S4~
number of dirr~ t algorithms. For example, in accordance with one alg~ hlll the image can be scaled in both dimensions by an equal factor, until one of the image dimensions reaches a fixed size. Another algoli~hm can scale independently in the two ~limen~iQns, subject to some constraint on the largest difference in the scaling S factors of the two ~limen~ions. Both approaches work well and, thcl~role, the choice of the algorithm and its implementation are left to the reader. We scale each character image into a convenient array size, such as 18x30.
People generally write characters at a slant. The slant is different from one person to another. The slant, or skew, of the characters is another variability of 10 written characters that carries no information. We wish to remove this variability.
Returning to FIG. 1, block 30 which follows block 25 deskews the image. Stated dirrel~nlly, it is the function of Block 30 to make all characters more uniformly upright.
Block 30 can use any one of a number of conventional procedures for 15 deskewing an image. One such procedure subjects the image to a transformation of the form rUl 1 -mxy /myy x - xO
Lv~ - 0 1 y - yO ~
where x and y are the original coordinates of the image, xO and yO define an origin point, u and v are the coordinates in the transformed image, and mxy and myy are the 20 image moments calculated by mxy = ~(x-xo)(y-yo)B(x~y) x,y and myy = ~,(y-yO )2B(x,y) .
xy In the above, B(x,y) assumes the value 1 when the pixel at position x,y is "black", 25 and the value 0 otherwise. The effect of this function is to reduce the xy moment to essentially 0.
Scaling (block 25) and deskewing (block 20) are both linear transformations. Therefore, the composition of the two is also a linear transformation. It may be advantageous to apply the compound transformation to 30 the cleansed image to produce the deskewed image directly. This combined operation allows us to avoid an explicit representation of the scaled image as an array of pixels. This elimin~tes a source of (computation) noise.
2~ZS42 _, Block 40, which in FIG. 1 follows block 30, thins the image. Thinning of the image also removes m~ningless variability of the image. As indicated above, the prior art methods for skeletonization use a 3x3 window that is passed over the image. The center point of the 3x3 window is turned OFF if certain conditions are S met; and those conditions, in most of the m-~thofl~, involve repeated tests with dirrcl~nt predefined window conditions. For example, the Ben-Lan and Montoto algo.ithlll states that a dark center point is deleted (i.e., turned OFF or turned light) if it satisfies the following conditions:
1) the pixel has at least one light 4-neighbor, and 10 2) the neighborhood does not match any of 8 predefined 3x3 windows.
A 4-neighbor is a pixel that is east, north, west, or south of the pixel under consideration.
Algorithms like the one described above are quite acceptable in software implementations beca-lse, until recently, processors were able to handle only one 15 task at a time anyway. However, these algoli~s are necess~rily slow because of their sequential nature. Furthermore, each of these prior art tests zeroes in on a certain characteristic of the pattern, but not on other characteristics. To thin strokes of dirr~le.lt character (i.e., vertical lines and horizontal lines) dirr~ t tests must be applied. Additionally, with prior art tests there is a need to p~rOllll at least some of 20 these tests sequentially before one can be sure that a particular pixel may be delete~l;
and the pixel cannot be turned OFF until these tests are performed. The example of FIG. 2 illustrates the problem.
In FIG. 2, templates 100 and 110 are two 3x3 pixel windows. The three top pixels in template 100 are circle-hatched to design~te searching for ON pixels.
25 The center pixel, and the pixel in the center of the bottom row are crosshatched to design~te searching for OFF pixels. The ~ i"i-lg pixels are blank, to design~te a "don't care" condition. Template 100 searches for the edge condition of light space (pixels 101, 102, and 103) above dark space (pixels 104 and 105), with the caveat that the dark space must be at least two pixels thick. When such a condition is 30 encountered, according to template 100 the center pixel (104) is turned from ON to OFF (dark to light). Thus, template 100 provides a mechanism to nibble away froman ON area, from the top, until there is only one ON row left.
Template 110 operates similarly, except that it has the bottom row looking for OFF pixels while the center pixels of the first and second row are looking 35 for ON pixels. Template 110 nibbles ON (dark) areas from the bottom.
2¢~542 -The above templates which thin horizontal lines and do not thin vertical lines illustrate the desirability of passing a number of dirr~rellt templates over the image, with the difr.,r~llt templates being sensitive to dirr~rent characteristics of the image. It is also desirable (from a speed standpoint) to pass the various templates S concu,~ ly. However, in the FIG. 2 image segment 106, templates 100 and 110 cannot be applied concurrently because, if that were done, the depicted 2-pixel wide horizontal line would be completely çlimin~ted The top row would be deleted by template 100, and the bottom row would be deleted by template 110.
If line thinning is to be performed efficiently, this interdependence 10 between dirferent templates must be broken.
We found that, unexpectedly, this interdependence can be broken by employing a window that is greater than 3x3. Hence, we use a template set which contains at least some templates that are greater than 3x3. Some are 3x3, some are 3x4, some are 4x3, and some are 5xS The characteristic of the set is that the 15 templates can be passed over the image concull~ntly. This capability comes about from the particular selection of templates, which allows the image to be altered in response to one template without having a deleterious effect on the ability of another template to independently alter the image. This fairly unique set of templates is shown in FIG. 3.
We discovered that the set of templates depicted in FIG. 3 is a sufficient set. Other sets are possible, of course, but, in accordance with our inventions, such sets are characterized by the inclusion of at least one template that is greater than 3x3.
To describe the operation of the depicted templates, we start with templates 120 and 140. These templates correspond to templates 100 and 110 of FM. 2. Template 120 is shown as a SxS array but, in essen~e, it forms a 3x3 window since the outer columns and rows are at a "don't care" condition. Template 120 differs from template 100 in that pixels 121 and 122 in template 120 test for ON
pixels, whereas the correspondingly positioned pixels in template 100 are set to30 "don't care". That is, template 101 makes sure that the pixel nibbled away (turned light) is above a line that extends in both directions. Template 140, on the other hand, differs from template 110 in that, effectively, it is a 3x4 template. It includes a 3x3 portion that is similar to the 3x3 template 110 ~other than pixels 141 and 142), and it also includes a pixel 143 at the center of the first row. Pixel 143, in effect, 35 requires a horizontal line to be 3 pixels wide before a pixel is pe, . "itled to be nibbled away (from the bottom).
20~2542 -g Templates 130 and 150 form a template pair like the template pair 120 and 140. Templates 130 and 150 thin vertical lines. Templates 160, 170, 180, and190 thin "knees" pointing to the right, left, up and down, l.~s~ ely; templates 200, 210, 220 and 230 thin slanted lines from above and from below; etc. It may be noted 5 that templates 160-230 are all 5x5 templates.
Returning to FIG. 1, Skeletonization block 40 is followed by feature extraction block 50. Although operationally similar, skeletonization is dirr.~ tfrom feature extraction from a functional standpoint. In the former, one identifies superfluous pixels and turns them from dark to light. In the latter, one itlentifies 10 relatively macroscopic characteristics that help classify the character. The macroscopic characteristics identified are the kind of characteri~tics that are not dependent on the size or thickn~oss of the character, but are the ones that give the character its particular "signature." Hence, it is these characteristics that block 50 seeks to identify Operationally, feature extraction is accomplished by passing a collection of windows over the image. Each window in our system is a 7x7 template, and eachtemplate detects the presence of a particular feature; such as an end point, diagonal lines, a horizontal line, a vertical line, etc. The detection works by a majority rule in the sense that when the majority of the 49 pixels (7x7) fit the template, it is 20 concluded that the feature is present. In our system we employ 49 dirr.~ellt 7x7 templates, as depicted in FIG. 4. For each of the templates we create a "feature map"
which basically indicates the coordinates in the image array where the pattern of the template matches the image.
Having developed the 49 feature maps corresponding to the 49 25 templates of FM. 4, we develop a number of super-feature maps in block 60 that are logical combinations (AND and OR) of the feature maps. We thus reduce the set from 49 to 18 maps (of 18x30 pixel arrays). This reduced number of maps has beendetermined heuristically.
We call the arrangements of the detected features "maps" because we 30 structure an array (in the memory where we store them) and we place the feature detections in the ap~ropliate locations in the array. In this manner we record the presence of a feature and its location. Other mech~ni~m~ for recording "hit" location design~tions can be used, but it is still conceptually simpler to think in terms of maps.
2~ 2 _ It turns out that the 18x30 array is too detailed for classification purposes. The detail can actually mask the character and make the classification task more difficult (as in the saying "you can't see the forest for the trees"). Accordingly, block 70 performs coarse blocking to reduce the 18x30 feature maps to feature maps 5 that are only 3xS. This results in a final map or vector of 270 bits, which corresponds to 18 3xS maps.
Lastly, block 80 performs the classification algorithms to determine, from the given 270 bits, the most likely classification c~n~ te. A simple algoli~l,m, such as de~e~ inil~g the lowest E~mming distance. will suffice once it is 10 known what templates most likely correspond to the characters the are to be idenfifie~1 The key, of course, lies in determining these templates; and that aspect calls for the learning methodologies (such as back propagation) that the art is currently dealing with.
Hardw~e embodiment Although FIG. 1 depicts the process of our OCR system, it is also quite representative of the haldwa~e re~li7~tion. The actual details of the signal flow would vary with the particular design, but that is perfectly well within the conventional circuit design arts. For purposes of the following discussion, it may be considered that our system opel~les in a pipelined fashion and that each electronic 20 circuit block applies the necess~ry signals and controls to the following circuit block, together with the necess~ry identifiç~tion as to which pixel is being considered.
As suggested earlier, block 10 compri~es conventional a~pal~tus that is tailored to the particular source of the image to be cl~cifieA It can simply be a video camera coupled to a collll~ .;ial "frame grabber" and a memory. When the 25 cl~ssifi~afion process begins, the mellluly is accessed to retrieve the center pixels and the 24 neighboring pixels, and the collection of retrieved signals is applied to block 20.
Blocks 20 and 30 are currently implemented on a SUN workstation with the simple programs presented in the appendix. Local memory is included with the30 microprocessors to store image signals and lelllpol~y computation results, asnecessary. Practically any microprocessor can be similarly lltili7od, but if higher speed is required than is obtainable with a microprocessor, specific haldwa~e can be designed in a conventional manner to carry out the needed calculadons. In fact, since the operations required are merely additions, subtractions, comparisons, and 35 rudimentary multiplications, a pipelined architecture can easily be designed that I I _ offers very high throughputs.
The output of block 30 is a sequence of signal sets, each having an associated center pixel and its neighboring pixels. Block 40 is implemented with the neural network of FIG. 5 which includes a series connection of a switch 400, a 5 template match network 410, and a threshold network 420. The input signals, which correspond to the 25 pixel values of the image covered at any instant by the SX5window are applied to switch 400 at inputs 410. Switch 410 insures that these values are applied to network 410 cim~llt~neously. Network 410 includes 25 inputleads and a number of output leads that equals the number of templates stored.
10 Within network 410, all input leads are connected to each output lead through a column of preset connection nodes. Each such column of connection nodes (e.g. the column cont~ining nodes 411-414) coll~,;,ponds to a stored template. Thus, the signal of each output lead represents the affinity of the input signal to a dirre.ent template. More specifically, the connection nodes are of three "varieties"; to wit, 15 excitatory (E), inhibitory (I), and "don't care" (D). Response to a match or a mi~m~tch differs with each of the v ri~ti~os~ in ccordance with the truth table below.
input synapse output Nodes 411 that implement this truth table are easily realized with gated amplifiers.
The information of whether a node is an E, I, or D node, can be stored in 25 a two flip-flop set associated with each node (when variability is desired).
Alternatively, the info~mation can be "hardwired" with an aIray of links associated with the array of nodes. The pro~ ing of the templates (i.e., connections) can be achieved through a burn-through of the applopliate links. Of course, if the templates are completely unchanging, one can design the template inrGl~nation directly into the 30 integrated circuit mask of the nodes' array.
The current of the output lines flows into an impedance, and the flow causes the voltage of each output line of network 410 to rise to a level that isplopv~lional to the degree of match between l's in the set of input signals and excitatory nodes. Of course, the voltage is also ~liminiched by the degree of match 35 between l's in the set of input signals and the inhibitory nodes.
2~)~)25~2 The output lines of network 410 are applied to threshold network 420, where that impedance can optionally be placed. Network 420 applies a set of thresholds to the output signals of network 410. Specifically, network 420 comprises a set of two-input amplifiers (e.g., 421-424) having one input responsive to the input 5 leads of network 420, and a numbe~ of sources (e.g., 425-427) that connect to the second input of amplifiers 421-424. Each of the sources supplies a dirre~ t current and, correspondingly, each amplifier 421-424 develops a voltage on its second lead that is related to the specific connection that its input has to sources 425-427. In this manner, dirrele,~t thresholds can be applied to the dirrtn,.~t amplifiers within network 10 420. The output leads of nc~wclL 420 are the outputs of amplifiers 421-424, and they take on the logic value 1 or 0, depending on whether the signal input of anamplifier exceeds the threshold or not.
Block 50 is constructed with a neural network such as the one depicted in FIG. 5. However, since the block 50 neural network deals with 7x7 templates as 15 compared to the 5x5 templates of block 40, a memory 55 is interposed between the two neural networks to buffer the data.
Block 60 generates the 18 feature maps. It simply takes the outputs of block 50 and, together with the signal that specifies the identity of the center pixel, stores the a~plupliate info~ ation in a memory. The result is 18 mellloly segments, 20 with each segment containing illfc.lmation about the fe&~ules found in the image.
Each such segment is, thus, one of our feature maps.
The coarse blocking of block 70 is achieved by using 18 additional smaller memory segments perhaps in the same physical memory device. In those smaller memory segmen~c, block 70 stores information about the features that are25 found in applûpliately selected portions of the larger memory segments. When the original image is 18 pixels by 30 pixels in size, the selection can be easily accomplished with a counter that operates in modulus 5, where the full value of the counter is used to access the larger segments, while the whole number after division by the modulus is used to identify the cells in the 18 smaller memory segments.
The 270 memory locations of the smaller memory segments form the output of block 70 and make up, in effect, a vector that describes the charactercontained in the image.
The last function that needs to be carried out is to apply this vector to some network that would select the most likely c~ntli~l~te character for the given 35 feature vector. This is the function of block 80.
ZS4~
Block 80 can be implemented in many ways. For example, the content-addressable teachings of Hopfield in the aforementioned U.S. patent 4,660,166 can be used to advantage. In accordance with his teachings, one can impart to the feedb~c~ network of his circuit the information about the characters in the subject 5 set. With such information in place, the content-addressable Illell~ulr identifies the feature vector of the character that is closest to the applied feature vector. The Hopfield network is very robust in making the "correct" choice even when the input appears to be quite distorted. It is a little difficult, however, to design the feedback network for the Hopfield circuit because all of the stored vectors are distributed 10 throughout the feedback network and commingled with one another. This difficulty is compounded by the fact that we do not exactly know how we recognize a "4", orthe limits of when we can recognize a "4" and when we are so unsure as to decline to make a decision. Yet, we know a "4" when we see one!
Current research attempts to solve this problem by having the classifier 15 circuit "learn", through trial and error, to reach the correct decisions. One structure that has the potential for such "learning" is depicted in FIG. 6. This technique is commonly referred to as "back propagation." It is described, for example, by D. E.
Rnm.olh~rt et al. in "T.e~rning Tnt~rn~1 Representations by Error Propagation", in D.
E. l~llmçlh~rt, J. L. McClell~n-l (Eds.), Parallel Distributed Proces~ing: Explorations 20 in the Microstructure of Cognition, MIT Press, 1986, Chap. 8.
FIG. 6 comprises interconnection networks 81 and 82 that are serially connected. The input signal set is applied at the input of network 81, and the output signal set appears at the output of network 82. Each of the networks has a plurality of input and output leads and each input lead is connected to all of the output leads.
25 More specifically, each input lead i is connected to each output lead j through a connection weight wij . In our application, network 81 has 270 input leads and 40 output leads. Network 82 has 40 input leads and 10 output leads. The number of input leads of network 81 is dictated by the length of the feature vector. The number of outputs of network 82 is dictated by the nu~ of characters in the classifying30 set. The number of intçrme~ te leads (in this case, 40) is determined heuristically.
Training of the FM. 6 circuit is carried out by applying a developed feature vector of a known character and adjusting the weights in both network 81 and 82 to maximize the output signal at the designated output lead of network 82 corresponding to the applied known character. All available samples of all the 35 characters in the set to be classified are applied to the network in this fashion, and each time, the weights in the interconnection network are adjusted to maximize the 2B02,5g2 signal at the appropliate output lead. In this manner, a set of weights wij is developed for both networks.
It may be ap~lv~iate to explicitly n~nlion that the connection weights wij are analog in nature and that the circuit opcldtes in an analog fashion. That is, 5 the voltage at any output lead of network 81 is a sum of the contributions of the "fired up" weights connected to that output lead. Each weight is "fired up" by abinary signal on the input lead to which the weight is connected. Thus, the output at lead j equals ~Biwii 10 where Bi is the value of the i'h input lead (0 or 1).
Though the concept of such a le~rning network is fairly well understood, the task remains to realize such an analog circuit efficiently and compactly. The re~ ents on such a circuit are not trivial. For example, the -,;ni-~----- weight change, or mo lific~tion, must be fairly small if optimi7~tion of the netwolk is to be 15 achieved. The iterative improvement mçtho~ology described above is based on the heuristic assumption that better weights may be found in the neighborhood of good ones, but that heuristic fails when the granularity is not fine enough. We found that for a small network 81, at least 8 bits of analog depth are n~cess~ry. Larger networks may require even finer granularity. The weights must also represent both 20 positive and negative values, and changes must be easily reversible. During the learning and training session the number of changes to the weights can be quite large. Therefore, a practical circuit must allow for quick modification of the weights.
Taking these and other requirements into account, we have created an 25 efficient analog connection weight, or strength, circuit with MOS VLSI technology.
Whereas each connection weight in FIG. 6 is depicted with merely a black dot. FIG. 7 presents a circuit for implem~nting these dots; or more particularly, FIG. 7 shows one connection weight circuit with its connection to input lines 83 and output line 84, as well as some col~ on circuitly. Primarily, the 30 interconnection weight portion of the FIG. 7 circuit includes capacitors 801 and 802, small MOS switches 803 and 804, a relatively large MOS transistor 805, a differential amplifier 806, and a multiplier 807. Secondarily, the circuit of FIG. 7 includes a charge-coupling switch 808, a sensing switch 809 and various control leads.
2~02S~Z
._ The circuit operates as follows. Capacitors 801 and 802 are charged to Lrrelellt voltage levels, and the difference in voltage levels is reflected in the output voltage of differential amplifier 806. Amplifier 806 has its two inputs connected to capacitors 801 and 802. The output of amplifier 806, which represents the 5 connection weight, is connected to multiplier 807. Multiplier 807 can be any conventional transcondllct~nce amplifier. Also connected to multiplier 807 is input lead 83 of the interconnection network. The output of converter 807 is connected to an output lead of the interconnection nelwolk. Thus, multiplier 807 sends a current to the output lead that is a product of the signal at the input lead and the value of the 10 connection weight. The connection weight is represented by the differential voltage developed by amplifier 806 in response to the Lrrel~ ce in voltages between capacitors 801 and 802.
We have found that the difference in voltages on capacitors 801 and 802 is m~int~inçd for a long time (relative to the operations involved in OCR systems) 15 and that no refreshing is necess~ry when the circuit is kept at reasonably low tempeldtulcs. For example, at 77 degrees Kelvin no detectable loss has been observed with time. It may be noted that one advantage of our circuit is that the weight is plopulLional to Vc#o, - VC802 and, therefore, even a loss in charge -- when it is the same at both capacitors -- results in no change to the weight.
Nevertheless, an avenue must clearly be provided for refreshing the information on capacitors 801 and 802. Moreover, an avenue must be provided for setting a voltage (charge) value on capacitors 801 and 802 and for modifying the set values to allow for the above-described "learning" procedure. This is where the rem~ining switches and controls come in.
To bring a connection weight to a desired level, switch 808 is closed moment~rily to allow a fixed voltage level to be applied to capacitor 801 from voltage source 816. That voltage coll~onds to a fixed charge. Thereafter, switch808 is turned off. At this point, the weight of the connection is at a m~ximllm positive level because capacitor 801 is connected to the non-inverting input of amplifier 806 and carries a positive voltage, while capacitor 802 is connected to the inverting input of amplifier 806. A change in the connection weight is accomplished in the following way.
First, transistors 803 and 805 are turned on. Transistor 803 is very small compared to transistor 805 and for the sake of a better understanding of what 35 happens, transistor 803 can be thought of as being merely a switch. By comparison, transistor 805 is long and narrow and when it is on it can be thought of as a 21~)2S42 `_ capacitor. When switch 803 is closed and transistor 805 (assuming it is an n channel device) is turned-on, the charge on capacitor 801 is distributed between the capacitor (801) and the inversion charge on the turned on transistor 805. Transistor 803 is then turned off, thereby trapping the charge in transistor 805. Transistor 804 is then 5 turned on and if transistor 805 is slowly turned off, the mobile charge in its channel will diffuse through switch 804 into capacitor 802.
The above steps thus move a quantum of charge from capacitor 801 to capacitor 802. That corresponds to a change in the capacitors' voltages and in the intercolmection weight.
The above sequence can be repeated as many times as necessary to bring the connection weights to the desired levels. In this manner, the optimization of the connection weights can proceed during the training period, with the result that each inter~onnection weight in nelwolk~ 81 and 82 is set to the correct level.
The above description addresses the training aspect of the circuit. Once 15 the learning process is over, means should be provided for 1) de~ ....ining the values of the weights and 2) refreshing the weights to compensate for loses with time, etc.
This is accomplished with the aid of sensing switch 809, an A~D converter, a D/Aconverter, and a non-volatile memory.
To ~letçrmine the value of the weights in an interconnection network, all 20 of the input leads are turned on, one at a time. Each time a lead is turned on, the sensing switches (809) of the weights connecte~l to that input lead are sequentially turned on to allow each amplifier's voltage to appear on sensing bus 810. That voltage is applied to A/D converter 811 and the resulting digital information isstored in memory 812. All of the weights are converted to digital form in this 25 manner and stored in memory 812. During a refresh operation, each connection weight is isolated in the manner described above, but this time the voltage output on sensing bus 810 is colllp~d in amplifier 814 to the analog voltage of D/A converter 813, to which the digital output of memory 812 is applied. Of course, memory 812is caused to deliver the digital output that corresponds to the refreshed connection 30 weight. Based on the comparison results, the sequence of switching elements 803, 804, and 805 is controlled by the output signal of amplifier 814 to either increase or ~limini.~h the voltage of capacitor 801 relative to capacitor 802. The control of directing the output of bus 810 to either A/D converter 811 or to colllp~dtor amplifier 814 is effected by switch 815. Should it be necess~ry to completely 35 discharge both capacitors 801 and 802, the voltage of source 816 can be reduced to zero and switches 803, 804, and 805 can be turned on.
2~ 542 APPENDIX
~
Il // fire.c 9nl88 LDJ
// check for broken images // returns -1 if completely blank // returns 0 if connected // returns 1 if connecte~l except for samll flyspecks 10 // returns 2 if badly disconnecteA
// uses recursive brush_re algorithm // Diagnostic output: prints number of segments, // and location and code of the largest // Side effect: sets up Lseg (in rec2com) 15 // IMPORTANT ASSUMPTION: firel assnmçs img.pix black pixels are POSITIVE
// and white pixels are zero ---// If you can't guarantee that, call fire() instead of fire() // negative pixels will cause trouble // This routine mofiifies img.pix ! !
20 #include ~stdio.h~
#include "rec2types.h"
#include "errl.h"
#include "fire.h"
inline int imin(int a, int b) {return(a~b? a: b); }
25 inline int imax(int a, int b){return(a>b? a: b);}
static int xdl; // copy of img.x static int ydl; // and img.y - 2~025~Z
static char** pixl;// and img.pix static Pair* pl;
static Pair* p2;
static int list_size = -1;
5 static Seg myseg;// segement being processed Seg Lseg; // longest segment there is static int nseg;
static int totpix;// total pixels in image int firel(Image img){
// make sure we have allcated enough room to keep our pair-lists:
int lsiz = img.x * img.y;// max possible size of list required if (lsiz > list_size) {
if (pl) {
15 delete pl;
delete p2;
}
pl = new Pair[lsiz];
p2 = new Pair[lsiz];
20 list_size = lsiz;
}
Lseg.list = 0; // no longest segment yet Lseg.size = 0;// first seg will beat this for sure nseg = 0;
25 totpix = 0;
// find first black pixel, so we can initiate the burn there:
int xx; int yy;
for (yy = 0; yy < img.y; yy++) {
for (xx = 0; xx < img.x; xx++) 30 if (img.pix[yy][xx] > 0) {
2~0ZSa~2 11/ fprintf(stderr, "firstx = %d firsty = %d 0, xx, yy);
// a lot of these things might logically be ar~,un~nt~ to burn(), // but static variables are faster & simpler nseg++; // count this segment myseg.ashes = -nseg;
myseg.list = (Lseg.list != pl) ? pl: p2;
myseg.size = O;
xdl = img.x; ydl = img.y; pixl = img.pix;
burn(xx, yy);// burn, baby, burn!
if (myseg.size > Lseg.size) Lseg = myseg;
) }
#ifdef TEST
15 fprintf(stderr, "Saw %d segmentsO, nseg);
if (nseg) fprintf(stderr, "Longest (code %d) starts at %d %dO, Lseg.~hes, Lseg.list[O].x, Lseg.list[O].y);
#endif if (nseg == O) return - l;
20 if (nseg == 1) return 0;
float frac = float(Lseg.size) / float(totpix);
const float minfrac = .9;
if (frac ~= minfrac) return l;
return 2;
}
// the m~gic~l recursive burning routine // turns to ashes all points that are 8-connected to the initial point void burn( 30 int xcent, int ycent // center of 3 x 3 region of interest ){
- 2~S~2 // if this point is off-scale:
if(xcent<0 ycent<0 xcenv=xdl ycenv=ydl) return;
// NOTE: this is indeed a check for > 0, // not just nonzero, so things don't burn twice.
5 if(pixl [ycent][xcent] > 0) {
int top = myseg.size++;// keep track of length of segm~nt totpix++; /I count total pixels pixl[ycent][xcent] = myse~ ~hes // turn this point to ashes myseg.list[top].x = xcent;
10 myseg.list[top].y = ycent;
burn(xcent+l, ycent+l);// ignite neigbors burn(xcent+l, ycent );
burn(xcent+l, ycent-l);
burn(xcent, ycent+l);
15 burn(xcent ,ycent-l);
burn(xcent-l, ycent+l);
burn(xcent-l, ycent );
burn(xcent-l, ycent-l);
#define jumpbreaks YES
20 #ifdef jumpbreaks int jump = imax(xdl, ydl);
jump = jump / 20;
if (jump < 3) jump = 3;
if(jump>l)~
25 burn(xcent+jump, ycent+jump);// ignite more distant neigbors burn(xcent+jump, ycent );
burn(xcent+jump, ycentjump);
burn(xcent, ycent+jump);
2~0~54Z
burn(xcent, ycentjump);
burn(xcentjump, ycent+jump);
burn(xcentjump, ycent );
S burn(xcentjump, ycentjump);
}
#endif // if this point NOT set, or already burned, do nothing 10 return;
}
~/////////////////////////////////
// same as above, but does not assume that black pixels are positive;
// non-zero suffices.
15 int fire(Image img) {
for (int yy = 0; yy < img.y; yy++) ( for (int xx = 0; xx < img.x; xx++) {
img.pix[yy][xx] = img.pix[yy][xx] != 0;
}
20 }
return ( firel(img) );
}
****************************************************~*********
// do most of the work for linear transformation program 2~ 542 // p~rO~ linear transformations on post of fice data // i.e. convert to standard size and aspect ratio // see lin.plan for extended discussion S // Note: xypix[][] will contain small integers 0..9 // for graylevels below threshold, you get zero;
// for graylevels above threshold, you get the graylevel number // This gives you the option of treating it as a boolean // if you don't care about graylevels.
10 // Caller allocates the array; we fill it.
#include <stdio.h>
#include <m~th h>
#include "new_array.h"
#include "/nets/utiVutil.h"
15 #include "rec2types.h"
#include "do_lin.h"
inline float fmin(float a, float b){return(a<b? a: b);}
inline float fmax(float a, float b)lreturn(a>b? a: b);) inline int imin(int a, int b){return(a<b? a: b);}
20 inline int imax(int a, int b){return(a>b? a: b);}
void do_lin( const Image rawJ/ input image const int known_fit,// 1 ==~ char already fills box pdim by qdim.
25 Image des, // result: array of small integers FILE* param_fp,// parameter file filepointer /l O ==> all parameters take default values const char* sname// filename, for informational messages ~025~2 // provide "" if you can't do better )( Pair* bl;
bl = new Pair[raw.x*raw.y];// safe; probably overestimate # of black pixels 5 int ibl = 0;
int pp, qq;
for (qq = 0; qq < raw.y; qq++) {
for (pp = 0; pp < raw.x; pp++) if (raw.pix[qq][pp]) {
bl[ibl].x = pp;
10 bl[ibl].y = qq;
ibl++;
}
}
do_linl(raw.x, raw.y, bl, ibl, known_fit, des, param_fp, sname);
lS delete(bl);
_~W
void do_linl( const int pdimJ/ size of input array 20 const int qdimJ/ ..
const Pair* bl,// input: list of black pixels const int nbl, // size of said list const int known fit,// 1 ==> char already fills box pdim by qdim.
Image des, // result: array of small integers 25 FILE* param_fpJ/ parameter file filepointer // 0 ==> all p~lleters take default values const char* sname// fil~-.n~m~, for infc,lmational messages // provide "" if you can't do better ){
2~2`5~2 -Fget(kernel, 2);// convolution kernel (units of PQ rows/cols) Iget(mingray, 3);// this or larger: return graylevel, else return zero Fget(fcorn, .7);1/ fatness corner Iget(info, 0); // report miscellaneous information S Iget(inflate, 0);//1 ==> small chars can grow, fatness can change float pkern = kernel;
float qkern = kernel;
const int xO = 0;
const int yO = 0;
10 const float xmid = (des.x - xO) / 2.0;
const float ymid = (des.y - yO) / 2.0;
~D~a~
// find raw bounding box int pO, qO, p2, q2;
15 int ibl;
int pp, qq;
if (known_fit) {
pO = 0; qO = 0;
p2 = pdim; q2 = qdim;
20 } else {
pO = bl[O].x; qO = bl[O].y;
p2 = pO, q2 = qO;
for (ibl = 1; ibl ~ nbl; ibl++){
pp = bl[ibl].x; qq = bl[ibl].y;
25 pO = imin(pO, pp);
p2 = imax(p2, pp);
qO = imin(qO, qq);
q2 = imax(q2, qq);
}
30 p2++; q2++;
}
2t9~2S~2 -// calculate some moments:
float** xyflt = new_float(des.y, des.x);
// note that we are treating the pixels as BOXES of ink, not points Il so the (0,0) pixel extends from (0,0) to ( l-eps, l-eps) 5 ll and has its center at (.5, .5) float pmid = (p0 + p2) / 2.0;
float qmid = (q0 + q2) / 2.0;
I/ but if we shift the middle half a bit, t/ we can pretend the (0,0) pixel is centered at (0,0) 10 float pmx = pmid - .5;
float qmx = qmid - .5;
float mpq = 0.;/l PQ moment float mqq = 0.;// QQ moment for (ibl = 0; ibl < nbl; ibl++) {
15 pp = bl[ibl].x; qq = bl[ibl].y;
mpq += (qq - qmx)*(pp - pmx);
mqq += (qq - qmx)*(qq - qrnx);
}
float theta = mpq / mqq;
20 // Note: since pixels are numbered from UPPER left, Il positive theta corresponds to "
// negative theta corresponds to "/"
// (pt, qt) is the coordinate where (p,q) will go when the char is deskewed.
25 // This is not quite the same as (x,y) since the latter // has size changes as well.
Il Calculate min and max horiz coordinates, // measured relative to a line parallel to the sides of the parallelogram // [the reference line goes through (pmid, qmid) 30 // which is the middle of the raw PO rectangle]
#define pmap(p,q) (p-pmx - (q-qmx)*theta) float pt0 = pmap(bl[0].x, bl[0].y) ;// leftmost black pixel 2~(~Z54~%
float pt2 = ptO;// rightmost black pixel for (ibl = l; ibl < nbl; ibl++) {
float xxx = pmap(bl[ibl].x, bl[ibl].y);
ptO = fmin(ptO, xxx);
S pt2 = fmax(pt2, xxx);
}
// The points we just calculated are centers of parallelogram boxes // C~lc~ te how much the box sticks out from there.
pt2 += .5*fabs(theta) + .501;//.001 to catch roundoff errors 10 ptO -= .5*fabs(theta) + .501;
float dsw = pt2 - ptO ;// deskewed width float kf = dsw / (p2-pO);// krunch factor // usually (but not always) less than 1 // (not used in further calculations) 15 float conwid = dsw + pkern - 1.;// pretend wider to make room for convolution float conhgt = q2 - qO + qkern - 1;// and taller, also for convolution float qtmid = qmid + (qkern-l.) / 2.;// height of middle of char float ptmid = pmid + ptO + .5*conwid + theta*(qkern-1.)/2.;// p coord of middle of char float fat = conwid / conhgt;// fatness ratio of the parallelogram 20 float dfat = des.x / (float) des.y;// desired fatness ratio float nfat = fat / dfat;// norm~li7e~1 fatness ratio if (info) fprintf(stdout, "%s w: %d, h: %d, th: %5.2f, dsw: %5.2f, kf: %5.2f, nfat: %5.2fO, sname, p2-pO, q2-qO, theta, dsw, kf, nfat);
25 ~DI
Il calculate the coefficients of the linear transformation:
float dOO, dOl, dO2, dlO, dl 1, dl2;
if (inflate) { // old "inflationary" scheme float fif = nfat > fcorn ?
l./nfat: l./fcorn;// fatness increasing factor dlO = 0.; I/ pure skew dl 1 = (des.y-yO) / conhgt;// make output char fill its box vertically 2~0254Z
-dOO = dl 1 * fif;
dOl = - dl 1 * fif * theta;
dO2 = xrI~id - dOO*ptmid - dOl*qtmid;// center point is fixed point dl2 = ymid - dlO*ptmid - dl l*qtmid;
5 }
else { // never inflate scheme float ygrow = (des.y-yO) / conhgt;
float xgrow = (des.x-xO) / conwid;
float dogrow = fmin(xgrow, ygrow);// grow as little as possible // i.e. shrink so BOTH fit dogrow = fmin(l.O, dogrow);// but NEVER really inflate dlO = O; // pure skew dl 1 = dogrow;// shrink y dOO = dogrow;// and x equally 15 dOl = -dogrow * theta;
dO2 = xmid - dOO*ptmid - dOl*qtmid;// center point is fixed point dl2 = yrnid - dlO*ptmid - dll*qtmid;
}
// make the output space all white 20 int xx, yy;
for (yy = yO; yy < des.y; yy++)( for (xx = xO; xx < des.x; xx++) {
xyflt[yy][xx] = 0.;
}
25 }
// and start beque~thing blackness int spip - imax(l, int(lO*dl 1));// spip == scans per input pixel float step = pkern / spip;
float offset = step / 2.;
30 int ii;
float fy, fq;
int iy;
float fxO, fx2;
int ixO, ix2;
2~5~2 float rxO, rx2;
#define oops(X) {fprintf(stderr, "oops (X) %s %f %fO, sname, fxO, fx2); continue;}
for (ibl = O; ibl < nbl; ibl++) ~// loop over black input pixels pp = bl[ibl].x; qq = bl[ibl].y;
S fxO = dOO*pp + dOl*qq+dO2;// start of scan line (in x space) fx2 = dOO*(pkern+pp) +dOl*qq+dO2;// end of scan line (..) ixO = int(fxO);// integer parts ix2 = int(fx2);
rxO = fxO - ixO;// rern~in~rs 10 rx2 = fx2 - ix2;
if (ixO < O) oops(l) ;// error checks & defence if (ixO >= des.x) oops(2);
if (ix2 < O) oops(3);
if (ix2 >= des.x) oops(4);
15 for (ii = 0; ii < spip; ii++)~// loop over scan lines per input pixel fq = qq + offset + step*ii;
fy = dlO*pp + dl l*fq + dl2;
iy = (int) fy;
if (iy < O) oops(S);
20 if (iy ~= des.y) oops(6);
xyflt[iy][ixO] -= rxO;
xyflt[iy][ix2] += rx2;
for (int jj = ixO; jj < ix2; jj++) xyflt[iy]rjj] += 1.0; 5 }
}
}
// clip the bottom off of the gray-scale // using questionable threshold scheme 30 1/ first, find blackest output pixel float zmax = O;
for (yy = yO; yy < des.y; yy++ ) {
for (xx = xO; xx < des.x; xx++ ) ~
zmax = fmax(zmax, xyflt[yy][xx]);
`
Z~25~2 -// then clip away....
for (yy = yO; yy ~ des.y; yy++ ) S for (xx = xO; xx < des.x; xx++ ) float zz = xyflt[yy][xx];
int gr = int(9.9 * zz / zrnax);
if (gr ~ mingray) gr = O;
des.pix[yy][xx] = gr;
10 }
}
// put away all our toys:
delete2(xyflt);
15 }
The article briefly describes fourteen different known skeletonization algolillLlls, and then proposes its own algorithm (SPTA). All of the described skeletonization25 algc.~ llls, inclu-ling SPTA, are based on the concept of passing over the image a square window of three rows and three columns (cunllllonly referred to as a 3x3 window). As the square 3x3 window is passed across the image, the algolilllllls evaluate the 8 pixel neighborhood surrounding the center pixel and, based on theevaluation, either convert a black center point to white, or leave it unaltered.Pattern classification received a boost from another direction with recent advances in the field of connectionism. Specifically, highly parallel computa~ion networks ("neural networks") have come to the fore with the work by Hopfield, disclosed in U.S. Patent 4,660,166, issued April 21, 1987. Also, work continued on robust learning algoli~llllls for multi-layered networks in which "hidden" layers of 35 neural elements perrnit separation of arbitrary regions of the feature space. This work, reported on, inter alia, by Gullichsen et al. in "Pattern Classification by Neural 200~542 Networks: An Experimental System for Icon Recognition", Proceedings of the IEEE
First Intemational Conference of Neural Networks, pp. IV-725-732, Cardill et al., Editors, concentrates on the character classification process. The system they describe uses some image preprocessing but not feature extractions. Instead, they rely entirely 5 on the inherent classification intelligence that the neural networks acquire through the "back propagation" training process. The reported system apparently works, but as suggested by the authors, many questions remained to be investigated. Thesystem's performance is less than acceptable.
There exist many other character classification techniques, 10 approaches, and algorithms. For purposes of this disclosure, however, the above references provide a reasonable description of the most relevant prior art. Suffice it to say that with all the efforts that have gone into solving the character recognition (i.e., classification) problem, the existing systems do not offer the accuracy that would permit a reliable automatic character recognition to be 15 implemented.
Summary of the Invention Our invention offers a highly effective OCR system by selecting the right combination of techniques, and modifying those techniques for increased performance in terms of speed and accuracy. The general approach of our system 20 is to remove meaningless variability and capture meaningful variability before attempting to classify. Specifically, in accordance with the principles of our invention, characters to be recognized are scanned to form a pixel image, and the image is scaled and "cleansed." The scaled and cleansed image is deskewed and skeletonized to form a thinned image, and the thinned image is applied to a neural 25 network that extracts features of the image and forms an image map. Features are then combined to form "super features". Thereafter, the image is coarse blocked to reduce the dimensionality of the resulting feature vector. The feature vector is, lastly, applied to a classifier that identifies the character.
In accordance with one aspect of the invention there is provided a 30 method for classifying an image, said method including a step of processing signals corresponding to said image to develop a processed image, a step of feature extraction to develop features of said processed image, and a step of classification CHARACTERIZED BY: said step of processing includes a substep of skeletonization of said image and a substep of deskewing of said image.
2002~42 `~ - 3a -In accordance with another aspect of the invention there is provided a method for classifying characters in an image COMPRISING THE STEPS OF:
capturing said image in ~he form of an array of picture elements; processing said image to scale said array of picture elements to a chosen size and to delete extraneous image portions from said image, developing thereby a processed image;deskewing and skeletonizing said processed image to develop a thinner and a moreupright version of said image; extracting features from said thinner and more upright image that characterize said image and developing feature identifiers torepresent features of said image and their locations; coarse blocking said feature maps to develop more granular feature maps; and classifying said granular feature maps as representing one of a known set of symbols, or representing an unknown symbol.
In accordance with yet another aspect of the invention there is provided apparatus for identifying characters embedded in an applied image comprising: first means for capturing said image in the form of a matrix of picture elements; second means responsive to said first means for identifying the presence of a character in said image and conditioning said character by deskewing and skeletonizing said image to develop a conditioned image; third means responsive to said second means for extracting features of said conditioned image to develop feature maps pertaining to specified features of said image; and fourth means responsive to said third means for developing coarse blocked versions of said feature maps; and means for classifying said coarse blocked version of said feature maps as representing one of a known number of characters or an unknown character.
Brief Description of the Drawin~
FIG. 1 presents a general flow diagram of the classification method of our invention;
FIG. 2 presents an example of a problem resulting from use of independent 3x3 templates;
FIG. 3 shows the set of thinning templates used with our invention, which includes templates greater than 3x3;
S~2 FIG. 4 depicts a set of feature extraction templates;
FIG. 5 illustrates the structure of a neural network decision circuit used in connection with the templates of FIGS. 3 and 4;
FIG. 6 depicts the structure of a two-layer neural network with analog-5 valued connection weights; and FIG. 7 illustrates one re~li7~tion for an analog-valued connection weights neural network.
Detailed Description FIG. 1 presents a flow chart of our process for character or symbol 10 cl~sific~tion. In block 10, the character image is ca~ d and, advantageously,stored in a frame buffer such as a semiconductor l~mol ~. The image may be obtained through electronic tr~nsmi~siQn from a remote location, or it may be obtained "locally" with a scanning camera. Regardless of the source, in accordance with conventional practice the image is l~plesen~ed by an ordered collection (array) 15 of pixels. The value of each pixel coll~,i,ponds to the light (hrightn~ss, color, etc.) em~n~ting from a particular small area of the image. The pixel values are stored in the memory.
Smudges and extraneous strokes are often found in proximity to characters. Their presence cannot help but make the recognition process more 20 difficult. In accordance with our invention, block 20 follows block 10 and its function is to cleanse the image. This is the first step in our effort to removeme~ningless variability from the image.
Usually, an image of a symbol or a character, such as a digit, contains one large group of pixels (contiguous) and a small number, possibly zero, of smaller 25 groups. Our cleaning algorithm basically identifies all such groups and deletes all but the largest one. If the deleted groups, together, con~tihlte more than a certain percentage of the original image, this fact is noted for later use, since it indicates that the image is anomolous. Tn the context of this description, it is ~sllme~l that the image symbols are composed of dark strokes on a light background. A "reversed"
30 image can of course be handled with equal facility. The above cleaning algc~
also assumes that the symbol set that is expected in the image does not contain symbols that call for disjoint strokes. The digits 0-9 and the Latin alphabet form such sets (save for lower case letters i and j), but most other alphabets (Hebrew, Chinese, Japanese, Korean, Arabic, etc.) contain many disjoint strokes. For such35 other sets a slightly differellt cleansing algorithm would have to be applied, such as looking at each disjoint area, rather than at the whole collection of such areas.
2~0~S42 There are a number of processes that can be applied to detect and identify these extraneous areas. The process we use resembles a brush fire.
In accordance with our process, the image is raster-scanned from top to bottom in an effort to find "black" pixel groups. When such a group is found (i.e., 5 when a black pixel is encountered that has not been considered before), the scanning is suspended and a "brush fire" is ignited. That is, the encountered pixel is marked with an identifiçr~ and the marking initiates a spreading process. In the spreading process, each of the eight immetli~ely neighboring pixels are considered. Those neighboring pixels that are black are similarly marked with the identifier, and each 10 m~rking initi~tes its own spreading process. In this manner, the first encou-lleled pixel of a "black" group causes the entire group to be quickly identified by theselected identifier. At this point in the process, the scS~nning of the image resumes so that other groups can be discovered and identified (with a dirr~ t identifier). When sc~nning is completed and all of the "black" areas are identified, area calculations 15 can be carried out. As indicated above, all but the largest group is deleted from the image (i.e., turned from dark to light, or turned OFF).
It may be noted at this point that in the character recognition art, it is more important to not make a mistake in identifying a character incorrectly, than to refuse to make a decision. For that reason, in a system that is ~lesigned to identify 20 numerals or other character sets that do not have disconnected strokes, the area removal threshold should be set to a fairly low level.
Ordinarily it is expected that the pixels comrrising the m~ningful part of the image will be contiguous in a strict sense (in the arole-..P~-tioned 0-9 character set and the Latin alphabet). On the other hand, an exception should be made, 25 perhaps, when areas are separated only slightly, and external inro""ation leads one to believe that it is possible for a character stroke to be inadvertently broken (such as when writing with a poor pen or on rough writing surface). To provide for such contingencies, our process for spreading the "fire" includes an option for defining the neighborhood to include eight additional pixels that are somewhat removed from the 30 eight imm~li~te pixels (the eight pixels being corners of a larger window and center pixels of sides of the larger window). In effect, we permit the "fire" to jump over a "fire break."
The process of scaling the image to a given size, in block 25, follows the cleansing process. Scaling, of course, removes a m.o~nin~less variability of the35 image. The sequence of cleansing followed by scaling is imposed by the desire to not scale the image that includes smudges. The scaling process can use any one of a 2S4~
number of dirr~ t algorithms. For example, in accordance with one alg~ hlll the image can be scaled in both dimensions by an equal factor, until one of the image dimensions reaches a fixed size. Another algoli~hm can scale independently in the two ~limen~iQns, subject to some constraint on the largest difference in the scaling S factors of the two ~limen~ions. Both approaches work well and, thcl~role, the choice of the algorithm and its implementation are left to the reader. We scale each character image into a convenient array size, such as 18x30.
People generally write characters at a slant. The slant is different from one person to another. The slant, or skew, of the characters is another variability of 10 written characters that carries no information. We wish to remove this variability.
Returning to FIG. 1, block 30 which follows block 25 deskews the image. Stated dirrel~nlly, it is the function of Block 30 to make all characters more uniformly upright.
Block 30 can use any one of a number of conventional procedures for 15 deskewing an image. One such procedure subjects the image to a transformation of the form rUl 1 -mxy /myy x - xO
Lv~ - 0 1 y - yO ~
where x and y are the original coordinates of the image, xO and yO define an origin point, u and v are the coordinates in the transformed image, and mxy and myy are the 20 image moments calculated by mxy = ~(x-xo)(y-yo)B(x~y) x,y and myy = ~,(y-yO )2B(x,y) .
xy In the above, B(x,y) assumes the value 1 when the pixel at position x,y is "black", 25 and the value 0 otherwise. The effect of this function is to reduce the xy moment to essentially 0.
Scaling (block 25) and deskewing (block 20) are both linear transformations. Therefore, the composition of the two is also a linear transformation. It may be advantageous to apply the compound transformation to 30 the cleansed image to produce the deskewed image directly. This combined operation allows us to avoid an explicit representation of the scaled image as an array of pixels. This elimin~tes a source of (computation) noise.
2~ZS42 _, Block 40, which in FIG. 1 follows block 30, thins the image. Thinning of the image also removes m~ningless variability of the image. As indicated above, the prior art methods for skeletonization use a 3x3 window that is passed over the image. The center point of the 3x3 window is turned OFF if certain conditions are S met; and those conditions, in most of the m-~thofl~, involve repeated tests with dirrcl~nt predefined window conditions. For example, the Ben-Lan and Montoto algo.ithlll states that a dark center point is deleted (i.e., turned OFF or turned light) if it satisfies the following conditions:
1) the pixel has at least one light 4-neighbor, and 10 2) the neighborhood does not match any of 8 predefined 3x3 windows.
A 4-neighbor is a pixel that is east, north, west, or south of the pixel under consideration.
Algorithms like the one described above are quite acceptable in software implementations beca-lse, until recently, processors were able to handle only one 15 task at a time anyway. However, these algoli~s are necess~rily slow because of their sequential nature. Furthermore, each of these prior art tests zeroes in on a certain characteristic of the pattern, but not on other characteristics. To thin strokes of dirr~le.lt character (i.e., vertical lines and horizontal lines) dirr~ t tests must be applied. Additionally, with prior art tests there is a need to p~rOllll at least some of 20 these tests sequentially before one can be sure that a particular pixel may be delete~l;
and the pixel cannot be turned OFF until these tests are performed. The example of FIG. 2 illustrates the problem.
In FIG. 2, templates 100 and 110 are two 3x3 pixel windows. The three top pixels in template 100 are circle-hatched to design~te searching for ON pixels.
25 The center pixel, and the pixel in the center of the bottom row are crosshatched to design~te searching for OFF pixels. The ~ i"i-lg pixels are blank, to design~te a "don't care" condition. Template 100 searches for the edge condition of light space (pixels 101, 102, and 103) above dark space (pixels 104 and 105), with the caveat that the dark space must be at least two pixels thick. When such a condition is 30 encountered, according to template 100 the center pixel (104) is turned from ON to OFF (dark to light). Thus, template 100 provides a mechanism to nibble away froman ON area, from the top, until there is only one ON row left.
Template 110 operates similarly, except that it has the bottom row looking for OFF pixels while the center pixels of the first and second row are looking 35 for ON pixels. Template 110 nibbles ON (dark) areas from the bottom.
2¢~542 -The above templates which thin horizontal lines and do not thin vertical lines illustrate the desirability of passing a number of dirr~rellt templates over the image, with the difr.,r~llt templates being sensitive to dirr~rent characteristics of the image. It is also desirable (from a speed standpoint) to pass the various templates S concu,~ ly. However, in the FIG. 2 image segment 106, templates 100 and 110 cannot be applied concurrently because, if that were done, the depicted 2-pixel wide horizontal line would be completely çlimin~ted The top row would be deleted by template 100, and the bottom row would be deleted by template 110.
If line thinning is to be performed efficiently, this interdependence 10 between dirferent templates must be broken.
We found that, unexpectedly, this interdependence can be broken by employing a window that is greater than 3x3. Hence, we use a template set which contains at least some templates that are greater than 3x3. Some are 3x3, some are 3x4, some are 4x3, and some are 5xS The characteristic of the set is that the 15 templates can be passed over the image concull~ntly. This capability comes about from the particular selection of templates, which allows the image to be altered in response to one template without having a deleterious effect on the ability of another template to independently alter the image. This fairly unique set of templates is shown in FIG. 3.
We discovered that the set of templates depicted in FIG. 3 is a sufficient set. Other sets are possible, of course, but, in accordance with our inventions, such sets are characterized by the inclusion of at least one template that is greater than 3x3.
To describe the operation of the depicted templates, we start with templates 120 and 140. These templates correspond to templates 100 and 110 of FM. 2. Template 120 is shown as a SxS array but, in essen~e, it forms a 3x3 window since the outer columns and rows are at a "don't care" condition. Template 120 differs from template 100 in that pixels 121 and 122 in template 120 test for ON
pixels, whereas the correspondingly positioned pixels in template 100 are set to30 "don't care". That is, template 101 makes sure that the pixel nibbled away (turned light) is above a line that extends in both directions. Template 140, on the other hand, differs from template 110 in that, effectively, it is a 3x4 template. It includes a 3x3 portion that is similar to the 3x3 template 110 ~other than pixels 141 and 142), and it also includes a pixel 143 at the center of the first row. Pixel 143, in effect, 35 requires a horizontal line to be 3 pixels wide before a pixel is pe, . "itled to be nibbled away (from the bottom).
20~2542 -g Templates 130 and 150 form a template pair like the template pair 120 and 140. Templates 130 and 150 thin vertical lines. Templates 160, 170, 180, and190 thin "knees" pointing to the right, left, up and down, l.~s~ ely; templates 200, 210, 220 and 230 thin slanted lines from above and from below; etc. It may be noted 5 that templates 160-230 are all 5x5 templates.
Returning to FIG. 1, Skeletonization block 40 is followed by feature extraction block 50. Although operationally similar, skeletonization is dirr.~ tfrom feature extraction from a functional standpoint. In the former, one identifies superfluous pixels and turns them from dark to light. In the latter, one itlentifies 10 relatively macroscopic characteristics that help classify the character. The macroscopic characteristics identified are the kind of characteri~tics that are not dependent on the size or thickn~oss of the character, but are the ones that give the character its particular "signature." Hence, it is these characteristics that block 50 seeks to identify Operationally, feature extraction is accomplished by passing a collection of windows over the image. Each window in our system is a 7x7 template, and eachtemplate detects the presence of a particular feature; such as an end point, diagonal lines, a horizontal line, a vertical line, etc. The detection works by a majority rule in the sense that when the majority of the 49 pixels (7x7) fit the template, it is 20 concluded that the feature is present. In our system we employ 49 dirr.~ellt 7x7 templates, as depicted in FIG. 4. For each of the templates we create a "feature map"
which basically indicates the coordinates in the image array where the pattern of the template matches the image.
Having developed the 49 feature maps corresponding to the 49 25 templates of FM. 4, we develop a number of super-feature maps in block 60 that are logical combinations (AND and OR) of the feature maps. We thus reduce the set from 49 to 18 maps (of 18x30 pixel arrays). This reduced number of maps has beendetermined heuristically.
We call the arrangements of the detected features "maps" because we 30 structure an array (in the memory where we store them) and we place the feature detections in the ap~ropliate locations in the array. In this manner we record the presence of a feature and its location. Other mech~ni~m~ for recording "hit" location design~tions can be used, but it is still conceptually simpler to think in terms of maps.
2~ 2 _ It turns out that the 18x30 array is too detailed for classification purposes. The detail can actually mask the character and make the classification task more difficult (as in the saying "you can't see the forest for the trees"). Accordingly, block 70 performs coarse blocking to reduce the 18x30 feature maps to feature maps 5 that are only 3xS. This results in a final map or vector of 270 bits, which corresponds to 18 3xS maps.
Lastly, block 80 performs the classification algorithms to determine, from the given 270 bits, the most likely classification c~n~ te. A simple algoli~l,m, such as de~e~ inil~g the lowest E~mming distance. will suffice once it is 10 known what templates most likely correspond to the characters the are to be idenfifie~1 The key, of course, lies in determining these templates; and that aspect calls for the learning methodologies (such as back propagation) that the art is currently dealing with.
Hardw~e embodiment Although FIG. 1 depicts the process of our OCR system, it is also quite representative of the haldwa~e re~li7~tion. The actual details of the signal flow would vary with the particular design, but that is perfectly well within the conventional circuit design arts. For purposes of the following discussion, it may be considered that our system opel~les in a pipelined fashion and that each electronic 20 circuit block applies the necess~ry signals and controls to the following circuit block, together with the necess~ry identifiç~tion as to which pixel is being considered.
As suggested earlier, block 10 compri~es conventional a~pal~tus that is tailored to the particular source of the image to be cl~cifieA It can simply be a video camera coupled to a collll~ .;ial "frame grabber" and a memory. When the 25 cl~ssifi~afion process begins, the mellluly is accessed to retrieve the center pixels and the 24 neighboring pixels, and the collection of retrieved signals is applied to block 20.
Blocks 20 and 30 are currently implemented on a SUN workstation with the simple programs presented in the appendix. Local memory is included with the30 microprocessors to store image signals and lelllpol~y computation results, asnecessary. Practically any microprocessor can be similarly lltili7od, but if higher speed is required than is obtainable with a microprocessor, specific haldwa~e can be designed in a conventional manner to carry out the needed calculadons. In fact, since the operations required are merely additions, subtractions, comparisons, and 35 rudimentary multiplications, a pipelined architecture can easily be designed that I I _ offers very high throughputs.
The output of block 30 is a sequence of signal sets, each having an associated center pixel and its neighboring pixels. Block 40 is implemented with the neural network of FIG. 5 which includes a series connection of a switch 400, a 5 template match network 410, and a threshold network 420. The input signals, which correspond to the 25 pixel values of the image covered at any instant by the SX5window are applied to switch 400 at inputs 410. Switch 410 insures that these values are applied to network 410 cim~llt~neously. Network 410 includes 25 inputleads and a number of output leads that equals the number of templates stored.
10 Within network 410, all input leads are connected to each output lead through a column of preset connection nodes. Each such column of connection nodes (e.g. the column cont~ining nodes 411-414) coll~,;,ponds to a stored template. Thus, the signal of each output lead represents the affinity of the input signal to a dirre.ent template. More specifically, the connection nodes are of three "varieties"; to wit, 15 excitatory (E), inhibitory (I), and "don't care" (D). Response to a match or a mi~m~tch differs with each of the v ri~ti~os~ in ccordance with the truth table below.
input synapse output Nodes 411 that implement this truth table are easily realized with gated amplifiers.
The information of whether a node is an E, I, or D node, can be stored in 25 a two flip-flop set associated with each node (when variability is desired).
Alternatively, the info~mation can be "hardwired" with an aIray of links associated with the array of nodes. The pro~ ing of the templates (i.e., connections) can be achieved through a burn-through of the applopliate links. Of course, if the templates are completely unchanging, one can design the template inrGl~nation directly into the 30 integrated circuit mask of the nodes' array.
The current of the output lines flows into an impedance, and the flow causes the voltage of each output line of network 410 to rise to a level that isplopv~lional to the degree of match between l's in the set of input signals and excitatory nodes. Of course, the voltage is also ~liminiched by the degree of match 35 between l's in the set of input signals and the inhibitory nodes.
2~)~)25~2 The output lines of network 410 are applied to threshold network 420, where that impedance can optionally be placed. Network 420 applies a set of thresholds to the output signals of network 410. Specifically, network 420 comprises a set of two-input amplifiers (e.g., 421-424) having one input responsive to the input 5 leads of network 420, and a numbe~ of sources (e.g., 425-427) that connect to the second input of amplifiers 421-424. Each of the sources supplies a dirre~ t current and, correspondingly, each amplifier 421-424 develops a voltage on its second lead that is related to the specific connection that its input has to sources 425-427. In this manner, dirrele,~t thresholds can be applied to the dirrtn,.~t amplifiers within network 10 420. The output leads of nc~wclL 420 are the outputs of amplifiers 421-424, and they take on the logic value 1 or 0, depending on whether the signal input of anamplifier exceeds the threshold or not.
Block 50 is constructed with a neural network such as the one depicted in FIG. 5. However, since the block 50 neural network deals with 7x7 templates as 15 compared to the 5x5 templates of block 40, a memory 55 is interposed between the two neural networks to buffer the data.
Block 60 generates the 18 feature maps. It simply takes the outputs of block 50 and, together with the signal that specifies the identity of the center pixel, stores the a~plupliate info~ ation in a memory. The result is 18 mellloly segments, 20 with each segment containing illfc.lmation about the fe&~ules found in the image.
Each such segment is, thus, one of our feature maps.
The coarse blocking of block 70 is achieved by using 18 additional smaller memory segments perhaps in the same physical memory device. In those smaller memory segmen~c, block 70 stores information about the features that are25 found in applûpliately selected portions of the larger memory segments. When the original image is 18 pixels by 30 pixels in size, the selection can be easily accomplished with a counter that operates in modulus 5, where the full value of the counter is used to access the larger segments, while the whole number after division by the modulus is used to identify the cells in the 18 smaller memory segments.
The 270 memory locations of the smaller memory segments form the output of block 70 and make up, in effect, a vector that describes the charactercontained in the image.
The last function that needs to be carried out is to apply this vector to some network that would select the most likely c~ntli~l~te character for the given 35 feature vector. This is the function of block 80.
ZS4~
Block 80 can be implemented in many ways. For example, the content-addressable teachings of Hopfield in the aforementioned U.S. patent 4,660,166 can be used to advantage. In accordance with his teachings, one can impart to the feedb~c~ network of his circuit the information about the characters in the subject 5 set. With such information in place, the content-addressable Illell~ulr identifies the feature vector of the character that is closest to the applied feature vector. The Hopfield network is very robust in making the "correct" choice even when the input appears to be quite distorted. It is a little difficult, however, to design the feedback network for the Hopfield circuit because all of the stored vectors are distributed 10 throughout the feedback network and commingled with one another. This difficulty is compounded by the fact that we do not exactly know how we recognize a "4", orthe limits of when we can recognize a "4" and when we are so unsure as to decline to make a decision. Yet, we know a "4" when we see one!
Current research attempts to solve this problem by having the classifier 15 circuit "learn", through trial and error, to reach the correct decisions. One structure that has the potential for such "learning" is depicted in FIG. 6. This technique is commonly referred to as "back propagation." It is described, for example, by D. E.
Rnm.olh~rt et al. in "T.e~rning Tnt~rn~1 Representations by Error Propagation", in D.
E. l~llmçlh~rt, J. L. McClell~n-l (Eds.), Parallel Distributed Proces~ing: Explorations 20 in the Microstructure of Cognition, MIT Press, 1986, Chap. 8.
FIG. 6 comprises interconnection networks 81 and 82 that are serially connected. The input signal set is applied at the input of network 81, and the output signal set appears at the output of network 82. Each of the networks has a plurality of input and output leads and each input lead is connected to all of the output leads.
25 More specifically, each input lead i is connected to each output lead j through a connection weight wij . In our application, network 81 has 270 input leads and 40 output leads. Network 82 has 40 input leads and 10 output leads. The number of input leads of network 81 is dictated by the length of the feature vector. The number of outputs of network 82 is dictated by the nu~ of characters in the classifying30 set. The number of intçrme~ te leads (in this case, 40) is determined heuristically.
Training of the FM. 6 circuit is carried out by applying a developed feature vector of a known character and adjusting the weights in both network 81 and 82 to maximize the output signal at the designated output lead of network 82 corresponding to the applied known character. All available samples of all the 35 characters in the set to be classified are applied to the network in this fashion, and each time, the weights in the interconnection network are adjusted to maximize the 2B02,5g2 signal at the appropliate output lead. In this manner, a set of weights wij is developed for both networks.
It may be ap~lv~iate to explicitly n~nlion that the connection weights wij are analog in nature and that the circuit opcldtes in an analog fashion. That is, 5 the voltage at any output lead of network 81 is a sum of the contributions of the "fired up" weights connected to that output lead. Each weight is "fired up" by abinary signal on the input lead to which the weight is connected. Thus, the output at lead j equals ~Biwii 10 where Bi is the value of the i'h input lead (0 or 1).
Though the concept of such a le~rning network is fairly well understood, the task remains to realize such an analog circuit efficiently and compactly. The re~ ents on such a circuit are not trivial. For example, the -,;ni-~----- weight change, or mo lific~tion, must be fairly small if optimi7~tion of the netwolk is to be 15 achieved. The iterative improvement mçtho~ology described above is based on the heuristic assumption that better weights may be found in the neighborhood of good ones, but that heuristic fails when the granularity is not fine enough. We found that for a small network 81, at least 8 bits of analog depth are n~cess~ry. Larger networks may require even finer granularity. The weights must also represent both 20 positive and negative values, and changes must be easily reversible. During the learning and training session the number of changes to the weights can be quite large. Therefore, a practical circuit must allow for quick modification of the weights.
Taking these and other requirements into account, we have created an 25 efficient analog connection weight, or strength, circuit with MOS VLSI technology.
Whereas each connection weight in FIG. 6 is depicted with merely a black dot. FIG. 7 presents a circuit for implem~nting these dots; or more particularly, FIG. 7 shows one connection weight circuit with its connection to input lines 83 and output line 84, as well as some col~ on circuitly. Primarily, the 30 interconnection weight portion of the FIG. 7 circuit includes capacitors 801 and 802, small MOS switches 803 and 804, a relatively large MOS transistor 805, a differential amplifier 806, and a multiplier 807. Secondarily, the circuit of FIG. 7 includes a charge-coupling switch 808, a sensing switch 809 and various control leads.
2~02S~Z
._ The circuit operates as follows. Capacitors 801 and 802 are charged to Lrrelellt voltage levels, and the difference in voltage levels is reflected in the output voltage of differential amplifier 806. Amplifier 806 has its two inputs connected to capacitors 801 and 802. The output of amplifier 806, which represents the 5 connection weight, is connected to multiplier 807. Multiplier 807 can be any conventional transcondllct~nce amplifier. Also connected to multiplier 807 is input lead 83 of the interconnection network. The output of converter 807 is connected to an output lead of the interconnection nelwolk. Thus, multiplier 807 sends a current to the output lead that is a product of the signal at the input lead and the value of the 10 connection weight. The connection weight is represented by the differential voltage developed by amplifier 806 in response to the Lrrel~ ce in voltages between capacitors 801 and 802.
We have found that the difference in voltages on capacitors 801 and 802 is m~int~inçd for a long time (relative to the operations involved in OCR systems) 15 and that no refreshing is necess~ry when the circuit is kept at reasonably low tempeldtulcs. For example, at 77 degrees Kelvin no detectable loss has been observed with time. It may be noted that one advantage of our circuit is that the weight is plopulLional to Vc#o, - VC802 and, therefore, even a loss in charge -- when it is the same at both capacitors -- results in no change to the weight.
Nevertheless, an avenue must clearly be provided for refreshing the information on capacitors 801 and 802. Moreover, an avenue must be provided for setting a voltage (charge) value on capacitors 801 and 802 and for modifying the set values to allow for the above-described "learning" procedure. This is where the rem~ining switches and controls come in.
To bring a connection weight to a desired level, switch 808 is closed moment~rily to allow a fixed voltage level to be applied to capacitor 801 from voltage source 816. That voltage coll~onds to a fixed charge. Thereafter, switch808 is turned off. At this point, the weight of the connection is at a m~ximllm positive level because capacitor 801 is connected to the non-inverting input of amplifier 806 and carries a positive voltage, while capacitor 802 is connected to the inverting input of amplifier 806. A change in the connection weight is accomplished in the following way.
First, transistors 803 and 805 are turned on. Transistor 803 is very small compared to transistor 805 and for the sake of a better understanding of what 35 happens, transistor 803 can be thought of as being merely a switch. By comparison, transistor 805 is long and narrow and when it is on it can be thought of as a 21~)2S42 `_ capacitor. When switch 803 is closed and transistor 805 (assuming it is an n channel device) is turned-on, the charge on capacitor 801 is distributed between the capacitor (801) and the inversion charge on the turned on transistor 805. Transistor 803 is then turned off, thereby trapping the charge in transistor 805. Transistor 804 is then 5 turned on and if transistor 805 is slowly turned off, the mobile charge in its channel will diffuse through switch 804 into capacitor 802.
The above steps thus move a quantum of charge from capacitor 801 to capacitor 802. That corresponds to a change in the capacitors' voltages and in the intercolmection weight.
The above sequence can be repeated as many times as necessary to bring the connection weights to the desired levels. In this manner, the optimization of the connection weights can proceed during the training period, with the result that each inter~onnection weight in nelwolk~ 81 and 82 is set to the correct level.
The above description addresses the training aspect of the circuit. Once 15 the learning process is over, means should be provided for 1) de~ ....ining the values of the weights and 2) refreshing the weights to compensate for loses with time, etc.
This is accomplished with the aid of sensing switch 809, an A~D converter, a D/Aconverter, and a non-volatile memory.
To ~letçrmine the value of the weights in an interconnection network, all 20 of the input leads are turned on, one at a time. Each time a lead is turned on, the sensing switches (809) of the weights connecte~l to that input lead are sequentially turned on to allow each amplifier's voltage to appear on sensing bus 810. That voltage is applied to A/D converter 811 and the resulting digital information isstored in memory 812. All of the weights are converted to digital form in this 25 manner and stored in memory 812. During a refresh operation, each connection weight is isolated in the manner described above, but this time the voltage output on sensing bus 810 is colllp~d in amplifier 814 to the analog voltage of D/A converter 813, to which the digital output of memory 812 is applied. Of course, memory 812is caused to deliver the digital output that corresponds to the refreshed connection 30 weight. Based on the comparison results, the sequence of switching elements 803, 804, and 805 is controlled by the output signal of amplifier 814 to either increase or ~limini.~h the voltage of capacitor 801 relative to capacitor 802. The control of directing the output of bus 810 to either A/D converter 811 or to colllp~dtor amplifier 814 is effected by switch 815. Should it be necess~ry to completely 35 discharge both capacitors 801 and 802, the voltage of source 816 can be reduced to zero and switches 803, 804, and 805 can be turned on.
2~ 542 APPENDIX
~
Il // fire.c 9nl88 LDJ
// check for broken images // returns -1 if completely blank // returns 0 if connected // returns 1 if connecte~l except for samll flyspecks 10 // returns 2 if badly disconnecteA
// uses recursive brush_re algorithm // Diagnostic output: prints number of segments, // and location and code of the largest // Side effect: sets up Lseg (in rec2com) 15 // IMPORTANT ASSUMPTION: firel assnmçs img.pix black pixels are POSITIVE
// and white pixels are zero ---// If you can't guarantee that, call fire() instead of fire() // negative pixels will cause trouble // This routine mofiifies img.pix ! !
20 #include ~stdio.h~
#include "rec2types.h"
#include "errl.h"
#include "fire.h"
inline int imin(int a, int b) {return(a~b? a: b); }
25 inline int imax(int a, int b){return(a>b? a: b);}
static int xdl; // copy of img.x static int ydl; // and img.y - 2~025~Z
static char** pixl;// and img.pix static Pair* pl;
static Pair* p2;
static int list_size = -1;
5 static Seg myseg;// segement being processed Seg Lseg; // longest segment there is static int nseg;
static int totpix;// total pixels in image int firel(Image img){
// make sure we have allcated enough room to keep our pair-lists:
int lsiz = img.x * img.y;// max possible size of list required if (lsiz > list_size) {
if (pl) {
15 delete pl;
delete p2;
}
pl = new Pair[lsiz];
p2 = new Pair[lsiz];
20 list_size = lsiz;
}
Lseg.list = 0; // no longest segment yet Lseg.size = 0;// first seg will beat this for sure nseg = 0;
25 totpix = 0;
// find first black pixel, so we can initiate the burn there:
int xx; int yy;
for (yy = 0; yy < img.y; yy++) {
for (xx = 0; xx < img.x; xx++) 30 if (img.pix[yy][xx] > 0) {
2~0ZSa~2 11/ fprintf(stderr, "firstx = %d firsty = %d 0, xx, yy);
// a lot of these things might logically be ar~,un~nt~ to burn(), // but static variables are faster & simpler nseg++; // count this segment myseg.ashes = -nseg;
myseg.list = (Lseg.list != pl) ? pl: p2;
myseg.size = O;
xdl = img.x; ydl = img.y; pixl = img.pix;
burn(xx, yy);// burn, baby, burn!
if (myseg.size > Lseg.size) Lseg = myseg;
) }
#ifdef TEST
15 fprintf(stderr, "Saw %d segmentsO, nseg);
if (nseg) fprintf(stderr, "Longest (code %d) starts at %d %dO, Lseg.~hes, Lseg.list[O].x, Lseg.list[O].y);
#endif if (nseg == O) return - l;
20 if (nseg == 1) return 0;
float frac = float(Lseg.size) / float(totpix);
const float minfrac = .9;
if (frac ~= minfrac) return l;
return 2;
}
// the m~gic~l recursive burning routine // turns to ashes all points that are 8-connected to the initial point void burn( 30 int xcent, int ycent // center of 3 x 3 region of interest ){
- 2~S~2 // if this point is off-scale:
if(xcent<0 ycent<0 xcenv=xdl ycenv=ydl) return;
// NOTE: this is indeed a check for > 0, // not just nonzero, so things don't burn twice.
5 if(pixl [ycent][xcent] > 0) {
int top = myseg.size++;// keep track of length of segm~nt totpix++; /I count total pixels pixl[ycent][xcent] = myse~ ~hes // turn this point to ashes myseg.list[top].x = xcent;
10 myseg.list[top].y = ycent;
burn(xcent+l, ycent+l);// ignite neigbors burn(xcent+l, ycent );
burn(xcent+l, ycent-l);
burn(xcent, ycent+l);
15 burn(xcent ,ycent-l);
burn(xcent-l, ycent+l);
burn(xcent-l, ycent );
burn(xcent-l, ycent-l);
#define jumpbreaks YES
20 #ifdef jumpbreaks int jump = imax(xdl, ydl);
jump = jump / 20;
if (jump < 3) jump = 3;
if(jump>l)~
25 burn(xcent+jump, ycent+jump);// ignite more distant neigbors burn(xcent+jump, ycent );
burn(xcent+jump, ycentjump);
burn(xcent, ycent+jump);
2~0~54Z
burn(xcent, ycentjump);
burn(xcentjump, ycent+jump);
burn(xcentjump, ycent );
S burn(xcentjump, ycentjump);
}
#endif // if this point NOT set, or already burned, do nothing 10 return;
}
~/////////////////////////////////
// same as above, but does not assume that black pixels are positive;
// non-zero suffices.
15 int fire(Image img) {
for (int yy = 0; yy < img.y; yy++) ( for (int xx = 0; xx < img.x; xx++) {
img.pix[yy][xx] = img.pix[yy][xx] != 0;
}
20 }
return ( firel(img) );
}
****************************************************~*********
// do most of the work for linear transformation program 2~ 542 // p~rO~ linear transformations on post of fice data // i.e. convert to standard size and aspect ratio // see lin.plan for extended discussion S // Note: xypix[][] will contain small integers 0..9 // for graylevels below threshold, you get zero;
// for graylevels above threshold, you get the graylevel number // This gives you the option of treating it as a boolean // if you don't care about graylevels.
10 // Caller allocates the array; we fill it.
#include <stdio.h>
#include <m~th h>
#include "new_array.h"
#include "/nets/utiVutil.h"
15 #include "rec2types.h"
#include "do_lin.h"
inline float fmin(float a, float b){return(a<b? a: b);}
inline float fmax(float a, float b)lreturn(a>b? a: b);) inline int imin(int a, int b){return(a<b? a: b);}
20 inline int imax(int a, int b){return(a>b? a: b);}
void do_lin( const Image rawJ/ input image const int known_fit,// 1 ==~ char already fills box pdim by qdim.
25 Image des, // result: array of small integers FILE* param_fp,// parameter file filepointer /l O ==> all parameters take default values const char* sname// filename, for informational messages ~025~2 // provide "" if you can't do better )( Pair* bl;
bl = new Pair[raw.x*raw.y];// safe; probably overestimate # of black pixels 5 int ibl = 0;
int pp, qq;
for (qq = 0; qq < raw.y; qq++) {
for (pp = 0; pp < raw.x; pp++) if (raw.pix[qq][pp]) {
bl[ibl].x = pp;
10 bl[ibl].y = qq;
ibl++;
}
}
do_linl(raw.x, raw.y, bl, ibl, known_fit, des, param_fp, sname);
lS delete(bl);
_~W
void do_linl( const int pdimJ/ size of input array 20 const int qdimJ/ ..
const Pair* bl,// input: list of black pixels const int nbl, // size of said list const int known fit,// 1 ==> char already fills box pdim by qdim.
Image des, // result: array of small integers 25 FILE* param_fpJ/ parameter file filepointer // 0 ==> all p~lleters take default values const char* sname// fil~-.n~m~, for infc,lmational messages // provide "" if you can't do better ){
2~2`5~2 -Fget(kernel, 2);// convolution kernel (units of PQ rows/cols) Iget(mingray, 3);// this or larger: return graylevel, else return zero Fget(fcorn, .7);1/ fatness corner Iget(info, 0); // report miscellaneous information S Iget(inflate, 0);//1 ==> small chars can grow, fatness can change float pkern = kernel;
float qkern = kernel;
const int xO = 0;
const int yO = 0;
10 const float xmid = (des.x - xO) / 2.0;
const float ymid = (des.y - yO) / 2.0;
~D~a~
// find raw bounding box int pO, qO, p2, q2;
15 int ibl;
int pp, qq;
if (known_fit) {
pO = 0; qO = 0;
p2 = pdim; q2 = qdim;
20 } else {
pO = bl[O].x; qO = bl[O].y;
p2 = pO, q2 = qO;
for (ibl = 1; ibl ~ nbl; ibl++){
pp = bl[ibl].x; qq = bl[ibl].y;
25 pO = imin(pO, pp);
p2 = imax(p2, pp);
qO = imin(qO, qq);
q2 = imax(q2, qq);
}
30 p2++; q2++;
}
2t9~2S~2 -// calculate some moments:
float** xyflt = new_float(des.y, des.x);
// note that we are treating the pixels as BOXES of ink, not points Il so the (0,0) pixel extends from (0,0) to ( l-eps, l-eps) 5 ll and has its center at (.5, .5) float pmid = (p0 + p2) / 2.0;
float qmid = (q0 + q2) / 2.0;
I/ but if we shift the middle half a bit, t/ we can pretend the (0,0) pixel is centered at (0,0) 10 float pmx = pmid - .5;
float qmx = qmid - .5;
float mpq = 0.;/l PQ moment float mqq = 0.;// QQ moment for (ibl = 0; ibl < nbl; ibl++) {
15 pp = bl[ibl].x; qq = bl[ibl].y;
mpq += (qq - qmx)*(pp - pmx);
mqq += (qq - qmx)*(qq - qrnx);
}
float theta = mpq / mqq;
20 // Note: since pixels are numbered from UPPER left, Il positive theta corresponds to "
// negative theta corresponds to "/"
// (pt, qt) is the coordinate where (p,q) will go when the char is deskewed.
25 // This is not quite the same as (x,y) since the latter // has size changes as well.
Il Calculate min and max horiz coordinates, // measured relative to a line parallel to the sides of the parallelogram // [the reference line goes through (pmid, qmid) 30 // which is the middle of the raw PO rectangle]
#define pmap(p,q) (p-pmx - (q-qmx)*theta) float pt0 = pmap(bl[0].x, bl[0].y) ;// leftmost black pixel 2~(~Z54~%
float pt2 = ptO;// rightmost black pixel for (ibl = l; ibl < nbl; ibl++) {
float xxx = pmap(bl[ibl].x, bl[ibl].y);
ptO = fmin(ptO, xxx);
S pt2 = fmax(pt2, xxx);
}
// The points we just calculated are centers of parallelogram boxes // C~lc~ te how much the box sticks out from there.
pt2 += .5*fabs(theta) + .501;//.001 to catch roundoff errors 10 ptO -= .5*fabs(theta) + .501;
float dsw = pt2 - ptO ;// deskewed width float kf = dsw / (p2-pO);// krunch factor // usually (but not always) less than 1 // (not used in further calculations) 15 float conwid = dsw + pkern - 1.;// pretend wider to make room for convolution float conhgt = q2 - qO + qkern - 1;// and taller, also for convolution float qtmid = qmid + (qkern-l.) / 2.;// height of middle of char float ptmid = pmid + ptO + .5*conwid + theta*(qkern-1.)/2.;// p coord of middle of char float fat = conwid / conhgt;// fatness ratio of the parallelogram 20 float dfat = des.x / (float) des.y;// desired fatness ratio float nfat = fat / dfat;// norm~li7e~1 fatness ratio if (info) fprintf(stdout, "%s w: %d, h: %d, th: %5.2f, dsw: %5.2f, kf: %5.2f, nfat: %5.2fO, sname, p2-pO, q2-qO, theta, dsw, kf, nfat);
25 ~DI
Il calculate the coefficients of the linear transformation:
float dOO, dOl, dO2, dlO, dl 1, dl2;
if (inflate) { // old "inflationary" scheme float fif = nfat > fcorn ?
l./nfat: l./fcorn;// fatness increasing factor dlO = 0.; I/ pure skew dl 1 = (des.y-yO) / conhgt;// make output char fill its box vertically 2~0254Z
-dOO = dl 1 * fif;
dOl = - dl 1 * fif * theta;
dO2 = xrI~id - dOO*ptmid - dOl*qtmid;// center point is fixed point dl2 = ymid - dlO*ptmid - dl l*qtmid;
5 }
else { // never inflate scheme float ygrow = (des.y-yO) / conhgt;
float xgrow = (des.x-xO) / conwid;
float dogrow = fmin(xgrow, ygrow);// grow as little as possible // i.e. shrink so BOTH fit dogrow = fmin(l.O, dogrow);// but NEVER really inflate dlO = O; // pure skew dl 1 = dogrow;// shrink y dOO = dogrow;// and x equally 15 dOl = -dogrow * theta;
dO2 = xmid - dOO*ptmid - dOl*qtmid;// center point is fixed point dl2 = yrnid - dlO*ptmid - dll*qtmid;
}
// make the output space all white 20 int xx, yy;
for (yy = yO; yy < des.y; yy++)( for (xx = xO; xx < des.x; xx++) {
xyflt[yy][xx] = 0.;
}
25 }
// and start beque~thing blackness int spip - imax(l, int(lO*dl 1));// spip == scans per input pixel float step = pkern / spip;
float offset = step / 2.;
30 int ii;
float fy, fq;
int iy;
float fxO, fx2;
int ixO, ix2;
2~5~2 float rxO, rx2;
#define oops(X) {fprintf(stderr, "oops (X) %s %f %fO, sname, fxO, fx2); continue;}
for (ibl = O; ibl < nbl; ibl++) ~// loop over black input pixels pp = bl[ibl].x; qq = bl[ibl].y;
S fxO = dOO*pp + dOl*qq+dO2;// start of scan line (in x space) fx2 = dOO*(pkern+pp) +dOl*qq+dO2;// end of scan line (..) ixO = int(fxO);// integer parts ix2 = int(fx2);
rxO = fxO - ixO;// rern~in~rs 10 rx2 = fx2 - ix2;
if (ixO < O) oops(l) ;// error checks & defence if (ixO >= des.x) oops(2);
if (ix2 < O) oops(3);
if (ix2 >= des.x) oops(4);
15 for (ii = 0; ii < spip; ii++)~// loop over scan lines per input pixel fq = qq + offset + step*ii;
fy = dlO*pp + dl l*fq + dl2;
iy = (int) fy;
if (iy < O) oops(S);
20 if (iy ~= des.y) oops(6);
xyflt[iy][ixO] -= rxO;
xyflt[iy][ix2] += rx2;
for (int jj = ixO; jj < ix2; jj++) xyflt[iy]rjj] += 1.0; 5 }
}
}
// clip the bottom off of the gray-scale // using questionable threshold scheme 30 1/ first, find blackest output pixel float zmax = O;
for (yy = yO; yy < des.y; yy++ ) {
for (xx = xO; xx < des.x; xx++ ) ~
zmax = fmax(zmax, xyflt[yy][xx]);
`
Z~25~2 -// then clip away....
for (yy = yO; yy ~ des.y; yy++ ) S for (xx = xO; xx < des.x; xx++ ) float zz = xyflt[yy][xx];
int gr = int(9.9 * zz / zrnax);
if (gr ~ mingray) gr = O;
des.pix[yy][xx] = gr;
10 }
}
// put away all our toys:
delete2(xyflt);
15 }
Claims (16)
1. A method for classifying an image, said method including a step of processing signals corresponding to said image to develop a processed image, a step of feature extraction to develop features of said processed image, and a step ofclassification CHARACTERIZED BY:
said step of processing includes a substep of skeletonization of said image and a substep of deskewing of said image.
said step of processing includes a substep of skeletonization of said image and a substep of deskewing of said image.
2. The method of claim 1 where said step of processing further includes a substep of scaling of said image.
3. The method of claim 2 wherein said substep of skeletonization follows said substeps of scaling and deskewing.
4. The method of claim 2 wherein said substep of skeletonization precedes said substeps of scaling and deskewing.
5. The method of claim 2 where said step of processing further includes a substep of cleansing said image prior to said substep of scaling.
6. The method of claim 2 where said step of processing further includes a substep of removing from said image disjoint strokes that are smaller in area than 10 percent of the area occupied by other strokes in said image.
7. The method of claim 1 wherein said step of classification determines the identity of a symbol contained in said image based on a stored set of weights derived from a training set of given sample symbols.
8. The method of claim 1 wherein said step of feature extraction includes a step of developing feature identifiers that describe features of said processed image and their locations within said processed image, and between said step of feature extraction and said step of classification, the method includes a step of coarse blocking that encodes said feature identifiers into a reduced set of feature identifiers.
9. The method of claim 1 wherein said step of feature extraction includes a substep of developing a plurality of feature maps that describe features of said processed image and their locations in said processed image, and a substep of combining developed feature maps into super-feature maps that correspond to composite features.
10. The method of claim 9 wherein said super-feature maps are developed through combinatorial logic that combines some of said feature maps.
11. A method for classifying characters in an image COMPRISING THE
STEPS OF:
capturing said image m the form of an array of picture elements;
processing said image to scale said array of picture elements to a chosen size and to delete extraneous image portions from said image, developing thereby a processed image;
deskewing and skeletonizing said processed image to develop a thinner and a more upright version of said image;
extracting features from said thinner and more upright image that characterize said image and developing feature identifiers to represent features of said image and their locations;
coarse blocking said feature maps to develop more granular feature maps; and classifying said granular feature maps as representing one of a known set of symbols, or representing an unknown symbol.
STEPS OF:
capturing said image m the form of an array of picture elements;
processing said image to scale said array of picture elements to a chosen size and to delete extraneous image portions from said image, developing thereby a processed image;
deskewing and skeletonizing said processed image to develop a thinner and a more upright version of said image;
extracting features from said thinner and more upright image that characterize said image and developing feature identifiers to represent features of said image and their locations;
coarse blocking said feature maps to develop more granular feature maps; and classifying said granular feature maps as representing one of a known set of symbols, or representing an unknown symbol.
12. Apparatus for identifying characters embedded in an applied image comprising:
first means for capturing said image in the form of a matrix of picture elements;
second means responsive to said first means for identifying the presence of a character in said image and conditioning said character by deskewing and skeletonizing said image to develop a conditioned image;
third means responsive to said second means for extracting features of said conditioned image to develop feature maps pertaining to specified features of said image; and fourth means responsive to said third means for developing coarse blocked versions of said feature maps; and means for classifying said coarse blocked version of said feature maps as representing one of a known number of characters or an unknown character.
first means for capturing said image in the form of a matrix of picture elements;
second means responsive to said first means for identifying the presence of a character in said image and conditioning said character by deskewing and skeletonizing said image to develop a conditioned image;
third means responsive to said second means for extracting features of said conditioned image to develop feature maps pertaining to specified features of said image; and fourth means responsive to said third means for developing coarse blocked versions of said feature maps; and means for classifying said coarse blocked version of said feature maps as representing one of a known number of characters or an unknown character.
13. The method of claim 1 wherein said substep of skeletonizing comprises:
a step of passing a plurality of templates, in parallel, over said image, and a step of unconditionally determining, for each template and based on a comparison between said template and said image, whether a selected portion of said image can be deleted from said image.
a step of passing a plurality of templates, in parallel, over said image, and a step of unconditionally determining, for each template and based on a comparison between said template and said image, whether a selected portion of said image can be deleted from said image.
14. The method of claim 13 further comprising a step deleting portions of said image in accordance with said step of unconditionally determing.
15. The method of claim 13 wherein said step of passing a plurality of templates passes said templates over the image in steps that successively align said templates with different portions of said image.
16. The method of claim 13 wherein at least some of said templates forms an array of more than three pixels by three pixels.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US28833988A | 1988-12-20 | 1988-12-20 | |
US288,339 | 1988-12-20 |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2002542A1 CA2002542A1 (en) | 1990-06-20 |
CA2002542C true CA2002542C (en) | 1995-10-24 |
Family
ID=23106681
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002002542A Expired - Fee Related CA2002542C (en) | 1988-12-20 | 1989-11-08 | Imaged symbol classification |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPH02257381A (en) |
CA (1) | CA2002542C (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6403547B2 (en) * | 2014-11-18 | 2018-10-10 | 学校法人東京理科大学 | Steel component identification device and program thereof |
CN114359645B (en) * | 2022-01-12 | 2024-05-21 | 中国平安人寿保险股份有限公司 | Image expansion method, device, equipment and storage medium based on characteristic area |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS58169681A (en) * | 1982-03-31 | 1983-10-06 | Fujitsu Ltd | Picture processing circuit |
JPS60142481A (en) * | 1983-12-28 | 1985-07-27 | Fujitsu Ltd | character recognition device |
JPS63131287A (en) * | 1986-11-20 | 1988-06-03 | Ricoh Co Ltd | Character recognition system |
-
1989
- 1989-11-08 CA CA002002542A patent/CA2002542C/en not_active Expired - Fee Related
- 1989-12-15 JP JP1324184A patent/JPH02257381A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
CA2002542A1 (en) | 1990-06-20 |
JPH02257381A (en) | 1990-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5224179A (en) | Image skeletonization method | |
Gupta et al. | An integrated architecture for recognition of totally unconstrained handwritten numerals | |
Suen et al. | Computer recognition of unconstrained handwritten numerals | |
Huang et al. | Off-line signature verification based on geometric feature extraction and neural network classification | |
Chellapilla et al. | Using machine learning to break visual human interaction proofs (HIPs) | |
EP0918300B1 (en) | Fingerprint feature correlator | |
Jackel et al. | An application of neural net chips: Handwritten digit recognition | |
Boles et al. | Personal identification using images of the human palm | |
EP3702958B1 (en) | Method for verifying the identity of a user by identifying an object within an image that has a biometric characteristic of the user and separating a portion of the image comprising the biometric characteristic from other portions of the image | |
CA2002542C (en) | Imaged symbol classification | |
Sharma et al. | A deep cnn model for student learning pedagogy detection data collection using ocr | |
JP2893948B2 (en) | License plate recognition device | |
CA2002544C (en) | Image skeletonization method | |
Chung et al. | Handwritten character recognition by Fourier descriptors and neural network | |
Gilewski et al. | Education Aspects: Handwriting Recognition-Neural Networks-Fuzzy Logic | |
Dhawale et al. | Recognition of degraded printed Marathi characters using zone and texture-based features | |
Matsugu et al. | Face recognition using SVM combined with CNN for face detection | |
Travieso et al. | Handwritten digits parameterisation for HMM based recognition | |
JP2899159B2 (en) | Fingerprint collation device | |
Szymkowski et al. | Svm based hiragana and katakana recognition algorithm with neural network based segmentation | |
JP2868909B2 (en) | Fingerprint collation device | |
JP2844789B2 (en) | Character recognition method and character recognition device | |
Lin et al. | Neuromorphic vision processing system | |
He et al. | Gabor binary codes for face recognition | |
Sahu et al. | Sign Language Recognition using GLCM and ResNet50 features with PPCA-based Feature Reduction Approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKLA | Lapsed |