JPH0632079B2 - Character recognition device - Google Patents
Character recognition deviceInfo
- Publication number
- JPH0632079B2 JPH0632079B2 JP60001581A JP158185A JPH0632079B2 JP H0632079 B2 JPH0632079 B2 JP H0632079B2 JP 60001581 A JP60001581 A JP 60001581A JP 158185 A JP158185 A JP 158185A JP H0632079 B2 JPH0632079 B2 JP H0632079B2
- Authority
- JP
- Japan
- Prior art keywords
- character
- pixel
- recognition
- pixel point
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Landscapes
- Character Discrimination (AREA)
Description
【発明の詳細な説明】 産業上の利用分野 本発明は、新聞・雑誌等の活字および手書き文字を認識
し、文字コードに変換する文字認識装置に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for recognizing printed characters and handwritten characters of newspapers, magazines, etc. and converting them into character codes.
従来の技術 従来の文字認識装置では、矩形で切り出された認識対象
文字パターンにおいて、前記文字切り出しに用いた矩形
の各辺を粗く分割し、この分割された各辺の各部分から
向い合う各辺に対して走査し、文字部分に出会うまでの
文字背景部のヒストグラムを求め、この情報により文字
認識を行っていた。第2図は矩形に切り出された認識対
象文字『回』と『囲』である。この2つの文字を切り出
した矩形の各辺を第3図a,bに示すように4分割す
る。前記4分割された各辺の各部分から向い合う各辺に
対して走査し、文字部分に出会うまでの背景部のヒスト
グラムを求めると第3図a,bの通りとなり、前記背景
部のヒストグラムより認識を行なう。2. Description of the Related Art In a conventional character recognition device, in a recognition target character pattern cut out in a rectangle, each side of the rectangle used for the character cutout is roughly divided, and each side facing from each part of this divided side. The character background was scanned by scanning, and the histogram of the character background portion until the character portion was encountered was obtained, and the character recognition was performed based on this information. FIG. 2 shows recognition target characters “turn” and “enclosure” cut out in a rectangle. Each side of a rectangle obtained by cutting out these two characters is divided into four as shown in FIGS. Scanning each side facing each other from each of the four divided sides to obtain the histogram of the background portion until the character portion is encountered is as shown in FIGS. 3A and 3B. From the histogram of the background portion, Recognize.
発明が解決しようとする問題点 しかしながら第3図a,bより明らかなように、文字
『回』と『囲』の切り出し矩形の各辺に対応した背景の
ヒストグラムは一致しており、文字『回』と『囲』の認
別は困難であることがわかる。Problems to be Solved by the Invention However, as is clear from FIGS. 3A and 3B, the histograms of the backgrounds corresponding to the respective sides of the cutout rectangle of the characters "time" and "enclosure" match, and the character "time" It turns out that it is difficult to distinguish between "and".
本発明は前記問題点を改善するもので、文字の周辺の状
態が類似している文字を識別できる文字認識装置を提供
することを目的としている。The present invention solves the above problems, and an object of the present invention is to provide a character recognition device that can identify characters having similar states around the characters.
問題点を解決するための手段 本発明は前記問題点を解決するため、入力画像から認識
対象文字を切り出した矩形において、前記矩形の各辺か
ら向い合う各辺に対して走査し、前記認識対象文字の背
景部から文字部に変化する画素点を検出したとき、当該
画素点を変化点とみなし、画素点の属する集合の番号で
ある画素列番号を1増加して当該画素点の画素列番号と
し、前記変化点以外の画素点の場合は着目画素の1画素
前の画素列番号を付与し、画素列番号に対応した画素数
のヒストグラムを水平方向M、垂直方向N分割の計M×
N個の各サブ領域毎に求め、その値を文字の特徴量とし
て認識を行なうものである。Means for Solving the Problems In order to solve the above-mentioned problems, in the rectangle obtained by cutting out a recognition target character from an input image, each side facing from each side of the rectangle is scanned to obtain the recognition target. When a pixel point that changes from the background part of a character to a character part is detected, that pixel point is regarded as a change point, the pixel column number that is the number of the set to which the pixel point belongs is incremented by 1, and the pixel column number of that pixel point In the case of a pixel point other than the change point, a pixel column number one pixel before the pixel of interest is given, and a histogram of the number of pixels corresponding to the pixel column number is divided into a horizontal direction M and a vertical direction N for a total of M ×
It is obtained for each of the N sub-regions, and the value is recognized as the character feature amount.
作 用 本発明は前記の技術的手段により、文字の背景の形状が
ほとんど等しい類似文字に対しても、サブ領域における
前記画素列番号に対応した画素数のヒストグラムに文字
の内部の差が反映され認識が可能となる。Operation According to the present invention, by the above technical means, even for similar characters whose character background shapes are almost the same, the internal difference of the character is reflected in the histogram of the number of pixels corresponding to the pixel column number in the sub-region. It becomes possible to recognize.
実施例 以下、本発明の実施例について図面を参照しながら説明
する。Examples Hereinafter, examples of the present invention will be described with reference to the drawings.
第1図は、本発明における文字認識装置の一実施例の構
成図である。1は画像入力部であり、認識対象文字を含
む画像を入力する。2は文字切り出し部であり、画像入
力部1で入力された画像から認識対象文字パターンを矩
形で切り出す。3は領域別密度計算部であり、各画素点
が属する画素列番号を決定し、文字切り出し部2で切り
出した矩形を水平方向にM、垂直方向にN分割したM×
N個の各サブ領域毎に前記画素列番号に対応して画素点
のヒストグラムを求める。4は認識部であり、領域別密
度計算部で求めたヒストグラムを認識対象文字の特徴量
として、辞書5と比較して認識候補文字を抽出する。6
は表示部であり、認識部4で得た認識結果を表する。FIG. 1 is a block diagram of an embodiment of a character recognition device according to the present invention. An image input unit 1 inputs an image including a recognition target character. Reference numeral 2 denotes a character cutout unit, which cuts out a recognition target character pattern in a rectangular shape from the image input by the image input unit 1. Reference numeral 3 denotes a region-specific density calculation unit that determines the pixel column number to which each pixel point belongs, and divides the rectangle cut out by the character cutout unit 2 into M in the horizontal direction and N in the vertical direction.
For each of the N sub-regions, a histogram of pixel points corresponding to the pixel column number is obtained. A recognition unit 4 extracts a recognition candidate character by comparing it with the dictionary 5 using the histogram obtained by the area density calculation unit as the feature amount of the recognition target character. 6
Is a display unit and represents the recognition result obtained by the recognition unit 4.
以上のように構成された文字認識装置の動作について認
識例題文字『回』を例に説明する。The operation of the character recognition apparatus configured as described above will be described by taking the recognition example character "" as an example.
認識例題文字『回』は画像入力部1で2値化される。文
字切り出し部2では2値化された入力画像から第4図
(a)に示す矩形パターンRを認識対象文字パターンと
して切り出す。領域別密度計算部3では前記文字切り出
し部2で得られた矩形Rの各辺から向い合う各辺に対し
て走査し、辺上の画素点の画素列番号を1とし、前記認
識対象文字パターンの背景部から文字部に変化する画素
点である変化点を検出した場合、画素列番号を1増加し
て当該画素点の画素列番号とし、前記変化点以外の画素
点の場合は当該画素の1画素前の画素列番号を付す。第
4図(b)に認識例題文字『回』の上辺から走査した場
合の全ての画素点の画素列番号の付加結果を示す。前記
切り出された矩形を第4図(a)の……で示す
水平方向に4分割(M=4)、垂直方向に4分割(N=
4)計M×N=16個のサブ領域毎に前記画素列番号に
対応して各画素点のヒストグラムを求める。第5図は認
識例題文字『回』の上辺から走査した場合のヒストグラ
ムで、認識対象文字パターンの特徴量である。認識対象
文字『回』の場合は前記切り出し矩形の上下左右の各辺
から走査した場合、第5図に類似したヒストグラムを得
ることができ、前記上下左右の4方向のヒストグラムを
特徴量として持つ。The recognition example character "" is binarized by the image input unit 1. The character cutout unit 2 cuts out the rectangular pattern R shown in FIG. 4A as a recognition target character pattern from the binarized input image. The area-specific density calculation unit 3 scans each side of the rectangle R obtained by the character cutout unit 2 facing each other, sets the pixel row number of the pixel point on the side to 1, and recognizes the recognition target character pattern. When a change point which is a pixel point changing from the background portion to the character portion is detected, the pixel row number is incremented by 1 to obtain the pixel row number of the pixel point. In the case of a pixel point other than the change point, The pixel column number one pixel before is attached. FIG. 4 (b) shows the result of adding the pixel column numbers of all the pixel points when scanning is performed from the upper side of the recognition example character "". The cut-out rectangle is horizontally divided into four (M = 4) and vertically divided into four (N =
4) Obtain a histogram of each pixel point corresponding to the pixel column number for each of a total of M × N = 16 sub-regions. FIG. 5 is a histogram in the case of scanning from the upper side of the recognition example character “twice”, which is the characteristic amount of the recognition target character pattern. In the case of the recognition target character “turn”, when scanning is performed from the upper, lower, left, and right sides of the cut-out rectangle, a histogram similar to that in FIG. 5 can be obtained, and the histograms in the four directions of the upper, lower, left, and right are possessed as feature amounts.
認識部4では、認識例題文字『回』と同じ方法で抽出し
た各文字の特徴量を蓄えている辞書5と照合し、認識候
補文字を抽出し、表示部6に結果を表示する。The recognition unit 4 collates with the dictionary 5 which stores the feature amount of each character extracted by the same method as the recognition example character “twice”, extracts the recognition candidate character, and displays the result on the display unit 6.
本実施例では、第6図(a)に示した認識例題文字
『囲』について切り出し矩形の上辺から走査して全ての
画素点に画素列番号を付加すると第6図(b)に示すよ
うになり、特徴量を求めると第7図に示す値となる。こ
の値は第5図に示した認識例題文字『回』の特徴量と異
っており、従来例で誤認識を起こしていた『回』と
『囲』についても認識可能となる。In the present embodiment, when the recognition example character "enclosure" shown in FIG. 6 (a) is scanned from the upper side of the cut rectangle and pixel row numbers are added to all pixel points, as shown in FIG. 6 (b). Then, when the characteristic amount is obtained, the value becomes as shown in FIG. This value is different from the feature amount of the recognition example character "turn" shown in FIG. 5, and it is possible to recognize "turn" and "encircle" which are erroneously recognized in the conventional example.
尚本実施例では切り出した認識対象文字の矩形の全サブ
領域に対して全画素列番号に対応したヒストグラムを求
めたが、限定したサブ領域に対して限定した画素列番号
に対応したヒストグラムを用いても同様の効果が得られ
る事はいうまでもない。In this embodiment, the histogram corresponding to all the pixel column numbers is obtained for all the rectangular sub-regions of the cut out recognition target character, but the histogram corresponding to the limited pixel column numbers is used for the limited sub-regions. However, it goes without saying that the same effect can be obtained.
発明の効果 本発明は、入力画像から矩形に切り出された認識対象文
字パターンを水平方向にM、垂直方向にN分割したM×
N個の領域毎に、前記矩形の各辺から向い合う辺に対し
て走査して決定される各画素点の画素列番号に対応した
画素数のヒストグラムを求め、その値を文字特徴として
認識を行なうことにより微少な差しか存在しない類似文
字についても認識可能となる。According to the present invention, a recognition target character pattern cut out in a rectangle from an input image is divided into M in the horizontal direction and N in the vertical direction, M ×.
For each of the N areas, a histogram of the number of pixels corresponding to the pixel column number of each pixel point determined by scanning the opposite sides of the rectangle is obtained, and the value is recognized as a character feature. By doing so, it becomes possible to recognize even a similar character with a slight difference.
第1図は本発明における一実施例による文字認識装置の
構成図、第2図は認識例題文字『回』と『囲』を示す
図、第3図は『回』と『囲』の従来方式による特徴量を
説明するための図、第4図は本発明の実施例における
『回』の各サブ領域毎の特徴量の求め方の説明図、第5
図は同実施例における『回』の各サブ領域毎のヒストグ
ラムの説明図、第6図は同実施例におけるもう一つの認
識例題文字『囲』をサブ領域に分割して示した図、第7
図は前記『囲』の各サブ領域毎のヒストグラムの説明図
である。 1……画像入力部、2……文字切り出し部、3……領域
別密度計算部、4……認識部、5……辞書部、6……表
示部。FIG. 1 is a block diagram of a character recognition apparatus according to an embodiment of the present invention, FIG. 2 is a diagram showing recognition example characters "" and "", and FIG. 3 is a conventional method of "" and "". FIG. 4 is a diagram for explaining the feature amount according to FIG. 4, and FIG. 4 is an explanatory diagram of a method for obtaining the feature amount for each sub-region of “times” in the embodiment of the present invention.
FIG. 7 is an explanatory diagram of a histogram for each sub-region of “times” in the same embodiment, and FIG. 6 is a diagram showing another recognition example character “enclosure” in the same embodiment divided into sub-regions, and FIG.
The figure is an explanatory diagram of a histogram for each sub-region of the "circle". 1 ... Image input section, 2 ... Character cutout section, 3 ... Area density calculation section, 4 ... Recognition section, 5 ... Dictionary section, 6 ... Display section.
Claims (1)
力部と、入力された画像から認識対象文字パターンを矩
形で切り出す文字切り出し部と、前記文字切り出し部で
得られた切り出し矩形の上下左右の各辺から向い合う各
辺に対して走査し、前記認識対象文字の背景部から文字
部に変化する画素点を検出したとき、当該画素点を変化
点とみなし、画素点の属する集合の番号である画素列番
号を1増加して当該画素点の画素列番号とし、前記変化
点以外の画素点の場合は当該画素点の1画素前の画素点
と同じ画素列番号を付与し、前記得られた各画素点の画
素列番号に対応した画素点のヒストグラムを前記文字切
り出し部で切り出した矩形を水平方向にM、垂直方向に
N分割した各サブ領域毎に求める領域別密度計算部と、
前記領域別密度計算部により得られたヒストグラムを用
いて候補文字を抽出する認識部を有することを特徴とす
る文字認識装置。1. An image input unit for inputting an image containing a recognition target character, a character cutting unit for cutting out a recognition target character pattern from the input image in a rectangle, and upper, lower, left and right sides of a cutting rectangle obtained by the character cutting unit. When the pixel point changing from the background part of the recognition target character to the character part is detected by scanning from each side facing each other, the pixel point is regarded as the change point, and the number of the set to which the pixel point belongs Is incremented by 1 to be the pixel column number of the pixel point, and in the case of pixel points other than the change point, the same pixel column number as the pixel point one pixel before the pixel point is given, A region-by-region density calculation unit that obtains each rectangle obtained by dividing the histogram of the pixel points corresponding to the pixel row number of each pixel point obtained by the character cutting unit into M in the horizontal direction and N in the vertical direction,
A character recognition device having a recognition unit for extracting candidate characters using the histogram obtained by the density calculation unit for each area.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP60001581A JPH0632079B2 (en) | 1985-01-09 | 1985-01-09 | Character recognition device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP60001581A JPH0632079B2 (en) | 1985-01-09 | 1985-01-09 | Character recognition device |
Publications (2)
Publication Number | Publication Date |
---|---|
JPS61161584A JPS61161584A (en) | 1986-07-22 |
JPH0632079B2 true JPH0632079B2 (en) | 1994-04-27 |
Family
ID=11505479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP60001581A Expired - Lifetime JPH0632079B2 (en) | 1985-01-09 | 1985-01-09 | Character recognition device |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPH0632079B2 (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS61188681A (en) * | 1985-02-15 | 1986-08-22 | Matsushita Electric Ind Co Ltd | Character recognition device |
-
1985
- 1985-01-09 JP JP60001581A patent/JPH0632079B2/en not_active Expired - Lifetime
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS61188681A (en) * | 1985-02-15 | 1986-08-22 | Matsushita Electric Ind Co Ltd | Character recognition device |
Also Published As
Publication number | Publication date |
---|---|
JPS61161584A (en) | 1986-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR900007009B1 (en) | Character recognition device | |
US5668892A (en) | Table recognition apparatus | |
JPH05242292A (en) | Separating method | |
JP2890306B2 (en) | Table space separation apparatus and table space separation method | |
JP3058791B2 (en) | Method of extracting figure of image recognition device | |
JP2661898B2 (en) | Character recognition device | |
JPH0632079B2 (en) | Character recognition device | |
JP2796561B2 (en) | Tabular document recognition method | |
JP3095470B2 (en) | Character recognition device | |
JP2537973B2 (en) | Character recognition device | |
JP2789622B2 (en) | Character / graphic area determination device | |
JPH0797390B2 (en) | Character recognition device | |
JP2795860B2 (en) | Character recognition device | |
JPH07160810A (en) | Character recognizing device | |
JP2797523B2 (en) | Drawing follower | |
JP2918363B2 (en) | Character classification method and character recognition device | |
JP2531800B2 (en) | Character cutting method in character reading device | |
JPH0215388A (en) | Character recognizing device | |
JPH05274472A (en) | Image recognizing device | |
JPH0576671B2 (en) | ||
JPS6120181A (en) | Character recognizing device | |
JPH0782525B2 (en) | Character recognition device | |
JPH08171609A (en) | High-speed character string extraction device | |
JPS63221495A (en) | Character recognizing device | |
JPH03160582A (en) | Method for separating ruled line and character in document picture data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EXPY | Cancellation because of completion of term |