CN1570958A - Method for identifying multi-font multi-character size print form Tibetan character - Google Patents
Method for identifying multi-font multi-character size print form Tibetan character Download PDFInfo
- Publication number
- CN1570958A CN1570958A CN 200410034107 CN200410034107A CN1570958A CN 1570958 A CN1570958 A CN 1570958A CN 200410034107 CN200410034107 CN 200410034107 CN 200410034107 A CN200410034107 A CN 200410034107A CN 1570958 A CN1570958 A CN 1570958A
- Authority
- CN
- China
- Prior art keywords
- character
- omega
- sigma
- overbar
- centerdot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 239000013598 vector Substances 0.000 claims abstract description 66
- 238000010606 normalization Methods 0.000 claims abstract description 52
- 238000012360 testing method Methods 0.000 claims abstract description 13
- 238000004458 analytical method Methods 0.000 claims abstract description 12
- 238000000605 extraction Methods 0.000 claims abstract description 11
- 230000005484 gravity Effects 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 33
- 238000012549 training Methods 0.000 claims description 27
- 230000009466 transformation Effects 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 15
- 239000000284 extract Substances 0.000 claims description 11
- 238000013461 design Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 238000002474 experimental method Methods 0.000 claims description 7
- 230000006835 compression Effects 0.000 claims description 5
- 238000007906 compression Methods 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 2
- 238000000926 separation method Methods 0.000 claims 2
- 230000001174 ascending effect Effects 0.000 claims 1
- 230000015572 biosynthetic process Effects 0.000 claims 1
- 238000006243 chemical reaction Methods 0.000 claims 1
- 238000012937 correction Methods 0.000 claims 1
- 230000008707 rearrangement Effects 0.000 claims 1
- 230000009467 reduction Effects 0.000 claims 1
- 238000012546 transfer Methods 0.000 claims 1
- 239000000203 mixture Substances 0.000 abstract description 6
- 239000000523 sample Substances 0.000 description 13
- 230000011218 segmentation Effects 0.000 description 9
- 238000009826 distribution Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 235000017166 Bambusa arundinacea Nutrition 0.000 description 2
- 235000017491 Bambusa tulda Nutrition 0.000 description 2
- 241001330002 Bambuseae Species 0.000 description 2
- 235000015334 Phyllostachys viridis Nutrition 0.000 description 2
- 239000011425 bamboo Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000003825 pressing Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000013095 identification testing Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
多字体多字号印刷体藏文字符识别方法属于字符识别领域,其特征在于,提出了针对属于非方块字的印刷体藏文字符特点的归一化方案:将字符图像以基线,即上平线,为分界点分解成互不交叠的两个子图像,对每个子图像分别采用以重心和边框相结合的位置归一化和基于三次B样条函数插值的大小归一化方法;提取能充分反映藏文字符组成信息的四方向线素特征,利用线性鉴别分析LDA压缩降维后得到紧凑的字符特征向量。采用基于置信度分析的粗、细两级分类策略进行字符类别的判决,粗、细分类器分别采用带偏差的欧氏距离EDD和修正的二次鉴别函数MQDF。本发明在多字体多字号印刷体藏文单字测试集上的识别正确率达到99.83%,对实际文本的识别率也可达99%以上。
The multi-font and multi-size printed Tibetan character recognition method belongs to the field of character recognition. It is characterized in that it proposes a normalization scheme for the characteristics of printed Tibetan characters belonging to non-square characters: the character image is based on the baseline, that is, the upper horizontal line, Decompose the boundary point into two sub-images that do not overlap each other, and use the position normalization combined with the center of gravity and frame and the size normalization method based on cubic B-spline function interpolation for each sub-image; the extraction can fully reflect The four-direction linear features of Tibetan character composition information are compressed and dimensionally reduced by linear discriminant analysis to obtain compact character feature vectors. The rough and fine two-level classification strategy based on confidence analysis is used to judge the character category. The coarse and fine classifiers use the Euclidean distance with deviation EDD and the modified quadratic discriminant function MQDF respectively. The present invention has a recognition accuracy rate of 99.83% on the multi-font and multi-size printed Tibetan single-character test set, and the recognition rate of the actual text can also reach more than 99%.
Description
技术领域technical field
多字体多字号印刷体藏文字符识别方法属于字符识别领域。The multi-font and multi-size printed Tibetan character recognition method belongs to the field of character recognition.
背景技术Background technique
藏文字符识别技术是中文多文种信息处理系统的重要组成部分,具有极高的理论价值和广阔的应用前景。字符识别方法可以归结为两类:统计决策方法和句法结构方法。在统计决策方法中,每个字符模式用一个特征矢量表示,它被看成是特征空间中的一个点,识别的过程就是在特征空间中将待识别字符模式正确地划分到所属的类别中。而句法结构方法则对于给定的字符集,抽取数量有限的不可分割的最小子模式(基元),将这些基元按照特定的顺序和规则组合起来可以构成该字符集中的任何字符。这样,利用字符结构与语言之间的相似性,字符识别可以借助形式语言学的文法(包含了句法规则)来描述剖析字符的结构。Tibetan character recognition technology is an important part of Chinese multilingual information processing system, which has extremely high theoretical value and broad application prospects. Character recognition methods can be classified into two categories: statistical decision-making methods and syntactic structure methods. In the statistical decision-making method, each character pattern is represented by a feature vector, which is regarded as a point in the feature space, and the recognition process is to correctly divide the character pattern to be recognized into its category in the feature space. The syntactic structure method extracts a limited number of indivisible minimum sub-patterns (primitives) for a given character set, and combines these primitives according to a specific order and rules to form any character in the character set. In this way, using the similarity between character structure and language, character recognition can use the grammar of formal linguistics (including syntactic rules) to describe and analyze the structure of characters.
字符数量大、字型结构复杂、字体种类多、相似字比例高给藏文字符识别研究带来了挑战。目前国内外对藏文识别的研究基本上还非常有限,尚未见到有成功的算法和系统出现。藏文虽然是拼音文字,每个字符都由若干个部件(字母及某些字母的变体)组成,但由于部件的结构及其相互间的连接方式复杂,使得正确分离字符中各部件非常困难,又考虑到句法结构方法的抗干扰性差等显著的弱点,所以本发明采用统计决策的方法来进行多字体多字号印刷体藏文字符识别的研究,以单个藏文字符的整体作为基本的识别单位。The large number of characters, complex font structure, many types of fonts, and high proportion of similar characters have brought challenges to the research of Tibetan character recognition. At present, the research on Tibetan recognition at home and abroad is basically very limited, and no successful algorithms and systems have been seen yet. Although Tibetan is a phonetic script, each character is composed of several parts (letters and some variants of letters), but due to the complexity of the structure of the parts and the way they are connected to each other, it is very difficult to correctly separate the parts in the character , taking into account the obvious weaknesses such as poor anti-interference of the syntactic structure method, so the present invention adopts the method of statistical decision-making to carry out the research of multi-font and multi-size printed Tibetan character recognition, and takes the whole of a single Tibetan character as the basic recognition unit.
在汉字识别中,方向线素很好地描述了它在所占空间的不同位置上横、竖、撇、捺四种基本单元的数量关系,从而全面、准确、稳定地反映了汉字的组成信息。藏文字符由各部件按照一定的次序纵向叠加在一起构成,而部件又由笔划组成,各部件中笔划之间的连接关系是固定不变的。这样,每个藏文字符都有特定的结构,并且这种结构可以被从层次、局部和细节三方面反映出来,而方向线素正是刻画这些结构特征的有效手段。In Chinese character recognition, the direction line element well describes the quantitative relationship of the four basic units of horizontal, vertical, left and right in different positions of the occupied space, thus comprehensively, accurately and stably reflecting the composition information of Chinese characters . Tibetan characters are composed of various components vertically stacked together in a certain order, and the components are composed of strokes, and the connection relationship between the strokes in each component is fixed. In this way, each Tibetan character has a specific structure, and this structure can be reflected from three aspects: level, part and detail, and the direction line element is an effective means to describe these structural characteristics.
本发明在全面细致考察藏文字符特点的基础上,根据藏文字符的特殊形态,选择了恰当的归一化方法,抽取描述能力强的方向线素特征,利用基于置信度分析的两级统计分类器获得识别结果,实现了高性能的多字体多字号藏文字符识别方法,这是目前所有其他文献里都没有使用过的方法。The present invention selects an appropriate normalization method according to the special form of Tibetan characters on the basis of comprehensively and meticulously examining the characteristics of Tibetan characters, extracts directional line features with strong descriptive ability, and utilizes two-level statistics based on confidence analysis The classifier obtains the recognition result and realizes a high-performance multi-font multi-size Tibetan character recognition method, which is a method that has not been used in any other literature so far.
发明内容Contents of the invention
本发明的目的在于实现一个多字体多字号印刷体藏文字符识别的方法。以单个藏文字符作为处理对象,首先对字符对象进行必要的归一化处理,包括位置归一化和大小归一化,然后提取能很好反映字符特点的四方向线素特征并采用LDA(线性鉴别分析)方法对特征进行压缩变换,采用基于置信度分析的粗、细两级统计分类器进行分类判决。由此,可以得到极高的单字识别正确率。根据该方法,实现了一个多字体多字号印刷体藏文字符识别系统。The purpose of the present invention is to realize a multi-font and multi-size printed Tibetan character recognition method. Taking a single Tibetan character as the processing object, firstly carry out the necessary normalization processing on the character object, including position normalization and size normalization, and then extract the four-directional line element feature that can well reflect the characteristics of the character and use LDA( Linear discriminant analysis) method compresses and transforms features, and adopts coarse and fine two-level statistical classifiers based on confidence analysis to make classification judgments. Thus, a very high accuracy rate of single character recognition can be obtained. According to the method, a multi-font and multi-size printed Tibetan character recognition system is realized.
作为一个印刷体藏文字符识别系统还包括单字样本的采集,即系统首先扫描输入印刷体藏文的文本,采用自动的方式进行字符切分。利用采集建立的训练样本数据库,进行方向线素特征抽取和特征变换,得到训练样本的特征数据库。在训练样本的特征数据库的基础上,通过实验确定分类器的参数。对未知的输入字符样本,采用同样的方法抽取特征,然后送入分类器与特征库进行分类比较,从而判断输入字符的类别属性。As a printed Tibetan character recognition system, it also includes the collection of single-character samples, that is, the system first scans the printed Tibetan text, and uses an automatic method to perform character segmentation. Using the training sample database established by collection, the direction line element feature extraction and feature transformation are carried out to obtain the feature database of the training sample. Based on the feature database of training samples, the parameters of the classifier are determined through experiments. For unknown input character samples, the same method is used to extract features, and then sent to the classifier for classification and comparison with the feature library, so as to determine the category attributes of the input characters.
本发明由以下几部分组成:字符归一化、四方向线素特征提取、特征变换、分类器设计。The invention consists of the following parts: character normalization, four-direction line element feature extraction, feature transformation, and classifier design.
1.字符归一化1. Character normalization
1.1位置归一化1.1 Position normalization
设原始字符图像为[F(i,j)]W×H,图像宽度为W,高度为H,图像位于第i行第j列的象素点的值为F(i,j),i=1,2,…,H,j=1,2,…,W。根据藏文字符的特点,[F(i,j)]W×H可以看作两个互不重叠的子图像[F1(i,j)]W×H1、[F2(i,j)]W×H2的纵向拼接而成,其中[F1(i,j)]W×H1为基线(上平线)以上部分图像,即上元音部分,[F2(i,j)]W×H2为基线以下部分,且H1+H2=H。设字符图像的水平投影V(i),i=1,2,…,H由下式计算:Suppose that the original character image is [F(i, j)] W×H , the image width is W, and the height is H, and the value of the pixel point in the i-th row and j-column of the image is F(i, j), i= 1, 2, ..., H, j=1, 2, ..., W. According to the characteristics of Tibetan characters, [F(i, j)] W×H can be regarded as two non-overlapping sub-images [F 1 (i, j)] W×H1 , [F2(i, j)] The longitudinal splicing of W×H2 , where [F1(i, j)] W×H1 is the part of the image above the baseline (upper horizontal line), that is, the upper vowel part, [F 2 (i, j)] W×H2 is the fraction below the baseline, and H 1 +H 2 =H. Suppose the horizontal projection V(i) of the character image, i=1, 2, ..., H is calculated by the following formula:
则基线所在位置的纵坐标值PI为:Then the ordinate value P I of the position of the baseline is:
根据PI和字符顶部的纵坐标的值就可以确定H1,而在本发明所采用的坐标系(图4)中,H1在数值上等于PI。H 1 can be determined according to the value of P I and the vertical coordinate of the top of the character, and in the coordinate system ( FIG. 4 ) adopted by the present invention, H 1 is numerically equal to P I .
设归一化后字符图像为[G(i,j)]M×N,图像宽度为M,高度为N,图像位于第i行第j列的象素点的值为G(i,j),i=1,2,…,N,j=1,2,…,M。同样的,[G(i,j)]M×N也可看作两个互不重叠子图像[G1(i,j)]M×N1、[G2(i,j)]M×N2的纵向拼接而成,其中[G1(i,j)]M×N1为基线以上部分图像,[G2(i,j)]M×N2为基线以下部分,根据对藏文字符中基线的位置特性分析,此处设定N1=N/4,N2=3N/4。这样,归一化可以看成是将输入图像点阵[F1(i,j)]W×H1、[F2(i,j)]W×H2分别映射成目标图像点阵[G1(i,j)]M×N1、[G2(i,j)]M×N2的处理过程。在此过程中,选定输入图像点阵[Fk(i,j)]W×Hk,k=1,2中的参考点Uk(uIk,uJk),k=1,2,移动输入图像点阵,使该参考点位于目标点阵[Gk(i,j)]M×Nk,k=1,2的中心,从而完成输入字符的位置归一化。Let the character image after normalization be [G(i, j)] M×N , the image width is M, the height is N, and the value of the pixel point in the i-th row and j-th column of the image is G(i, j) , i=1, 2,..., N, j=1, 2,..., M. Similarly, [G(i, j)] M×N can also be regarded as two non-overlapping sub-images [G 1 (i, j)] M×N1 , [G 2 (i, j)] M×N2 , where [G 1 (i, j)] M×N1 is the part above the baseline, and [G 2 (i, j)] M×N2 is the part below the baseline. According to the For position characteristic analysis, set N 1 =N/4 and N 2 =3N/4 here. In this way, normalization can be regarded as mapping the input image lattice [F 1 (i, j)] W×H1 and [F 2 (i, j)] W×H2 into the target image lattice [G 1 ( i, j)] M×N1 , [G 2 (i, j)] M×N2 processing. In this process, select the reference point U k (u Ik , u Jk ) in the input image lattice [F k (i, j)] W×Hk , k =1, 2 , k=1, 2, move The image lattice is input, and the reference point is located at the center of the target lattice [G k (i, j)] M×Nk , k=1, 2, thereby completing the position normalization of the input characters.
令[Fk(i,j)]W×Hk,k=1,2重心和外边框几何的中心分别为Ak(aIk,aJk),k=1,2和Bk(bIk,bJk),k=1,2,则有:Let [F k (i, j)] W×Hk , k=1, 2 center of gravity and the center of outer frame geometry be A k (a Ik , a Jk ), k=1, 2 and B k (b Ik , b Jk ), k=1, 2, then:
令Uk(uIk,uJk),k=1,2为介于Ak(aIk,aJk),k=1,2与Bk(bIk,bJk),k=1,2之间的一点,即:Let U k (u Ik , u Jk ), k=1, 2 be between A k (a Ik , a Jk ), k=1, 2 and B k (b Ik , b Jk ), k=1, 2 A point between, that is:
其中β为常数且0≤β≤1。Where β is a constant and 0≤β≤1.
1.2大小归一化1.2 Size normalization
藏文字符是非方块字,字符宽度具有相对稳定性,而各字符间高度差异很大,无法象汉字那样归一化为方形点阵。据对收集到的1200套藏文字符样本中共710,400个(6种字体、7种字号,每套样本592个字符)字符的高宽比特性所做的统计,取归一化之后的藏文字符的高宽比为2较合理,它是差别各异的各字体字符高宽比的一个折衷。Tibetan characters are non-square characters, and the width of the characters is relatively stable, but the height of each character varies greatly, so it cannot be normalized into a square lattice like Chinese characters. According to statistics on the aspect ratio characteristics of 710,400 (6 fonts, 7 font sizes, 592 characters per sample) characters collected from 1,200 sets of Tibetan character samples, the normalized Tibetan characters The aspect ratio of 2 is more reasonable, which is a compromise between the different font character aspect ratios.
考察输入输入字符图像[Fk(i,j)]W×Hk,k=1,2,与归一化后目标字符点阵为[Gk(i,j)]M×Nk,k=1,2,之间的关系可知:Consider the input character image [F k (i, j)] W×Hk , k=1, 2, and the target character lattice after normalization is [G k (i, j)] M×Nk , k=1 , 2, the relationship between:
Gk(i,j)=Fk(i/ri,j/rj),k=1,2G k (i, j) = F k (i/r i , j/r j ), k=1, 2
其中ri和rj分别为i和j方向的尺度变换因子:ri=Nk/Hk,rj=M/W。根据上式,输出图像点阵中的点(i,j)对应于输入字符中的点(i/ri,j/rj)。Fk(i,j)为离散函数,而i/ri、j/rj的取值一般不为整数,故需要根据Fk中已知的离散点处的值来估计其在(i/ri,j/rj)处的取值。本发明采用三次B样条函数来进行插值运算,以减少归一化后字符点阵出现诸如阶梯状边缘等畸变。对于给定(i,j),令:Where r i and r j are scaling factors in directions i and j respectively: r i =N k /H k , r j =M/W. According to the above formula, the point (i, j) in the output image lattice corresponds to the point (i/r i , j/r j ) in the input character. F k (i, j) is a discrete function, and the values of i/r i and j/r j are generally not integers, so it is necessary to estimate its value in (i/ r i , the value at j/r j ). The present invention adopts cubic B-spline function to perform interpolation operation, so as to reduce the distortion such as ladder-like edge and the like that appear in the character dot matrix after normalization. For a given (i, j), let:
其中: [·]为取整函数。插值过程可表示为:in: [·] is rounding function. The interpolation process can be expressed as:
式中的RB(z)为三次B样条函数:R B (z) in the formula is a cubic B-spline function:
其中W(z)为阶跃函数, where W(z) is a step function,
2.方向线素特征提取2. Direction feature extraction
2.1提取字符的轮廓2.1 Extracting the outline of characters
假定特征字图像其笔划所对应的点为黑象素点,背景点为白象素点。对于笔划象素点,如果其8邻域有白象素点且当前黑象素不是孤立黑象素点(8邻域黑象素点的个数为0),则称该笔划象素点为轮廓点。提取轮廓图像的方法是扫描整个字符点阵,对于某个位置的黑象素,如果它的8邻域中的黑象素个数和白象素个数均大于0,则保留该黑象素,否则将字符点阵在该位置的值改为0。这样,从归一化后的字符图像[G(i,j)]M×N得到了其轮廓图像[G′(i,j)]M×N。It is assumed that the points corresponding to the strokes of the characteristic word image are black pixels, and the background points are white pixels. For a stroke pixel, if there are white pixels in its 8 neighbors and the current black pixel is not an isolated black pixel (the number of black pixels in the 8 neighborhood is 0), then the stroke pixel is called a contour point . The method of extracting the outline image is to scan the entire character lattice, for a black pixel at a certain position, if the number of black pixels and the number of white pixels in its 8 neighbors are both greater than 0, then keep the black pixel, otherwise Change the value of the character lattice at this position to 0. In this way, the contour image [G′(i, j)] M×N is obtained from the normalized character image [G(i, j)] M×N .
2.2分块和特征矢量的构成2.2 Composition of blocks and feature vectors
对于字符轮廓点阵[G′(i,j)]M×N中的每一个黑象素,根据它与相邻的另外两个黑象素的位置关系,赋予它横(0°)、竖(90°)、撇(45°)、捺(135°)四种线素。考虑两种情况:一种是3个黑象素在同一直线上,则只给该中心象素分配一种线素特征并且赋值为2(图9a-d);另一种3个黑象素不在同一直线上,那么就同时给中心象素分配两种线素特征并分别赋值为1(图9e-p),如图9k所示的情况则给中心线素分配的线素是捺和竖,数值均为1,其余情况类推。按照这些原则对字符点阵中的各黑象素的进行线素特征的分配,对每个黑象素点(i,j),都可以得到一个4维向量X(i,j)=(xv,xk,xp,xo)T,其分量分别表示该黑象素点处的4种线素的数量。For each black pixel in the character outline lattice [G′(i, j)] M×N , according to its positional relationship with the other two adjacent black pixels, it is given horizontal (0°), vertical (90°), skimming (45°), and pressing (135°) four lines. Consider two cases: one is that 3 black pixels are on the same line, then only assign a line pixel feature to the central pixel and assign a value of 2 (Figure 9a-d); the other is 3 black pixels If they are not on the same straight line, then assign two kinds of line element features to the central pixel at the same time and assign them a value of 1 respectively (Fig. 9e-p). , the values are all 1, and the rest of the cases are deduced by analogy. According to these principles, each black pixel in the character dot matrix is distributed with line features, and for each black pixel point (i, j), a 4-dimensional vector X (i, j)=(x v , x k , x p , x o ) T , the components of which respectively represent the quantity of four kinds of line pixels at the black pixel point.
完成上述工作以后,将M×N的点阵均匀分成宽为M0、高为N0的子区域(图10),每个子区域跟相邻的子区域之间在水平方向有M0/2、在垂直方向上有N0/2个象素的重合,故从整个M×N点阵可以得到的子区域个数为
而整个子区域的方向线素特征向量XS=(xv,xk,xp,xo)T由该子区域中各方块特征向量的加权和来表示,即:And the direction line element feature vector X S =(x v , x k , x p , x o ) T of the whole sub-area is represented by the weighted sum of the feature vectors of each block in the sub-area, namely:
XS=αAXA+αBXB+αCXC+αDXD X S =α A X A +α B X B +α C X C +α D X D
其中αA,αB,αC,αD为介于0和1之间的常数,它们刻画了不同方块内的特征向量对本子区域整体特征向量的贡献的重要程度。这样,从每个子区域都可以得到一个4维特征向量后,将所有子区域的特征向量按顺序排列在一起组成的
3.特征变换3. Feature transformation
特征维数的增大和训练样本的不足,将给分类器参数估计和识别计算量都带来很大的问题。根据一般的分类器设计的经验,对训练样本数的要求是达到特征维数的10倍以上。为了减少过高的特征维数和训练样本的相对不足给分类器设计和参数估计带来的困难,本发明利用LDA方法对高维的原始特征进行了压缩。The increase of the feature dimension and the shortage of training samples will bring great problems to the estimation of classifier parameters and the calculation of recognition. According to the general classifier design experience, the requirement for the number of training samples is to reach more than 10 times the feature dimension. In order to reduce the difficulties brought about by the excessively high feature dimension and the relative shortage of training samples to classifier design and parameter estimation, the present invention utilizes the LDA method to compress high-dimensional original features.
设字符类别数为c(在藏文字符识别中c=592),第ω类字符的训练样本数为Oω,ω=1,2,…,c,则对第该字符类别的训练样本采用上述方法提取四方向线素特征后,得到特征向量集合为
首先计算每个字符类ω(1≤ω≤c)特征向量的中心μω和所有字符类的特征向量的中心μFirst calculate the center μ of the eigenvectors of each character class ω (1≤ω≤c) ω and the center μ of the eigenvectors of all character classes
然后计算类间散度矩阵Sb和平均类内散度矩阵Sw Then calculate the between-class scatter matrix S b and the average within-class scatter matrix S w
寻找变换矩阵Φ,使得tr[(ΦTSwΦ)-1(ΦTSbΦ)]达到最大,从而使模式类内散度方差与类间散度方差的比值达到最大以增加各模式类别间的可分性。Find the transformation matrix Φ, so that tr[(Φ T S w Φ) -1 (Φ T S b Φ)] reaches the maximum, so that the ratio of the variance of the divergence within the class to the variance of the divergence between the classes is maximized to increase the Separability between categories.
用矩阵计算工具计算矩阵Sw -1Sb的前
4.分类器设计4. Classifier design
分类器设计是字符识别的核心技术之一,研究者针对不同的问题提出了许多模式分类器。但在多种因素制约下,目前在处理大字符集识别问题时,往往还是选择最小距离分类器。本发明采用基于置信度分析的粗、细两级分类策略(图13)来完成输入待识别藏文字符所属类别的判断。Classifier design is one of the core technologies of character recognition, and researchers have proposed many pattern classifiers for different problems. However, under the constraints of many factors, when dealing with the recognition of large character sets, the minimum distance classifier is often selected. The present invention adopts a coarse and fine classification strategy (Fig. 13) based on confidence analysis to complete the judgment of the category of the input Tibetan characters to be recognized.
4.1粗分类4.1 Rough classification
粗分类的目的是在一个大的字符集中快速选出一个数目相对很小的候选字子集,并保证候选集中包含待识别字符所属正确类别的概率尽可能大。这就要求粗分类器结构简单、运算速度快。为此,本发明设计了一种带偏差的欧氏距离(EDD)分类器。The purpose of rough classification is to quickly select a relatively small subset of candidate characters from a large character set, and to ensure that the probability of the correct category of characters to be recognized in the candidate set is as high as possible. This requires the coarse classifier to be simple in structure and fast in operation. For this reason, the present invention designs a kind of Euclidean distance (EDD) classifier with deviation.
令Y=(y1,y2,…,yd)T为输入未知字符的d维特征向量,Yω=(yω 1,yω 2,…,yω d)T为第ω类字符的标准特征向量,带偏差的欧氏距离定义如下:Let Y=(y 1 ,y 2 ,...,y d ) T be the d-dimensional feature vector of the input unknown character, Y ω =(y ω 1 ,y ω 2 ,...,y ω d ) T is the ωth class character The standard eigenvector of , the Euclidean distance with bias is defined as follows:
式中In the formula
其中,σω k是第ω类字符特征向量的第k个分量的均方差,θω,γω为与ω相关的常数,C为与字符类别无关的常量。上式的一个最重要的特性是在欧氏距离中引入了字符特征的二阶统计量,这使得分类器对特征在空间上的分布具有一定的刻画能力。Among them, σ ω k is the mean square error of the kth component of the ω-th character feature vector, θ ω and γ ω are constants related to ω, and C is a constant independent of the character category. One of the most important features of the above formula is that the second-order statistics of character features are introduced in the Euclidean distance, which makes the classifier have a certain ability to describe the distribution of features in space.
4.2细分类4.2 Subdivision
贝叶斯分类器是理论上最优的统计分类器,在处理实际问题时,人们希望尽量去逼近它。当在字符的特征为高斯分布且各类特征分布的先验概率相等的条件下,贝叶斯分类器简化为马氏距离分类器。但该条件在实际中通常不易满足,而且马氏距离分类器的性能随着协方差矩阵估计误差的产生而严重劣化。本发明采用MQDF(修正二次鉴别函数)作为细分类度量,它是马氏距离的一个变形。MQDF鉴别函数形式为:Bayesian classifier is the optimal statistical classifier in theory, and when dealing with practical problems, people hope to approach it as much as possible. Under the condition that the character features are Gaussian distribution and the prior probabilities of various feature distributions are equal, the Bayesian classifier is simplified to a Mahalanobis distance classifier. But this condition is usually not easy to meet in practice, and the performance of the Mahalanobis distance classifier is seriously degraded with the generation of covariance matrix estimation error. The present invention adopts MQDF (Modified Quadratic Discriminant Function), which is a deformation of the Mahalanobis distance, as the subdivision measure. The form of the MQDF discriminant function is:
其中λωl和φωl分别为第ω类样本的协方差矩阵∑ω的第l个特征值和特征向量,K表示所截取的主本征向量的个数,也是模式类的主子空间维数,其最优值由实验确定,h2是对小本征值的实验估计。MQDF产生的是二次判决曲面,因只需估计每个类别协方差阵的前K个主本征向量,避免了小本征值估计误差的负面影响。MQDF鉴别距离可以看作是在K维主子空间内的马氏距离和剩余的(d-K)维空间内的欧氏距离的加权和,加权因子为1/h2。Among them, λωl and φωl are respectively the lth eigenvalue and eigenvector of the covariance matrix Σω of the ωth sample, and K represents the number of intercepted main eigenvectors, which is also the main subspace dimension of the pattern class, Its optimal value is determined experimentally, and h2 is an experimental estimate of the small eigenvalue. MQDF produces a quadratic decision surface, because it only needs to estimate the first K principal eigenvectors of each category covariance matrix, avoiding the negative impact of small eigenvalue estimation errors. The MQDF discrimination distance can be regarded as the weighted sum of the Mahalanobis distance in the K-dimensional main subspace and the Euclidean distance in the remaining (dK)-dimensional space, and the weighting factor is 1/h 2 .
4.3置信度计算4.3 Confidence Calculation
设粗分类器的输出候选集为CanSet={(e1,D1),(e2,D2)…,(eL,DL)},k为候选集容量,ek和Dk分别为候选字符和对应的粗分类距离,D1≤D2≤…≤DL。细分类器的作用是根据重新计算的鉴别距离对CanSet进行再排序,找到输入字符所属的最可能的类别。如果粗分类结果的可靠性相当高,换言之,若e1已经为输入字符的正确分类时,则细分类完全没必要进行。本发明对候选集CanSet进行置信度分析以决定是否需要进行细分类,采用EDD输出的距离作为度量,依下式计算置信度:Let the output candidate set of the rough classifier be CanSet={(e 1 , D 1 ), (e 2 , D 2 )..., (e L , D L )}, k is the capacity of the candidate set, e k and D k are respectively is the candidate character and the corresponding rough classification distance, D 1 ≤D 2 ≤...≤D L . The role of the fine classifier is to reorder the CanSet according to the recalculated discriminative distance to find the most probable category to which the input character belongs. If the reliability of the rough classification result is quite high, in other words, if e 1 is already the correct classification of the input character, then fine classification is completely unnecessary. The present invention analyzes the confidence of the candidate set CanSet to determine whether subdivision is required, and uses the distance output by EDD as a measure to calculate the confidence according to the following formula:
当置信度低于一定的阈值ConfTH时,将CanSet送入细分类器处理,否则直接输出CanSet。本发明的特征在于,它是一种能够识别多种字体和多种字号的印刷体藏文字符识别技术。它依次含有以下步骤:When the confidence is lower than a certain threshold Conf TH , the CanSet is sent to the fine classifier for processing, otherwise the CanSet is output directly. The present invention is characterized in that it is a printed Tibetan character recognition technology capable of recognizing multiple fonts and multiple font sizes. It contains the following steps in order:
它首先对输入的单个藏文字符进行适当的位置归一化和大小归一化,以最大限度地消除输入字符因字号和字体的不同而造成的形状、姿态等方面的差异,然后提取能很好反映藏文字符结构特点的四方向线素特征,在此基础上,利用LDA变换提取最具鉴别性的特征以降低特征维数,把变换后特征送入基于识别置信度分析的粗、细两级分类器判定字符所属类别。在由图像采集设备和计算机组成的系统中,它依次含有以下步骤:It firstly performs proper position normalization and size normalization on the input single Tibetan character, so as to eliminate the difference in shape, posture, etc. of the input character due to the difference in size and font, and then extracts The four-direction line element feature that reflects the structural characteristics of Tibetan characters is good. On this basis, the LDA transformation is used to extract the most discriminative features to reduce the feature dimension, and the transformed features are sent to the coarse and fine lines based on the recognition confidence analysis. A two-stage classifier determines the class to which a character belongs. In a system consisting of an image acquisition device and a computer, it contains the following steps in sequence:
1.字符样本的采集1. Collection of character samples
扫描输入印有多字体多字号藏文字符的文本,利用已有算法进行去除噪声、二值化等必要预处理后,将藏文文本进行切分以分离单个字符,对每个字符的图像标定其对应的正确的字符的内码,由此完成用以训练和测试的藏文字符单字样本的采集,建立训练样本数据库。Scan and input the text printed with multi-font and multi-size Tibetan characters, use the existing algorithm to perform necessary preprocessing such as noise removal, binarization, etc., segment the Tibetan text to separate individual characters, and calibrate the image of each character The internal code of the correct character corresponding to it completes the collection of Tibetan character samples for training and testing, and establishes a training sample database.
2.归一化处理,包含字符位置和大小的线性归一化2. Normalization processing, including linear normalization of character position and size
2.1定位单个藏文字符的基线位置2.1 Locate the baseline position of a single Tibetan character
设原始字符图像为[F(i,j)]W×H,其中W为图像宽度,H为图像高度,图像位于第i行第j列的象素点的值为F(i,j),i=1,2,…,H,j=1,2,…,WLet the original character image be [F(i, j)] W×H , wherein W is the image width, H is the image height, and the value of the pixel point in the i-th row and j-column of the image is F(i, j), i=1, 2, ..., H, j = 1, 2, ..., W
由下式计算字符图像的水平投影V(i),i=1,2,…,H:Calculate the horizontal projection V(i) of the character image by the following formula, i=1, 2,..., H:
则基线的位置PL为:Then the position PL of the baseline is:
2.2以基线为分界点将输入图像分离成两个子图像2.2 Separate the input image into two sub-images with the baseline as the dividing point
[F(i,j)]W×H可以看作两个子图像[F1(i,j)]W×H1、[F2(i,j)]W×H2的纵向拼接其中[F1(i,j)]W×H1为基线以上部分,即上元音部分;[F2(i,j)]W×H2为基线以下部分。两者没有交叠而是纵向组合在一起合成[F(i,j)]W×H,且H1+H2=H[F(i, j)] W×H can be regarded as the longitudinal splicing of two sub-images [F 1 (i, j)] W×H1 , [F 2 (i, j)] W×H2 where [F 1 ( i, j)] W×H1 is the part above the baseline, that is, the upper vowel part; [F 2 (i, j)] W×H2 is the part below the baseline. The two do not overlap but are combined vertically to form [F(i,j)] W×H , and H 1 +H 2 =H
对应的,归一化后的目标字符图像[G(i,j)]M×N也可以看作两个子图像[G1(i,j)]M×N1、[G2(i,j)]M×N2的纵向拼接其中M为目标图像的宽度,N为图像高度。[G1(i,j)]M×N1为基线以上部分图像,即上元音部分;[G2(i,j)]M×N2为基线以下部分。两者也没有交叠而是纵向组合成[G(i,j)]M×N,且设定N1=N/4,N2=3N/4。Correspondingly, the normalized target character image [G(i, j)] M×N can also be regarded as two sub-images [G 1 (i, j)] M×N1 , [G 2 (i, j) ] M×N2 vertical splicing where M is the width of the target image, and N is the image height. [G 1 (i, j)] M×N1 is the part of the image above the baseline, that is, the upper vowel part; [G 2 (i, j)] M×N2 is the part below the baseline. The two are not overlapped but vertically combined into [G(i, j)] M×N , and N 1 =N/4, N 2 =3N/4 are set.
2.3位置归一化参考点Uk(uIk,uJk),k=1,2的选择2.3 Selection of position normalized reference point U k (u Ik , u Jk ), k=1, 2
[Fk(i,j)]W×Hk,k=1,2重心和外边框中心分别为Ak(aIk,aJk),k=1,2和Bk(bIk,bJl),k=1,2其中[F k (i, j)] W×Hk , k=1, 2 The center of gravity and the center of the outer frame are A k (a Ik , a Jk ), k=1, 2 and B k (b Ik , b Jl ) , k=1, 2 where
令Uk(uIk,uJk),k=1,2为介于Ak(aIk,aJk),k=1,2与Bk(bIk,bJk),k=1,2之间的一点,即:Let U k (u Ik , u Jk ), k=1, 2 be between A k (a Ik , a Jk ), k=1, 2 and B k (b Ik , b Jk ), k=1, 2 A point between, that is:
其中β为常数且0≤β≤1。Where β is a constant and 0≤β≤1.
移动输入图像点阵,使该参考点位于目标点阵[Gk(i,j)]M×Nk,k=1,2的几何中心,从而完成输入字符的位置归一化Move the input image lattice so that the reference point is located at the geometric center of the target lattice [G k (i, j)] M×Nk , k=1, 2, thereby completing the position normalization of the input characters
2.4大小归一化2.4 Size normalization
因[Fk(i,j)]W×Hk,k=1,2与[Gk(i,j)]M×Nk,k=1,2之间的关系为Gk(i,j)=Fk(i/ri,j/rj),k=1,2其中ri和rj分别为i和j方向的尺度变换因子:ri=Nk/Hk,rj=M/W。故采用三次B样条函数进行插值运算,以减少归一化后字符出现诸如阶梯状边缘等畸变。对于给定(i,j),令:Because the relationship between [F k (i, j)] W×Hk , k=1, 2 and [G k (i, j)] M×Nk , k=1, 2 is G k (i, j) =F k (i/r i , j/r j ), k=1, 2 where r i and r j are scale transformation factors in i and j directions respectively: r i =N k /H k , r j =M /W. Therefore, the cubic B-spline function is used for interpolation operation to reduce the distortion of characters such as stepped edges after normalization. For a given (i, j), let:
其中: [·]为取整函数。插值过程可表示为:in: [·] is rounding function. The interpolation process can be expressed as:
式中的RB(z)为三次B样条函数:R B (z) in the formula is a cubic B-spline function:
其中W(z)为阶跃函数, where W(z) is a step function,
3.提取藏文字符的四方向线素特征3. Extract the four-directional line element features of Tibetan characters
3.1字符轮廓提取3.1 Character outline extraction
扫描整个字符点阵,对于某个位置的黑象素,根据它的8邻域中的象素分布情况决定是否保留该黑象素。这样,可以得到归一化后的字符图像[G(i,j)]M×N的轮廓图像[G′(i,j)]M×N。Scan the entire character dot matrix, and for a black pixel at a certain position, decide whether to keep the black pixel according to the distribution of pixels in its 8 neighbors. In this way, the normalized character image [G(i, j)] M×N contour image [G′(i, j)] M×N can be obtained.
3.2方向线素特征的提取3.2 Extraction of direction line features
首先,对于字符轮廓点阵[G′(i,j)]M×N中的每一个黑象素(i,j),根据它与相邻的另外两个黑象素的之间的位置关系,赋予它横(0°)、竖(90°)、撇(45°)、捺(135°)四种线素。并记为一个4维向量X(i,j)=(xv,xk,xp,xo)T。First, for each black pixel (i, j) in the character outline lattice [G′(i, j)] M×N , according to the positional relationship between it and the other two adjacent black pixels , endow it with four line elements: horizontal (0°), vertical (90°), left (45°), and right (135°). And recorded as a 4-dimensional vector X(i, j)=(x v , x k , x p , x o ) T .
将整个大小为M×N的字符轮廓图像[G′(i,j)]M×N均匀划分为
整个子区域的方向线素特征向量XS=(xv,xk,xp,xo)T由该子区域中各方块特征向量的加权和来表示:The direction line element feature vector X S of the whole sub-area = (x v , x k , x p , x o ) T is represented by the weighted sum of the feature vectors of each block in the sub-area:
XS=αAXA+αBXB+αCXC+αDXD这样,从每个子区域都可以得到一个4维特征向量后,将所有子区域的特征向量按顺序排列在一起组成的表示输入字符的
4.特征变换4. Feature transformation
设字符类别数为c,第ω类字符的训练样本数为Oω,ω=1,2,…,c,则对第该字符类别的训练样本采用上述方法提取四方向线素特征后,得到特征向量集合为
利用LDA变换对原始特征压缩如下Use LDA transformation to compress the original features as follows
首先计算每个字符类ω(1≤ω≤c)特征向量的中心μω、所有字符类的特征向量的中心μ、类间散度矩阵Sb和平均类内散度矩阵Sw First calculate the center μ ω of the eigenvectors of each character class ω (1≤ω≤c), the center μ of the eigenvectors of all character classes, the inter-class scatter matrix S b and the average intra-class scatter matrix S w
寻找变换矩阵Φ,使得tr(ΦTSwΦ)-1(ΦTSbΦ)]达到最大,则LDA相应的特征变换为Y=ΦTX,这里Y是最具判别性的d维特征。Find the transformation matrix Φ such that tr(Φ T S w Φ) -1 (Φ T S b Φ)] reaches the maximum, then the corresponding feature transformation of LDA is Y=Φ T X, where Y is the most discriminative d-dimensional feature.
5.对输入字符所属类别的判断,即对未知类别的字符图像,提取特征,与识别库中已有的数据进行比较,以确定其正确的字符代码。5. Judging the category of the input character, that is, extracting features from the character image of an unknown category, and comparing it with the existing data in the recognition database to determine its correct character code.
5.1设计分类器5.1 Design Classifier
对由LDA压缩得到的特征向量Y,计算各字符的均值向量
其中每个藏文字符类别ω(1≤ω≤c)的特征集合为
5.2分类判决5.2 Classification Judgment
对未知类别的输入字符图像,首先进行位置归一化和大小归一化处理,再提取四方向线素特征X,利用LDA线性变换矩阵Φ将原始方向线素特征X变换成Y=ΦTX=(y1,y2,…,yd)T,d是变换后特征的维数。For an input character image of an unknown category, first perform position normalization and size normalization processing, and then extract the four-directional linear feature X, and use the LDA linear transformation matrix Φ to transform the original direction linear feature X into Y=Φ T X =(y 1 , y 2 ,...,y d ) T , where d is the dimension of the transformed feature.
从库文件中读取所有字符类的均值向量
其中in
将所有经过计算的 ω=1,2,…,c按照由小到大的顺序重新排序,选出前L(1≤L≤c)个距离及其所代表的字符类别码ek,k=1,2,…,L组成粗分类候选集CanSet={(e1,D1),(e2,D2)…,(eL,DL)},D1≤D2≤…≤DL。all calculated ω=1, 2,..., c are reordered according to the order from small to large, and the first L (1≤L≤c) distances and the character category codes e k represented by them are selected, k=1, 2,... , L constitutes a rough classification candidate set CanSet={(e 1 , D 1 ), (e 2 , D 2 )..., (e L , D L )}, D 1 ≤D 2 ≤...≤D L .
计算CanSet中首字符的识别置信度Conf(CanSet)Calculate the recognition confidence Conf(CanSet) of the first character in CanSet
若Conf(CanSet)高于一定的阈值ConfTH,直接将(e1,D1)作为输入字符的识别结果输出,即认为输入字符属于e1所对应的字符类别,且识别距离是D1。否则,计算Y到CanSet中各内码所对应的字符类别的MQDF鉴别距离 ω=1,2,…,LIf Conf(CanSet) is higher than a certain threshold Conf TH , directly output (e 1 , D 1 ) as the recognition result of the input character, that is, the input character is considered to belong to the character category corresponding to e 1 , and the recognition distance is D 1 . Otherwise, calculate the MQDF discrimination distance from Y to the character category corresponding to each inner code in CanSet ω=1, 2, ..., L
若
实验证明,本发明在多字体多字号印刷体藏文单字测试集上的识别正确率达到99.83%,对实际文本的识别率也可达99%以上。Experiments have proved that the present invention has a recognition accuracy rate of 99.83% on the multi-font and multi-size printed Tibetan single-character test set, and the recognition rate of the actual text can reach more than 99%.
附图说明Description of drawings
图1一个典型的藏文字符识别系统的硬件构成。Figure 1 shows the hardware configuration of a typical Tibetan character recognition system.
图2藏文单字样本的生成。Figure 2 Generation of Tibetan word samples.
图3藏文字符识别系统的构成。Figure 3 The composition of the Tibetan character recognition system.
图4采用的图像坐标系示意。Figure 4 shows the image coordinate system used.
图5字符归一化流程Figure 5 Character normalization process
图6字符归一化示例Figure 6 Example of character normalization
图7方向线素特征提取流程。Figure 7 is the process of directional line element feature extraction.
图8归一化后字符及其轮廓。Figure 8 Normalized characters and their outlines.
图9四方向线素特征中的横、竖、撇、捺四种方向属性。The four direction attributes of horizontal, vertical, left and right in the four-direction line feature in Figure 9.
图10图像子区域的划分方法。Fig. 10 The division method of image sub-regions.
图11构成子区域的小方块示意。Figure 11 is a schematic diagram of the small blocks that constitute sub-regions.
图12LDA特征变换流程图。Figure 12 LDA feature transformation flow chart.
图13分类策略Figure 13 Classification strategy
图14基于本算法的多字体多字号印刷藏文字符识别系统。Figure 14 is a multi-font and multi-size printed Tibetan character recognition system based on this algorithm.
图15多字体印刷藏文(混排汉英)文档识别系统Figure 15 Multi-font printing Tibetan (mixed Chinese-English) document recognition system
具体实施方式Detailed ways
如图1所示,一个印刷体藏文字符识别系统在硬件上由两部分构成:图像采集设备和计算机。图像采集设备一般是扫描仪,用来获取藏文字符的数字图像。计算机用于对数字图像进行处理,并进行判决分类。As shown in Figure 1, a printed Tibetan character recognition system consists of two parts in terms of hardware: image acquisition equipment and a computer. The image capture device is generally a scanner, which is used to obtain digital images of Tibetan characters. Computers are used to process digital images and make judgments and classifications.
图2所示的是训练藏文单字样本和测试藏文单字样本的生成过程。对于一篇印刷体藏文样张,首先通过扫描仪将其扫入计算机,使之变为数字图像。对数字图像二值化、去除噪声等预处理措施,得到二值化的图像。再对输入图像进行行切分,得到文本行,在此基础上对每一个文本行进行字切分,得到单个藏文字符,然后标定每个字符图像所属的字符类别。此后,要进行一次检查,对行、字切分阶段和字符类别标定阶段产生的错误采用手动方式改正。最后,将相同的字符类别对应的原始字符图像提取出来,并保存,完成藏文单字样本的采集。Figure 2 shows the generation process of training Tibetan single-character samples and testing Tibetan single-character samples. As for a printed Tibetan sample, firstly, it is scanned into a computer through a scanner to turn it into a digital image. Binarize digital images, remove noise and other preprocessing measures to obtain binarized images. Line segmentation is then performed on the input image to obtain text lines, and on this basis, word segmentation is performed on each text line to obtain a single Tibetan character, and then the character category to which each character image belongs is calibrated. Thereafter, a check is to be carried out, and the errors generated in the line and character segmentation stages and the character category calibration stages are manually corrected. Finally, the original character images corresponding to the same character category are extracted and saved to complete the collection of Tibetan single character samples.
如图3所示,印刷体藏文字符识别算法分为两个部分:训练系统和测试系统。训练系统中,对输入的藏文单字训练样本集中的每一个样本,恰当地进行归一化处理,提取反映其组成信息的四方向线素特征,利用LDA对特征进行变换,降低原始特征维数,然后,采用合适的分类器,训练分类器,得到特征库文件。在测试系统中,对输入的未知类别字符图像,采用和训练系统同样的归一化和特征提取方法,并用训练系统得到的变换矩阵对特征进行变换,然后送入分类器进行分类,判断输入字符所属的类别。As shown in Figure 3, the printed Tibetan character recognition algorithm is divided into two parts: the training system and the testing system. In the training system, each sample in the input Tibetan single-character training sample set is properly normalized, and the four-directional line element feature reflecting its composition information is extracted, and the feature is transformed by LDA to reduce the dimension of the original feature , and then, adopt a suitable classifier, train the classifier, and obtain the feature library file. In the test system, the same normalization and feature extraction methods as the training system are used for the input character images of unknown categories, and the transformation matrix obtained by the training system is used to transform the features, and then sent to the classifier for classification to judge the input characters category to which it belongs.
因而,实用的多字体多字号印刷体藏文字符识别系统的实现需要考虑如下几个方面:Therefore, the realization of a practical multi-font and multi-size printed Tibetan character recognition system needs to consider the following aspects:
A)藏文字符单字样本的获取;A) Acquisition of Tibetan character samples;
B)训练系统的实现;B) Implementation of the training system;
C)测试系统的实现。C) Implementation of the test system.
下面分别对这三个方面进行详细介绍。These three aspects are described in detail below.
A)藏文字符单字本的获取A) Acquisition of Tibetan character monographs
印刷体藏文单字样本的获取过程如图2所示。输入的一篇纸质印刷体藏文文档通过扫描仪得到数字图像,输入计算机。然后对该图像进行噪声去除、二值化等预处理措施。利用各种虑波方法去除噪声在现有文献中已经有大量记载。二值化方法可采用已有的全局二值化或局部自适应二值化。接着对文档进行版面分析,得到字符区域。对字符区域分别利用水平投影直方图和垂直投影直方图进行行切分和字切分得到单个字符。在此阶段的切分错误采用手动的方式进行更正。对得到的单个藏文字符的类别进行标定,一般采用计算机自动标定,对其中的错误进行人工处理(更该、删除等)。最后,把具有相同内码的字符所对应的不同字体、不同字号的原始字符图像保存起来,就得到了多字体多字号印刷体藏文单字样本。The process of obtaining printed Tibetan individual character samples is shown in Figure 2. A paper-based printed Tibetan document is input into a digital image through a scanner and input into a computer. Then, preprocessing measures such as noise removal and binarization are performed on the image. The use of various filtering methods to remove noise has been extensively documented in the existing literature. The binarization method can adopt the existing global binarization or local adaptive binarization. Then, the layout analysis is performed on the document to obtain the character area. Use the horizontal projection histogram and the vertical projection histogram to perform line segmentation and word segmentation on the character area to obtain a single character. Segmentation errors at this stage are corrected manually. To calibrate the category of the individual Tibetan characters obtained, the automatic calibration of the computer is generally used, and the errors in it are manually processed (corrected, deleted, etc.). Finally, the original character images of different fonts and different font sizes corresponding to the characters with the same internal code are saved, and a multi-font and multi-size printed Tibetan single character sample is obtained.
B)训练系统的实现B) Implementation of training system
B.1字符归一化B.1 Character normalization
B.1.1位置归一化B.1.1 Position normalization
设原始字符图像为[F(i,j)]W×H,图像宽度为W,高度为H,图像位于第i行第j列的象素点的值为F(i,j),i=1,2,…,H,j=1,2,…,W。[F(i,j)]W×H可以看作由两个子图像——基线以上部分[F1(i,j)]W×H1和基线以下部分[F2(i,j)]W×H2的纵向拼接而成,H1+H2=H。设字符图像的水平投影为V(i),i=1,2,…,H,可由下式计算:Suppose that the original character image is [F(i, j)] W×H , the image width is W, and the height is H, and the value of the pixel point in the i-th row and j-column of the image is F(i, j), i= 1, 2, ..., H, j=1, 2, ..., W. [F(i, j)] W×H can be seen as consisting of two sub-images—a part above the baseline [F 1 (i, j)] W×H1 and a part below the baseline [F 2 (i, j)] W× H2 is spliced longitudinally, H 1 +H 2 =H. Let the horizontal projection of the character image be V(i), i=1, 2, ..., H, which can be calculated by the following formula:
则基线所在位置的纵坐标值PI为:Then the ordinate value P I of the position of the baseline is:
根据PI和字符顶部的纵坐标的值就可以确定H1,而在本发明所采用的坐标系(图4)中,H1在数值上等于PI。H 1 can be determined according to the value of P I and the vertical coordinate of the top of the character, and in the coordinate system ( FIG. 4 ) adopted by the present invention, H 1 is numerically equal to P I .
设归一化后字符图像为[G(i,j)]M×N,图像宽度为M,高度为N,图像位于第i行第j列的象素点的值为G(i,j),i=1,2,…,N,j=1,2,…,M。同样的,[G(i,j)]M×N也可以看作两个子图像——基线以上部分[G1(i,j)]M×N1和基线以下部分[G2(i,j)]M×N2的纵向拼接而成,此处设定N1=N/4,N2=3N/4。这样,归一化可以看成是将输入图像点阵[F1(i,j)]W×H1、[F2(i,j)]W×H2分别映射成目标图像点阵[G1(i,j)]M×N1、[G2(i,j)]M×N2的处理过程。在此过程中,选定输入图像点阵[Fk(i,j)]W×Hk,k=1,2中的参考点Uk(uIk,uJk),k=1,2,移动输入图像点阵,使该参考点,位于目标点阵[Gk(i,j)]M×Nk,k=1,2的中心,从而完成输入字符的位置归一化。Let the character image after normalization be [G(i, j)] M×N , the image width is M, the height is N, and the value of the pixel point in the i-th row and j-th column of the image is G(i, j) , i=1, 2,..., N, j=1, 2,..., M. Similarly, [G(i, j)] M×N can also be regarded as two sub-images - the part above the baseline [G 1 (i, j)] M×N1 and the part below the baseline [G 2 (i, j) ] M×N2 vertical splicing, where N 1 =N/4 and N 2 =3N/4 are set. In this way, normalization can be regarded as mapping the input image lattice [F 1 (i, j)] W×H1 and [F 2 (i, j)] W×H2 into the target image lattice [G 1 ( i, j)] M×N1 , [G 2 (i, j)] M×N2 processing. In this process, select the reference point U k (u Ik , u Jk ) in the input image lattice [F k (i, j)] W×Hk , k =1, 2 , k=1, 2, move The image lattice is input, and the reference point is located at the center of the target lattice [G k (i, j)] M×Nk , k=1, 2, thereby completing the position normalization of the input characters.
令[Fk(i,j)]W×Hk,k=1,2重心和外边框几何的中心分别为Ak(aIk,aJk),k=1,2和Bk(bIk,bJk),k=1,2,则有:Let [F k (i, j)] W×Hk , k=1, 2 center of gravity and the center of outer frame geometry be A k (a Ik , a Jk ), k=1, 2 and B k (b Ik , b Jk ), k=1, 2, then:
令Uk(uIk,uJk),k=1,2为介于Ak(aIk,aJk),k=1,2与Bk(bIk,bJk),k=1,2之间的一点,即:Let U k (u Ik , u Jk ), k=1, 2 be between A k (a Ik , a Jk ), k=1, 2 and B k (b Ik , b Jk ), k=1, 2 A point between, that is:
其中β为常数且0≤β≤1。Where β is a constant and 0≤β≤1.
B.1.2大小归一化B.1.2 Size normalization
考察输入字符图像[Fk(i,j)]W×Hk,k=1,2与归一化后目标字符点阵为[Gk(i,j)]M×Nk,k=1,2之间的关系可知:Investigate the input character image [F k (i, j)] W×Hk , k=1, 2 and the target character lattice after normalization is [G k (i, j)] M×Nk , k=1, 2 The relationship between can be seen:
Gk(i,j)=Fk(i/ri,j/rj),k=1,2G k (i, j) = F k (i/r i , j/r j ), k=1, 2
其中ri和rj分别为i和j方向的尺度变换因子:ri=Nk/Hk,rj=M/W。根据上式,输出图像点阵中的点(i,j)对应于输入字符中的点(i/ri,j/rj)。Fk(i,j)为离散函数,而i/ri、j/rj的取值一般不为整数,故需要根据Fk中已知的离散点处的值来估计其在(i/ri,j/rj)处的取值。采用三次B样条函数进行插值运算,以减少归一化后字符出现畸变。对于给定(i,j),令:Where r i and r j are scaling factors in directions i and j respectively: r i =N k /H k , r j =M/W. According to the above formula, the point (i, j) in the output image lattice corresponds to the point (i/r i , j/r j ) in the input character. F k (i, j) is a discrete function, and the values of i/r i and j/r j are generally not integers, so it is necessary to estimate its value in (i/ r i , the value at j/r j ). The cubic B-spline function is used for interpolation operation to reduce the distortion of characters after normalization. For a given (i, j), let:
其中: [·]为取整函数。插值过程可表示为:in: [·] is rounding function. The interpolation process can be expressed as:
式中的RB(z)为三次B样条函数:R B (z) in the formula is a cubic B-spline function:
其中W(z)为阶跃函数, where W(z) is a step function,
B.2方向线素特征提取B.2 Directional line element feature extraction
B.2.1取字符的轮廓B.2.1 Take the outline of the character
扫描整个字符点阵,对于某个位置的黑象素,如果它的8邻域中黑象素个数和白象素个数均大于0,则保留该黑象素,否则将字符点阵在该位置的值改为0。这样,可以从归一化后的字符图像[G(i,j)]M×N的轮廓图像[G′(i,j)]M×N。Scan the entire character lattice, for a black pixel at a certain position, if the number of black pixels and the number of white pixels in its 8 neighbors are both greater than 0, then keep the black pixel, otherwise, place the character lattice at this position value to 0. In this way, the normalized character image [G(i, j)] M×N contour image [G′(i, j)] M×N can be obtained.
B.2.2分块和特征矢量的构成B.2.2 Composition of blocks and feature vectors
对于字符轮廓点阵[G′(i,j)]M×N中的每一个黑象素,根据它与相邻的另外两个黑象素的位置关系,赋予它横(0°)、竖(90°)、撇(45°)、捺(135°)四种线素。考虑两种情况:一种是3个黑象素在同一直线上,则只给该中心象素分配一种线素特征并且赋值为2;另一种3个黑象素不在同一直线上,那么就同时给中心象素分配两种线素特征并分别赋值为1。按照这些原则对字符点阵中的各黑象素的进行线素特征的分配,对每个黑象素点(i,j),都可以得到一个4维向量X(i,j)=(xv,xk,xp,xo)T,其分量分别表示该黑象素点处的4种线素数量。For each black pixel in the character outline lattice [G′(i, j)] M×N , according to its positional relationship with the other two adjacent black pixels, it is given horizontal (0°), vertical (90°), skimming (45°), and pressing (135°) four lines. Consider two situations: one is that 3 black pixels are on the same straight line, then only assign a line pixel feature to the central pixel and assign a value of 2; the other is that the 3 black pixels are not on the same straight line, then Just assign two kinds of line element features to the center pixel at the same time and assign a value of 1 respectively. According to these principles, each black pixel in the character dot matrix is distributed with line features, and for each black pixel point (i, j), a 4-dimensional vector X (i, j)=(x v , x k , x p , x o ) T , and their components respectively represent the four kinds of line pixel quantities at the black pixel point.
完成上述工作以后,将M×N的点阵均匀划分成宽为M0、高为N0的子区域,每个子区域跟相邻的子区域之间在水平方向有M0/2、在垂直方向上有N0/2个象素的重合,故子区域的总个数为
而整个子区域的方向线素特征向量XS=(xv,xk,xp,xo)T表示为该子区域中各方块特征向量的加权和,即:And the direction line element feature vector X S =(x v , x k , x p , x o ) T of the whole sub-area is expressed as the weighted sum of the feature vectors of each block in the sub-area, namely:
XS=αAXA+αBXB+αCXC+αDXD X S =α A X A +α B X B +α C X C +α D X D
其中αA,αB,αC,αD为介于0和1之间的常数,它们刻画了不同方块内的特征向量对本子区域总体特征向量的贡献的重要程度。这样,从每个子区域都可以得到一个4维特征向量后,将所有子区域的特征向量按顺序排列在一起组成的
B.3特征变换B.3 Feature Transformation
设字符类别数为c(在藏文字符识别中c=592),第ω类字符的训练样本数为Oω,ω=1,2,…,c,原始方向线素特征向量集合为
首先计算每个字符类ω(1≤ω≤c)特征向量的中心μω、所有字符类的特征向量的中心μ、类间散度矩阵Sb和平均类内散度矩阵Sw First calculate the center μ ω of the eigenvectors of each character class ω (1≤ω≤c), the center μ of the eigenvectors of all character classes, the inter-class scatter matrix S b and the average intra-class scatter matrix S w
寻找变换矩阵Φ,使得tr[(ΦTSwΦ)-1(ΦTSbΦ)]达到最大,从而使模式类内散度方差与类间散度方差的比值达到最大以增加各模式类别间的可分性。Find the transformation matrix Φ, so that tr[(Φ T S w Φ) -1 (Φ T S b Φ)] reaches the maximum, so that the ratio of the variance of the divergence within the class to the variance of the divergence between the classes is maximized to increase the Separability between categories.
用矩阵计算工具计算矩阵Sw -1Sb的前
B.4设计分类器B.4 Designing Classifiers
对经LDA变换得到特征向量Y,计算各字符的均值向量
其中每个藏文字符类别ω(1≤ω≤c)的最具可分性的特征集合为
C)测试系统的实现C) Implementation of the test system
对未知类别的输入字符图像,首先进行位置归一化和大小归一化处理,再提取四方向线素特征X,利用LDA线性变换矩阵Φ将原始方向线素特征X变换成Y=ΦTX=(y1,y2,…,yd)T,d是变换后特征的维数。For an input character image of an unknown category, first perform position normalization and size normalization processing, and then extract the four-directional linear feature X, and use the LDA linear transformation matrix Φ to transform the original direction linear feature X into Y=Φ T X =(y 1 , y 2 ,...,y d ) T , where d is the dimension of the transformed feature.
从库文件中读取所有字符类的均值向量
其中in
将所有经过计算的 ω=1,2,…,c按照由小到大的顺序重新排序,选出前L(1≤L≤c)个距离及其所代表的字符类别码ek,k=1,2,…,L组成粗分类候选集CanSet={(e1,D1),(e2,D2)…,(eL,DL)},D1≤D2≤…≤DL。all calculated ω=1, 2,..., c are reordered according to the order from small to large, and the first L (1≤L≤c) distances and the character category codes e k represented by them are selected, k=1, 2,... , L constitutes a rough classification candidate set CanSet={(e 1 , D 1 ), (e 2 , D 2 )..., (e L , D L )}, D 1 ≤D 2 ≤...≤D L .
计算CanSet中首字符的识别置信度Conf(CanSet)Calculate the recognition confidence Conf(CanSet) of the first character in CanSet
若Conf(CanSet)高于一定的阈值ConfTH,直接将(e1,D1)作为输入字符的识别结果输出,即认为输入字符属于e1所对应的字符类别,且识别距离是D1。否则,计算Y到CanSet中各内码所对应的字符类别的MQDF鉴别距离 ω=1,2,…,LIf Conf(CanSet) is higher than a certain threshold Conf TH , directly output (e 1 , D 1 ) as the recognition result of the input character, that is, the input character is considered to belong to the character category corresponding to e 1 , and the recognition distance is D 1 . Otherwise, calculate the MQDF discrimination distance from Y to the character category corresponding to each inner code in CanSet ω=1, 2, ..., L
若
实施例1:多字体多字号印刷体藏文字符识别系统基于本发明的多字体多字号印刷体藏文字符识别系统如图14a所示,实验在收集到的1200套印刷体藏文文档(每个文档包涵全部592个现代藏文字符)上进行的,这些样本文档大部分采自当今主要的印刷藏文出版系统(方正、华光),也有少量由TureType字体直接打印形成。字体不仅有最常用的白体、黑体和通用体,还包括圆体、长体、竹体,字号从六号到初号。样本质量不等,正常、断裂、粘连字符的比例约为2∶1∶1。经过扫描输入、行、字切分和内码标定等过程,将这1200套藏文文档转换为1200套单字样本(即每个字符类别有1200个单字样本),从中随机抽出900套组成训练集,其余300套留作测试样本。Embodiment 1: Multi-font and multi-size printed body Tibetan character recognition system Based on the multi-font and multi-font size printed body Tibetan character recognition system of the present invention, as shown in Figure 14a, the experiment was performed on 1200 sets of printed Tibetan documents collected (each This document contains all 592 modern Tibetan characters), and most of these sample documents are collected from today's major printed Tibetan publishing systems (Fangzheng, Huaguang), and a small amount are directly printed by TrueType fonts. The fonts include not only the most commonly used white body, black body and universal body, but also round body, long body and bamboo body, and the font sizes range from No. 6 to No. 1. The sample quality varies, and the ratio of normal, broken, and glued characters is about 2:1:1. After the process of scanning input, line and word segmentation and internal code calibration, these 1200 sets of Tibetan documents are converted into 1200 sets of single-character samples (that is, each character category has 1200 single-character samples), and 900 sets are randomly selected from them to form a training set , and the remaining 300 sets are reserved as test samples.
实验中,采用本发明的方法将每个藏文字符归一化为48×96的点阵,归一化参数β=0.5。四方向线素特征提取中子区域的如图10所示方式划分,取M0=N0=16,子区域中各方块的特征向量对整个子区域特征向量的加权系数αA,αB,αC,αD分别为0.4,0.3,0.2,0.1。按照图7所示的流程提取方向线素特征后,采用LDA线性变换进行特征压缩,变换后特征维数d选定为128(图14c)。粗分类器EDD中的参数θ1=θ2=…=θ592=0.8,γ1=γ2=…=γ592=2.2,C=20,粗分类置信度分析时采用阈值ConfTH=0.9,细分类器MQDF中的参数K=32(图14b),h2用各字符类的协方差阵的第K个本征值的均值作为估计值。在测试集上的实验结果如表1所示In the experiment, the method of the present invention is used to normalize each Tibetan character into a 48×96 lattice, and the normalization parameter β=0.5. The sub-regions are divided as shown in Figure 10 in the four-direction line element feature extraction, and M 0 =N 0 =16, the weighting coefficients α A , α B of the feature vectors of each block in the sub-region to the feature vector of the entire sub-region, α C , α D are 0.4, 0.3, 0.2, 0.1, respectively. After extracting the directional line element features according to the process shown in Figure 7, LDA linear transformation is used for feature compression, and the feature dimension d after transformation is selected as 128 (Figure 14c). Parameters in the coarse classifier EDD θ 1 =θ 2 =...=θ 592 =0.8, γ 1 =γ 2 =...=γ 592 =2.2, C=20, the threshold value Conf TH =0.9 is used in the rough classification confidence analysis, The parameter K=32 in the fine classifier MQDF (Fig. 14b), h 2 uses the mean value of the Kth eigenvalue of the covariance matrix of each character class as an estimated value. The experimental results on the test set are shown in Table 1.
表1系统在六种藏文字体测试样本集上的识别率
从表1可见,多字体多字号藏文字符的平均识别正确率达到99.83%,表明本发明所提的方法的有效性。It can be seen from Table 1 that the average recognition accuracy rate of multi-font and multi-size Tibetan characters reaches 99.83%, indicating the effectiveness of the method proposed in the present invention.
实施例2:多字体印刷藏文(混排汉英)文档识别系统Embodiment 2: multi-font printing Tibetan (mixed Chinese-English) document recognition system
多字体印刷藏文(混排汉英)文档识别系统的研究是为适应藏族地区办公自动化和促进中文多文种信息处理技术发展的需求而展开的,它的系统框图如图15所示。主要包括图像输入和预处理子系统、行字切分子系统、字符识别子系统和后处理子系统。本发明是字符识别子系统的主要组成部分,在汉字和英文识别核心的配合下对藏文占主体、夹杂一定汉字和英文、数字、符号的多字体印刷文档进行自动识别,将文档图像转换为计算机可“阅读”的文本。The research on multi-font printed Tibetan (mixed Chinese-English) document recognition system is carried out to meet the needs of office automation in Tibetan areas and to promote the development of Chinese multilingual information processing technology. Its system block diagram is shown in Figure 15. It mainly includes image input and preprocessing subsystem, line word cutting subsystem, character recognition subsystem and postprocessing subsystem. The present invention is the main component of the character recognition subsystem. With the cooperation of the Chinese character and English recognition cores, it automatically recognizes multi-font printed documents that are dominated by Tibetan and mixed with certain Chinese characters, English, numbers and symbols, and converts the document image into Text that a computer can "read".
在该系统中的藏文字符识别部分采用本发明提出的方法,具体参数与实施例1一致,移植了实施例1中的字符特征库。该系统于2003年11月通过了教育部主持的专家鉴定。在鉴定测试时,从由西北民族大学提供的500余页,共52万余字的实际印刷体藏文文档(采自书籍、报刊、杂志等出版物)中随机选出62页,共95583个字符进行了测试,结果如下:The Tibetan character recognition part in this system adopts the method proposed by the present invention, and the specific parameters are consistent with those in Embodiment 1, and the character feature library in Embodiment 1 is transplanted. The system passed the expert appraisal hosted by the Ministry of Education in November 2003. During the identification test, 62 pages were randomly selected from the actual printed Tibetan documents (collected from books, newspapers, magazines, etc.) provided by Northwest University for Nationalities, totaling 95,583. The characters were tested and the results are as follows:
表2多字体印刷藏文(混排汉英)文档识别系统的测试性能
注:ACE为可判断的识别错误率 ASE为可判断的切分错误率 UTE为不可判断错误类型的错误率该结果表明,本发明提出的多字体多字号印刷体藏文字符识别完全适应实际应用的需要,能够获得良好的识别性能,具有广泛的应用前景。Note: ACE is the identifiable recognition error rate ASE is the identifiable segmentation error rate UTE is the error rate of the undeterminable error type. The results show that the multi-font and multi-font-size printed Tibetan character recognition proposed by the present invention is completely suitable for practical application needs, can obtain good recognition performance, and has a wide range of application prospects.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200410034107 CN1251130C (en) | 2004-04-23 | 2004-04-23 | Method for identifying multi-font multi-character size print form Tibetan character |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200410034107 CN1251130C (en) | 2004-04-23 | 2004-04-23 | Method for identifying multi-font multi-character size print form Tibetan character |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1570958A true CN1570958A (en) | 2005-01-26 |
CN1251130C CN1251130C (en) | 2006-04-12 |
Family
ID=34481469
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200410034107 Expired - Fee Related CN1251130C (en) | 2004-04-23 | 2004-04-23 | Method for identifying multi-font multi-character size print form Tibetan character |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1251130C (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100440250C (en) * | 2007-03-09 | 2008-12-03 | 清华大学 | Printed Mongolian Character Recognition Method |
WO2009114967A1 (en) * | 2008-03-19 | 2009-09-24 | 东莞市步步高教育电子产品有限公司 | Motion scan-based image processing method and device |
CN101366017B (en) * | 2005-12-12 | 2010-06-16 | 微软公司 | Method and system for character recognition based on logical structure and layout |
CN101510259B (en) * | 2009-03-18 | 2011-04-06 | 西北民族大学 | On-line identification method for 'ding' of handwriting Tibet character |
CN102184383A (en) * | 2011-04-18 | 2011-09-14 | 哈尔滨工业大学 | Automatic generation method of image sample of printed character |
CN102360436A (en) * | 2011-10-24 | 2012-02-22 | 中国科学院软件研究所 | Identification method for on-line handwritten Tibetan characters based on components |
CN103999097A (en) * | 2011-07-11 | 2014-08-20 | 华为技术有限公司 | System and method for compact descriptor for visual search |
CN104809442A (en) * | 2015-05-04 | 2015-07-29 | 北京信息科技大学 | Intelligent recognition method for graphemes of Dongba pictographs |
CN106127266A (en) * | 2016-08-29 | 2016-11-16 | 大连民族大学 | Hand-written Manchu alphabet recognition methods |
CN106355200A (en) * | 2016-08-29 | 2017-01-25 | 大连民族大学 | Manchu handwritten recognition device |
CN106408002A (en) * | 2016-08-29 | 2017-02-15 | 大连民族大学 | Hand-written manchu alphabet identification system |
CN107025452A (en) * | 2016-01-29 | 2017-08-08 | 富士通株式会社 | Image-recognizing method and image recognition apparatus |
CN107730511A (en) * | 2017-09-20 | 2018-02-23 | 北京工业大学 | A kind of Tibetan language historical document line of text cutting method based on baseline estimations |
CN108932454A (en) * | 2017-05-23 | 2018-12-04 | 杭州海康威视系统技术有限公司 | A kind of character recognition method based on picture, device and electronic equipment |
CN110858317A (en) * | 2018-08-24 | 2020-03-03 | 北京搜狗科技发展有限公司 | Handwriting recognition method and device |
CN111553336A (en) * | 2020-04-27 | 2020-08-18 | 西安电子科技大学 | A system and method for image recognition of printed Uyghur documents based on conjoined segments |
CN111583217A (en) * | 2020-04-30 | 2020-08-25 | 深圳开立生物医疗科技股份有限公司 | Tumor ablation curative effect prediction method, device, equipment and computer medium |
-
2004
- 2004-04-23 CN CN 200410034107 patent/CN1251130C/en not_active Expired - Fee Related
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101366017B (en) * | 2005-12-12 | 2010-06-16 | 微软公司 | Method and system for character recognition based on logical structure and layout |
CN100440250C (en) * | 2007-03-09 | 2008-12-03 | 清华大学 | Printed Mongolian Character Recognition Method |
WO2009114967A1 (en) * | 2008-03-19 | 2009-09-24 | 东莞市步步高教育电子产品有限公司 | Motion scan-based image processing method and device |
CN101510259B (en) * | 2009-03-18 | 2011-04-06 | 西北民族大学 | On-line identification method for 'ding' of handwriting Tibet character |
CN102184383A (en) * | 2011-04-18 | 2011-09-14 | 哈尔滨工业大学 | Automatic generation method of image sample of printed character |
CN102184383B (en) * | 2011-04-18 | 2013-04-10 | 哈尔滨工业大学 | Automatic generation method of image sample of printed character |
CN103999097B (en) * | 2011-07-11 | 2017-04-12 | 华为技术有限公司 | System and method for compact descriptor for visual search |
CN103999097A (en) * | 2011-07-11 | 2014-08-20 | 华为技术有限公司 | System and method for compact descriptor for visual search |
CN102360436A (en) * | 2011-10-24 | 2012-02-22 | 中国科学院软件研究所 | Identification method for on-line handwritten Tibetan characters based on components |
CN102360436B (en) * | 2011-10-24 | 2012-11-07 | 中国科学院软件研究所 | Identification method for on-line handwritten Tibetan characters based on components |
CN104809442B (en) * | 2015-05-04 | 2017-11-17 | 北京信息科技大学 | A kind of Dongba pictograph grapheme intelligent identification Method |
CN104809442A (en) * | 2015-05-04 | 2015-07-29 | 北京信息科技大学 | Intelligent recognition method for graphemes of Dongba pictographs |
CN107025452A (en) * | 2016-01-29 | 2017-08-08 | 富士通株式会社 | Image-recognizing method and image recognition apparatus |
CN106355200A (en) * | 2016-08-29 | 2017-01-25 | 大连民族大学 | Manchu handwritten recognition device |
CN106408002A (en) * | 2016-08-29 | 2017-02-15 | 大连民族大学 | Hand-written manchu alphabet identification system |
CN106127266A (en) * | 2016-08-29 | 2016-11-16 | 大连民族大学 | Hand-written Manchu alphabet recognition methods |
CN108932454A (en) * | 2017-05-23 | 2018-12-04 | 杭州海康威视系统技术有限公司 | A kind of character recognition method based on picture, device and electronic equipment |
CN107730511A (en) * | 2017-09-20 | 2018-02-23 | 北京工业大学 | A kind of Tibetan language historical document line of text cutting method based on baseline estimations |
CN107730511B (en) * | 2017-09-20 | 2020-10-27 | 北京工业大学 | A Text Line Segmentation Method of Tibetan Historical Documents Based on Baseline Estimation |
CN110858317A (en) * | 2018-08-24 | 2020-03-03 | 北京搜狗科技发展有限公司 | Handwriting recognition method and device |
CN111553336A (en) * | 2020-04-27 | 2020-08-18 | 西安电子科技大学 | A system and method for image recognition of printed Uyghur documents based on conjoined segments |
CN111553336B (en) * | 2020-04-27 | 2023-03-24 | 西安电子科技大学 | Print Uyghur document image recognition system and method based on link segment |
CN111583217A (en) * | 2020-04-30 | 2020-08-25 | 深圳开立生物医疗科技股份有限公司 | Tumor ablation curative effect prediction method, device, equipment and computer medium |
Also Published As
Publication number | Publication date |
---|---|
CN1251130C (en) | 2006-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1251130C (en) | Method for identifying multi-font multi-character size print form Tibetan character | |
CN1158627C (en) | Method and device for character recognition | |
CN1156791C (en) | Pattern recognizing apparatus and method | |
CN1187952C (en) | Apparatus and method for correcting distortion of input image | |
CN1818927A (en) | Fingerprint identification method and system | |
CN1177407A (en) | Method and system for velocity-based head writing recognition | |
CN100336070C (en) | Method of robust human face detection in complicated background image | |
CN1794266A (en) | Biocharacteristics fusioned identity distinguishing and identification method | |
CN1151465C (en) | Model identification equipment using condidate table making classifying and method thereof | |
CN1924897A (en) | Image processing apparatus and method and program | |
CN1200387C (en) | Statistic handwriting identification and verification method based on separate character | |
CN1310825A (en) | Methods and apparatus for classifying text and for building a text classifier | |
CN1573742A (en) | Image retrieving system, image classifying system, image retrieving program, image classifying program, image retrieving method and image classifying method | |
CN1459761A (en) | Character identification technique based on Gabor filter set | |
CN1599913A (en) | Iris identification system and method, and storage media having program thereof | |
CN101055620A (en) | Shape comparison device and method | |
CN1215201A (en) | Character Recognition/Correction Method | |
CN1041773C (en) | Character recognition method and apparatus based on 0-1 pattern representation of histogram of character image | |
CN1552041A (en) | Face meta-data creation and face similarity calculation | |
CN1338703A (en) | Device for extracting drawing line from multiple value image | |
CN1664846A (en) | On-line Handwritten Chinese Character Recognition Method Based on Statistical Structural Features | |
CN1251128C (en) | Pattern ranked matching device and method | |
CN1973757A (en) | Computerized disease sign analysis system based on tongue picture characteristics | |
CN1310182C (en) | Method, device and storage medium for enhancing document, image and character recognition | |
CN1266643C (en) | Printed font character identification method based on Arabic character set |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20060412 Termination date: 20140423 |