CN105023029A

CN105023029A - Online handwritten Tibetan syllable recognition method and device

Info

Publication number: CN105023029A
Application number: CN201510370046.7A
Authority: CN
Inventors: 马龙龙; 吴健
Original assignee: Institute of Software of CAS
Current assignee: Institute of Software of CAS
Priority date: 2015-06-29
Filing date: 2015-06-29
Publication date: 2015-11-04
Anticipated expiration: 2035-06-29
Also published as: CN105023029B

Abstract

The invention provides an online handwritten Tibetan syllable recognition method and device, relates to the technical field of character recognition, and is used to solve the problem that the prior art cannot efficiently recognize Tibetan syllables continuously handwritten by users. The method includes: preprocessing the dot track of the Tibetan syllables continuously handwritten by the user; successively segmenting the preprocessed Tibetan syllables from the horizontal direction and the vertical direction to obtain the substructure block sequence of the two-layer marking result ; Using a segmentation hypothesis verification method based on a semi-Markov conditional random field, the substructure block sequence of the two-layer marking result is subjected to segmentation hypothesis verification to obtain the optimal segmentation path and the identification result of the component string; according to the optimal Based on the optimal segmentation path and the recognition result of the part string, the category of the handwritten Tibetan syllable input by the user is determined. The invention is suitable for recognizing Tibetan syllables continuously handwritten by users.

Description

A method and device for online handwritten Tibetan syllable recognition

技术领域technical field

本发明涉及字符识别技术领域，尤其涉及一种联机手写藏文音节识别方法及装置。The invention relates to the technical field of character recognition, in particular to an online handwritten Tibetan syllable recognition method and device.

背景技术Background technique

藏文的输入方式主要包括手写输入和键盘输入。与键盘输入相比，手写输入更符合人们的表达习惯，是一种易被用户使用、有效的实时工具，而且携带方便，操作简单。随着智能手机、平板电脑、电子白板、iPad等移动终端设备的进步和广泛应用，联机手写藏文输入(笔输入)算法的研究得到了越来越多的应用和关注，目前相关研究主要是以藏文字符识别为主，并且已有支持以藏文字符为输入单元的手写输入法。然而，由于藏族语言自身的特殊性，藏族地区的人们更希望手写藏文输入能够支持连续书写，以藏文音节为手写输入单元，这样更符合藏族地区人们的书写习惯，但目前在联机手写藏文音节识别上的研究相对较少，这方面的技术还没有相关的文献或专利的报道。Tibetan input methods mainly include handwriting input and keyboard input. Compared with keyboard input, handwriting input is more in line with people's expression habits. It is an effective real-time tool that is easy to use by users, and it is easy to carry and easy to operate. With the advancement and wide application of mobile terminal devices such as smartphones, tablet computers, electronic whiteboards, and iPads, research on online handwritten Tibetan input (pen input) algorithms has received more and more applications and attention. It mainly focuses on Tibetan character recognition, and there is already a handwriting input method that supports Tibetan characters as input units. However, due to the particularity of the Tibetan language itself, people in Tibetan areas hope that handwritten Tibetan input can support continuous writing, and Tibetan syllables are used as handwritten input units, which is more in line with the writing habits of people in Tibetan areas. There are relatively few studies on text syllable recognition, and there are no related literature or patent reports on this technology.

在实现本发明的过程中，发明人发现现有技术中至少存在如下技术问题：In the process of realizing the present invention, the inventor found that there are at least the following technical problems in the prior art:

现有的联机手写藏文音节识别方法，不能对用户连续手写输入的藏文音节进行高效识别，不能满足藏文用户的书写习惯和需求。The existing online handwritten Tibetan syllable recognition method cannot efficiently recognize the Tibetan syllables input by users' continuous handwriting, and cannot meet the writing habits and needs of Tibetan users.

发明内容Contents of the invention

本发明提供一种联机手写藏文音节识别方法及装置，能够对用户连续手写输入的藏文音节进行高效识别，满足藏文用户的书写习惯和需求。The invention provides an online handwritten Tibetan syllable recognition method and device, capable of efficiently recognizing Tibetan syllables continuously handwritten by users, and meeting the writing habits and needs of Tibetan users.

本发明提供的联机手写藏文音节识别方法，包括：The online handwritten Tibetan syllable recognition method provided by the present invention comprises:

对用户连续手写输入的藏文音节的点轨迹进行预处理；Preprocess the dot track of Tibetan syllables input by the user's continuous handwriting;

对预处理后的藏文音节先后从水平方向和垂直方向进行过分割，得到两层标记结果的子结构块序列；Segment the preprocessed Tibetan syllables successively from the horizontal direction and the vertical direction, and obtain the substructure block sequence of the two-layer marking result;

采用基于半马尔科夫条件随机场的分割假设验证方法，对所述两层标记结果的子结构块序列进行分割假设验证，获取最优的分割路径及部件串的识别结果；Using a segmentation hypothesis verification method based on a semi-Markov conditional random field, the segmentation hypothesis verification is performed on the substructure block sequence of the two-layer marking result, and the optimal segmentation path and the recognition result of the component string are obtained;

根据所述最优的分割路径及部件串的识别结果，确定所述用户输入的手写藏文音节类别。According to the optimal segmentation path and the recognition result of the part string, the handwritten Tibetan syllable category input by the user is determined.

本发明提供的联机手写藏文音节识别装置，包括：The online handwritten Tibetan syllable recognition device provided by the present invention comprises:

预处理单元，用于对用户连续手写输入的藏文音节的点轨迹进行预处理；A preprocessing unit is used to preprocess the dot track of the Tibetan syllables continuously handwritten by the user;

过分割单元，用于对预处理后的藏文音节先后从水平方向和垂直方向进行过分割，得到两层标记结果的子结构块序列；The over-segmentation unit is used to successively over-segment the preprocessed Tibetan syllables from the horizontal direction and the vertical direction to obtain the substructure block sequence of the two-layer marking result;

分割假设验证单元，用于采用基于半马尔科夫条件随机场的分割假设验证方法，对所述两层标记结果的子结构块序列进行分割假设验证，获取最优的分割路径及部件串的识别结果；The segmentation hypothesis verification unit is used to adopt the segmentation hypothesis verification method based on the semi-Markov conditional random field to perform segmentation hypothesis verification on the substructure block sequence of the two-layer marking result, and obtain the optimal segmentation path and identification of the component string result;

确定单元，用于根据所述最优的分割路径及部件串的识别结果，确定所述用户输入的手写藏文音节类别。The determination unit is configured to determine the handwritten Tibetan syllable category input by the user according to the optimal segmentation path and the recognition result of the component string.

本发明提供的联机手写藏文音节识别方法及装置，首先对用户连续手写输入的藏文音节的点轨迹进行预处理，然后对预处理后的藏文音节先后从水平方向和垂直方向进行过分割，得到两层标记结果的子结构块序列，并采用基于半马尔科夫条件随机场的分割假设验证方法，对所述两层标记结果的子结构块序列进行分割假设验证，获取最优的分割路径及部件串的识别结果，最后根据所述最优的分割路径及部件串的识别结果，确定所述用户输入的手写藏文音节类别。与现有技术相比，本发明能够对用户连续手写输入的藏文音节进行高效识别，满足藏文用户的书写习惯和需求。The on-line handwritten Tibetan syllable recognition method and device provided by the present invention first preprocess the dot track of the Tibetan syllables continuously handwritten by the user, and then sequentially over-segment the preprocessed Tibetan syllables from the horizontal direction and the vertical direction , obtain the substructure block sequence of the two-layer marking result, and use the segmentation hypothesis verification method based on the semi-Markov conditional random field to perform segmentation hypothesis verification on the substructure block sequence of the two-layer marking result to obtain the optimal segmentation The recognition result of the path and the component string, and finally, according to the optimal segmentation path and the recognition result of the component string, determine the category of the handwritten Tibetan syllable input by the user. Compared with the prior art, the present invention can efficiently recognize the Tibetan syllables continuously handwritten by users, and satisfy the writing habits and needs of Tibetan users.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.

图1为本发明实施例一提供的联机手写藏文音节识别方法流程图；Fig. 1 is the flowchart of the online handwritten Tibetan syllable recognition method provided by Embodiment 1 of the present invention;

图2为本发明实施例二提供的藏文音节的结构组成示意图；Fig. 2 is a schematic diagram of the structural composition of Tibetan syllables provided by Embodiment 2 of the present invention;

图3为本发明实施例二提供的藏文音节在水平方向的字符分割的示例图；FIG. 3 is an example diagram of character segmentation of Tibetan syllables in the horizontal direction provided by Embodiment 2 of the present invention;

图4为本发明实施例二提供的藏文音节在水平方向的错误字符分割检测及正确分割的示例图；Fig. 4 is an example diagram of the wrong character segmentation detection and correct segmentation of Tibetan syllables in the horizontal direction provided by Embodiment 2 of the present invention;

图5为本发明实施例二提供的藏文字符在垂直方向的部件分割的示例图；Fig. 5 is an example diagram of component segmentation of Tibetan characters in the vertical direction provided by Embodiment 2 of the present invention;

图6为本发明实施例三提供的联机手写藏文音节识别装置的结构示意图。FIG. 6 is a schematic structural diagram of an on-line handwritten Tibetan syllable recognition device provided by Embodiment 3 of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

实施例一Embodiment one

本实施例提供一种联机手写藏文音节识别方法，如图1所示，所述方法包括：The present embodiment provides a kind of online handwritten Tibetan syllable recognition method, as shown in Figure 1, described method comprises:

S11、对用户连续手写输入的藏文音节的点轨迹进行预处理。S11. Preprocessing the dot track of the Tibetan syllables continuously handwritten by the user.

S12、对预处理后的藏文音节先后从水平方向和垂直方向进行过分割，得到两层标记结果的子结构块序列。S12. Segment the preprocessed Tibetan syllables successively from the horizontal direction and the vertical direction to obtain a sub-structural block sequence of two-layer marking results.

S13、采用基于半马尔科夫条件随机场的分割假设验证方法，对所述两层标记结果的子结构块序列进行分割假设验证，获取最优的分割路径及部件串的识别结果。S13. Using a segmentation hypothesis verification method based on a semi-Markov conditional random field, perform segmentation hypothesis verification on the substructure block sequence of the two-layer marking result, and obtain an optimal segmentation path and a component string identification result.

S14、根据所述最优的分割路径及部件串的识别结果，确定所述用户输入的手写藏文音节类别。S14. Determine the handwritten Tibetan syllable category input by the user according to the optimal segmentation path and the recognition result of the component string.

本发明实施例提供的联机手写藏文音节识别方法，首先对用户连续手写输入的藏文音节的点轨迹进行预处理，然后对预处理后的藏文音节先后从水平方向和垂直方向进行过分割，得到两层标记结果的子结构块序列，并采用基于半马尔科夫条件随机场的分割假设验证方法，对所述两层标记结果的子结构块序列进行分割假设验证，获取最优的分割路径及部件串的识别结果，最后根据所述最优的分割路径及部件串的识别结果，确定所述用户输入的手写藏文音节类别。与现有技术相比，本发明能够对用户连续手写输入的藏文音节进行高效识别，满足藏文用户的书写习惯和需求。The online handwritten Tibetan syllable recognition method provided by the embodiment of the present invention first preprocesses the dot track of the Tibetan syllables continuously handwritten by the user, and then sequentially over-segments the preprocessed Tibetan syllables from the horizontal direction and the vertical direction , obtain the substructure block sequence of the two-layer marking result, and use the segmentation hypothesis verification method based on the semi-Markov conditional random field to perform segmentation hypothesis verification on the substructure block sequence of the two-layer marking result to obtain the optimal segmentation The recognition result of the path and the component string, and finally, according to the optimal segmentation path and the recognition result of the component string, determine the category of the handwritten Tibetan syllable input by the user. Compared with the prior art, the present invention can efficiently recognize the Tibetan syllables continuously handwritten by users, and satisfy the writing habits and needs of Tibetan users.

进一步地，所述对用户连续手写输入的藏文音节的点轨迹进行预处理可以包括：对用户连续手写输入的藏文音节的点轨迹进行去除孤立点、等距离重采样及高斯平滑处理。Further, the preprocessing of the point trajectories of Tibetan syllables continuously handwritten by the user may include: removing isolated points, equidistant resampling and Gaussian smoothing on the point trajectories of Tibetan syllables continuously handwritten by the user.

进一步地，所述对预处理后的藏文音节先后从水平方向和垂直方向进行过分割，得到两层标记结果的子结构块序列可以包括：对预处理后的藏文音节先后进行水平方向的字符分割和垂直方向的部件分割，得到两层标记结果的子结构块序列。Further, the preprocessed Tibetan syllables are successively segmented from the horizontal direction and the vertical direction, and the substructure block sequence for obtaining the two-layer marking result may include: performing horizontal division on the preprocessed Tibetan syllables. Character segmentation and component segmentation in the vertical direction result in a sequence of substructural blocks for the two-level labeling result.

进一步地，所述采用基于半马尔科夫条件随机场的分割假设验证方法，对所述两层标记结果的子结构块序列进行分割假设验证，获取最优的分割路径及部件串的识别结果可以包括：通过不同的权值将部件分类器、几何上下文及语言上下文集成到一个统一的识别框架下，对所述两层标记结果的子结构块序列进行不同的分割假设验证，获取最优的分割路径及部件串的识别结果。Further, the segmentation hypothesis verification method based on the semi-Markov conditional random field is used to verify the segmentation hypothesis of the sub-structural block sequence of the two-layer marking result, and to obtain the optimal segmentation path and the recognition result of the component string can be Including: integrating the component classifier, geometric context and language context into a unified recognition framework through different weights, performing different segmentation hypothesis verification on the sub-structural block sequence of the two-layer marking results, and obtaining the optimal segmentation Path and component string recognition results.

可选地，连接所述部件分类器、几何上下文及语言上下文的权值以及所述部件分类器、几何上下文及语言上下文各自的参数可以通过基于最小化负对数似然度损失函数的准则训练得到。Optionally, the weights connecting the part classifier, the geometric context and the language context and the respective parameters of the part classifier, the geometric context and the language context may be trained by a criterion based on minimizing a negative log-likelihood loss function get.

实施例二Embodiment two

本实施例提供一种联机手写藏文音节识别方法。本实施例采用中国科学院软件研究所基础软件国家工程研究中心的多语言处理研究组的MRG-OHTC样本数据库。该数据库包括了150位不同书写者的手写藏文音节样本，每位书写者完成预先选取的827个高频音节的书写，其中，两字符音节456个，三字符音节309个，四字符音节62个。选取其中130套(书写人)样本进行训练，剩余的20套样本进行测试。此外，150套样本均采用半监督的标定工具进行字符层和音节层的标记。This embodiment provides an online handwritten Tibetan syllable recognition method. This embodiment uses the MRG-OHTC sample database of the Multilingual Processing Research Group of the National Engineering Research Center for Basic Software of the Institute of Software, Chinese Academy of Sciences. The database includes handwritten Tibetan syllable samples of 150 different writers. Each writer completed the writing of 827 high-frequency syllables pre-selected, including 456 two-character syllables, 309 three-character syllables, and 62 four-character syllables. indivual. Select 130 sets of (writer) samples for training, and the remaining 20 sets of samples for testing. In addition, the 150 sets of samples are labeled at the character level and syllable level with semi-supervised calibration tools.

本实施例提供的联机手写藏文音节识别方法具体过程如下：The specific process of the online handwritten Tibetan syllable recognition method provided by the present embodiment is as follows:

(1)点轨迹预处理(1) Point trajectory preprocessing

将一个联机手写藏文音节的输入表示为手写轨迹的点序列：(x₁，y₁)，(x₂，y₂)，…(x_n，y_n)，其中n表示输入音节轨迹中的点数，轨迹中点的顺序按书写的时间先后排列，笔划之间用结束标志点断开。首先去除孤立点，即由单个噪声点组成的笔划，以消除孤立噪声点对字符和部件分割及部件识别的影响，然后对音节的轨迹进行等距离重采样，最后用高斯滤波进行点的平滑，以克服轨迹中点的波动。在等距离重采样中，点的距离设置为0.5；在高斯平滑中，方差设置为1.2。The input of an online handwritten Tibetan syllable is expressed as a point sequence of the handwritten trajectory: (x ₁ , y ₁ ), (x ₂ , y ₂ ), ... (x _n , y _n ), where n represents the point in the input syllable trajectory The number of points and the order of the points in the trajectory are arranged according to the time of writing, and the strokes are separated by the end mark point. First remove isolated points, that is, strokes composed of single noise points, to eliminate the impact of isolated noise points on character and component segmentation and component recognition, then equidistantly resample the trajectory of syllables, and finally use Gaussian filtering to smooth the points, to overcome fluctuations in the midpoint of the trajectory. In equidistant resampling, the distance of the points is set to 0.5; in Gaussian smoothing, the variance is set to 1.2.

(2)过分割(2) Over-segmentation

将经过预处理的藏文音节过分割成两层标记结果，每一层标记结果均由子结构块序列组成。子结构块是完整的部件或部件的一部分。藏文音节由1～4个藏文字符在水平方向组合组成，每个字符由一个或多个部件在垂直方向按上下叠加的纵向形式组合而成，如图2所示。所述的部件是指字符的子笔划序列，是易被计算机分割算法提取、比笔划结构更稳定的结构基元。由于字符由部件组成，不同的字符共用相同的部件，因而部件的类别数远小于字符的类别数。结合藏文音节的一般书写顺序，过分割的具体步骤如下：The preprocessed Tibetan syllables are over-segmented into two layers of labeling results, and each layer of labeling results consists of a sequence of substructural blocks. A substructure block is a complete part or a part of a part. A Tibetan syllable is composed of 1 to 4 Tibetan characters combined in the horizontal direction, and each character is composed of one or more components vertically superimposed vertically, as shown in Figure 2. The components refer to the sub-stroke sequences of characters, which are structural primitives that are easily extracted by computer segmentation algorithms and are more stable than stroke structures. Since characters are composed of components, and different characters share the same components, the number of categories of components is much smaller than the number of categories of characters. Combined with the general writing order of Tibetan syllables, the specific steps of over-segmentation are as follows:

a、水平方向的字符分割：a. Character segmentation in the horizontal direction:

首先将藏文音节从水平书写方向切分为字符序列。初始假定每一个笔划为一个子结构块，迭代归并任意两个在水平方向重叠度较大的子结构块，直到没有可归并的为止。假定最初每个笔划是一个子结构块，基于这个信息，如果两个子结构块(笔划序列)在水平方向有间隔或两个子结构块在水平方向重叠且重叠度小于0.1，就将这两个子结构块分割；如果两个子结构块在水平方向重叠且重叠度大于0.1，就将这两个子结构块归并。所述的水平方向重叠度是指两个子结构块在水平方向的重叠程度的度量。First, Tibetan syllables are segmented into character sequences from the horizontal writing direction. Initially assume that each stroke is a sub-structure block, iteratively merge any two sub-structure blocks with a large overlap in the horizontal direction until there is no mergeable one. Assuming that each stroke is a substructure block initially, based on this information, if there is an interval between two substructure blocks (stroke sequences) in the horizontal direction or if the two substructure blocks overlap in the horizontal direction and the overlap is less than 0.1, the two substructure blocks Block segmentation; if two substructure blocks overlap in the horizontal direction and the degree of overlap is greater than 0.1, merge the two substructure blocks. The horizontal overlapping degree refers to the measurement of the overlapping degree of two sub-structural blocks in the horizontal direction.

通常手写藏文音节时，一般字符之间在水平方向存在明显的间隔，可通过以上迭代方式进行归并，如图3所示为正确字符分割结果。然而，由于书写的随意性，元音在水平方向的宽度较大，藏文音节中某个带有元音的字符常常和其它字符在水平方向有较大重叠，如图4所示，通过检测元音的位置，进行强制断开，从而解决错误归并的问题。Usually, when Tibetan syllables are handwritten, there are obvious intervals between the general characters in the horizontal direction, which can be merged through the above iterative method, as shown in Figure 3, which is the correct character segmentation result. However, due to the arbitrariness of writing, the width of the vowel in the horizontal direction is relatively large, and a character with a vowel in a Tibetan syllable often overlaps with other characters in the horizontal direction, as shown in Figure 4. Through the detection The position of the vowel is forced to break, so as to solve the problem of wrong merge.

b、垂直方向的部件分割：b. Parts division in the vertical direction:

基于水平方向的字符分割结果，针对每个字符，从垂直方向进行部件分割。采用类似于水平方向重叠度的计算方法进行归并，相比字符之间的间隔或重叠，一般部件之间在垂直方向存在空白间隔较小或部件之间的重叠较大，设置重叠度归并的经验值为0.2，垂直方向的部件分割结果如图5所示。Based on the result of character segmentation in the horizontal direction, for each character, component segmentation is performed in the vertical direction. Use a calculation method similar to the overlapping degree in the horizontal direction for merging. Compared with the spacing or overlapping between characters, there are generally small blank spaces in the vertical direction between components or large overlaps between components. Experience in setting overlap merging The value is 0.2, and the component segmentation results in the vertical direction are shown in Figure 5.

藏文字符内的部件之间可能连笔，通过角点检测方法，将连笔断开，从而保证部件的正确分割。Components in Tibetan characters may connect with each other, and the corner detection method is used to break the connected strokes, so as to ensure the correct segmentation of components.

(3)基于半马尔科夫条件随机场的分割假设验证(3) Segmentation hypothesis verification based on semi-Markov conditional random field

将藏文音节识别看成是两层部件串的识别，即水平方向和垂直方向分割的部件串识别，关键问题是如何从经过步骤(2)所得的两层标记结果的子结构块序列中得到正确的部件串分割点及部件识别结果。本发明采用了基于半马尔科夫条件随机场的分割假设验证方法，将部件分类器、几何上下文和语言上下文集成到一个统一识别框架下，对不同的分割假设进行验证，获得最优的分割路径及部件串的识别结果。对上述各模型分别说明如下：The Tibetan syllable recognition is regarded as the recognition of two-layer component strings, that is, the recognition of component strings divided in the horizontal direction and vertical direction. The key problem is how to obtain Correct part string segmentation point and part recognition result. The present invention adopts the segmentation hypothesis verification method based on the semi-Markov conditional random field, integrates the component classifier, geometric context and language context into a unified recognition framework, verifies different segmentation assumptions, and obtains the optimal segmentation path And the recognition result of the component string. The above models are explained as follows:

a、部件分类器a. Part classifier

部件分类器采用基于深度神经网络的多特征多分类融合模型，从不同角度利用深度神经网络对藏文字丁进行特征表示，然后用不同的统计分类器进行分类，实现多特征多分类融合的藏文部件识别方法。对于联机特征，联机手写藏文部件由笔划序列组成，首先通过坐标归一化方法(NCFE)提取原始特征，然后利用深度信念网(DBN)通过多层的非线性变换，得到更高层的特征，采用最近原型分类器(NPC)分类得到基于联机特征的分类结果。对于脱机特征，先将由笔划序列组成的藏文部件转换成二值化图像，以最底层的像素作为特征表示的输入，利用深度卷积神经网络(DCNN)提取特征，采用修正二次判别函数分类器(MQDF)分类得到基于脱机特征的分类结果，最后融合基于联机和脱机的分类结果得到藏文部件识别结果。The component classifier adopts a multi-feature and multi-classification fusion model based on a deep neural network. It uses a deep neural network to represent Tibetan characters from different angles, and then classifies them with different statistical classifiers to realize multi-feature and multi-classification fusion of Tibetan characters. Part identification method. For online features, the online handwritten Tibetan components are composed of stroke sequences. First, the original features are extracted through the coordinate normalization method (NCFE), and then the deep belief network (DBN) is used to obtain higher-level features through multi-layer nonlinear transformation. The online feature-based classification results are obtained by using Nearest Prototype Classifier (NPC) classification. For offline features, the Tibetan components composed of stroke sequences are first converted into binary images, and the bottommost pixels are used as the input of the feature representation, and the deep convolutional neural network (DCNN) is used to extract features, and the modified quadratic discriminant function is used. The classifier (MQDF) classifies and obtains the classification results based on offline features, and finally combines the online and offline classification results to obtain the Tibetan component recognition results.

本发明实施例采用藏文部件为基本识别单元，与字符类别相比，部件的类别总数约为字符类别的1/5，这使得部件分类器的词典存储量较小，可以满足移动设备的存储需求；此外，较小的部件类别中相似的部件也大大减少，有助于提高最终的音节识别精度。The embodiment of the present invention uses Tibetan components as the basic recognition unit. Compared with character categories, the total number of component categories is about 1/5 of the character category, which makes the storage capacity of the dictionary of the component classifier small, which can meet the storage requirements of mobile devices. requirements; moreover, the similar parts in smaller part categories are also greatly reduced, which helps to improve the final syllable recognition accuracy.

b、几何上下文b. Geometric context

几何上下文包括音节内字符之间的几何上下文和字符内部件之间的几何上下文。音节内字符之间的几何上下文是指候选字符模式相对于整个藏文音节的高度、宽度、位置和相邻候选字符之间的距离、相对位置等信息。针对音节内字符之间的几何上下文，分别为每一类字符建立一元几何特征，并为音节内的每两个连续字符之间建立二元几何特征，分别用不同的二次判别函数来模型化一元和二元特征。本实施例一共使用了6个一元几何特征，包括候选字符的宽度，高度，外接矩形对角线长度，外接矩形的中心、上边界和下边界与字符串水平中心线的距离，这6个特征需要用平均藏文字符高度进行归一化。一共使用了4个二元几何特征，包括相邻藏文字符外接矩形上边界、下边界、上边界与下边界以及水平中心线之间的差，上述特征都用平均藏文字符高度进行归一化。The geometric context includes the geometric context between characters within a syllable and the geometric context between parts within a character. The geometric context between characters within a syllable refers to information such as the height, width, and position of the candidate character pattern relative to the entire Tibetan syllable, and the distance and relative position between adjacent candidate characters. Aiming at the geometric context between characters in a syllable, a unary geometric feature is established for each type of character, and a binary geometric feature is established between every two consecutive characters in a syllable, which are modeled with different quadratic discriminant functions Unary and binary features. This embodiment uses a total of 6 unary geometric features, including the width of the candidate character, the height, the length of the diagonal of the circumscribed rectangle, the distance between the center, upper boundary and lower boundary of the circumscribed rectangle and the horizontal center line of the character string, these 6 features It needs to be normalized with the average Tibetan character height. A total of 4 binary geometric features are used, including the upper boundary, lower boundary, upper boundary and lower boundary of adjacent Tibetan characters, and the difference between the horizontal centerline. The above features are normalized by the average Tibetan character height change.

字符内部件之间的几何上下文是指候选部件模式相对于整个藏文字符的高度、宽度、相对位置等信息。针对每一类字符，分别为字符内的每一个部件建立一元几何特征，并为字符内的每两个连续部件(按上边界排列)之间建立二元几何特征，分别用不同的高斯概率密度函数来模型化一元和二元几何特征。部件层的一元和二元几何特征提取方法类似于字符层的方法，提取的特征都用平均藏文部件高度进行归一化。The geometric context between parts within a character refers to information such as the height, width, and relative position of the candidate part pattern relative to the entire Tibetan character. For each type of character, a unary geometric feature is established for each component in the character, and a binary geometric feature is established between every two consecutive components (arranged according to the upper boundary) in the character, using different Gaussian probability densities Functions to model unary and binary geometric features. The unary and binary geometric feature extraction methods at the component level are similar to those at the character level, and the extracted features are all normalized by the average Tibetan component height.

c、语言上下文c. Language context

藏文音节中字符的类别之间以及藏文字符中部件的类别之间有着一定的关系，即语言上下文，语言上下文分别从字符层和部件层构建语言模型。对于字符层和部件层的语言模型，都使用二元文法来描述，语言上下文的特征函数定义为二元文法概率的对数，且是与字符或部件类别有关的二元特征函数。There is a certain relationship between the categories of characters in Tibetan syllables and the categories of components in Tibetan characters, that is, the language context. The language context builds the language model from the character layer and the component layer respectively. For the language model of the character layer and the component layer, a binary grammar is used to describe it, and the feature function of the language context is defined as the logarithm of the probability of the binary grammar, and it is a binary feature function related to the character or component category.

上述三个模型的建模均需要首先在字符层和部件层对藏文音节样本进行标定，并通过标定结果从藏文音节样本中获取字符和部件样本以及确定字符和部件的类别，其中字符和部件的类别分别为562类和120类。标定方法采用基于半监督学习的方法，大大缩减人工干预的工作量。The modeling of the above three models needs to calibrate the Tibetan syllable samples at the character layer and component layer first, and obtain character and component samples from the Tibetan syllable samples and determine the categories of characters and components through the calibration results. The categories of components are 562 and 120 respectively. The calibration method adopts a method based on semi-supervised learning, which greatly reduces the workload of manual intervention.

在构建部件分类器、几何上下文和语言上下文模型后，利用基于半马尔科夫条件随机场的分割假设验证方法，对步骤(2)得到的两层标记结果的子结构块序列进行分割假设的进一步验证。After constructing the part classifier, geometric context and language context models, the segmentation hypothesis verification method based on semi-Markov conditional random fields is used to further verify the segmentation hypothesis on the sub-structural block sequence of the two-layer labeling result obtained in step (2). verify.

基于两层标记结果的子结构块序列，经过候选部件模式的分类，生成候选切分-识别网格，在网格中构建半马尔科夫条件随机场模型。假定藏文音节过分割结果(两层标记结果)为X，网格中候选路径的类别为Y(类别序列)，则对应的切分为S：Y(候选部件序列)，根据半马尔科夫条件随机场模型，候选路径(S,Y)的条件概率P(S,Y|X)表示为：Based on the substructure block sequence of the two-layer labeling results, the candidate segmentation-recognition grid is generated through the classification of candidate component patterns, and a semi-Markov conditional random field model is constructed in the grid. Assuming that the Tibetan syllable over-segmentation result (two-layer labeling result) is X, and the category of the candidate path in the grid is Y (category sequence), the corresponding segmentation is S: Y (candidate component sequence), according to the semi-Markov Conditional random field model, the conditional probability P(S,Y|X) of the candidate path (S,Y) is expressed as:

$\begin{matrix} P P ((S S,, Y Y | | X x)) = = \frac{11}{Z Z ((X x))} \underset{c c &Element; &Element; S S}{Π Π} {Ψ Ψ}_{c c} ((X x,, {Y Y}_{c c})) \\ = = \frac{11}{Z Z ((X x))} exp exp {{- - E E. ((X x,, S S,, Y Y))}} \end{matrix}$

其中c表示随机场中的最大团，Y_c表示c的类别，Ψ_c(X,Yc)为定义在c上的势函数，归一化因子Z(X)是网格中所有候选路径的势函数之和，E(X,S,Y)表示能量函数：where c represents the largest clique in the random field, Y _c represents the category of c, Ψ _c (X, Yc) is the potential function defined on c, and the normalization factor Z(X) is the potential of all candidate paths in the grid The sum of functions, E(X,S,Y) represents the energy function:

$E E. ((X x,, S S,, Y Y)) = = \underset{c c &Element; &Element; S S}{Σ Σ} {Σ Σ}_{k k = = 11}^{K K} {λ λ}_{k k} {f f}_{k k} (({X x}_{c c},, {Y Y}_{c c}))$

f_k(X_c,Y_c)是定义在c上的第k个特征函数，分别用来描述部件分类模型、音节内字符之间的一元和二元特征函数、字符内部件之间的一元和二元特征函数、基于字符的语言模型和基于部件的语言模型。基于半马尔科夫条件随机场的分割假设验证方法通过权值λ_k将各个子模型集成到一个统一识别框架下，权值λ_k以及各个子模型的参数采用基于最小化负对数似然度损失函数的准则训练得到。f _k (X _c , Y _c ) is the kth feature function defined on c, which is used to describe the component classification model, the unary and binary feature functions between characters in a syllable, and the unary and Binary feature functions, character-based language models, and component-based language models. The segmentation hypothesis verification method based on the semi-Markov conditional random field integrates each sub-model into a unified identification framework through the weight λ _k , and the weight λ _k and the parameters of each sub-model are based on minimizing the negative logarithm likelihood The criterion of the loss function is trained.

(4)识别输出(4) Identification output

基于步骤(3)得到的部件串的分割假设验证结果，查看音节的字符串表示词典以及字符的部件串表示词典，可以得到音节内包括的字符类别，从而确定输入的藏文音节类别。Based on the part string segmentation hypothesis verification result obtained in step (3), check the syllable string representation dictionary and the character part string representation dictionary to obtain the character category included in the syllable, thereby determining the input Tibetan syllable category.

表1和表2分别列出了采用基于半马尔科夫条件随机场的分割假设验证方法融合几何上下文和语言上下文的效果，从表中可看出几何上下文和语言上下文都改进了藏文音节识别的精度，在语言上下文中，基于部件的bi-gram比基于字符的bi-gram对藏文音节识别精度的贡献更大，这主要是因为整个音节的识别框架是在部件层构建。Table 1 and Table 2 respectively list the effect of using the segmentation hypothesis verification method based on semi-Markov conditional random fields to fuse geometric context and language context. It can be seen from the table that both geometric context and language context have improved Tibetan syllable recognition In the linguistic context, component-based bi-grams contribute more to Tibetan syllable recognition accuracy than character-based bi-grams, mainly because the entire syllable recognition framework is built at the component level.

表1几何上下文对音节识别精度的影响Table 1 Effect of geometric context on syllable recognition accuracy

部件分类器Part Classifier 几何上下文geometry context 识别精度(％)Recognition accuracy (%) √√ 73.5173.51 √√ √√ 74.8774.87

表2语言上下文对音节识别精度的影响Table 2 The influence of language context on the accuracy of syllable recognition

基于部件的bi-gramComponent-based bi-grams 基于字符的bi-gramcharacter-based bi-gram 识别精度(％)Recognition accuracy (%) √√ 79.6579.65 √√ 77.7277.72

√√ √√ 81.2381.23

实施例三Embodiment Three

本发明实施例提供一种联机手写藏文音节识别装置，如图6所示，所述装置包括：An embodiment of the present invention provides an online handwritten Tibetan syllable recognition device, as shown in Figure 6, the device includes:

预处理单元11，用于对用户连续手写输入的藏文音节的点轨迹进行预处理；The preprocessing unit 11 is used to preprocess the dot track of the Tibetan syllables continuously handwritten by the user;

过分割单元12，用于对预处理后的藏文音节先后从水平方向和垂直方向进行过分割，得到两层标记结果的子结构块序列；The over-segmentation unit 12 is used to successively over-segment the preprocessed Tibetan syllables from the horizontal direction and the vertical direction to obtain the substructure block sequence of the two-layer marking result;

分割假设验证单元13，用于采用基于半马尔科夫条件随机场的分割假设验证方法，对所述两层标记结果的子结构块序列进行分割假设验证，获取最优的分割路径及部件串的识别结果；The segmentation hypothesis verification unit 13 is used to adopt the segmentation hypothesis verification method based on the semi-Markov conditional random field to perform segmentation hypothesis verification on the sub-structural block sequence of the two-layer marking result, and obtain the optimal segmentation path and component string recognition result;

确定单元14，用于根据所述最优的分割路径及部件串的识别结果，确定所述用户输入的手写藏文音节类别。The determining unit 14 is configured to determine the handwritten Tibetan syllable category input by the user according to the optimal segmentation path and the recognition result of the component string.

本发明实施例提供的联机手写藏文音节识别装置，首先对用户连续手写输入的藏文音节的点轨迹进行预处理，然后对预处理后的藏文音节先后从水平方向和垂直方向进行过分割，得到两层标记结果的子结构块序列，并采用基于半马尔科夫条件随机场的分割假设验证方法，对所述两层标记结果的子结构块序列进行分割假设验证，获取最优的分割路径及部件串的识别结果，最后根据所述最优的分割路径及部件串的识别结果，确定所述用户输入的手写藏文音节类别。与现有技术相比，本发明能够对用户连续手写输入的藏文音节进行高效识别，满足藏文用户的书写习惯和需求。The online handwritten Tibetan syllable recognition device provided by the embodiment of the present invention first preprocesses the dot track of the Tibetan syllables continuously handwritten by the user, and then sequentially over-segments the preprocessed Tibetan syllables from the horizontal direction and the vertical direction , obtain the substructure block sequence of the two-layer marking result, and use the segmentation hypothesis verification method based on the semi-Markov conditional random field to perform segmentation hypothesis verification on the substructure block sequence of the two-layer marking result to obtain the optimal segmentation The recognition result of the path and the component string, and finally, according to the optimal segmentation path and the recognition result of the component string, determine the category of the handwritten Tibetan syllable input by the user. Compared with the prior art, the present invention can efficiently recognize the Tibetan syllables continuously handwritten by users, and satisfy the writing habits and needs of Tibetan users.

进一步地，所述预处理单元11，用于对用户连续手写输入的藏文音节的点轨迹进行去除孤立点、等距离重采样及高斯平滑处理。Further, the preprocessing unit 11 is configured to perform isolated point removal, equidistant resampling, and Gaussian smoothing on point trajectories of Tibetan syllables continuously handwritten by the user.

进一步地，所述过分割单元12，用于对预处理后的藏文音节先后进行水平方向的字符分割和垂直方向的部件分割，得到两层标记结果的子结构块序列。Further, the over-segmentation unit 12 is configured to perform horizontal character segmentation and vertical component segmentation on the preprocessed Tibetan syllables successively to obtain a sub-structural block sequence of two-layer marking results.

进一步地，所述分割假设验证单元13，用于通过不同的权值将部件分类器、几何上下文及语言上下文集成到一个统一的识别框架下，对所述两层标记结果的子结构块序列进行不同的分割假设验证，获取最优的分割路径及部件串的识别结果。Further, the segmentation hypothesis verification unit 13 is used to integrate the component classifier, geometric context and language context into a unified recognition framework through different weights, and perform Different segmentation assumptions are verified to obtain the optimal segmentation path and recognition results of component strings.

可选地，连接所述部件分类器、几何上下文及语言上下文的权值以及所述部件分类器、几何上下文及语言上下文各自的参数通过基于最小化负对数似然度损失函数的准则训练得到。Optionally, the weights connecting the part classifier, the geometric context and the language context and the respective parameters of the part classifier, the geometric context and the language context are obtained by training based on the criterion of minimizing the negative log-likelihood loss function .

本发明实施例提供的联机手写藏文音节识别方法及装置，可以适用于对用户连续手写输入的藏文音节进行识别，但不仅限于此。The online handwritten Tibetan syllable recognition method and device provided by the embodiments of the present invention are suitable for recognizing Tibetan syllables continuously handwritten by users, but are not limited thereto.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)或随机存储记忆体(Random Access Memory，RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the programs can be stored in computer-readable storage media. During execution, it may include the processes of the embodiments of the above-mentioned methods. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), etc.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. All should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims

1. the recognition methods of on-line handwritten Tibetan language syllable, is characterized in that, comprising:

Pre-service is carried out to the locus of points of the Tibetan language syllable of the continuous handwriting input of user;

Successively from horizontal direction and vertical direction, over-segmentation is carried out to pretreated Tibetan language syllable, obtains the minor structure block sequence of two-layer mark result;

Adopt the segmentation hypothesis verification method based on half Markov condition random field, segmentation hypothesis verification is carried out to the minor structure block sequence of described two-layer mark result, obtain optimum split path and the recognition result of parts string;

According to the split path of described optimum and the recognition result of parts string, determine the hand-written Tibetan language syllable classification that described user inputs.

2. method according to claim 1, it is characterized in that, the locus of points of the described Tibetan language syllable to the continuous handwriting input of user carries out pre-service and comprises: carry out removal isolated point, equidistant resampling and Gaussian smoothing to the locus of points of the Tibetan language syllable of the continuous handwriting input of user.

3. method according to claim 1, it is characterized in that, described successively from horizontal direction and vertical direction, over-segmentation is carried out to pretreated Tibetan language syllable, the minor structure block sequence obtaining two-layer mark result comprises: successively carry out the Character segmentation of horizontal direction and the parts segmentation of vertical direction to pretreated Tibetan language syllable, obtain the minor structure block sequence of two-layer mark result.

4. method according to claim 1, it is characterized in that, described employing is based on the segmentation hypothesis verification method of half Markov condition random field, segmentation hypothesis verification is carried out to the minor structure block sequence of described two-layer mark result, the recognition result obtaining optimum split path and parts string comprises: by different weights by part classification device, geometrically hereafter and under Linguistic context is integrated into a unified identification framework, different segmentation hypothesis verifications is carried out to the minor structure block sequence of described two-layer mark result, obtain optimum split path and the recognition result of parts string.

5. method according to claim 4, it is characterized in that, connect described part classification device, geometrically hereafter and the weights of Linguistic context and described part classification device, geometrically hereafter and Linguistic context parameter separately by obtaining based on the criterion training minimizing negative log likelihood loss function.

6. an on-line handwritten Tibetan language syllable recognition device, is characterized in that, comprising:

Pretreatment unit, the locus of points for the Tibetan language syllable to the continuous handwriting input of user carries out pre-service;

Over-segmentation unit, for successively carrying out over-segmentation from horizontal direction and vertical direction to pretreated Tibetan language syllable, obtains the minor structure block sequence of two-layer mark result;

Segmentation hypothesis verification unit, for adopting the segmentation hypothesis verification method based on half Markov condition random field, carries out segmentation hypothesis verification to the minor structure block sequence of described two-layer mark result, obtains optimum split path and the recognition result of parts string;

Determining unit, for according to the split path of described optimum and the recognition result of parts string, determines the hand-written Tibetan language syllable classification that described user inputs.

7. device according to claim 6, is characterized in that, described pretreatment unit, and the locus of points for the Tibetan language syllable to the continuous handwriting input of user carries out removal isolated point, equidistant resampling and Gaussian smoothing.

8. device according to claim 6, is characterized in that, described over-segmentation unit, for successively carrying out the Character segmentation of horizontal direction and the parts segmentation of vertical direction to pretreated Tibetan language syllable, obtains the minor structure block sequence of two-layer mark result.

9. device according to claim 6, it is characterized in that, described segmentation hypothesis verification unit, for by different weights by part classification device, geometrically hereafter and under Linguistic context is integrated into a unified identification framework, different segmentation hypothesis verifications is carried out to the minor structure block sequence of described two-layer mark result, obtains optimum split path and the recognition result of parts string.

10. device according to claim 9, it is characterized in that, connect described part classification device, geometrically hereafter and the weights of Linguistic context and described part classification device, geometrically hereafter and Linguistic context parameter separately by obtaining based on the criterion training minimizing negative log likelihood loss function.