CN104182966A

CN104182966A - Automatic splicing method of regular shredded paper

Info

Publication number: CN104182966A
Application number: CN201410340616.3A
Authority: CN
Inventors: 段倩; 金鑫; 浦志强; 李医民; 朱峰
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2014-07-16
Filing date: 2014-07-16
Publication date: 2014-12-03
Anticipated expiration: 2034-07-16
Also published as: CN104182966B

Abstract

The invention belongs to image processing technology, and in particular relates to a method for automatic splicing of regular shredded paper. The technical scheme of the present invention is realized through six steps: (1) prepare the image data set and perform preprocessing; (2) classify the shredded paper according to Chinese and English, single and double sides; (3) extract the local area feature of each image , such as the position and gray value of the boundary pixel of the shredded paper, and the height of the upper (lower) boundary; the extraction range of the features of the English shredded paper is expanded, and the additional features include: the line height of the English shredded paper, the English old paper (4) Reclassify the fragments according to the feature values extracted in step (3); (5) Partially match the fragments, row matching and column matching; (6) Matched images are restored. The method provided by the invention can splice a large amount of shredded paper more accurately.

Description

A method for automatic splicing of regular shredded paper

技术领域 technical field

本发明属于图像处理技术的应用领域，具体涉及一种规则碎纸自动拼接方法。 The invention belongs to the application field of image processing technology, and in particular relates to an automatic splicing method of regular shredded paper. the

背景技术 Background technique

碎纸拼接技术是数字图像处理技术的一个重要研究分支，它是将一组相互间存在重叠部分的碎纸进行空间匹配对准，从而进行无缝拼接得到完整的、宽视角场景的图像。 Shredded paper splicing technology is an important research branch of digital image processing technology. It is to spatially match and align a group of shredded papers that overlap with each other, so as to seamlessly stitch together to obtain a complete image of a wide-view scene. the

碎纸自动拼接复原技术在司法物证复原、历史文献修复以及军事情报获取等领域都有着重要的应用。近年来，随着德国斯塔西文件恢复工程的公布，碎纸文件复原技术的研究引起了广泛的关注。 Shredded paper automatic splicing and restoration technology has important applications in the fields of judicial evidence restoration, historical document restoration, and military intelligence acquisition. In recent years, with the publication of the German Stasi Document Restoration Project, the research on shredded document recovery technology has attracted widespread attention. the

碎纸拼接必须完成的关键是碎片的匹配技术。传统破碎文件的拼接，更多的是使用碎片的边缘形状提取其轮廓曲线并利用计算机算法进行拼接。现如今随着碎纸机的广泛应用，越来越多的破碎纸片拼接问题中，碎纸的边缘形状都大致相同，边缘形状拼接不再适用。对于规则形状的碎纸，则是根据纸片边缘所包含的文字内容，通过图像配准运算确定碎纸边界的参数，对碎片进行匹配，最终实现无缝拼接。但是在实际应用当中，待拼接的纸片数量越大，具有相似文字信息的纸片边缘数量也就越大，且相似程度越高。而计算机扫描形成数字图像的分辨率具有一定的局限性，因此在拼接过程中，会出现一定量的错误拼接。理想的拼接技术所要达到的效果便是“零错误”。就现有的技术现状来看，现有的碎纸拼接方法大都针对于非规则形状，能够有效应用于大型宽幅规则纸片拼接的方法较为少见。 The key to the splicing of shredded paper is the matching technology of the fragments. The splicing of traditional broken files is more about using the edge shape of the fragments to extract their contour curves and using computer algorithms for splicing. Nowadays, with the widespread application of paper shredders, in more and more splicing problems of shredded paper, the edge shapes of shredded paper are roughly the same, and edge shape splicing is no longer applicable. For regular-shaped shredded paper, according to the text content contained in the edge of the paper, the parameters of the shredded paper boundary are determined through image registration operations, and the shreds are matched to finally achieve seamless splicing. However, in practical applications, the larger the number of paper sheets to be spliced, the larger the number of edges of paper sheets with similar text information, and the higher the similarity. However, the resolution of digital images formed by computer scanning has certain limitations, so a certain amount of wrong splicing will occur during the splicing process. The effect that the ideal splicing technology wants to achieve is "zero error". As far as the current technical situation is concerned, most of the existing splicing methods for shredded paper are aimed at irregular shapes, and methods that can be effectively applied to splicing large-scale wide and regular pieces of paper are relatively rare. the

提高碎纸自动拼接质量的技术关键在于如何高质量地获取碎纸上的文字或图像信息。一般来说，碎片上的信息量越小，拼接错误甚至是无法拼接的几率越大。因此迄今为止，在该技术领域对碎纸图像进行自动拼接过程希望能够得到最终高质量的宽幅碎纸拼接存在纸较大的技术难度。 The technical key to improving the quality of automatic splicing of shredded paper lies in how to obtain text or image information on shredded paper with high quality. Generally speaking, the smaller the amount of information on the fragments, the greater the chance of splicing errors or even failure to splice. Therefore, so far, in this technical field, the process of automatically splicing shredded paper images in the hope of obtaining the final high-quality wide-width shredded paper splicing has relatively large technical difficulties. the

发明内容 Contents of the invention

本发明的目的是提供一种规则碎纸自动拼接的方法，能够更加准确地对大量碎纸片进行拼接。 The purpose of the present invention is to provide a method for automatically splicing regular shredded paper, which can more accurately splice a large number of shredded paper. the

本发明是通过以下技术方案实现的，主要包括以下六个步骤： The present invention is achieved through the following technical solutions, mainly comprising the following six steps:

1.图像数据集的准备和预处理的具体步骤包括： 1. The specific steps of image data set preparation and preprocessing include:

1.1将碎纸片从左到右、从上到下依次编号，记为1，2，3···n；若需要区分正反面，则正面记为a1，a2，a3···an；反面记为b1，b2，b3···bn； 1.1 Number the scraps of paper from left to right and from top to bottom, and record them as 1, 2, 3...n; if you need to distinguish the front and back, record the front as a1, a2, a3...an; Recorded as b1, b2, b3...bn;

1.2将图像数字化，以像素点作为最小单位，并提取各像素点的灰度值和所在位置，建立函数矩阵； 1.2 Digitize the image, take the pixel as the smallest unit, and extract the gray value and location of each pixel, and establish a function matrix;

1.3将图像进行值化：灰度值为“0”的点为黑色点，灰度值为“255”的点为白色点，“0”与“255”之间的为灰色点； 1.3 Value the image: the point with the gray value "0" is a black point, the point with the gray value "255" is a white point, and the point between "0" and "255" is a gray point;

1.4去噪点：由于原始信息都是连续的模拟信号，数字化处理过后的图像也应该是一个具有连续趋势的间断点图像。针对同一颜色点完全包围异色点的情况，将异色点的颜色同化成周围点的颜色； 1.4 Denoising: Since the original information is a continuous analog signal, the digitally processed image should also be a discontinuous point image with a continuous trend. For the situation that the same color point completely surrounds the different color point, the color of the different color point is assimilated into the color of the surrounding points;

2.对碎纸整体进行分类，按中英文、单双面分为4种情况：中文单面、中文双面、英文单面、英文双面； 2. Classify the shredded paper as a whole, and divide it into 4 situations according to Chinese and English, single and double-sided: Chinese single-sided, Chinese double-sided, English single-sided, English double-sided;

3.分别提取出每幅图像局部区域的特征，这些特征包括：碎纸片边界像素点的位置和灰度值、上(下)边界高度；对英文碎纸片特征的提取范围进行扩大，附加特征包括：英文碎纸片的行高、英文碎纸片的水平位置、英文碎纸片的行间距； 3. Extract the features of the local area of each image respectively, these features include: the position and gray value of the pixel points on the boundary of the shredded paper, and the height of the upper (lower) boundary; expand the extraction range of the English shredded paper features, add Features include: row height of English scraps, horizontal position of English scraps, line spacing of English scraps;

特征提取的方法具体如下： The method of feature extraction is as follows:

i)碎纸片最外层的像素点的位置和灰度值： i) The position and gray value of the outermost pixel of the shredded paper:

定义碎纸片最左(右)端一列像素点为左(右)边界，最顶(底)端一行像素点为上(下)边界，提取各边界像素点的位置和灰度值； Define a column of pixels at the leftmost (right) end of the shredded paper as the left (right) boundary, and a row of pixels at the top (bottom) end as the upper (lower) boundary, and extract the position and gray value of each boundary pixel;

ii)上(下)边界高度： ii) Upper (lower) boundary height:

根据每一张碎片的上下边界是否完全白色分为白色边界高度和黑色边界高度两大类。具体分类方法如下： According to whether the upper and lower borders of each fragment are completely white, they are divided into two categories: white border height and black border height. The specific classification methods are as follows:

以碎片的最底端为x轴，以碎片左边垂直于x轴向上为y轴，x轴与y轴的相交点为原点建立坐标系，将图片上各个像素点向y轴作投影。如图1所示。一个黑色或灰色点的投影记为一次有效投影，投影次数加1，而白色点的投影无效，投影次数不改变。记录与原点之间的距离为h个像素点的投影点上的投影次数f(h)。 The bottom of the fragment is the x-axis, the left side of the fragment is perpendicular to the x-axis and the y-axis is the y-axis, and the intersection point of the x-axis and the y-axis is the origin to establish a coordinate system, and each pixel on the picture is projected to the y-axis. As shown in Figure 1. The projection of a black or gray point is recorded as a valid projection, and the number of projections is increased by 1, while the projection of a white point is invalid, and the number of projections does not change. Record the number of projections f(h) on the projection point whose distance from the origin is h pixels. the

$g g ((h h)) = = \{\begin{matrix} 00,, f f ((h h)) < < n no / / 1010 \\ 255255,, f f ((h h)) &GreaterEqual; &Greater Equal; n no / / 1010 \end{matrix}$

当投影次数f(h)小于该行总像素点n的1/10时，将y轴上点h的灰度值g(h)记为“0”；当投影次数f(h)大于或等于该行总像素点n的1/10时，将点h的灰度值g(h)记为“255”。 When the number of projections f(h) is less than 1/10 of the total pixel points n of the row, the gray value g(h) of point h on the y-axis is recorded as "0"; when the number of projections f(h) is greater than or equal to When it is 1/10 of the total pixel point n in this row, the gray value g(h) of point h is recorded as "255". the

在投影轴上，从碎片的上边界依次向下进行统计，直至出现颜色不同的点。这一段高度即为上边界高度，下边界高度亦然。 On the projected axis, count down from the upper boundary of the shards until points with different colors appear. The height of this section is the height of the upper boundary, and the height of the lower boundary is also the same. the

iii)英文碎纸片的行高： iii) Row height of English shreds:

英文字母的高度以及在同一行中所占的位置高度大致相同，因此，按照步骤i)的方式进行投影，灰度值为“1”的区间即是字母有效区间，定义有效区间的高度为行高； The height of English letters and the height of the position occupied in the same line are roughly the same, therefore, according to step i) for projection, the interval with a gray value of "1" is the effective interval of letters, and the height of the effective interval is defined as row high;

iv)英文碎纸片的水平位置： iv) Horizontal position of English fragments:

经步骤i)投影后，字母的有效投影区间的上下边界，距碎纸片顶部的距离称为该行字母的所在水平位置，用以确定该行字母在碎纸片上的位置； After step i) projection, the upper and lower boundaries of the effective projection interval of the letter, the distance from the top of the shredded paper is called the horizontal position of the row of letters, which is used to determine the position of the row of letters on the shredded paper;

v)行间距： v) Line spacing:

提取两水平位置间的垂直距离作为行间距； Extract the vertical distance between two horizontal positions as the line spacing;

4.依据步骤3所提取的特征集，对碎片进行分类： 4. Classify the fragments according to the feature set extracted in step 3:

具体步骤如下： Specific steps are as follows:

i)根据纸片边缘是否有文字笔画信息，将碎纸分为三类：上下边界碎片、左右边界碎片和中间碎片； i) According to whether there is text stroke information on the edge of the paper, the shredded paper is divided into three categories: upper and lower border fragments, left and right border fragments and middle fragments;

ii)依据行间距特征，分别对上述三类碎纸片进一步分类，相同行间距分为一类； ii) According to the characteristics of line spacing, further classify the above three types of shredded paper, and the same line spacing is divided into one category;

iii)依据上(下)边界高度，对步骤i)所形成的三类碎片集进行分类，上(下)边界高度相同或相近的碎片划分为同一碎片集： iii) According to the height of the upper (lower) boundary, classify the three types of fragment sets formed in step i), and the fragments with the same or similar height of the upper (lower) boundary are divided into the same fragment set:

划分类别需要遵循一定的条件： Classification needs to follow certain conditions:

(1)每一类的碎片数量必须等于或略小于纸张的纵切次数； (1) The number of fragments of each type must be equal to or slightly less than the number of longitudinal cuts of the paper;

(2)与其他高度相间隔的类别，若数量小于各类别碎片数量的1/5，则不独立为一个类别； (2) If the number of categories separated from other heights is less than 1/5 of the number of fragments of each category, it is not an independent category;

(3)高度相互连续的几个类别归为同一类； (3) Several categories that are highly continuous with each other are classified into the same category;

(4)最终的类别总数为纸张的横切次数； (4) The final total number of categories is the number of cross-cuts of the paper;

(5)若还是无法确定类别，则再以同样的方法对底部高度进行辅助判断。 (5) If the category still cannot be determined, use the same method to make an auxiliary judgment on the height of the bottom. the

iv)利用水平位置，对步骤ii)所形成的各个碎片集进一步分类，处于同一水平位置的碎片划分为一类； iv) Utilize the horizontal position to further classify each fragment set formed in step ii), and the fragments at the same horizontal position are divided into one class;

5.对碎片进行匹配的具体步骤： 5. Specific steps for matching fragments:

5.1对碎片进行局部匹配，即是两碎片之间的匹配，下面以左右匹配为例： 5.1 Partial matching of fragments, that is, matching between two fragments, the following takes left and right matching as an example:

i.定义X_ij为第i张碎片右边界上第j行像素点的灰度值，定义Y_i′j为第i′张碎片左边界上第j行像素点的灰度值(i≠i′)。判定匹配与否的关键在于X_ij和Y_i′j之间的匹配程度，将步骤3提取的特征集，以右边界特征为基准，定义判定标准为： i. Define X _ij as the gray value of the jth row of pixels on the right boundary of the i'th fragment, and define Y _i'j as the gray value of the jth row of pixels on the left boundary of the i'th fragment (i≠i '). The key to judging whether it matches or not lies in the degree of matching between X _ij and Y _i′j . The feature set extracted in step 3 is based on the right boundary feature, and the judgment standard is defined as:

X_ij为白色，Y_i′j-1、Y_i′j、Y_i′j+1出现灰白黑三色且不全为黑为正常，可进行匹配； X _ij is white, Y _i′j-1 , Y _i′j , Y _i′j+1 have three colors of gray, white and black and not all black, which is normal and can be matched;

X_ij为灰色，Y_i′j-1、Y_i′j、Y_i′j+1出现任意色均为正常，可进行匹配； X _ij is gray, any color of Y _i′j-1 , Y _i′j , Y _i′j+1 is normal and can be matched;

X_ij为黑色，Y_i′j-1、Y_i′j、Y_i′j+1不全为白色为正常，可进行匹配； X _ij is black, Y _i′j-1 , Y _i′j , Y _i′j+1 are not all white, it is normal and can be matched;

其余情况为不正常，不可进行匹配。 The rest of the cases are abnormal and cannot be matched. the

X_ij与Y_i′j-1、Y_i′j、Y_i′j+1的关系如图2所示。 The relationship between X _ij and Y _i′j-1 , Y _i′j , and Y _i′j+1 is shown in FIG. 2 .

其中：X_ij：第i张纸条的最左边一列的第j行像素点的灰度值； Among them: X _ij : the gray value of the pixel point in the jth row of the leftmost column of the i-th note;

Y_i′j：第i′张纸条的最右边一列的第j行像素点的灰度值； Y _i′j : the gray value of the pixel point in the jth row of the rightmost column of the i′th paper strip;

边界跟踪算法具体流程如下： The specific process of the boundary tracking algorithm is as follows:

(1)选取碎片i和i′ (1) Select fragments i and i′

(2)假设碎片i和i′相互匹配； (2) Assume that fragments i and i′ match each other;

(3)读取碎片i右边界j行像素点X_ij的灰度值； (3) Read the gray value of the pixel point X _ij of the row j of the right boundary of the fragment i;

(4)扫描碎片i′左边界的第j-1、j、j+1行的像素点Y_i′(j-1)、Y_i′j、Y_i′(j+1)的，判断其是否全为白色； (4) Scan the pixels Y i _′(j-1) , Y _i′j , Y i′(j+1 _{) of the j-1, j, j+1th} row of the left boundary of the fragment i′, and judge its Is it all white;

(5)若全为白色，且超出行范围，则j＝j+1后返回(3)； (5) If all are white and exceed the line range, return to (3) after j=j+1;

(6)若不全为白色，则j＝j+1，读取下一行，判断X_ij是否为白色； (6) If it is not all white, then j=j+1, read the next line, and judge whether X _ij is white;

(7)若为白色，则返回(5)； (7) If it is white, return to (5);

(8)若不为白色，则判断Y_i′(j-1)、Y_i′j、Y_i′(j+1)是否全为白色； (8) If it is not white, judge whether Y _i′(j-1) , Y _i′j , Y _i′(j+1) are all white;

(9)若全为白色，则返回(5)； (9) If all are white, then return to (5);

(10)若不为白色，则j＝j+1读取下一行，判断X_ij的颜色； (10) If it is not white, then j=j+1 reads the next line, and judges the color of X _ij ;

(11)若为白色，则返回(5)； (11) If it is white, return to (5);

(12)若为灰色，则返回(5)； (12) If it is gray, return to (5);

(13)若为黑色，则判断Y_i′(j-1)、Y_i′j、Y_i′(j+1)是否全为白色； (13) If it is black, judge whether Y _i′(j-1) , Y _i′j , Y _i′(j+1) are all white;

(14)若不为白色，则返回(5)； (14) If it is not white, return to (5);

(15)若全为白色，则碎片i和i′匹配过程结束，碎片i和i′不匹配； (15) If all are white, the matching process of fragments i and i' ends, and fragments i and i' do not match;

(16)若j+1超出行范围，则碎片i和i′匹配过程结束，碎片i和i′匹配。 (16) If j+1 exceeds the row range, the matching process of fragment i and i' ends, and fragment i and i' match. the

ii.根据步骤i的判定标准，确定图像匹配指数的数学模型，具体为： ii. According to the judgment standard of step i, determine the mathematical model of the image matching index, specifically:

${S S}_{{ii i}^{' '}} = = {Σ Σ}_{j j = = 11}^{N N} {T T}_{{i i}^{' '}} (({X x}_{ij ij}))$

其中： in:

S_ii′：第i张碎片与第i′张碎片的匹配指数； S _ii′ : the matching index between the i-th fragment and the i′-th fragment;

N：碎纸片竖直高度上像素点的总数； N: the total number of pixels on the vertical height of the shredded paper;

X_ij：碎片i右边界j行像素点的灰度值； X _ij : the gray value of the pixel points in row j on the right boundary of fragment i;

T_i′(X_ij)：判断第i张碎纸第j行的右边界特征与对应行的第i′张纸条的左边界特征的匹配指数； T _i′ (X _ij ): judging the matching index of the right boundary feature of the i-th shredded paper row j and the left boundary feature of the i′-th paper strip in the corresponding row;

该匹配指数具体表示为： The matching index is specifically expressed as:

${T T}_{{i i}^{' '}} (({X x}_{ij ij})) = = \{\begin{matrix} {T T}_{11} (({X x}_{ij ij})),, & {X x}_{ij ij} = = 00,, \\ {T T}_{22} (({X x}_{ij ij})),, & 00 < < {X x}_{ij ij} \leq \leq 255255 . . \end{matrix}$

其中，T₂(X_ij)＝0. in, T ₂ (X _ij )=0.

当且仅当S_ii′指数为0时，两碎片才视为可匹配；若不为0则不能进行匹配，且数值越大，匹配程度越差。 If and only when the S _ii′ index is 0, the two fragments are regarded as matching; if it is not 0, the matching cannot be performed, and the larger the value, the worse the matching degree.

5.2步骤5.1已经完成了两碎片之间的局部匹配过程，将步骤5.1获得的符合匹配条件的碎纸片，形成各个小的碎片集，对碎片集进行行匹配和列匹配，i.行匹配的具体过程： 5.2 Step 5.1 has completed the local matching process between the two fragments. The shreds that meet the matching conditions obtained in step 5.1 are formed into small fragment sets, and row matching and column matching are performed on the fragment sets. i. Specific process:

i)以其中一张碎片为基准，若两碎片的局部匹配成功，则将两碎片合并为一张碎纸片，放入新的碎片集；若局部匹配未成功，则保留基准碎片，继续局部匹配。原碎片集中的碎片均无法成功局部匹配时，均放入新碎片集； i) Based on one of the fragments, if the partial matching of the two fragments is successful, the two fragments will be merged into one fragment and put into a new fragment set; if the partial matching is not successful, the reference fragment will be kept and the local match. When none of the fragments in the original fragmentation set can be partially matched successfully, they are put into the new fragmentation set;

ii)新的碎片按照上述步骤重复进行，直至所有碎纸片拼接成完整的碎片行。 ii) For new pieces, repeat the above steps until all pieces of paper are spliced into a complete line of pieces. the

ii.根据上述过程，确定图像行匹配指数的数学模型，具体为： ii. According to the above process, determine the mathematical model of the image row matching index, specifically:

目标函数： Objective function:

W＝min∑S_ii′ W=min∑S _ii′

约束条件： Restrictions:

$\{\begin{matrix} {S S}_{{ii i}^{' '}} = = {Σ Σ}_{j j = = 11}^{N N} {T T}_{{i i}^{' '}} (({X x}_{ij ij})) \\ {T T}_{{i i}^{' '}} (({X x}_{ij ij})) = = \{\begin{matrix} {T T}_{11} (({X x}_{ij ij})),, & {X x}_{ij ij} = = 00,, \\ {T T}_{22} (({X x}_{ij ij})),, & 00 < < {X x}_{ij ij} \leq \leq 255255 . . \end{matrix} \\ a a &GreaterEqual; &Greater Equal; M m \end{matrix}$

其中，a为S_ii′的个数； Wherein, a is the number of S _ii′ ;

W的最小值为0； The minimum value of W is 0;

M为碎纸片纵向切割的次数 M is the number of longitudinal cuts of shredded paper

iii将通过行匹配的碎片集形成碎片行，对碎片行矩阵进行转置，再以同样的方法进行列匹配； iii will form a fragmented row through the fragmented set matched by the row, transpose the fragmented row matrix, and then perform column matching in the same way;

6，将步骤5匹配之后的图像进行还原。 6. Restore the image after matching in step 5. the

本发明的有效利益是： Effective benefits of the present invention are:

可以一次性拼接处理数量较为庞大的碎纸片，并就匹配拼接流程提出了相应的优化解决方案，其主要体现在： It can splice and process a relatively large number of shredded paper at one time, and proposes a corresponding optimization solution for the matching splicing process, which is mainly reflected in:

(1)针对中文碎片，仅对边界做特征处理，对边界特征进行数学模型建立，因此，本发明在更新图像样本数据库时，扫描数据库的范围大大缩小，在大量待拼接碎纸的情况下，具有时间优势。 (1) For Chinese fragments, only feature processing is performed on the boundary, and a mathematical model is established on the boundary feature. Therefore, when the present invention updates the image sample database, the scope of the scanned database is greatly reduced. In the case of a large number of shredded paper to be spliced, Has a time advantage. the

(2)本发明所设计的边界跟踪算法，可以确保碎纸匹配过程中的唯一性，进一步提高了本发明的有效性和可操作性。 (2) The boundary tracking algorithm designed by the present invention can ensure the uniqueness in the shredded paper matching process, further improving the effectiveness and operability of the present invention. the

附图说明 Description of drawings

图1是本发明实施例的碎纸片文字投影图； Fig. 1 is the text projection diagram of shredded paper of the embodiment of the present invention;

图2是本发明实施例的局部匹配图； Fig. 2 is the local matching figure of the embodiment of the present invention;

图3是本发明实施例的边界跟踪算法流程图。 Fig. 3 is a flow chart of the boundary tracking algorithm of the embodiment of the present invention. the

具体实施方式 Detailed ways

下面以中文单面为例，简单地说明本发明的执行过程。本实例共选择了209张碎纸片图像，这209张碎片一张A4纸横切10刀，纵切18刀。具体执行步骤如下： Taking Chinese single-sided as an example, the execution process of the present invention will be briefly described below. In this example, a total of 209 shredded paper images are selected, and these 209 shredded pieces are cut 10 times crosswise and 18 lengthwise on an A4 paper. The specific execution steps are as follows:

(1)预处理 (1) Pretreatment

(a)进行图像数据集的准备和预处理，包括图像数字化、去噪、二值化； (a) Prepare and preprocess the image data set, including image digitization, denoising, and binarization;

(b)将碎纸片排列为11行19列的矩阵，按从左向右、从上到下的顺序依次从1到209编号； (b) Arrange the scraps of paper into a matrix of 11 rows and 19 columns, numbered from 1 to 209 in order from left to right and from top to bottom;

(2)提取碎片矩阵的特征集，这些特征包括：碎纸片最外层的像素点的位置和灰度值、上边界高度； (2) Extract the feature set of the debris matrix, these features include: the position and gray value of the pixel points of the outermost layer of the debris, and the height of the upper boundary;

(3)利用特征值对碎片进行分类： (3) Use eigenvalues to classify fragments:

①根据纸片边缘是否有文字笔画信息，将碎纸分为三类：左右边界碎片各11张、上下边界碎片各19张和中间碎片149张； ① According to whether there is text and stroke information on the edge of the paper, the shredded paper is divided into three categories: 11 left and right border fragments, 19 upper and lower border fragments, and 149 middle fragments;

②根据提取的上边界高度，对碎片进行分类： ② Classify the fragments according to the extracted upper boundary height:

一般情况下，同一行碎片的白色上边界高度或黑色上边界高度是大致相同的。计算出每一碎片的边界高度并将具有相同边界高度的碎片归为一类，并统计该类碎片数量。统计结果如表一所示。 Generally, the height of the white upper border or the height of the black upper border of the same row of fragments is roughly the same. Calculate the boundary height of each fragment and classify the fragments with the same boundary height into one category, and count the number of such fragments. The statistical results are shown in Table 1. the

表一具有相同上边界高度的碎片数量 Table 1 Number of Fragments with the Same Upper Boundary Height

从表一中可以看出白色顶部和黑色顶部一共有43组，而图片仅被切割成为了11行，因此，需要对已划分的类别做进一步处理。 It can be seen from Table 1 that there are 43 groups of white tops and black tops, and the picture is only cut into 11 rows. Therefore, further processing of the divided categories is required. the

(6)每一类的碎片数量必须等于或略小于19； (6) The number of fragments of each category must be equal to or slightly less than 19;

(7)与其他高度相间隔的类别，若数量小于10，则不独立为一个类别(例如高度3)； (7) The category separated from other heights, if the number is less than 10, is not an independent category (for example, height 3);

(8)高度相互连续的几个类别归为同一类； (8) Several categories that are highly continuous with each other are classified into the same category;

(9)最终的类别总数为11类； (9) The final total number of categories is 11 categories;

(10)若还是无法确定类别，则再根据底部高度进行辅助判断。 (10) If the category still cannot be determined, an auxiliary judgment will be made based on the height of the bottom. the

经过进一步处理过后的分类情况如表二所示： The classification after further processing is shown in Table 2:

表二上边界高度的分类及对应的碎片数量 Table 2 Classification of the height of the upper boundary and the corresponding number of debris

(4)基于边界跟踪算法对碎片进行行、列匹配； (4) Row and column matching of fragments based on boundary tracking algorithm;

(5)显示拼接后的图像： (5) Display the spliced image:

表三拼接后的碎片编号表 Table 3 Fragment number table after splicing

049 049 054 054 065 065 143 143 186 186 002 002 057 057 192 192 178 178 118 118 190 190 095 095 011 011 022 022 129 129 028 028 091 091 188 188 141 141 061 061 019 019 078 078 067 067 069 069 099 099 162 162 096 096 131 131 079 079 063 063 116 116 163 163 072 072 006 006 177 177 020 020 052 052 036 036 168 168 100 100 076 076 062 062 142 142 030 030 041 041 023 023 147 147 191 191 050 050 179 179 120 120 086 086 195 195 026 026 001 001 087 087 018 018 038 038 148 148 046 046 161 161 024 024 035 035 081 081 189 189 122 122 103 103 130 130 193 193 088 088 167 167 025 025 008 008 009 009 105 105 074 074 071 071 156 156 083 083 132 132 200 200 017 017 080 080 033 033 202 202 198 198 015 015 133 133 170 170 205 205 085 085 152 152 165 165 027 027 060 060 014 014 128 128 003 003 159 159 082 082 199 199 135 135 012 012 073 073 160 160 203 203 169 169 134 134 039 039 031 031 051 051 107 107 115 115 176 176 094 094 034 034 084 084 183 183 090 090 047 047 121 121 042 042 124 124 144 144 077 077 112 112 149 149 097 097 136 136 164 164 127 127 058 058 043 043 125 125 013 013 182 182 109 109 197 197 016 016 184 184 110 110 187 187 066 066 106 106 150 150 021 021 173 173 157 157 181 181 204 204 139 139 145 145 029 029 064 064 111 111 201 201 005 005 092 092 180 180 048 048 037 037 075 075 055 055 044 044 206 206 010 010 104 104 098 098 172 172 171 171 059 059 007 007 208 208 138 138 158 158 126 126 068 068 175 175 045 045 174 174 000 000 137 137 053 053 056 056 093 093 153 153 070 070 166 166 032 032 196 196 089 089 146 146 102 102 154 154 114 114 040 040 151 151 207 207 155 155 140 140 185 185 108 108 117 117 004 004 101 101 113 113 194 194 119 119 114 114

Claims

1. A method for automatic splicing of regular shredded paper, characterized in that: the method mainly comprises the following six steps:

(1) Preparation and preprocessing of image data sets, including numbering shredded paper, image digitization, image value, and denoising;

(2) Classify the shredded paper as a whole, and divide it into 4 situations according to Chinese and English, single and double sides: Chinese single side, Chinese double sided, English single side, English double sided;

(3) Extract the features of the local area of each image respectively. For Chinese, these features include: the position and gray value of the boundary pixel point of the shredded paper, and the height of the upper and lower boundaries; expand the extraction range of the English shredded paper features, and add The features also include: row height of English scraps, horizontal position of English scraps, line spacing of English scraps;

(4) According to the feature set extracted in step 3, the fragments are classified: firstly, they are divided into three types: upper and lower boundary fragments, left and right boundary fragments, and middle fragments; Fragments of features are grouped into one class;

(5)a: Carry out local matching on the fragments according to the boundary tracking algorithm, and determine the matching index of the boundary features of the fragments; b: Use the boundary tracking algorithm to perform row matching and column matching on the shreds that meet the matching conditions obtained in step a;

(6) Restore the image after step 5 matching.

2. The method for automatically splicing a piece of regular shredded paper according to claim 1, characterized in that: the numbering method described in step (1) is: number the shredded paper pieces sequentially from left to right and from top to bottom, It is recorded as 1, 2, 3...n; if it is necessary to distinguish the front and back, the front is recorded as a1, a2, a3...an; the reverse is recorded as b1, b2, b3...bn; step (1) The image digitization described above refers to: taking pixels as the minimum unit, and extracting the gray value and location of each pixel, and establishing a function matrix; the image value described in step (1) refers to: the gray value is "0 " is a black point, a point with a grayscale value of "255" is a white point, and a point between "0" and "255" is a gray point; the denoising point described in step (1) refers to: for the same color point Completely surround the heterochromatic point, and assimilate the color of the heterochromatic point to the color of the surrounding points.

3. The method for automatic splicing of a piece of regular shredded paper according to claim 1, characterized in that: the position and gray value feature and extraction method of the boundary pixel points of the shredded paper pieces described in step (3) are: define shredded paper The leftmost or rightmost column of pixels on the paper is the left or right boundary, and the topmost or bottommost row of pixels is the upper and lower boundaries, and the position and gray value of each boundary pixel are extracted;

The upper and lower boundary height features described in step (3) and the extraction method are: according to whether the upper and lower boundaries of each fragment are completely white, they are divided into two categories: white boundary height and black boundary height; the specific classification methods are as follows:

Take the bottom of the fragment as the x-axis, the left side of the fragment perpendicular to the x-axis as the y-axis, and the intersection point of the x-axis and the y-axis as the origin to establish a coordinate system, and horizontally project each pixel point on the picture to the y-axis; The projection of a black or gray point is recorded as a valid projection, and the number of projections is increased by 1, while the projection of a white point is invalid, and the number of projections does not change; record the number of projections f(h) on a projection point that is h pixels away from the origin;

g g ((h h)) = = \{\begin{matrix} 00,, f f ((h h)) < < n no / / 1010 \\ 255255,, f f ((h h)) &GreaterEqual; &Greater Equal; n no / / 1010 \end{matrix}

When the number of projections f(h) is less than 1/10 of the total number of pixels n in the row, record the gray value g(h) of point h on the y-axis as "0"; when the number of projections f(h) is greater than or equal to When 1/10 of the total number of pixels n in this row, the gray value g(h) of point h is recorded as "255";

On the projection axis, count down from the upper boundary of the fragments until points with different colors appear; the height of this section is the height of the upper boundary, and the height of the lower boundary is the same;

The line height feature and extraction method of the English fragments described in step (3) are: the height of the English letters and the height of the positions occupied in the same row are roughly the same, therefore, after the English fragments are horizontally projected, the gray value The interval of "1" is the effective interval of the letter, and the height of the effective interval is defined as the row height, and the row height is extracted;

The horizontal position feature and extraction method of the English fragment described in step (3) are: after the English fragment is carried out lateral projection, the upper and lower boundaries of the effective projection interval of the letter, the distance from the top of the fragment is called the horizontal position of the row letter. position, to extract horizontal position features;

The line spacing feature and extraction method of the English shreds described in step (3) are: the vertical distance between two horizontal positions is defined as the line spacing, and the line spacing feature is extracted.

4. The method for automatic splicing of a piece of rule shredded paper according to claim 1, characterized in that: according to the feature set extracted in step 3 described in step (4), the fragments are classified: the specific steps are:

i) According to whether there is text stroke information on the border of the paper sheet, the shredded paper is divided into three categories: upper and lower border fragments, left and right border fragments and middle fragments;

ii) According to the characteristics of line spacing, further classify the above three types of shredded paper, and the same line spacing is divided into one category;

iii) Classify the three types of fragment sets formed in step i according to the height of the upper and lower boundaries, and the fragments with the same or similar heights as the upper and lower boundaries are divided into the same fragment set;

iv) Use the horizontal position feature to further classify each fragment set formed in step ii, and divide the fragments at the same horizontal position into one category.

5. The method for automatic splicing of a piece of regular shredded paper according to claim 1 or 4, characterized in that: in the described step iii), the conditions for classifying are to be observed:

(1) The number of fragments of each type must be equal to or slightly less than the number of longitudinal cuts of the paper;

(2) If the number of categories separated from other heights is less than 1/5 of the number of fragments of each category, it is not an independent category;

(3) Several categories that are highly continuous with each other are classified into the same category;

(4) The final total number of categories is the number of cross-cuts of the paper;

(5) If the category still cannot be determined, an auxiliary judgment will be made based on the height of the bottom.

6. The method for automatic splicing of a piece of regular shredded paper according to claim 1, characterized in that: the step of matching the fragments described in step (5) comprises:

a. Partially match the fragments according to the boundary tracking algorithm, that is, match between two fragments, and determine the matching index of the boundary characteristics of the fragments;

b. The shredded pieces of paper obtained in step a that meet the matching conditions are formed into small fragment sets, and the boundary tracking algorithm is used to perform row matching and column matching on the fragment sets.

7. The method for automatic splicing of a piece of ruled shredded paper according to claim 6, characterized in that: the criterion for partial matching described in step a is: define X _ij as the jth row of pixels on the right boundary of the ith fragment The gray value of the point, define Y _i'j as the gray value of the pixel point in the jth row on the left boundary of the i'th fragment (i≠i'); the key to determine whether it matches or not lies in _Xij and Y _i'j The degree of matching between, the feature set extracted in step 3 is based on the right boundary feature, and the judgment standard is defined as:

X _ij is white, Y _i′j-1 , Y _i′j , Y _i′j+1 have three colors of gray, white and black and not all black, which is normal and can be matched;

X _ij is gray, any color of Y _i′j-1 , Y _i′j , Y _i′j+1 is normal and can be matched;

X _ij is black, Y _i′j-1 , Y _i′j , Y _i′j+1 are not all white, it is normal and can be matched;

The rest of the cases are abnormal and cannot be matched;

The relationship between X _ij and Y _i′j-1 , Y _i′j , Y _i′j+1 is shown in Table 1;

Among them: X _ij : the gray value of the pixel point in the jth row of the leftmost column of the i-th note;

Y _i′j : the gray value of the pixel point in the jth row of the rightmost column of the i′th paper strip.

8. A kind of regular shredded paper automatic mosaic method according to claim 7, is characterized in that: according to the judging standard of described partial match, determine the mathematical model of image matching index, be specifically:

{S S}_{{ii i}^{' '}} = = {Σ Σ}_{j j = = 11}^{N N} {T T}_{{i i}^{' '}} (({X x}_{ij ij}))

in:

S _ii′ : the total matching index between the i-th fragment and the i′-th fragment;

N: the total number of pixels on the vertical height of the shredded paper;

X _ij : the gray value of the pixel points in row j on the right boundary of fragment i;

T _i' (X _ij ): judge the matching index of the data of the right boundary feature of the i-th shredded paper row j and the data of the left boundary feature of the i'th paper strip in the corresponding row;

The matching index is specifically expressed as:

{T T}_{{i i}^{' '}} (({X x}_{ij ij})) = = \{\begin{matrix} {T T}_{11} (({X x}_{ij ij})),, & {X x}_{ij ij} = = 00,, \\ {T T}_{22} (({X x}_{ij ij})),, & 00 < < {X x}_{ij ij} \leq \leq 255255 . . \end{matrix}

in, T ₂ (X _ij )=0

If and only when the S _ii′ index is 0, the two fragments are regarded as matching; if it is not 0, the matching cannot be performed, and the larger the value, the worse the matching degree.

9. The method for automatic splicing of a ruled shredded paper according to claim 6, characterized in that: the specific flow of the boundary tracking algorithm described in step a is as follows:

1) Select fragments i and i';

2) Assume fragments i and i' match each other;

3) Read the gray value of the pixel point X _ij in row j of the right boundary of fragment i;

4) Scan the pixel points Y i'(j-1) , Y i'j _, Y _i'(j+1) of the j-1, j, j+1 rows adjacent to X _ij on the left border of the fragment i _' , judge whether it is all white;

5) If all are white and exceed the line range, return 3 after j+1);

6) If it is not all white, then j+1 reads the next line, and judges whether X _ij is white;

7) If it is white, return to 5);

8) If it is not white, judge whether Y _i′(j-1) , Y _i′j , Y _i′(j+1) are all white;

9) If all are white, return to 5);

10) If it is not white, then j+1 reads the next line to judge the color of X _ij ;

11) If it is white, return to 5);

12) If it is gray, return to 5);

13) If it is black, judge whether Y _i′(j-1) , Y _i′j , Y _i′(j+1) are all white;

14) If it is not white, return to 5);

15) If all are white, the matching process of fragments i and i' ends, and fragments i and i' do not match;

16) If j+1 exceeds the row range, the matching process of fragment i and i' ends, and fragment i and i' match.

10. The method for automatic splicing of a piece of regular shredded paper according to claim 6, characterized in that: the specific process of performing row matching and column matching described in step b is:

(1) The specific process of line matching:

i) Based on one of the fragments, if the partial matching of the two fragments is successful, the two fragments will be merged into one fragment and put into a new fragment set; if the partial matching is not successful, the reference fragment will be kept and the local Matching; when the fragments in the original fragmentation set cannot be partially matched successfully, they are put into the new fragmentation set;

ii) For new pieces, repeat the above steps until all the pieces of paper are spliced into a whole row;

(2) According to the above process, determine the mathematical model of the image row matching index, specifically:

Objective function:

W=min∑S _ii′

Restrictions:

\{\begin{matrix} {S S}_{{ii i}^{' '}} = = {Σ Σ}_{j j = = 11}^{N N} {T T}_{{i i}^{' '}} (({X x}_{ij ij})) \\ {T T}_{{i i}^{' '}} (({X x}_{ij ij})) = = \{\begin{matrix} {T T}_{11} (({X x}_{ij ij})),, & {X x}_{ij ij} = = 00,, \\ {T T}_{22} (({X x}_{ij ij})),, & 00 < < {X x}_{ij ij} \leq \leq 255255 . . \end{matrix} \\ a a &GreaterEqual; &Greater Equal; M m \end{matrix}

Wherein, a is the number of S _ii′ ;

The minimum value of W is 0;

M is the number of times the shredded paper is cut horizontally;

(3) Form a fragment row from the fragment set matched by row, transpose the fragment row matrix, and then perform column matching. The specific process of column matching is the same as that of row matching.