CN113095307A

CN113095307A - Automatic identification method for financial voucher information

Info

Publication number: CN113095307A
Application number: CN202110640739.9A
Authority: CN
Inventors: 郑瑛; 孙泉辉; 程嵩; 高翔; 郭端宏; 潘军
Original assignee: State Grid Zhejiang Electric Power Co Ltd; Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Current assignee: State Grid Zhejiang Electric Power Co Ltd; Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2021-07-09
Anticipated expiration: 2041-06-09
Also published as: CN113095307B

Abstract

The invention discloses an automatic identification method for financial voucher information. An image analysis module extracts image feature information from image information of paper financial vouchers, and then the image analysis module analyzes the image feature information to obtain data information, and the data information is stored electronically. . The present invention not only stores images for the storage records of financial vouchers, but also stores the data of financial vouchers digitally, which saves labor costs, reduces the error of manual records, and further improves the identification of verification information. the recognition accuracy.

Description

Automatic identification method for financial voucher information

Technical Field

The invention relates to the technical field of image recognition and storage, in particular to an automatic recognition method for financial voucher information.

Background

In the standard cost accounting of enterprises, financial settlement contents of the enterprises need to be involved, the financial settlement of the enterprises generally includes settlement between internal units, settlement with suppliers, settlement with banks and the like, and financial vouchers are often used in the settlement process. The existing paper financial voucher generally comprises the contents of abstract, subject, amount, certificate making date, lending direction and the like. But the paper financial voucher is comparatively difficult to preserve, and it is comparatively troublesome when checking, auditing simultaneously, need carry out the electronization with the paper financial voucher and save, the current process of carrying out the electronization with the paper financial voucher is only simple carries out the image extraction record with the paper financial voucher, can not obtain paper financial voucher kind information, amount of money information and anti-fake information automatically, and these information need artifical discernment to obtain, and efficiency is lower and make mistakes easily.

The invention patent of Chinese patent publication No. CN112598062A, published on 2021, 4 and 2, entitled "a method and device for identifying image" discloses a method and device for identifying image, the method comprises: acquiring an image to be identified; randomly cutting an image block with a preset image size from an image to be recognized; inputting the image blocks into a trained neural network classification model to obtain classification results of the image blocks; determining classification confidence according to the classification result; determining whether the current classification result is used as a final image recognition result or not according to the classification confidence; and when the current classification result cannot be used as the final image recognition result, obtaining a next image block again according to the feature map and the positioning strategy network in an iterative calculation mode, and obtaining a next classification confidence coefficient according to the next image block until the current classification result is determined to be used as the final image recognition result according to the obtained classification confidence coefficient. The invention has the disadvantages that the invention only discloses an identification method for the image, the identification method can not be applied to the image identification of the paper financial voucher, and the sum information and the like of the paper product voucher can not be well identified.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide an automatic identification method of financial voucher information.

The purpose of the invention is realized by the following technical scheme:

an image analysis module extracts image characteristic information of image information of a paper financial voucher, then the image analysis module analyzes the image characteristic information to obtain data information, and the data information is electronically stored;

the image characteristic information comprises: the category information of the paper financial voucher, the amount information of the paper financial voucher and the verification information of the paper financial voucher,

in the identification of the category information of the paper financial voucher, the image analysis module determines the category of the paper financial voucher according to different category information corresponding to different types of paper financial vouchers;

in the identification of the amount information of the paper financial voucher, the image analysis module finishes the extraction of image characteristic information by the following steps:

as shown in fig. 1, the image analysis module obtains the category of the paper financial voucher, and determines the range and category of money information extraction according to the category of the paper financial voucher, wherein the category of money information extraction is arabic digital information;

step 2, carrying out characteristic extraction on the amount information within the amount information extraction range, and then converting the image information of the amount information into numerical value information, wherein the specific mode is as follows: determining an image area of each numerical value in the amount information, wherein the numerical value in the image area is a numerical value to be recognized, recognizing any image area through a convolutional neural network, comparing the numerical value to be recognized in the image area with numerical value information pre-trained by the convolutional neural network to determine a confidence level, determining the numerical value to be recognized as the numerical value information with the confidence level being greater than or equal to a set threshold value if the confidence level of the numerical value to be recognized and certain numerical value information is greater than or equal to the set threshold value and the confidence level of the numerical value to be recognized and other numerical value information is less than the set threshold value, and performing an error correction analysis step if the confidence level of the numerical value to be recognized and all the numerical value information is less than the set threshold value so that the numerical value to be recognized determines specific;

step 3, finishing the extraction of the money information image characteristic information if all the numerical values to be identified are identified;

in the identification of the verification information of the paper financial voucher, the image analysis module restores the verification information according to a preset mapping rule to verify the authenticity.

The error correction analysis step specifically comprises:

the image area of the numerical value to be recognized is decomposed into a plurality of sub-image areas, meanwhile, the numerical value information pre-trained by the convolutional neural network is also correspondingly decomposed into a plurality of pieces of sub-numerical value information, for any sub-image area, the convolutional neural network is used for recognizing, the sub-numerical value to be recognized in the sub-image area is compared with the sub-numerical value information pre-trained by the convolutional neural network to determine the sub-confidence degree, if the confidence degree of the sub-numerical value to be recognized and certain sub-numerical value information in the sub-image area is larger than or equal to a set sub-confidence degree threshold value, and meanwhile, the confidence degree of the sub-numerical value to be recognized and other sub-numerical value information in the sub-image area is smaller than the set sub-confidence degree threshold value, the numerical value information corresponding to the sub.

According to the scheme, the paper financial voucher is divided into three types of areas to be identified, in the image characteristic information extraction process, the type information of the paper financial voucher is extracted firstly, the type information is generally easily identified and can be a two-dimensional code or a bar code, more types of information can be combined with colors, the type information identification reading accuracy of the paper financial voucher is high, and mistakes are avoided. After the category information of the paper financial voucher is determined, according to the category of the paper financial voucher, the preset range of money amount information extraction corresponding to each category is called, so that image characteristic information extraction can be rapidly performed on the money amount information, the money amount information is identified by the convolutional neural network in the scheme, and the image information is converted into specific money amount data to be stored. More, this scheme has still designed verification information, and verification information is generally used for the anti-fake discernment to paper financial document, and verification information can be anti-fake code or anti-fake watermark, can also verify whether the discernment is correct to the discernment of the classification information of paper financial document when guaranteeing paper financial document true and false. In conclusion, the scheme not only stores images for the financial voucher storage records, but also stores the data of the financial vouchers in a datamation mode, so that labor cost is saved, errors of manual records are reduced, and the identification accuracy is further improved for identification of verification information.

The money information of the paper financial voucher can be manually filled or printed data, the manual filling can cause that the size and the shape of the numerical value are not uniform due to different writing of each person, and the printed data can cause the problem of different printing depths, so that the convolutional neural network can cause the condition that the image characteristic information of the money information is extracted wrongly. Therefore, in the scheme, an error correction analysis step is set for extracting the image characteristic information of the money information, the image area of the numerical value to be identified is decomposed into a plurality of sub-image areas, if any sub-image area is matched with the corresponding sub-numerical value information, the numerical value information corresponding to the sub-numerical value information can be determined as the numerical value information of the numerical value to be identified, and the design optimizes the capacity of extracting the image characteristic information of the money information.

Preferably, the error correction analyzing step further includes:

if the confidence degrees of the sub-numerical values to be identified and at least two sub-numerical value information in the sub-image region are greater than or equal to the set sub-confidence degree threshold value, the numerical value information corresponding to the sub-numerical value information is used as candidate numerical value information, confidence degree analysis is carried out on other sub-image regions of the numerical values to be identified, if the confidence degree of the sub-numerical values to be identified and one sub-numerical value information in any other sub-image region is greater than or equal to the set sub-confidence degree threshold value, meanwhile, the confidence degrees of the sub-numerical values to be identified and other sub-numerical value information in the sub-image region are smaller than the set sub-confidence degree threshold value, and the numerical value information corresponding to the sub-numerical value information is the numerical value information of the numerical values to be identified, wherein the confidence degrees of the sub-numerical.

If the image feature information is decomposed into a plurality of sub-image regions, for a certain sub-image region, the confidence degrees of at least two pieces of sub-value information are greater than or equal to a set sub-confidence degree threshold, for example, when the lower half of Arabic numerals 6 or 8 is used as the sub-image region, the confidence degrees of 6 or 8 used as the sub-value information are both greater than the set sub-confidence degree threshold, at this time, the confidence degrees of other sub-image regions of the value to be recognized are analyzed, and the confidence degrees of the other sub-image regions of 6 or 8 are different greatly, so that the design of the scheme further improves the capability of extracting the image feature information of the.

Preferably, the sizes of the plurality of block sub-image regions are equal, and the plurality of block sub-image regions divide the image region into horizontal equal parts or vertical equal parts; the error correction analysis substep specifically comprises:

step a, for a single sub-image region, searching a first region with the same size and the highest confidence coefficient as the single sub-image region in each numerical information, recording the first region and the confidence coefficient thereof, and determining the highest confidence coefficient from the confidence coefficients corresponding to all the numerical information, wherein the numerical information corresponding to the highest confidence coefficient is candidate numerical information;

selecting sub-image areas adjacent to a single sub-image area, searching a second area with the same size and the highest confidence coefficient as the adjacent sub-image areas in the candidate numerical information, and recording the second area;

the substep c, comparing the first region with the second region, if the distance between the first region and the second region is adjacent or the area of the overlapped region between the first region and the second region is less than a set error value or the area of the overlapped region between the first region and the second region is less than a set area value, judging that the candidate numerical value information is the final numerical value information, and determining the numerical value information of the numerical value to be identified as the identification numerical value corresponding to the final numerical value information; in other cases, a jump is made to sub-step d,

and d, deleting the numerical information corresponding to the highest confidence coefficient, and repeatedly executing the substeps a to c until the identification numerical value corresponding to the numerical value to be identified is determined.

The handwritten amount information is greatly different from people to hand, so in the process of extracting the image characteristic information of the amount information, if the image area of a value to be recognized is decomposed into a plurality of sub-image areas, meanwhile, the pre-trained numerical value information of the convolutional neural network is also correspondingly decomposed into a plurality of blocks of sub-numerical value information, and a method for determining confidence coefficient of the sub-image areas and the sub-numerical value information is possibly not used for recognizing the hand-written information, therefore, the pre-trained numerical value information of the convolutional neural network is not simply decomposed into a plurality of blocks of sub-numerical value information in the scheme, but the sub-image areas are traversed and matched in the pre-trained numerical value information of the convolutional neural network to obtain the optimal matching result, so that the best possible matching of the handwritten amount information is ensured firstly, and then the sub-image areas adjacent to the sub-image areas are matched with the pre-trained numerical value information of the convolutional neural network, the first region and the second region may not be adjacent to each other or may have an overlapping region, but as long as the distance between the first region and the second region or the first region and the second region is smaller than a set error value or the area of the overlapping region between the first region and the second region is smaller than a set area value, the matching is considered to be successful.

Preferably, the substep d is executed at most twice, and if the final numerical value information is not determined after the substep d is executed twice, it is determined that the identification of the numerical value to be identified fails.

Preferably, the amount information further includes Chinese upper case digit information, the Chinese upper case digit information is recognized through an OCR, a numerical value corresponding to the Chinese upper case digit information and a numerical value corresponding to the arabic digit information are compared after recognition is completed, if the two numerical values are equal, it is determined that the arabic digit information and the Chinese upper case digit information are successfully recognized, and if the two numerical values are not equal, a correction step is executed to ensure that the amount information is successfully recognized.

The money information on the paper financial voucher is sometimes completed by manual handwriting, and the situation that Chinese capital figures and Arabic figure information are identified to be inconsistent occurs at the moment, so that the Arabic figure information and the Chinese capital figure information contained in the financial voucher are ingeniously utilized to judge whether convolutional neural network characteristic identification is wrong or not, the Arabic figure information and the Chinese capital figure information are matched, and the inconsistent numerical values can be corrected into the consistent numerical values.

Preferably, the correcting step specifically comprises: searching possible matching results with the confidence coefficient larger than the set Chinese upper case number threshold value in the Chinese upper case number information, searching possible matching results with the confidence coefficient larger than the set Arabic number threshold value in the Arabic number information, and correcting unequal numerical values into equal numerical values if the corresponding numerical values in the possible matching results are equal.

Preferably, the financial certificate information automatic identification method further comprises a method for optimizing a convolutional neural network, each numerical value corresponds to an initial image quantity in the initial convolutional neural network, after a plurality of times of convolutional neural network training, a plurality of image quantities corresponding to each numerical value exist, all images of any numerical value are compared with the pre-initial images in similarity, images with the similarity being smaller than or equal to a set threshold value are selected and retained as candidate images, then the similarity of all candidate images is compared pairwise, if the similarity between any two candidate images is smaller than the set value, the lower similarity between the two candidate images and the initial image is retained, the higher similarity is discarded, if the similarity between any two candidate images is larger than or equal to the set value, both candidate images are retained; all the retained candidate images and the initial image are retained as images corresponding to the numerical value.

The design carries out optimization training on the convolutional neural network, images which possibly appear on the same numerical value are kept as far as possible, the images are different from each other but can be regarded as images corresponding to the numerical value, the design can improve the recognition efficiency of the convolutional neural network, and the recognition accuracy of the amount information of the paper financial voucher is further ensured.

The invention has the beneficial effects that:

1. according to the scheme, the financial certificate storage records are not only stored with images, but also stored with data, so that labor cost is saved, errors of manual records are reduced, and the identification accuracy of verification information is further improved;

2. an error correction analysis sub-step is set for the extraction of the image characteristic information of the money information, and the capability of extracting the image characteristic information of the money information is optimized;

3. and the convolutional neural network is optimally trained, so that the identification accuracy of the amount information of the paper financial voucher is further ensured.

Drawings

FIG. 1 is a flow chart of the identification of the amount information of the paper financial instrument of the present invention;

fig. 2 is a schematic view of the paper financial instrument of the present invention.

Detailed Description

The invention is further described below with reference to the figures and examples.

Example 1:

in the identification of the amount information of the paper financial voucher, the image analysis module completes the extraction of the image characteristic information by the following steps, as shown in fig. 1:

step 1, an image analysis module acquires the type of a paper financial voucher, and determines the range and the type of money information extraction according to the type of the paper financial voucher, wherein the type of money information extraction is Arabic digital information;

The error correction analysis step specifically comprises:

The error correction analyzing step further includes:

The amount information also comprises Chinese upper case digit information, the Chinese upper case digit information is identified through OCR, the numerical value corresponding to the Chinese upper case digit information is compared with the numerical value corresponding to the Arabic digit information after the identification is finished, if the numerical values of the Chinese upper case digit information and the Arabic digit information are equal, the Arabic digit information and the Chinese upper case digit information are judged to be successfully identified, and if the numerical values of the Arabic digit information and the Chinese upper case digit information are not equal, a correction step is executed to ensure that the amount information is successfully identified.

The correction steps are specifically as follows: searching possible matching results with the confidence coefficient larger than the set Chinese upper case number threshold value in the Chinese upper case number information, searching possible matching results with the confidence coefficient larger than the set Arabic number threshold value in the Arabic number information, and correcting unequal numerical values into equal numerical values if the corresponding numerical values in the possible matching results are equal.

The method for automatically identifying financial voucher information further comprises a method for optimizing a convolutional neural network, wherein in the initial convolutional neural network, each numerical value corresponds to an initial image number, after a plurality of times of convolutional neural network training, the number of images corresponding to each numerical value also exists, similarity comparison is carried out on all images of any numerical value and a pre-initial image, images with the similarity less than or equal to a set threshold value are selected and reserved as candidate images, then comparing the similarity of all candidate images pairwise, if the similarity between any two candidate images is less than a set value, keeping the two candidate images with low similarity with the initial image and abandoning the two candidate images with high similarity, and keeping the two candidate images if the similarity between any two candidate images is more than or equal to a set value; all the retained candidate images and the initial image are retained as images corresponding to the numerical value.

The paper financial voucher of the invention is shown in figure 2, the financial voucher is an electric power company simulation electric charge invoice, in this embodiment, an invoice code is the type information of the paper financial voucher, the type of the paper financial voucher can be determined according to the invoice code, for the amount information of the paper financial voucher, as a template of the electric charge invoice is preset, the extraction range of the amount information can be obtained, the extraction range comprises the amount of Arabic numerals and the Chinese capitalized amount, the Arabic numerals comprise the account number of a user, the starting and ending electric meter reading, the actual electric quantity, the electric price, the electric charge and the like, the Chinese capitalized amount comprises the information in a table of total RMB (capitalized), the electric charge can simply obtain the total amount, and can obtain each classified electric charge and the total electric charge, if each classified amount is obtained, the classified sum can be added and the like to be calculated and the total sum can be checked and matched, so that the reliability of identification is further improved.

The analog electric bill displayed in the embodiment is data printed, which is regular but still has unclear writing, for example, some parts are light and cannot be identified in the printing process, or the writing printed after long-time storage is partially shallow. Therefore, the scheme considers the optimized identification reading of the amount information, and greatly improves the accuracy and effectiveness of the amount information reading. In addition, besides the temporary analog electric charge invoice, a handwritten electric charge invoice also exists, and because each person of handwritten handwriting is different, the traditional method for identifying the handwritten handwriting needs to train a large amount of convolutional neural networks, the efficiency is low, the method can quickly identify handwritten digital information, and the identification accuracy is high.

Example 2: the principle and implementation method of the automatic identification method of the financial voucher information are basically the same as the embodiment 1, and the difference is that: the sizes of the plurality of sub-image areas are equal, and the plurality of sub-image areas divide the image area into horizontal equal parts or vertical equal parts; the error correction analysis substep specifically comprises:

and d, deleting the numerical information corresponding to the highest confidence coefficient, and repeatedly executing the substeps a to c until the identification numerical value corresponding to the numerical value to be identified is determined. And the substep d is executed at most twice, and if the final numerical value information is not determined after the substep d is executed twice, the numerical value to be identified is judged to fail to be identified.

The above-described embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention in any way, and other variations and modifications may be made without departing from the spirit of the invention as set forth in the claims.

Claims

1. an automatic identification method for financial voucher information, characterized in that the image analysis module extracts image feature information from the image information of paper financial vouchers, and then the image analysis module analyzes and obtains data information according to the image feature information, and the data information is processed electronically. storage;

The image feature information includes: category information of paper financial vouchers, amount information of paper financial vouchers, and verification information of paper financial vouchers,

In the identification of category information of paper financial documents, the image analysis module determines the category of paper financial documents according to different categories of information corresponding to different types of paper financial documents;

In the identification of amount information for paper financial vouchers, the image analysis module completes the extraction of image feature information through the following steps:

Step 1, the image analysis module obtains the category of the paper financial voucher, determines the scope and category of the amount information extraction according to the category of the paper financial voucher, and the category of the amount information extraction is Arabic numeral information;

Step 2: Perform feature extraction on the amount information within the scope of the amount information extraction, and then convert the image information of the amount information into numerical information. The value to be identified, for any image area, identify it through a convolutional neural network, and compare the value to be identified in the image area with the numerical information pre-trained by the convolutional neural network to determine the confidence level. The confidence of the information is greater than or equal to the set threshold, and the confidence of the value to be identified and other numerical information is less than the set threshold, then the value to be identified is determined to be the numerical information whose confidence is greater than or equal to the set threshold. If the confidence of the identification value and all the numerical information is less than the set threshold, the error correction analysis step is performed, so that the to-be-identified value determines the specific numerical information;

Step 3: After all the values to be identified have been identified, the feature information extraction of the amount information image is completed;

In the identification of verification information of paper financial vouchers, the image analysis module restores the verification information to verify the authenticity according to the preset mapping rules;

The error correction analysis steps specifically include:

The image area of the value to be recognized is decomposed into several sub-image areas, and the numerical information pre-trained by the convolutional neural network is also decomposed into several sub-numeric information correspondingly. For any sub-image area, the convolutional neural network is used for identification. Compare the sub-value to be identified in the sub-image area with the sub-value information pre-trained by the convolutional neural network to determine the sub-confidence, if the confidence of the sub-value to be identified and a certain sub-value information in the sub-image area is greater than or equal to The set sub-confidence threshold, and at the same time the confidence of the sub-value to be identified and other sub-value information in the sub-image area is less than the set sub-confidence threshold, then the numerical information corresponding to the sub-value information is determined as to be Numerical information that identifies a numeric value.

2. a kind of financial voucher information automatic identification method according to claim 1 is characterized in that, described error correction analysis step also comprises:

If the confidence of the sub-value to be identified and at least two of the sub-value information in the sub-image area is greater than or equal to the set sub-confidence threshold, the numerical information corresponding to these sub-value information is used as candidate numerical information, and the Perform confidence analysis on other sub-image areas. If the confidence of the sub-value to be identified and a certain sub-value in any other sub-image area is greater than or equal to the set sub-confidence threshold, and the sub-value to be identified in the sub-image area The confidence with other sub-value information is less than the set threshold for the sub-value information, and the value information corresponding to the sub-value information is the same as a certain value information in the candidate value information, then the value information corresponding to the sub-value information is Value information for the value to be identified.

3. a kind of financial voucher information automatic identification method according to claim 1, is characterized in that, the size of described several sub-image areas is all equal, and several sub-image areas carry out horizontal equal division or vertical etc. The described error correction analysis sub-steps are specifically:

Sub-step a, for a single sub-image area, find the first area in each numerical information that is equal in size to the single sub-image area and has the highest confidence, record the first area and its confidence, and the confidence corresponding to all numerical information. , determine the highest confidence, and the numerical information corresponding to the highest confidence is the candidate numerical information;

Sub-step b, select the adjacent sub-image area of a single sub-image area, find the second area that is equal in size to the adjacent sub-image area and has the highest confidence in the candidate numerical information, and record the second area;

Sub-step c, compare the first area and the second area, if the first area and the second area are adjacent or the distance between the first area and the second area is less than the set error value or the difference between the first area and the second area. If the area of the overlapping area is smaller than the set area value, the candidate numerical information is judged to be the final numerical information, and the numerical information of the numerical value to be identified is determined as the identification value corresponding to the final numerical information; in other cases, jump to sub-step d,

In sub-step d, the numerical information corresponding to the highest confidence level is deleted, and sub-steps a to c are repeatedly executed until the identification value corresponding to the value to be identified is determined.

4. a kind of financial voucher information automatic identification method according to claim 3 is characterized in that, described sub-step d is carried out at most twice, if after carrying out sub-step d twice, final numerical information is not determined, then judge Recognition of the value to be recognized failed.

5. according to a kind of financial voucher information automatic identification method described in any one of claim 1-4, it is characterized in that, described amount information also comprises Chinese capital numeral information, and Chinese capital numeral information is identified by OCR, After the recognition is completed, compare the numerical value corresponding to the Chinese capital numeral information with the numerical value corresponding to the Arabic numeral information. If the two values are equal, it is judged that the Arabic numeral information and the Chinese capital numeral information have been successfully identified. If the two values are not equal, the correction is performed. Steps to ensure that the amount information is successfully identified.

6. a kind of financial voucher information automatic identification method according to claim 5, is characterized in that, described revision step is specifically: in Chinese capital numeral information, look for the possible match whose confidence is greater than the Chinese capital numeral threshold of setting As a result, the Arabic numeral information is searched for possible matching results whose confidence is greater than the set Arabic numeral threshold. If the corresponding values in the possible matching results are equal, the unequal values are corrected to equal values.

7. a kind of financial voucher information automatic identification method according to claim 1 is characterized in that, also comprises the method that convolutional neural network is optimized, in initial convolutional neural network, each numerical value corresponds to an initial image Quantity, after several times of convolutional neural network training, there will also be several images corresponding to each value. Compare all images of any value with the pre-initial image for similarity, and select the similarity less than or equal to the setting The images with the threshold value of , keep these images as candidate images, and then compare the similarity between all candidate images. If the similarity between any two candidate images is less than the set value, then the two candidate images are compared with each other. If the similarity between any two candidate images is greater than or equal to the set value, the two candidate images are retained; all retained candidate images and The initial images are kept as the images corresponding to this value.