CN108133212B - A deep learning-based fixed invoice amount recognition system - Google Patents
A deep learning-based fixed invoice amount recognition system Download PDFInfo
- Publication number
- CN108133212B CN108133212B CN201810011763.4A CN201810011763A CN108133212B CN 108133212 B CN108133212 B CN 108133212B CN 201810011763 A CN201810011763 A CN 201810011763A CN 108133212 B CN108133212 B CN 108133212B
- Authority
- CN
- China
- Prior art keywords
- image
- module
- picture
- deep learning
- recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/243—Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/2163—Partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Character Input (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a quota invoice amount identification system based on deep learning, which comprises an image acquisition module, an image rotation module, an image identification module and a result storage module, wherein the image acquisition module is used for acquiring an image file; the image rotation module is used for correcting the picture file; the image recognition module obtains the specific position of the image file to be recognized by using a deep learning model and performs image recognition; and the result storage module is used for storing the final recognition result. The invention can improve the OCR recognition rate when the image is polluted.
Description
Technical Field
The invention relates to the technical field of image recognition, in particular to a quota invoice amount recognition system based on deep learning.
Background
The concept of OCR (Optical Character Recognition) was proposed earlier than in the 1920 s and has been an important research direction in the field of pattern Recognition.
In recent years, with the rapid update and iteration of mobile devices and the rapid development of mobile internet, OCR has a wider application scene, from the character recognition of the original scanned files to the recognition of picture characters in natural scenes, such as the characters in identification cards, bank cards, house numbers, bills and various network pictures.
Conventional OCR techniques are as follows:
firstly, text positioning is carried out, then inclined text correction is carried out, then single characters are segmented, the single characters are identified, and finally semantic error correction is carried out based on a statistical model (such as hidden Markov chain (HMM)). The treatment mode can be divided into three stages: a preprocessing stage, an identification stage and a post-processing stage. The key is the preprocessing stage, the quality of which directly determines the final recognition result, and therefore the following preprocessing stage is described in detail herein.
The pretreatment stage comprises three steps:
(1) the method is characterized in that a character area in a picture is positioned, character detection is mainly based on a connected domain analysis method, the main idea is to rapidly separate the character area from a non-character area by clustering character color, brightness and edge information, and two popular algorithms are as follows: the method comprises the following steps that a maximum extremum stable region (MSER) algorithm and a Stroke Width Transformation (SWT) algorithm are adopted, and in a natural scene, due to the interference of illumination intensity, picture shooting quality and character-like background, detection results contain a great number of non-character regions, at present, two main methods for distinguishing true character regions from candidate regions are adopted, and regular judgment or a lightweight neural network model is adopted for distinguishing;
(2) correcting the text region image, which is mainly based on rotation transformation and affine transformation;
(3) the single character is extracted by the line and row division, the line and row division point is found out by binarization and projection by utilizing the characteristic that the characters have gaps between the lines and the row, when the distinguishing degree of the characters and the background is good, the effect is good, the influence of illumination and image pickup quality in the shot picture is caused, and when the character background is difficult to distinguish, the wrong division condition is often caused.
Therefore, the conventional OCR recognition framework has more steps, so that error accumulation is easily caused to influence the final recognition result.
Disclosure of Invention
The invention aims to solve the technical problem of providing a quota invoice amount recognition system based on deep learning, which can improve the OCR recognition rate when an image is polluted.
The technical scheme adopted by the invention for solving the technical problems is as follows: the system comprises an image acquisition module, an image rotation module, an image identification module and a result storage module, wherein the image acquisition module is used for acquiring an image file; the image rotation module is used for correcting the picture file; the image recognition module obtains the specific position of the image file to be recognized by using a deep learning model and performs image recognition; and the result storage module is used for storing the final recognition result.
The image rotation module corrects the picture file in a mode of combining tesseract adjusting direction and opencv rotation adjusting angle.
The image rotation module extracts straight lines through Hough transformation, and respectively calculates the distances from the original points corresponding to a plurality of angles to the straight lines from the pixels at the top ends of the straight lines; and traversing pixel points of the whole image, finding out the distance which is repeated most, obtaining a linear equation corresponding to the straight line, and finally obtaining the rotation angle.
The image rotation module obtains the rotation angle of the image characters by using tesseract.
The image recognition module comprises a sample processing unit, an image training unit and a test unit; the sample processing unit is used for sorting the collected sample pictures and marking the picture types to obtain an xml file corresponding to the pictures, wherein the xml file comprises the type information and the position information of the pictures; the image training unit adopts 24 convolutional layers and 2 full-link layers, wherein the convolutional layers are used for extracting features, the full-link layers are used for predicting results, and the output of the last layer is k dimensions, wherein k is S (B5 + C), k comprises category prediction and bbox coordinate prediction, S is the number of divided grids, B is the number of targets in charge of each grid, and C is the number of categories; the testing unit multiplies the predicted type information of each grid by the predicted authentication information of each bounding box to obtain the optimal score of each bounding box, sets a threshold value to filter out the result with low score, and carries out NMS processing on the retained result to obtain the final detection result.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: compared with the traditional OCR recognition framework, the method has the advantages that steps are reduced, and the influence of error accumulation on the final recognition result is reduced. The invention realizes the combination of deep learning and OCR image recognition, can greatly improve the OCR recognition rate when the image is polluted, and is convenient to operate; the system can be applied to the accounting field, can improve the working efficiency of accountants, and can liberate the accountants from fussy work.
Drawings
FIG. 1 is a block diagram of the system of the present invention;
FIG. 2 is an internal structural view of the present invention;
fig. 3A-3B are graphs of recognition results after an embodiment of the present invention is employed.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The embodiment of the invention relates to a quota invoice amount recognition system based on deep learning, which comprises an image acquisition module, an image rotation module, an image recognition module and a result storage module, wherein the image acquisition module is used for acquiring an image file; the image rotation module is used for correcting the picture file; the image recognition module obtains the specific position of the image file to be recognized by using a deep learning model and performs image recognition; and the result storage module is used for storing the final recognition result.
As shown in fig. 2, the present embodiment identifies the amount of money on the invoice, the invoice code, and the invoice number, based on the quota invoice scanned by the customer. Since the picture uploaded by the client may be tilted or inverted, a rotation step is added to facilitate later recognition, and the rotated picture is passed ocr to obtain the above fields.
The image rotation aims at that some pictures uploaded by the user scanning are inclined or inverted. The image rotation module in the embodiment corrects the picture by combining the tesseract adjustment direction with the opencv rotation adjustment small angle.
Opencv rotation regulation
The embodiment mainly uses a method of opencv line extraction, and then obtains the inclination angle of the straight line, wherein the method of straight line extraction is Hough transformation.
At any point O (X, Y) in the rectangular coordinate system, any straight line passing through O satisfies Y ═ kX + b (divided by a straight line perpendicular to the X axis). Due to this special case, converting the coordinate system to a polar coordinate system suffices.
In the polar coordinate system, any straight line may be represented by ρ ═ xCos θ + ySin θ.
Assuming that there is a straight line in a 10 × 10 image, the distances from the corresponding origin to the straight line when the angles are 180 °, 135 °, 90 °, 45 °, and 0 ° are calculated respectively from the top pixel points of the straight line of the image. Repeating the previous steps after traversing pixel points of the whole image, finding the distance with the most repetition, obtaining the corresponding linear equation, and obtaining the angle.
When a plurality of straight lines are found in a picture, the angle with the highest angular frequency is taken as the rotation angle of the picture.
Tesseract rotation
Tesseract is an OCR engine developed by Ray Smithf in Hewlett packard laboratories between 1985 and 1995, once named president in the 1995UNLV accuracy test. But development was essentially stopped after 1996. In 2006, Google invited Smith to join, restarting the project. The license for the project is currently Apache 2.0. The project supports mainstream platforms such as Windows, Linux and Mac OS at present. But as an engine it only provides command line tools.
Tesserract can identify most text languages (including Chinese), can obtain the text content on the picture and the rotation angle (270 degrees, 180 degrees, 90 degrees and 0 degrees) of the picture characters, and because the identification precision is not high, the rotation angle of the image characters can be obtained by using only tesseract in the embodiment. tesseract only accepts the grey-scale image, so the input color image needs to be converted into a grey-scale image.
In the embodiment, the image recognition module recognizes by using a deep learning method, and here, a deep learning target detection method yolo (youonly lookup) is used.
The idea of YOLO: the position of the bounding box and the category to which the bounding box belongs are directly returned in the output layer (the whole graph is used as the input of the network, and the Object Detection problem is converted into a Regression problem).
1. Sample treatment:
and (3) arranging the collected sample pictures, and marking the picture categories by using labelme software to obtain corresponding xml files, wherein the files contain the information and the positions of the categories in the pictures.
2. And (3) image training:
first, the picture is normalized to 448 x 448, and the picture is divided into 7 x 7 grids (cells), and the centers of the objects fall into the grids, so that the grids are responsible for predicting the objects.
CNN extraction features and predictions: convolution is responsible for extracting features; the full link part is responsible for the prediction. The final layer output is k dimensions. Wherein
k=S*S(B*5+C) (1)
k contains the class prediction and the bbox coordinate prediction. S is the number of the divided grids, B is the number of targets in charge of each grid, and C is the number of categories. Where 5 includes predicted center point coordinates, width and height, and class prediction. The bbox coordinate prediction is expressed as:
wherein if a ground true box (manually marked object) falls in a grid cell, the first term is 1, otherwise 0 is taken. The second term is the IOU value between the predicted bounding box and the actual groudtuthbox.
The network structure is referred to GoogleLeNet. 24 convolutional layers, 2 full link layers. (inceptionmodules replacing Goolenet with 1X 1reduction layers followed by 3X 3 conditional layers)
The design goal of the loss function is to balance the coordinates (x, y, w, h), confidence, classification.
For different sizes of bbox predictions, a small box prediction bias is less tolerable than a large bbox prediction bias. And the same offset loss is the same in the total weighted loss. To alleviate this problem, the present embodiment replaces the original width and height with the square root of the box width and height.
A grid predicts a plurality of bounding boxes, and during training, we hope that only one bounding box is exclusively responsible for (one object and one bbox) in each class (grounttruebox). Specifically, the bounding box with the largest IOU of the group truebox (object) is responsible for the prediction of the group truebox (object). This practice is called specialization of building boxpredictor. Each predictor will predict better and better for a particular (sizes) ingredient or class of object.
3. A test module:
class information Pr (Class) predicted by each grid at the time of testiObject) and bounding box predicted confidence informationMultiplying the result to obtain the best score of each bounding box. After the best score of each bbox is obtained, a threshold value is set, the boxes with low scores are filtered out, NMS treatment is carried out on the reserved boxes, and the final detection result is obtained. Fig. 3A-3B are graphs of recognition results after the present invention has been employed.
Compared with the traditional OCR recognition framework, the method reduces steps and reduces the influence on the final recognition result due to error accumulation. The invention realizes the combination of deep learning and OCR image recognition, can greatly improve the OCR recognition rate when the image is polluted, and is convenient to operate; the system can be applied to the accounting field, can improve the working efficiency of accountants, and can liberate the accountants from fussy work.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810011763.4A CN108133212B (en) | 2018-01-05 | 2018-01-05 | A deep learning-based fixed invoice amount recognition system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810011763.4A CN108133212B (en) | 2018-01-05 | 2018-01-05 | A deep learning-based fixed invoice amount recognition system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108133212A CN108133212A (en) | 2018-06-08 |
CN108133212B true CN108133212B (en) | 2021-06-29 |
Family
ID=62399437
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810011763.4A Expired - Fee Related CN108133212B (en) | 2018-01-05 | 2018-01-05 | A deep learning-based fixed invoice amount recognition system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108133212B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109086756B (en) * | 2018-06-15 | 2021-08-03 | 众安信息技术服务有限公司 | Text detection analysis method, device and equipment based on deep neural network |
CN109002768A (en) * | 2018-06-22 | 2018-12-14 | 深源恒际科技有限公司 | Medical bill class text extraction method based on the identification of neural network text detection |
CN109816118B (en) * | 2019-01-25 | 2022-12-06 | 上海深杳智能科技有限公司 | A method and terminal for creating structured documents based on deep learning model |
CN109886257B (en) * | 2019-01-30 | 2022-10-18 | 四川长虹电器股份有限公司 | Method for correcting invoice image segmentation result by adopting deep learning in OCR system |
CN109993160B (en) * | 2019-02-18 | 2022-02-25 | 北京联合大学 | Image correction and text and position identification method and system |
CN109948617A (en) * | 2019-03-29 | 2019-06-28 | 南京邮电大学 | An Invoice Image Positioning Method |
WO2020223859A1 (en) * | 2019-05-05 | 2020-11-12 | 华为技术有限公司 | Slanted text detection method, apparatus and device |
CN110348346A (en) * | 2019-06-28 | 2019-10-18 | 苏宁云计算有限公司 | A kind of bill classification recognition methods and system |
CN110781726A (en) * | 2019-09-11 | 2020-02-11 | 深圳壹账通智能科技有限公司 | Image data identification method and device based on OCR (optical character recognition), and computer equipment |
CN111160395A (en) * | 2019-12-05 | 2020-05-15 | 北京三快在线科技有限公司 | Image recognition method and device, electronic equipment and storage medium |
CN111401371B (en) * | 2020-06-03 | 2020-09-08 | 中邮消费金融有限公司 | Text detection and identification method and system and computer equipment |
CN112464872A (en) * | 2020-12-11 | 2021-03-09 | 广东电网有限责任公司 | Automatic extraction method and device based on NLP (non-line segment) natural language |
CN112686319B (en) * | 2020-12-31 | 2025-02-11 | 南京太司德智能电气有限公司 | A method for merging power signal model training files |
CN113159086B (en) * | 2020-12-31 | 2024-04-30 | 南京太司德智能电气有限公司 | Efficient electric power signal description model training method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617415A (en) * | 2013-11-19 | 2014-03-05 | 北京京东尚科信息技术有限公司 | Device and method for automatically identifying invoice |
CN104573688A (en) * | 2015-01-19 | 2015-04-29 | 电子科技大学 | Mobile platform tobacco laser code intelligent identification method and device based on deep learning |
CN106096607A (en) * | 2016-06-12 | 2016-11-09 | 湘潭大学 | A kind of licence plate recognition method |
CN107341523A (en) * | 2017-07-13 | 2017-11-10 | 浙江捷尚视觉科技股份有限公司 | Express delivery list information identifying method and system based on deep learning |
CN107358232A (en) * | 2017-06-28 | 2017-11-17 | 中山大学新华学院 | Invoice recognition methods and identification and management system based on plug-in unit |
-
2018
- 2018-01-05 CN CN201810011763.4A patent/CN108133212B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617415A (en) * | 2013-11-19 | 2014-03-05 | 北京京东尚科信息技术有限公司 | Device and method for automatically identifying invoice |
CN104573688A (en) * | 2015-01-19 | 2015-04-29 | 电子科技大学 | Mobile platform tobacco laser code intelligent identification method and device based on deep learning |
CN106096607A (en) * | 2016-06-12 | 2016-11-09 | 湘潭大学 | A kind of licence plate recognition method |
CN107358232A (en) * | 2017-06-28 | 2017-11-17 | 中山大学新华学院 | Invoice recognition methods and identification and management system based on plug-in unit |
CN107341523A (en) * | 2017-07-13 | 2017-11-10 | 浙江捷尚视觉科技股份有限公司 | Express delivery list information identifying method and system based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN108133212A (en) | 2018-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108133212B (en) | A deep learning-based fixed invoice amount recognition system | |
CN109086714B (en) | Form recognition method, recognition system and computer device | |
CN111611643B (en) | Household vectorization data acquisition method and device, electronic equipment and storage medium | |
Huang et al. | Robust scene text detection with convolution neural network induced mser trees | |
CN105046252B (en) | A kind of RMB prefix code recognition methods | |
WO2022121039A1 (en) | Bankcard tilt correction-based detection method and apparatus, readable storage medium, and terminal | |
CN111461134A (en) | A low-resolution license plate recognition method based on generative adversarial network | |
CN108647681A (en) | A kind of English text detection method with text orientation correction | |
CN104951940B (en) | A kind of mobile payment verification method based on personal recognition | |
CN107103317A (en) | Fuzzy license plate image recognition algorithm based on image co-registration and blind deconvolution | |
US20100189316A1 (en) | Systems and methods for graph-based pattern recognition technology applied to the automated identification of fingerprints | |
CN111783757A (en) | An ID card identification method based on OCR technology in complex scenarios | |
CN108446699A (en) | Identity card pictorial information identifying system under a kind of complex scene | |
US20040086153A1 (en) | Methods and systems for recognizing road signs in a digital image | |
CN106529532A (en) | License plate identification system based on integral feature channels and gray projection | |
CN111695373B (en) | Zebra stripes positioning method, system, medium and equipment | |
CN106169080A (en) | A kind of combustion gas index automatic identifying method based on image | |
CN116503622A (en) | Data acquisition and reading method based on computer vision image | |
CN108681735A (en) | Optical character recognition method based on convolutional neural networks deep learning model | |
CN108319958A (en) | A kind of matched driving license of feature based fusion detects and recognition methods | |
CN116740758A (en) | Bird image recognition method and system for preventing misjudgment | |
CN114359538A (en) | A method for locating and identifying water meter readings | |
Wang et al. | Scene text recognition via gated cascade attention | |
CN115908774B (en) | Quality detection method and device for deformed materials based on machine vision | |
CN118537600A (en) | Data acquisition and reading method based on computer vision image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210629 |
|
CF01 | Termination of patent right due to non-payment of annual fee |