CN113158895A

CN113158895A - Bill identification method and device, electronic equipment and storage medium

Info

Publication number: CN113158895A
Application number: CN202110426383.9A
Authority: CN
Inventors: 王仲; 曾纪才; 李飞
Original assignee: Beijing Ctj Info Tech Co ltd
Current assignee: Beijing Ctj Info Tech Co ltd
Priority date: 2021-04-20
Filing date: 2021-04-20
Publication date: 2021-07-23
Anticipated expiration: 2041-04-20
Also published as: CN113158895B

Abstract

The application discloses a bill identification method, a bill identification device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a bill picture of a bill to be identified; identifying the actual inclination angle category of the bill picture, and correcting the inclination angle of the bill picture based on the actual inclination angle category; and detecting the text box of the corrected bill picture, extracting character information from the text box, identifying the actual type of the bill picture, and determining the actual classification of the text box based on the actual type so as to extract the bill page information of the bill to be identified. Therefore, the problems that in the related technology, when bills are identified, the character identification effect is poor, the accuracy is not high, the applicability of bill information extraction is poor, the user experience is not high and the like are solved.

Description

Bill identification method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for identifying a bill, an electronic device, and a storage medium.

Background

At present, a bill recognition system based on deep learning is to recognize character information on a bill by using a deep learning technology and perform structured extraction on the character information. Among them, an OCR (Optical Character Recognition) technology is an Optical Character Recognition technology for recognizing characters on a picture.

In the related art, although the electronization of bills has been widely developed, paper bills still account for a large proportion in many fields. For these paper bills, the traditional processing method mainly uses a high-speed camera to scan and obtain a bill image, then stores the bill image into a database, and manually inputs the information on the bill into the database.

The bill recognition method in the related art is time-consuming and labor-consuming, and along with the development of computer technologies, especially OCR technologies, a plurality of bill recognition methods based on OCR technologies appear, the automatic recognition of the bill information can be realized by using a computer, and the extracted bill information is stored in a database, so that the labor input for processing bills is greatly saved. However, the existing bill recognition and processing system based on the OCR technology has the following problems:

(1) for the pictures with inclination or direction rotation, the character recognition effect is not good. The existing OCR technology mostly adopts a straight line detection method to detect the straight line profile of a bill, calculate the slope of a straight line and further calculate the inclination angle of a picture, thereby realizing the correction of the picture direction. However, such methods are highly susceptible to image noise and are time consuming.

(2) Structured ticket face information cannot be extracted well for complex layout information. At present, rules and coordinates are mostly utilized when structured extraction is carried out on the character recognition result of the bill. This method is not suitable for analysis and extraction of complex layout information, such as layouts with too many extracted elements, misplaced extracted elements, tilted layouts, keyword information blocked by stains, and the like.

Content of application

The application provides a bill identification method, a bill identification device, an electronic device and a storage medium, and aims to solve the problems that in the related technology, when bills are identified, the character identification effect is poor, the accuracy is not high, the applicability of bill information extraction is poor, the user experience is not high, and the like.

The embodiment of the first aspect of the application provides a bill identification method, which comprises the following steps: acquiring a ticket image of a ticket to be identified; identifying the actual inclination angle category of the bill picture, and correcting the inclination angle of the bill picture based on the actual inclination angle category; detecting a text box of the corrected bill picture, extracting character information from the text box, identifying the actual type of the bill picture, and determining the actual classification of the text box based on the actual type so as to extract bill page information of the bill to be identified.

Optionally, in an embodiment of the present application, the identifying an actual tilt angle category of the ticket picture, and correcting the tilt angle of the ticket picture based on the actual tilt angle category includes: respectively collecting data of anticlockwise rotation of 0 degree, anticlockwise rotation of 90 degrees, anticlockwise rotation of 180 degrees and anticlockwise rotation of 270 degrees to determine the actual inclination angle category; and rotating the bill picture clockwise by the correction angle corresponding to the actual inclination angle category.

Optionally, in an embodiment of the present application, the detecting a text box of the rectified ticket picture, extracting text information from the text box, and while identifying an actual type of the ticket picture, determining an actual classification of the text box based on the actual type to extract ticket page information of the ticket to be identified includes: acquiring a rectangular area containing text lines by using a preset text detection algorithm to obtain the text box; determining the position of the text according to the current coordinates of the four vertexes of the text box; and intercepting a rectangular area picture according to the position of the text, and inputting the rectangular area picture into a preset text which is a network to obtain the character information.

Optionally, in an embodiment of the present application, the detecting a text box of the rectified ticket picture, extracting text information from the text box, and while identifying an actual type of the ticket picture, determining an actual classification of the text box based on the actual type to extract ticket page information of the ticket to be identified includes: acquiring the image characteristics of the text box by adopting a DenseNet network; converting the image features into one-dimensional feature vectors, and generating final one-dimensional combined features by combining the geometric features of the text frame; and inputting the one-dimensional combined features into a fully-connected network with the neuron number equal to the classification number, outputting the probability value of each classification by using a softmax function, and determining the actual classification.

Optionally, in an embodiment of the present application, after acquiring a ticket picture of the ticket to be recognized, the method further includes: and carrying out drying, sharpening and binarization processing on the bill picture to obtain the bill picture with the contrast ratio meeting the preset condition.

The embodiment of the second aspect of the present application provides a bill identifying device, including: the acquisition module is used for acquiring a bill picture of the bill to be identified; the correction module is used for identifying the actual inclination angle category of the bill picture and correcting the inclination angle of the bill picture based on the actual inclination angle category; the identification module is used for detecting the text box of the corrected bill picture, extracting character information from the text box, identifying the actual type of the bill picture, and determining the actual classification of the text box based on the actual type so as to extract the bill page information of the bill to be identified.

Optionally, in an embodiment of the present application, the correcting module is further configured to collect data of 0 degree counterclockwise rotation, 90 degree counterclockwise rotation, 180 degree counterclockwise rotation, and 270 degree counterclockwise rotation, respectively, to determine the actual tilt angle category; and rotating the bill picture clockwise by the correction angle corresponding to the actual inclination angle category.

Optionally, in an embodiment of the present application, the identification module includes: the first acquisition unit is used for acquiring a rectangular area containing text lines by using a preset text detection algorithm to obtain the text box; determining the position of a text according to the current coordinates of the four top points of the text box, intercepting a rectangular area picture according to the position of the text, and inputting the rectangular area picture into a preset text which is a network to obtain the character information; and the second acquisition unit is used for acquiring the image features of the text box by adopting a DenseNet network, converting the image features into one-dimensional feature vectors, generating final one-dimensional combined features by combining the geometric features of the text box, inputting the one-dimensional combined features into a full-connection network with the number of neurons equal to the number of classifications, outputting the probability value of each classification by utilizing a softmax function, and determining the actual classification.

An embodiment of a third aspect of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being configured to perform a ticket recognition method as described in the above embodiments.

A fourth aspect of the present application provides a computer-readable storage medium, which stores computer instructions for causing the computer to execute the method for identifying a bill according to the above embodiment.

The angle of inclination is corrected to actual angle of inclination category based on the bill picture, need not to consider the angle that the picture was shot, the picture characters of arbitrary angle all can be discerned, effectively improve the usability of bill discernment, and carry out the analysis to the space of a whole page based on degree of depth study and machine learning, discernment bill page information, have stronger robustness, be difficult to receive the picture slope, text position deviation, the key point is sheltered from, the influence of factors such as the face information is many, effectively promote the character recognition effect, guarantee the accuracy and the practicality of discernment, effectively promote and use experience. Therefore, the problems that in the related technology, when bills are identified, the character identification effect is poor, the accuracy is not high, the applicability of bill information extraction is poor, the user experience is not high and the like are solved.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a method for identifying a bill according to an embodiment of the present application;

FIG. 2 is a diagram illustrating a deep learning based image angle classification model according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a text box classification model based on deep learning according to an embodiment of the present application.

FIG. 4 is an exemplary diagram of a document identification device according to an embodiment of the present application;

fig. 5 is a block diagram of an electronic device according to an embodiment of the application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

The following describes a bill identifying method, apparatus, electronic device, and storage medium according to embodiments of the present application with reference to the drawings. Aiming at the problems of poor character recognition effect, low accuracy, poor applicability of ticket information extraction and low user experience in ticket recognition in the related technology mentioned in the background technology center, the application provides a ticket recognition method, in the method, the inclination angle is corrected based on the actual inclination angle category of the bill picture without considering the picture shooting angle, picture characters of any angle can be identified, the usability of bill identification is effectively improved, and carry out the analysis to the space of a page based on degree of depth study and machine learning, discernment bill page information has stronger robustness, is difficult to receive the influence of factors such as picture slope, text position offset, key point sheltered from, bill face information is many, effectively promotes the character recognition effect, guarantees the accuracy and the practicality of discernment, effectively promotes and uses experience. Therefore, the problems that in the related technology, when bills are identified, the character identification effect is poor, the accuracy is not high, the applicability of bill information extraction is poor, the user experience is not high and the like are solved.

Specifically, fig. 1 is a schematic flowchart of a bill identification method according to an embodiment of the present application.

As shown in fig. 1, the bill identifying method includes the following steps:

in step S101, a ticket picture of a ticket to be recognized is acquired.

Optionally, in an embodiment of the present application, after acquiring a ticket picture of a ticket to be recognized, the method further includes: and carrying out drying, sharpening and binarization processing on the bill picture to obtain the bill picture with the contrast ratio meeting the preset condition.

It should be understood by those skilled in the art that, in order to improve the recognition efficiency and accuracy, the embodiments of the present application may first perform, but are not limited to, operations such as drying, sharpening, binarization processing, and the like on an image, so as to obtain a relatively sharp image with high contrast.

In step S102, the actual inclination angle category of the bill picture is identified, and the inclination angle of the bill picture is corrected based on the actual inclination angle category.

Optionally, in an embodiment of the present application, identifying an actual tilt angle category of the ticket picture, and correcting the tilt angle of the ticket picture based on the actual tilt angle category includes: respectively collecting data of anticlockwise rotation of 0 degree, anticlockwise rotation of 90 degrees, anticlockwise rotation of 180 degrees and anticlockwise rotation of 270 degrees to determine the category of the actual inclination angle; and rotating the bill picture clockwise by the correction angle corresponding to the actual inclination angle category.

Specifically, for the image angle correction processing of the deep learning, the image inclination angle can be classified and the image angle can be corrected according to the embodiment of the application. The embodiment of the application specifically comprises four parts: the method comprises a data set generation step, a picture angle classification model training step based on deep learning, a picture angle classification model prediction step based on deep learning and an angle correction step.

For example, the data set may include data sets for four types of angles: the rotation is 0 degree counterclockwise, 90 degrees counterclockwise, 180 degrees counterclockwise and 270 degrees counterclockwise. Defining the classification angle as c, c is then ∈ {0,90,180,270 }. Defining a picture counterclockwise rotation angle is denoted by n, where n ∈ [0,360 ]. For a picture rotated by n degrees counterclockwise, the classification angle corresponding to c is the classification of the current picture when | n-c | takes the minimum value. When the image is corrected, the image is firstly subjected to angle classification, and then the image is rotated clockwise by the angle corresponding to the classification. The angle classification model of the embodiment of the application can modify the final classification number to be 4 on the basis of VGG16, and effectively ensures the classification effect.

In step S103, a text box of the rectified bill picture is detected, text information is extracted from the text box, and an actual category of the bill picture is identified, and meanwhile, an actual classification of the text box is determined based on the actual category, so as to extract bill page information of the bill to be identified.

The method and the device can accurately identify the picture characters at any angle and carry out high-accuracy structured extraction on the ticket information.

In the practical implementation process, after the image preprocessing and the image angle correction processing based on the deep learning, the embodiment of the application can perform text detection and positioning processing based on the deep learning, layout analysis processing based on the deep learning, and information structured extraction processing, which are listed in the following embodiments and schematically illustrated below.

Optionally, in an embodiment of the present application, detecting a text box of the rectified ticket image, extracting text information from the text box, and while identifying the actual type of the ticket image, determining an actual classification of the text box based on the actual type to extract ticket page information of the ticket to be identified includes: acquiring a rectangular area containing text lines by using a preset text detection algorithm to obtain a text box; determining the position of the text according to the current coordinates of the four vertexes of the text box; and intercepting the rectangular area picture according to the position of the text, and inputting the rectangular area picture into a preset text called a network to obtain character information.

In some embodiments, the deep learning based text detection and recognition step includes both text localization and text recognition. The text detection means that a rectangular area containing text lines is obtained by using a text detection algorithm, the rectangular area is called as a text box, and the position of the area where the text is located can be represented according to coordinates of four vertexes of the text box. The text recognition means that the picture in the rectangular area is captured by utilizing coordinates obtained in a text detection link, and finally the picture is transmitted into a text recognition network to recognize texts in the picture. The text recognition method based on the deep learning comprises an advanced east model which can be adopted by a text detection model based on the deep learning, and a DenseNet + CTC model which can be adopted by a text recognition model based on the deep learning.

Further, in the step of bill classification based on deep learning, various types of bill pictures can be automatically identified by using a deep learning network, and the identifiable types of bills comprise value-added tax special bills, value-added tax general bills, value-added tax electronic common bills, value-added tax volume bills, train tickets, bus tickets, airplane tickets, taxi tickets, machine invoice, quota invoice, road tolls and the like. The method mainly comprises the following three steps: generating bill classification data, training a bill classification model and predicting bill types. The network adopted in the step can be modified on the basis of inclusion-v 3, and the final full-connection layer is replaced by the full-connection layer containing 11 neurons.

Optionally, in an embodiment of the present application, detecting a text box of the rectified ticket image, extracting text information from the text box, and while identifying the actual type of the ticket image, determining an actual classification of the text box based on the actual type to extract ticket page information of the ticket to be identified includes: acquiring image characteristics of the text box by adopting a DenseNet network; converting the image features into one-dimensional feature vectors, and generating final one-dimensional combination features by combining the geometric features of the text boxes; and inputting the one-dimensional combination characteristics into a full-connection network with the neuron number equal to the classification number, outputting the probability value of each classification by using a softmmax function, and determining the actual classification.

In some embodiments, a deep learning based layout analysis step may classify text boxes, determining which field each text box belongs to, thereby facilitating structured extraction of the ticket page information. The method comprises a data set generation step, a text box classification model training step based on deep learning and a text box classification model prediction step based on deep learning. The text box classification model network structure based on deep learning is as follows: and acquiring the image characteristics of the text box by adopting a DenseNet network, converting the image characteristics into one-dimensional characteristic vectors, and then combining the geometric characteristics of the text box to generate the final one-dimensional combined characteristics. And finally, inputting the one-dimensional feature vector into a full-connection network with the neuron number equal to the classification number, and outputting the probability value of each classification by using a softmax function. Taking value-added tax invoices as an example, the text box classification model can establish the following classifications: seller name, seller taxpayer identification number, buyer name, buyer taxpayer identification number, amount, invoice code, invoice number and date. We can determine the classification of which field the text box provided by the text detection and recognition step belongs to by the text box classification model.

Specifically, the information structuring extraction step extracts structured ticket information by using the rule, the coordinate of the text box, and the classification information of the text box. For certain types of tickets, the embodiment of the application can determine the coordinate relationship between the fields, for example, for value-added tax invoices, the ordinate of the amount field must be larger than the ordinate of the date field. Firstly, extracting required fields according to rules, wherein the rules comprise keywords, regular expression matching and the like, and in the process, the coordinate information of the text box is used for narrowing the search range of the fields to be extracted. And finally, searching fields which are not matched with the rules according to the classification result of the text box, and extracting according to requirements.

In summary, compared with the existing bill identification processing technology, the embodiment of the application has the following advantages:

(1) the high-speed shooting instrument is not needed, only one high-definition camera is needed, and the equipment cost is reduced.

(2) A new angle correction model based on deep learning is provided, the angle of picture shooting is not needed to be considered, picture characters of any angle can be identified, and the product usability is improved.

(3) The layout analysis algorithm based on deep learning and machine learning has strong robustness and is not easily influenced by factors such as picture inclination, text position deviation, key point shielding, multiple ticket information and the like.

The principle of the bill identification method of the present application is illustrated in a specific embodiment below.

Referring to fig. 2 and fig. 3, fig. 2 is a deep learning-based image angle classification model, in which the module with conv prefix labels is a convolution operation module, the module with pool prefix labels is a pooling module, and the module with fc prefixes is a fully-connected network module. The model is a fully-connected network with 4-neuron number, which is replaced by the last layer of fully-connected network of VGG16 on the basis of VGG16, and FIG. 3 is a text box classification model based on deep learning, which utilizes a densenet network to extract text box picture features and combines the geometric features of the text box picture to generate final text box classification features, then inputs the fully-connected network with the neuron number equal to the text box classification number, and finally outputs the probability of each classification by utilizing a softmax function.

For example, the S1 image preprocessing step:

firstly, a USM sharpening enhancement algorithm is adopted for carrying out drying removal and enhancement processing, and then graying and binarization processing are carried out on the obtained picture to obtain a clear picture with high contrast.

S2, a picture angle correction step based on deep learning:

further, the picture angle correction step based on the deep learning can classify the picture inclination angle and correct the picture angle. The image angle correction step based on the deep learning specifically comprises four parts: the method comprises the steps of data set generation, picture angle classification model training based on deep learning, picture angle classification model prediction based on deep learning and angle correction. The data set includes data sets for four types of angles: the rotation is 0 degree counterclockwise, 90 degrees counterclockwise, 180 degrees counterclockwise and 270 degrees counterclockwise. Defining the classification angle as c, c is then ∈ {0,90,180,270 }. Defining a picture counterclockwise rotation angle is denoted by n, where n ∈ [0,360 ]. For a picture rotated by n degrees counterclockwise, the classification angle corresponding to c is the classification of the current picture when | n-c | takes the minimum value.

The data set generation of the angle correction step is specifically operated as follows: randomly carrying out operations such as brightness enhancement, chroma enhancement, contrast enhancement, sharpness enhancement and the like on the picture; and randomly carrying out operations such as color space exchange, angle rotation, translation, clipping and the like on the picture. The data enhancement is realized through the series of operations, so that the number of training samples is greatly increased, and the generalization capability of the trained model is stronger.

The image angle classification model training step based on deep learning specifically comprises the following operations: the data sets were trained according to the method described above, divided into four categories. And reasonably setting parameters such as the number of rounds epoch, batch, batch size, learning rate initial value and the like of model training, adjusting according to the training effect, and storing the model after the training is finished. The image angle classification model network structure based on deep learning is shown in fig. 2 in the figure, and the network model structure replaces the fully-connected network of the last layer on the basis of VGG16, and modifies the fully-connected network for the neuron number of 4.

The image angle classification model prediction step based on deep learning firstly reads a trained model file, and then carries out angle classification on the image preprocessed. And obtaining the counterclockwise rotation angle of the corresponding picture through the classification number.

And in the angle correction step, the picture is rotated clockwise by a corresponding angle through the predicted counterclockwise rotation angle.

S3 text detection and identification step based on deep learning:

further, the text detection and recognition step based on deep learning comprises a text detection step and a text recognition step. The text detection means that a rectangular area containing text lines is obtained by using a text detection algorithm, the rectangular area is called as a text box, and the position of the area where the text is located can be represented according to coordinates of four vertexes of the text box. And the text recognition refers to that the rectangular area picture is cut out by utilizing coordinates obtained in a text detection link, and finally the picture is transmitted into a text recognition network to recognize the text in the picture. The text detection model based on deep learning is an adopted advanced east model, and the text recognition model based on deep learning is an adopted DenseNet mode.

The text detection training dataset includes a picture set and a text box coordinate dataset. The text box coordinate data set making process is specifically operated as follows: labeling the bill and picture data sets by using tools such as labelimg and the like, and finally generating a labeled text box coordinate set related to each picture, wherein each picture corresponds to a txt file, and the format of the text box coordinate set is X₁，Y₁，X₂，Y₂，X₃，Y₃，X₄，Y₄"text", wherein X₁，Y₁，X₂，Y₂，X₃，Y₃，X₄，Y₄The four vertex coordinates of the circumscribed quadrangle of the text are respectively represented, and the text is the actual text content contained by the quadrangle. Each picture of the picture set corresponds to the coordinate set file.

The text detection model can adopt an advanced east model, and four vertex coordinates of the text box are obtained through calculation according to the output of the model.

The text recognition model may employ DenseNet, decoding the text sequence using CTC. And intercepting the picture of the corresponding text through coordinates of four vertexes of the text box obtained by text detection, and then inputting the picture into a text recognition model to obtain the text on the picture.

S4 bill classification step based on deep learning:

further, in the step of bill classification based on deep learning, various types of bill pictures can be automatically identified by using a deep learning network, and the identifiable types of bills comprise value-added tax special bills, value-added tax general bills, value-added tax electronic common bills, value-added tax volume bills, train tickets, bus tickets, airplane tickets, taxi tickets, machine invoice, quota invoice and road toll. The method mainly comprises three parts: generating bill classification data, training a bill classification model and predicting bill types. The network adopted in this step can be modified on the basis of Inception V3, and the last fully-connected layer is replaced by a fully-connected layer containing 11 neurons.

For data set production, the bill pictures are classified into 11 types, namely value-added tax special bills, value-added tax general bills, value-added tax electronic common bills, value-added tax roll bills, train tickets, bus tickets, plane tickets, taxi tickets, machine invoice, quota invoice and road tolls, each type is placed under a folder, then data enhancement is carried out on the various types of bill pictures, and the number of training samples is increased. The bill data enhancement method particularly relates to random operations of horizontal and vertical stretching, clipping, picture blurring and the like on pictures.

For the training model, migration learning can be performed by using inclusion v3, feature extraction is performed by using the original weight parameters, and a classification layer is added to the final bottleneck, wherein the number of neurons in the classification layer is 11.

For model test, firstly, a trained model is loaded, and then a picture to be predicted is input into the model to obtain the bill type of the picture.

S5 layout analysis step based on deep learning:

further, the layout analysis step based on deep learning can classify the text boxes and determine which field each text box belongs to, thereby facilitating the structured extraction of the bill page information. The method comprises a data set generation step, a text box classification model training step based on deep learning and a text box classification model prediction step based on deep learning. The structure of a text box classification model network based on deep learning is shown in fig. 3: and acquiring image features of the text box by adopting a DenseNet network, converting the image features into one-dimensional feature vectors, and then combining the geometric features of the text box to generate final one-dimensional combined features. And finally, inputting the one-dimensional feature vector into a full-connection network with the neuron number equal to the classification number, and outputting the probability value of each classification by using a softmax function. Taking a value-added tax invoice as an example, the text box classification model can establish the following classifications: seller name, seller taxpayer identification number, buyer name, buyer taxpayer identification number, amount, invoice number and date. We can determine the classification of which field the text box provided by the text detection and recognition step belongs to by the text box classification model.

For the dataset generation step, the required datasets are text box slices and vertex coordinates of the text boxes. These data sets can firstly be obtained by means of a data annotation tool and secondly by means of the recognition result of the text recognition step. And the output result of the text identification step is the coordinates of the text box and the identified text, the step simultaneously generates a text file of the coordinates of the text box, and each row of data stored in the file is the name of the picture where the text box is located and the icon of the text box.

For deep learning based text box classification model training, a text box classification model can only use the text box picture of the same type of ticket, for example, if a train ticket layout analysis is to be performed, then the text box coordinates and picture of the train ticket are used to make a model training data set and perform model training.

For the text box classification model prediction based on deep learning, the text box picture in the corresponding picture is obtained according to the coordinates in the text recognition result, and then the text box picture and the geometric features of the text box are input into the network shown in fig. 3, so that the classification of the text box can be obtained. Taking a train ticket as an example, the classification field includes information such as a train ticket code, a departure place, a destination, a train number, a seat registration, a date, a passenger name, and the like. We can determine which field a certain text box belongs to by a text box classification model based on deep learning.

S6 information structured extraction step:

and the information structured extraction step extracts structured ticket information by using the rule, the coordinate of the text box and the classification information of the text box. For a certain kind of ticket, the relationship of the coordinates between the fields can be determined, for example for a value added tax invoice, the ordinate of the amount field must be greater than the ordinate of the date field. Firstly, extracting required fields according to rules, wherein the rules comprise keywords, regular expression matching and the like, and in the process, the coordinate information of the text box is used for narrowing the search range of the fields to be extracted. And finally, searching fields which are not matched with the rules according to the classification result of the text box, and extracting according to requirements.

For example, taking a train ticket as an example, a text box of a train ticket obtained through a text box classification step based on deep learning belongs to a 'date' classification, and then date information of the text box can be extracted according to a rule method of a regular expression in a date format. Meanwhile, on the basis of classification of text box classification model prediction, position information of some text boxes can be used for further judging the classification result, for example, if another text box of the train ticket belongs to a field of 'departure place' of the train ticket through model prediction, the ordinate of the 'departure place' text box should be smaller than the coordinate of the 'date' text box.

According to the bill identification method provided by the embodiment of the application, the inclination angle is corrected based on the actual inclination angle category of the bill picture, the angle of picture shooting is not required to be considered, picture characters at any angle can be identified, the usability of bill identification is effectively improved, the page layout is analyzed based on deep learning and machine learning, bill page information is identified, the bill identification method has strong robustness, the bill identification method is not easily influenced by factors such as picture inclination, text position deviation, key points are shielded, and bill page information is multiple, the character identification effect is effectively improved, the identification accuracy and the practicability are guaranteed, and the use experience is effectively improved.

Next, a bill identifying apparatus according to an embodiment of the present application will be described with reference to the drawings.

Fig. 4 is a block diagram of a bill identifying apparatus according to an embodiment of the present application.

As shown in fig. 4, the bill identifying apparatus 10 includes: an acquisition module 100, a remediation module 200, and an identification module 300.

The acquiring module 100 is configured to acquire a ticket picture of a ticket to be identified.

And the correcting module 200 is used for identifying the actual inclination angle category of the bill picture and correcting the inclination angle of the bill picture based on the actual inclination angle category.

The recognition module 300 is configured to detect a text box of the rectified ticket image, extract text information from the text box, recognize an actual type of the ticket image, and determine an actual classification of the text box based on the actual type to extract ticket page information of the ticket to be recognized.

Optionally, in an embodiment of the present application, the correcting module 200 is further configured to collect data of 0 degree counterclockwise rotation, 90 degree counterclockwise rotation, 180 degree counterclockwise rotation, and 270 degree counterclockwise rotation, respectively, to determine the actual tilt angle category; and rotating the bill picture clockwise by the correction angle corresponding to the actual inclination angle category.

Optionally, in an embodiment of the present application, the identification module 300 includes: a first acquisition unit and a second acquisition unit.

The first acquiring unit is used for acquiring a rectangular area containing text lines by using a preset text detection algorithm to obtain a text box; determining the position of the text according to the current coordinates of the four vertexes of the text box, intercepting a rectangular area picture according to the position of the text, and inputting the rectangular area picture into a preset text network to obtain character information.

And the second acquisition unit is used for acquiring the image characteristics of the text box by adopting a DenseNet network, converting the image characteristics into one-dimensional characteristic vectors, generating final one-dimensional combined characteristics by combining the geometric characteristics of the text box, inputting the one-dimensional combined characteristics into a full-connection network with the neuron number equal to the classification number, outputting the probability value of each classification by utilizing a softmax function, and determining the actual classification.

It should be noted that the foregoing explanation of the embodiment of the bill identifying method is also applicable to the bill identifying apparatus of this embodiment, and details are not described here.

According to the bill recognition device that this application embodiment provided, the angle of inclination is corrected based on the actual angle of inclination category of bill picture, the angle that need not to consider the picture and shoot, the picture characters of arbitrary angle all can be discerned, effectively improve the usability of bill discernment, and carry out the analysis to the space of a whole page based on degree of depth study and machine learning, discernment bill page information, stronger robustness has, be difficult to receive the picture slope, text position skew, the key point is sheltered from, the influence of factors such as the face information is many, effectively promote the character recognition effect, guarantee the accuracy and the practicality of discernment, effectively promote use experience.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

a memory 1201, a processor 1202, and a computer program stored on the memory 1201 and executable on the processor 1202.

The processor 1202, when executing the program, implements the ticket recognition method provided in the above-described embodiments.

Further, the electronic device further includes:

a communication interface 1203 for communication between the memory 1201 and the processor 1202.

A memory 1201 for storing computer programs executable on the processor 1202.

The memory 1201 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 1201, the processor 1202 and the communication interface 1203 are implemented independently, the communication interface 1203, the memory 1201 and the processor 1202 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this does not represent only one bus or one type of bus.

Optionally, in a specific implementation, if the memory 1201, the processor 1202 and the communication interface 1203 are implemented by being integrated on a single chip, the memory 1201, the processor 1202 and the communication interface 1203 may complete communication with each other through an internal interface.

Processor 1202 may be a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.

The present embodiment also provides a computer-readable storage medium having a computer program stored thereon, wherein the program is executed by a processor to implement the ticket recognition method as above.

In the description herein, reference to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Moreover, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a sequential list of executable instructions that may be thought of as being useful for implementing logical functions, may be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that may fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: discrete logic circuits with logic gates for implementing logic functions on data signals, application specific integrated circuits with appropriate combinational logic gates, Programmable Gate Arrays (PGAs), Field Programmable Gate Arrays (FPGAs), etc.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that can be related to instructions of a program, which can be stored in a computer-readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A bill identification method is characterized by comprising the following steps:

acquiring a bill picture of a bill to be identified;

identifying the actual inclination angle category of the bill picture, and correcting the inclination angle of the bill picture based on the actual inclination angle category; and

detecting a text box of the corrected bill picture, extracting character information from the text box, identifying the actual type of the bill picture, and determining the actual classification of the text box based on the actual type so as to extract bill page information of the bill to be identified.

2. The method of claim 1, wherein the identifying the actual tilt angle category of the ticket image and correcting the tilt angle of the ticket image based on the actual tilt angle category comprises:

respectively collecting data of 0 degree of anticlockwise rotation, 90 degrees of anticlockwise rotation, 180 degrees of anticlockwise rotation and 270 degrees of anticlockwise rotation to determine the actual inclination angle category;

and rotating the bill picture clockwise by the correction angle corresponding to the actual inclination angle category.

3. The method of claim 1, wherein the detecting a text box of the rectified bill picture and extracting text information from the text box, and identifying the actual type of the bill picture, and determining the actual classification of the text box based on the actual type to extract bill page information of the bill to be identified comprises:

acquiring a rectangular area containing text lines by using a preset text detection algorithm to obtain the text box; determining the position of the text according to the current coordinates of the four vertexes of the text box;

and intercepting a rectangular area picture according to the position of the text, and inputting the rectangular area picture into a preset text which is a network to obtain the character information.

4. The method of claim 1, wherein the detecting a text box of the rectified bill picture and extracting text information from the text box, and identifying the actual type of the bill picture, and determining the actual classification of the text box based on the actual type to extract bill page information of the bill to be identified comprises:

acquiring the image characteristics of the text box by adopting a DenseNet network;

converting the image features into one-dimensional feature vectors, and generating final one-dimensional combined features by combining the geometric features of the text boxes;

inputting the one-dimensional combined features into a full-connection network with the neuron number equal to the classification number, outputting the probability value of each classification by using a softmax function, and determining the actual classification.

5. The method according to claim 1, after acquiring the ticket picture of the ticket to be identified, further comprising:

and carrying out drying, sharpening and binarization processing on the bill picture to obtain the bill picture with the contrast ratio meeting the preset condition.

6. A bill identifying apparatus, comprising:

the acquisition module is used for acquiring a bill picture of a bill to be identified;

the correction module is used for identifying the actual inclination angle category of the bill picture and correcting the inclination angle of the bill picture based on the actual inclination angle category; and

the identification module is used for detecting the text box of the corrected bill picture, extracting character information from the text box, identifying the actual type of the bill picture, and determining the actual classification of the text box based on the actual type so as to extract the bill page information of the bill to be identified.

7. The apparatus of claim 6, wherein the correction module is further configured to collect data of 0 degree counterclockwise rotation, 90 degree counterclockwise rotation, 180 degree counterclockwise rotation, 270 degree counterclockwise rotation, respectively, to determine the actual tilt angle category;

8. The apparatus of claim 6, wherein the identification module comprises:

the first obtaining unit is used for obtaining a rectangular area containing text lines by using a preset text detection algorithm to obtain the text box; determining the position of a text according to the current coordinates of the four vertexes of the text box, intercepting a rectangular area picture according to the position of the text, and inputting the rectangular area picture into a preset text network to obtain the text information;

and the second acquisition unit is used for acquiring the image features of the text box by adopting a DenseNet network, converting the image features into one-dimensional feature vectors, generating final one-dimensional combined features by combining the geometric features of the text box, inputting the one-dimensional combined features into a full-connection network with the number of neurons equal to the number of classifications, outputting the probability value of each classification by utilizing a softmax function, and determining the actual classification.

9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of ticket identification according to any one of claims 1-5.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for implementing a method for ticket recognition according to any one of claims 1-5.