CN117011871A

CN117011871A - End-to-end certificate image analysis method, device, equipment and medium

Info

Publication number: CN117011871A
Application number: CN202310855718.8A
Authority: CN
Inventors: 高大帅; 李健; 陈明; 武卫东
Original assignee: Beijing Sinovoice Technology Co Ltd
Current assignee: Beijing Sinovoice Technology Co Ltd
Priority date: 2023-07-12
Filing date: 2023-07-12
Publication date: 2023-11-07

Abstract

The embodiment of the invention provides an end-to-end certificate image analysis method, device, equipment and medium, wherein the method comprises the following steps: acquiring a certificate image; inputting the certificate image into an image analysis model for processing, wherein the image analysis model comprises a feature extraction module, an inclination correction module and a structuring extraction module; obtaining sharing characteristics of the certificate image through a characteristic extraction module; performing inclination correction on the shared characteristics through an inclination correction module to obtain a characteristic diagram; decoding the feature map through a structured extraction module to obtain an analysis result; determining structural elements of the certificate image according to the analysis result; the structured elements include a type of the document image. The method can realize end-to-end intelligent document analysis, omits text line detection and identification during document structuring extraction, fuses document classification into document structuring, maximally improves document analysis performance and improves customer experience.

Description

End-to-end certificate image analysis method, device, equipment and medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an end-to-end certificate image analysis method, apparatus, device, and medium.

Background

Document image parsing refers to obtaining text information of an image from a received document image. Traditional image analysis mainly relies on an artificially designed extractor to extract features of an image, professional knowledge and a complex parameter adjusting process are needed, and meanwhile, each method is poor in generalization capability and robustness aiming at specific applications. With the development of artificial intelligence, deep learning neural networks play an increasingly important role in the process of image information extraction.

At present, the technology for intelligently extracting the photographed document image uploaded by the user in the market is a staged algorithm, wherein the first step is to perform 0-360-degree inclination correction on the document image, the second step is to perform document classification, and the third step is to analyze the document by using text line detection, recognition and post-processing. In the prior art, document inclination correction, certificate classification and structured information extraction are completely and independently carried out in stages, the flow is complex, and errors in each stage can be accumulated to cause poor final effect.

Disclosure of Invention

In order to solve the problems, the embodiment of the invention discloses an end-to-end certificate image analysis method, an end-to-end certificate image analysis device, an end-to-end certificate image analysis equipment and a medium.

In a first aspect, an embodiment of the present invention provides an end-to-end certificate image analysis method, where the method includes:

acquiring a certificate image;

inputting the certificate image into an image analysis model for processing, wherein the image analysis model comprises a feature extraction module, an inclination correction module and a structuring extraction module; obtaining sharing characteristics of the certificate image through the characteristic extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structured extraction module to obtain an analysis result;

determining structural elements of the certificate image according to the analysis result; the structured elements include a type of the document image.

Optionally, the feature extraction module includes a shared convolutional neural network and a feature pyramid network, and the acquiring, by the feature extraction module, the shared feature of the image includes:

batch processing is carried out on the certificate images;

extracting features of the certificate images after batch processing through the shared convolutional neural network to obtain features extracted by each convolutional layer of the shared convolutional neural network;

And carrying out feature fusion on the features extracted by each convolution layer through the feature pyramid network to obtain shared features.

Optionally, the tilt correction module includes a first spatial deformation network and a second spatial deformation network, and the tilt correction of the shared feature by the tilt correction module obtains a feature map, including:

and correcting the large angle of the shared feature through a first space deformation network, and correcting the small angle of the shared feature through a second space deformation network to obtain the feature map.

Optionally, the structured extraction module includes a decoder network, and the decoding, by the structured extraction module, the feature map to obtain an analysis result includes:

and decoding the feature map subjected to shared feature extraction and inclination correction through a decoder network to obtain the structural elements of the document image.

Optionally, the image analysis model is trained by:

acquiring a first sample image and a second sample image, and performing angle rotation enhancement on the first sample image; wherein the first sample image is a simulated document image and the second sample image is a real document image;

Inputting the first sample image into an image analysis model; acquiring sharing characteristics of the first sample image through the characteristic extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structured extraction module to obtain an analysis result; training the image analysis model based on a first loss function and a second loss function;

inputting the first sample image into an image analysis model; acquiring sharing characteristics of the first sample image through the characteristic extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structural extraction module to obtain an analysis result of the first sample image; training the image analysis model based on a first loss function and a second loss function;

inputting the second sample image into the image analysis model; acquiring shared features of the second sample image through the feature extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structured extraction module to obtain an analysis result of the second sample image;

If the analysis result of the second sample image does not meet the preset condition, continuing to train the image analysis model by using the first sample image; and if the analysis result of the second sample image meets the preset condition, obtaining the image analysis model.

Optionally, the training the image analysis model based on the first loss function and the second loss function includes:

performing supervisory training on the first spatial deformation network and the second spatial deformation network of the inclination correction module through the first loss function;

and performing supervised training on the decoder network through the second loss function.

Optionally, if the analysis result of the second sample image meets a preset condition, obtaining the image analysis model includes:

acquiring a real structural element of the second sample image;

obtaining a test structural element of the second sample image from an analysis result of the second sample image;

and comparing the test structural element with the real structural element, and if the comparison results are the same, obtaining the image analysis model.

In a second aspect, an embodiment of the present invention discloses an end-to-end certificate image parsing apparatus, where the apparatus includes:

The image acquisition module is used for acquiring a certificate image;

the image processing module is used for inputting the certificate image into an image analysis model for processing, and the image analysis model comprises a feature extraction module, an inclination correction module and a structuring extraction module; obtaining sharing characteristics of the certificate image through the characteristic extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structured extraction module to obtain an analysis result;

the structural element determining module is used for determining structural elements of the certificate image according to the analysis result; the structured elements include a type of the document image.

Optionally, the feature extraction module includes a shared convolutional neural network and a feature pyramid network, and the image processing module includes:

the batch processing sub-module is used for carrying out batch processing on the certificate images;

the feature extraction sub-module is used for extracting features of the certificate images after batch processing through the shared convolutional neural network to obtain features extracted by each convolutional layer of the shared convolutional neural network;

And the feature fusion sub-module is used for carrying out feature fusion on the features extracted by the convolution layers through the feature pyramid network to obtain shared features.

Optionally, the tilt correction module includes a first spatially deformed network and a second spatially deformed network, and the image processing module further includes:

and the angle correction sub-module is used for correcting the large angle of the shared feature through the first space deformation network, and correcting the small angle of the shared feature through the second space deformation network, so as to obtain the feature map.

Optionally, the structured extraction module includes a decoder network, and the image processing module further includes:

and the decoding submodule is used for decoding the feature map subjected to shared feature extraction and inclination correction through a decoder network to obtain the structural elements of the document image of the certificate.

Optionally, the image analysis model is trained by:

the sample acquisition module is used for acquiring a first sample image and a second sample image and carrying out angle rotation enhancement on the first sample image; wherein the first sample image is a simulated document image and the second sample image is a real document image;

The training module is used for inputting the first sample image into the image analysis model; acquiring sharing characteristics of the first sample image through the characteristic extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structured extraction module to obtain an analysis result; training the image analysis model based on a first loss function and a second loss function;

the test module is used for inputting the second sample image into the image analysis model; acquiring shared features of the second sample image through the feature extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structured extraction module to obtain an analysis result of the second sample image;

the condition judgment module is used for continuing to train the image analysis model by using the first sample image if the analysis result of the second sample image does not meet the preset condition; and if the analysis result of the second sample image meets the preset condition, obtaining the image analysis model.

Optionally, the training module includes:

a first training sub-module for performing supervised training on the first and second spatially deformed networks of the tilt correction module by the first loss function;

and the second training submodule is used for performing supervision training on the decoder network through the second loss function.

Optionally, the condition judgment module includes:

the second structural element acquisition sub-module is used for acquiring the real structural elements of the second sample image;

a third structural element obtaining sub-module, configured to obtain a test structural element of the second sample image from an analysis result of the second sample image;

and the condition judgment sub-module is used for comparing the test structural element with the real structural element, and if the comparison result is the same, the image analysis model is obtained.

In a third aspect, the present invention shows an electronic device comprising: a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor performs the steps of an end-to-end document image resolution method as claimed in any one of the preceding claims.

In a fourth aspect, the present invention shows a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of an end-to-end document image resolution method as defined in any one of the above.

The embodiment of the invention has the following advantages:

the image analysis model in the embodiment of the invention comprises a feature extraction module, an inclination correction module and a structuring extraction module, wherein the feature extraction module is used for extracting the features of the certificate image through a shared convolutional neural network, the feature pyramid network is used for fusing the features to obtain shared features, the two space deformation networks in the inclination correction module are used for respectively carrying out large-angle correction and small-angle correction on the shared features to obtain a feature image, and the decoder network in the structuring extraction module is used for decoding the feature image to obtain the structuring element of the certificate image. According to the end-to-end certificate image analysis method provided by the invention, three functional modules of document feature extraction, document inclination correction and document analysis can be placed in an image analysis system model for processing, so that end-to-end intelligent document analysis is realized, text line detection and recognition are omitted during document structured extraction, document classification is fused into document structuring, the image analysis flow is simplified, error accumulation at each stage is avoided, document analysis performance is improved to the greatest extent, and customer experience is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of steps of an end-to-end certificate image parsing method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of an image analysis model training method according to an embodiment of the present invention;

FIG. 3 is a block diagram of an end-to-end document image analysis device according to an embodiment of the present invention;

fig. 4 is a block diagram of an image analysis model training apparatus according to an embodiment of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a step flow diagram of an end-to-end certificate image parsing method according to an embodiment of the present invention is shown, where the method specifically includes the following steps:

step 101, acquiring a certificate image;

in the embodiment of the present invention, the document image may be uploaded to a client by a user, and the client may be a computer device used in a mobile, for example, a mobile phone, an iPad (tablet computer), a notebook computer, etc., which is not limited in the present invention. These clients may support operating system programs such as Android (Android), iOS (iPhone Operation System, apple operating system), windows, etc. The document image to be parsed may include, among others, document-like documents (e.g., identity cards, passports, travel licenses, business cards, etc.), ticket-like documents (e.g., medical invoices, insurance documents, etc.). The carrier of the document of the certificate type and the document of the bill type can be paper, plastic and the like.

For example, the client may be installed with an image resolution program. When the user wants to parse the information of the document image by the image parsing program, the image parsing program may be started, and then the document image to be parsed may be uploaded to the image parsing program, so that the image parsing program may acquire the document image to be parsed. The document image to be parsed may refer to the document of the document class or the document of the ticket class.

102, inputting the certificate image into an image analysis model for processing, wherein the image analysis model comprises a feature extraction module, an inclination correction module and a structuring extraction module; obtaining sharing characteristics of the certificate image through the characteristic extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structured extraction module to obtain an analysis result;

in the embodiment of the invention, after the document image to be analyzed is obtained, the document image to be analyzed is input into an image analysis model. The image analysis model can comprise a feature extraction module, an inclination correction module and a structural extraction module, the shared features of the certificate images can be obtained through the feature extraction module, then the shared features are subjected to inclination correction through the inclination correction module to obtain feature images, and finally the feature images are decoded through the structural extraction module to obtain analysis results. Therefore, the image analysis model is an end-to-end certificate image analysis model, and three functional modules of document feature extraction, document inclination correction and document analysis are placed in a unified model for processing in the application process, so that end-to-end intelligent document analysis is realized.

The image analysis program may include an image analysis model, and after a user uploads a document image to be analyzed to a client, the image analysis model automatically processes the document image to be analyzed, extracts features of the document image, performs tilt correction on the document image, and decodes the processed document image to obtain an analysis result.

In an alternative embodiment of the present invention, the feature extraction module includes a shared convolutional neural network and a feature pyramid network, and the acquiring, by the feature extraction module, the shared feature of the image may include the following sub-steps:

s11, carrying out batch processing on the certificate images;

in the embodiment of the invention, after the certificate image is acquired, batch processing is required to be carried out on the certificate image, and then the feature extraction is carried out on the document image after Batch. In the training process of the CNN (Convolutional Neural Network convolutional neural network), the selection of parameters such as learning rate, parameter initialization, weight attenuation coefficient and the like is critical to the training result. The method has the advantages that a relatively large initial learning rate can be selected through the Batch process, so that the training speed is improved, training data can be thoroughly disturbed, and a certain sample is prevented from being frequently selected during each Batch of training. The essence of the neural network learning process is to learn data distribution, and once the distribution of training data and test data is different, the generalization capability of the network is greatly reduced; once the distribution of the training data is different from Batch to Batch (Batch gradient decreases), the network learns to adapt to the different distribution at each iteration, which greatly reduces the training speed of the network. Therefore, the batch processing can accelerate the network convergence speed, improve gradient dispersion and improve the generalization capability of the network.

Step S12, extracting features of the certificate images after batch processing through the shared convolutional neural network to obtain features extracted by each convolutional layer of the shared convolutional neural network;

in the embodiment of the invention, a shared CNN skeleton network such as a resnet50 can be used for extracting the characteristics of the certificate image after Batch to obtain the characteristics extracted by each convolution layer in the CNN network. For example, the document image may be n×c×h×w, where "N" represents Batch size, "C" represents channel number of feature map, "H" represents high of feature map, and "W" represents width of feature map. Where N represents several images of the batch, H represents how many pixels the image has in the vertical direction, W represents the number of pixels in the horizontal direction, C represents the number of channels, e.g. the number of channels of a black-and-white image c=1, and the number of channels of an RGB color image c=3. CNN can perform feature extraction on n×c×h×w, and the extracted features are self-learning abstract features.

And S13, carrying out feature fusion on the features extracted by the convolution layers through the feature pyramid network to obtain shared features.

In the embodiment of the invention, the FPN (Feature Pyramid Networks feature pyramid network) can perform feature fusion on the features extracted by each convolution layer of the CNN. A feature pyramid network is a feature extractor aimed at improving accuracy and speed. The FPN consists of two paths, bottom-up and top-down, where the bottom-up path is a common convolutional network for feature extraction, the spatial resolution decreases from bottom-up, and the semantic value of each layer increases when higher layer structures are detected. Illustratively, the FPN may fuse features of feature layers 1/128,1/64,1/32,1/16 in the middle of the CNN to obtain the final shared feature of n×c1×h/16×w/16.

In an alternative embodiment of the present invention, the tilt correction module includes a first spatial deformation network and a second spatial deformation network, and the tilt correction of the shared feature by the tilt correction module to obtain the feature map may include the following substeps:

and S21, correcting the large angle of the shared feature through a first space deformation network, and correcting the small angle of the shared feature through a second space deformation network to obtain the feature map.

In the embodiment of the invention, two STNs (Spatial Transformer Networks space deformation networks) can perform large-angle and small-angle inclination correction on shared features. The main idea of STN is to learn a change parameter through a network, then calculate the corresponding coordinate of the new graph on the original graph, and then fill the new graph through a certain filling method, for example, a section of micro-module is inserted into the existing convolution architecture, so that the neural network can actively convert the feature graph on the condition of the feature graph itself. The above-mentioned change parameter may be an affine transformation parameter, a linear transformation of two-dimensional coordinates (x, y) into two-dimensional coordinates (u, v). Affine transformation can realize transformation such as translation, scaling, rotation, overturning and the like, and has the characteristics of flatness and parallelism. For example, the multiple document images uploaded by the user may have a tilt of 0 to 360 degrees, where 0 degrees, 90 degrees, 180 degrees, and 270 degrees are large angles, and a small angle within plus or minus 45 degrees, and the sharing feature of n×c1×h/16×w/16 may be corrected by the large angle of the first STN and the small angle of the second STN, to obtain a feature map of n×c1×h/16×w/16. The profile may be used as an encoding input for a subsequent Transformer decoder decoder.

In an optional embodiment of the present invention, the structured extraction module includes a decoder network, and the decoding, by the structured extraction module, the feature map to obtain the parsing result may include the following substeps:

and a substep S31, decoding the feature map subjected to shared feature extraction and inclination correction through a decoder network to obtain structural elements of the document image.

In the embodiment of the invention, a Transformer decoder decoder network such as a Bart decoder can be used for decoding the feature map after the shared feature extraction and the inclination correction to obtain the structural elements of the document image of the certificate.

Step 103, determining structural elements of the certificate image according to the analysis result; the structured elements include a type of the document image.

In the embodiment of the invention, the structural element of the certificate image can be determined according to the analysis result of the decoder network Bart decoder, wherein the structural element comprises the type of the certificate image. Illustratively, the structural elements of the document image of the certificate may be eight structural elements of name, sex, family of names, date of birth, address, identification number, issuing authority, effective date on the identification card, or nine structural elements of social credit code, name, type, residence, legal representative, registered capital, date of establishment, business deadline, and business scope on the business license.

In the embodiment of the invention, the image analysis model comprises a feature extraction module, an inclination correction module and a structuring extraction module, wherein the feature extraction module is used for extracting the features of the certificate image through a shared convolutional neural network, the feature pyramid network is used for fusing the features to obtain the shared features, the two space deformation networks in the inclination correction module are used for respectively carrying out large-angle correction and small-angle correction on the shared features to obtain a feature image, and the decoder network in the structuring extraction module is used for decoding the feature image to obtain the structuring element of the certificate image. According to the end-to-end certificate image analysis method provided by the invention, three functional modules of document feature extraction, document inclination correction and document analysis can be placed in an image analysis system model for processing, so that end-to-end intelligent document analysis is realized, text line detection and recognition are omitted during document structured extraction, and document classification is fused into document structuring, so that the image analysis flow is simplified, error accumulation at each stage is avoided, document analysis performance is improved to the greatest extent, and customer experience is improved.

Referring to fig. 2, a step flowchart of an image analysis model training method provided by an embodiment of the present invention is shown, where the image analysis model may be obtained by training in the following manner:

Step 201, acquiring a first sample image and a second sample image, and performing angle rotation enhancement on the first sample image; wherein the first sample image is a simulated document image and the second sample image is a real document image;

in the embodiment of the invention, the data simulation can be performed by using a fixed layout or a semi-fixed layout of the document image, wherein the fixed layout ensures that the relative positions of all elements on the image are completely fixed, such as an identity card, and the semi-fixed layout ensures that the relative positions of all elements on the image are not fixed, such as a business card. 10 ten thousand of the identification card data and 10 ten thousand of the business license data can be selected as first sample images by a proper simulation algorithm for network training, and the inclination angle, the type and the structural information of each first sample image can be acquired during simulation. And then, carrying out angle rotation enhancement on the first sample image so as to enable the image analysis model to accurately determine the inclination angle of the image. The 100 pieces of real data provided by the user can be used for carrying out structural information labeling on the identity card and the business license as a second sample image for network testing.

Step 202, inputting a first sample image into an image analysis model; acquiring sharing characteristics of the first sample image through the characteristic extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structural extraction module to obtain an analysis result of the first sample image; training the image analysis model based on a first loss function and a second loss function;

In the embodiment of the invention, after the first sample image is acquired, the first sample image can be input into the image analysis model. The image analysis model can comprise a feature extraction module, an inclination correction module and a structural extraction module, the shared features of the first sample image can be obtained through the feature extraction module, then the shared features are subjected to inclination correction through the inclination correction module to obtain a feature image, and finally the feature image is decoded through the structural extraction module to obtain an analysis result of the first sample image. And analyzing the first sample image to obtain structural elements of the first sample image, wherein the structural elements of the first sample image are training results of an image analysis model. At the same time, the image analysis model is trained repeatedly based on the first loss function and the second loss function, so that the structural elements obtained by analyzing the first sample image are identical to the simulation structural information of the first sample image as much as possible.

In an alternative embodiment of the present invention, the training the image analysis model based on the first loss function and the second loss function may comprise the sub-steps of:

a substep S41 of performing a supervisory training on the first and second spatial deformation networks of the tilt correction module by the first loss function;

In the embodiment of the invention, the first loss function may be a SmoothL1 loss function, and is used in a regression task, and the SmoothL1 loss function has the advantages of high convergence speed, insensitivity to abnormal values, such as smaller punishment to oversized or undersized outliers and relatively smaller gradient change, so that the model is more robust. The two space deformation networks of the inclination correction module are supervised and trained through the SmoothL1 loss function, and the inclination correction module is adjusted according to the inclination angle error value calculated in each training process, so that the accuracy of inclination correction of the image analysis model can be improved.

In a substep S42, the decoder network is supervised trained by the second loss function.

In an embodiment of the present invention, the second loss function may be a seq2seq cross entropy loss function, typically used in the task of machine translation, converting one sequence into another. The seq2seq cross entropy loss function has the advantages that the structure is simple, manual characteristic engineering and various complex front-back components are not needed, the structure is an end-to-end neural network, and the whole structure is optimized together. The decoder network of the structured extraction module is supervised and trained through the seq2seq cross entropy loss function, and the structured extraction module is adjusted according to the structured information error value calculated in each training process, so that the accuracy of structured extraction of the image analysis model can be improved.

Step 203, inputting the second sample image into the image analysis model; acquiring shared features of the second sample image through the feature extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structured extraction module to obtain an analysis result of the second sample image;

in the embodiment of the invention, after the second sample image is acquired, the second sample image may be input to the image analysis model. The image analysis model can comprise a feature extraction module, an inclination correction module and a structural extraction module, the shared features of the second sample image can be obtained through the feature extraction module, then the shared features are subjected to inclination correction through the inclination correction module to obtain a feature image, and finally the feature image is decoded through the structural extraction module to obtain an analysis result of the second sample image. And analyzing the second sample image to obtain structural elements of the second sample image, namely the test result of the image analysis model.

Step 204, if the analysis result of the second sample image does not meet the preset condition, continuing to train the image analysis model by using the first sample image; and if the analysis result of the second sample image meets the preset condition, obtaining the image analysis model.

In the embodiment of the invention, whether the image analysis model is trained or not can be judged through the test result of the second sample image on the image analysis model. If the analysis result of the second sample image does not meet the preset condition, the image analysis model is not trained, and the image analysis model needs to be trained by continuously using the first sample image; if the analysis result of the second sample image meets the preset condition, the completion of the training of the image analysis model is indicated, and the image analysis model can be obtained.

In an optional embodiment of the present invention, if the analysis result of the second sample image meets a preset condition, obtaining the image analysis model may include the following substeps:

a substep S51 of obtaining a real structural element of the second sample image;

in the embodiment of the invention, the real structural elements of the second sample image can be obtained from 100 pieces of real data of the identity card and the business license provided by the user.

A substep S52, obtaining a test structural element of the second sample image from the analysis result of the second sample image;

in the embodiment of the invention, after the second sample image is acquired, the second sample image may be input to the image analysis model. The shared features of the second sample image can be obtained through the feature extraction module, the shared features are subjected to inclination correction through the inclination correction module to obtain a feature image, and finally the feature image is decoded through the structural extraction module to obtain an analysis result of the second sample image, namely the test structural elements of the second sample image.

And S53, comparing the test structural element with the real structural element, and if the comparison result is the same, obtaining the image analysis model.

In the embodiment of the invention, the test structural element of the second sample image can be compared with the real structural element of the second sample image to judge whether the image analysis model is trained. If the test structural elements of the second sample image are not identical to the real structural elements of the second sample image, the image analysis model is not trained, and the first sample image is required to be continuously used for training the image analysis model; if the test structural element of the second sample image is the same as the real structural element of the second sample image, the image analysis model training is completed, and the image analysis model can be obtained.

In order that those skilled in the art may better understand the embodiments of the present invention, the following description of the embodiments of the present invention is provided by way of example:

(1) Performing data simulation on the first sample image, and performing angle rotation enhancement;

(2) The maximum edge equal proportion of all the first sample images is adjusted to 1280, then the short edges are tiled to obtain a color image with the size of 1280 x 3, and after Batch, N x 1280 x 3 is obtained and is input to a feature extraction module;

(3) Extracting features of input after Batch by using a resnet50, obtaining feature graphs with the proportions of 1/16, 1/32, 1/64 and 1/128 with the original graph, performing FPN fusion, and finally obtaining feature graphs with the size of N80X 512;

(4) Respectively performing large-angle and small-angle inclination correction on the feature map by using two lightweight STN networks (each STN comprises three CNN layers and 2 full-connection layers), wherein the corrected feature map is in a size of N80 x 512; performing supervision training on the two lightweight STN networks based on a loss function SmoothL1 loss function;

(5) Decoding the inclination corrected feature map by using a 4-layer Bart decoder network to obtain the certificate type and the corresponding structural elements of the certificate image; the base decoder network is supervised trained based on the seq2seq cross entropy loss function.

In the embodiment of the invention, three functional modules of document feature extraction, document inclination correction and document analysis can be placed in an image analysis system model for training to obtain an end-to-end intelligent document analysis model, the features of the document image are extracted through a shared convolutional neural network in a feature extraction module, the features are fused through a feature pyramid network to obtain shared features, the shared features are respectively subjected to large-angle correction and small-angle correction through two space deformation networks in an inclination correction module to obtain a feature image, and the feature image is decoded through a decoder network in a structural extraction module to obtain structural elements of the document image. Meanwhile, two space deformation networks of the inclination correction module are subjected to supervision training based on the first loss function, and a decoder network of the structural extraction module is subjected to supervision training based on the second loss function, so that the accuracy of extracting structural elements of a document image to be analyzed by the image analysis system model is improved. The end-to-end certificate image analysis method provided by the invention can realize end-to-end intelligent document analysis, omits text line detection and identification during document structured extraction, fuses document classification into document structuring, maximally improves document analysis performance and improves customer experience.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Referring to fig. 3, a block diagram of an end-to-end certificate image analysis device according to an embodiment of the present invention is shown, which may specifically include the following modules:

an image acquisition module 301, configured to acquire a document image;

the image processing module 302 is configured to input the document image into an image analysis model for processing, where the image analysis model includes a feature extraction module, an inclination correction module, and a structured extraction module; obtaining sharing characteristics of the certificate image through the characteristic extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structured extraction module to obtain an analysis result;

A structural element determining module 303, configured to determine structural elements of the document image according to the analysis result; the structured elements include a type of the document image.

In an embodiment of the present invention, the feature extraction module includes a shared convolutional neural network and a feature pyramid network, and the image processing module 302 includes:

In an embodiment of the present invention, the tilt correction module includes a first spatial distortion network and a second spatial distortion network, and the image processing module 302 further includes:

In an embodiment of the present invention, the structured extraction module includes a decoder network, and the image processing module 302 further includes:

and the decoding submodule is used for decoding the feature map subjected to the shared feature extraction and the inclination correction through a decoder network to obtain the structural elements of the document image of the document.

Referring to fig. 4, a structural block diagram of an image analysis model training device provided by an embodiment of the present invention is shown, which may specifically include the following modules:

a sample acquisition module 401, configured to acquire a first sample image and a second sample image, and perform angular rotation enhancement on the first sample image; wherein the first sample image is a simulated document image and the second sample image is a real document image;

a training module 402, configured to input a first sample image into an image analysis model; acquiring sharing characteristics of the first sample image through the characteristic extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structured extraction module to obtain an analysis result; training the image analysis model based on a first loss function and a second loss function;

A test module 403, configured to input the second sample image into the image analysis model; acquiring shared features of the second sample image through the feature extraction module; performing tilt correction on the shared features through the tilt correction module to obtain a feature map; decoding the feature map through the structured extraction module to obtain an analysis result of the second sample image;

a condition judgment module 404, configured to continuously train the image analysis model by using the first sample image if the analysis result of the second sample image does not meet a preset condition; and if the analysis result of the second sample image meets the preset condition, obtaining the image analysis model.

In an embodiment of the present invention, the training module 402 includes:

In the embodiment of the present invention, the condition determining module 404 includes:

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

The embodiment of the invention also provides electronic equipment, which comprises: the processor, the memory and the computer program stored in the memory and capable of running on the processor, when the computer program is executed by the processor, the processes of the end-to-end certificate image analysis method embodiment are realized, the same technical effects can be achieved, and in order to avoid repetition, the description is omitted here.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, realizes the processes of the end-to-end certificate image analysis method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The end-to-end certificate image analysis method and the end-to-end certificate image analysis device provided by the invention are described in detail, and specific examples are applied to illustrate the principle and the implementation of the invention, and the description of the above examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. An end-to-end certificate image parsing method, comprising:

acquiring a certificate image;

2. The method of claim 1, wherein the feature extraction module comprises a shared convolutional neural network and a feature pyramid network, the acquiring, by the feature extraction module, shared features of an image comprising:

batch processing is carried out on the certificate images;

3. The method of claim 1, wherein the tilt correction module comprises a first spatially deformed network and a second spatially deformed network, wherein the tilt correction of the shared feature by the tilt correction module results in a feature map, comprising:

4. The method of claim 1, wherein the structured extraction module comprises a decoder network, and wherein decoding the feature map by the structured extraction module results in a parsing result, comprising:

5. The method of claim 1, wherein the image analysis model is trained by:

6. The method of claim 5, wherein the training the image analysis model based on the first and second loss functions comprises:

7. The method according to claim 5, wherein obtaining the image analysis model if the analysis result of the second sample image satisfies a preset condition includes:

acquiring a real structural element of the second sample image;

8. An end-to-end document image resolution apparatus, the apparatus comprising:

the image acquisition module is used for acquiring a certificate image;

9. An electronic device, comprising: a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor performs the steps of an end-to-end document image parsing method as claimed in any one of claims 1 to 7.

10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor, implements the steps of an end-to-end document image parsing method according to any of claims 1-7.